CN109871463B

CN109871463B - Audio processing method, device, electronic equipment and storage medium

Info

Publication number: CN109871463B
Application number: CN201910168211.9A
Authority: CN
Inventors: 孔令城
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2019-03-06
Filing date: 2019-03-06
Publication date: 2024-04-09
Anticipated expiration: 2039-03-06
Also published as: CN109871463A

Abstract

The embodiment of the invention discloses an audio processing method, an audio processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: extracting an audio fingerprint of a target audio, and obtaining an inverted index table, wherein the inverted index table comprises the target audio and fingerprint information of the target audio; acquiring the fingerprint information representativeness of the target audio according to the fingerprint information of the target audio; and if the representative degree of the fingerprint information of the target audio is lower than expected, deleting the fingerprint information of the target audio from the inverted index table. By screening the data in the inverted index table, the memory consumption can be reduced, and the retrieval efficiency can be improved.

Description

Audio processing method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of multimedia data technology, and in particular, to an audio processing method, an audio processing apparatus, an electronic device, and a storage medium.

Background

With the development of the internet and audio fingerprint technology, an audio retrieval mode based on audio fingerprints is generated. The audio retrieval mode only needs to extract the audio fingerprint from the audio segment input by the user, compares the audio fingerprint with the audio fingerprint in the inverted index table, records the mapping relation between the audio fingerprint and the audio in the inverted index table, and can retrieve relevant audio according to the comparison result. The audio retrieval mode does not need a user to manually input a text, so that the audio can be retrieved more conveniently, and the audio retrieval mode is favored by more and more people. In practice, it is found that in the audio retrieval mode, if the audio fingerprints included in the inverted index table are too many, more audio can be retrieved for the user, but the workload of secondary filtering is increased and a larger memory is required to be consumed to store the inverted index table; the secondary filtering refers to that after a plurality of relevant audios are searched out according to fingerprints of audio segments input by users, one audio needed by the users needs to be screened out from the searched-out audios again. If the audio fingerprints included in the inverted index table are too few, the workload of secondary filtering can be reduced, the memory consumption can be reduced, but more audio cannot be searched for the user. Thus, the inverted index table is a key factor affecting retrieval performance.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide an audio processing method, an audio processing device, electronic equipment and a storage medium, which can reduce memory consumption and improve retrieval efficiency by screening data in an inverted index table.

In one aspect, an embodiment of the present invention provides an audio processing method, including:

extracting an audio fingerprint of the target audio;

acquiring an inverted index table, wherein the inverted index table comprises the target audio and fingerprint information of the target audio, and the fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio;

acquiring the fingerprint information representativeness of the target audio according to the fingerprint information of the target audio, wherein the fingerprint information representativeness of the target audio is the inverse text frequency of the fingerprint information of the target audio, and the inverse text frequency is inversely proportional to the number of matched audio;

and if the representative degree of the fingerprint information of the target audio is lower than expected, deleting the fingerprint information of the target audio from the inverted index table.

In one aspect, an embodiment of the present invention provides an audio processing apparatus, including:

an extraction unit for extracting an audio fingerprint of the target audio;

the acquisition unit is used for acquiring an inverted index table, wherein the inverted index table comprises the target audio and fingerprint information of the target audio, and the fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio; acquiring the fingerprint information representativeness of the target audio according to the fingerprint information of the target audio, wherein the fingerprint information representativeness of the target audio is the inverse text frequency of the fingerprint information of the target audio, and the inverse text frequency is inversely proportional to the number of matched audio;

and the deleting unit is used for deleting the fingerprint information of the target audio from the inverted index table if the fingerprint information representativeness of the target audio is lower than expected.

In one aspect, an embodiment of the present invention provides an electronic device, including: a processor and a storage device;

the storage device stores computer program instructions, and the processor invokes the computer program instructions for performing the steps of:

extracting an audio fingerprint of the target audio;

In one aspect, embodiments of the present invention provide a computer readable storage medium storing computer program instructions that, when executed, cause a computer to perform a method comprising:

extracting an audio fingerprint of the target audio;

In the embodiment of the invention, the audio fingerprint of the target audio is extracted, the representative degree of the fingerprint information of the target audio is obtained, and when the representative degree of the fingerprint information of the target audio is lower than the expected degree, the fingerprint information of the target audio is deleted from the inverted index table; the fingerprint information in the inverted index table can be screened through the fingerprint information representativeness of the target audio so as to retain the fingerprint information with higher representativeness and delete the fingerprint information with lower representativeness, so that the storage space can be saved, the resource consumption of the electronic equipment can be reduced, and the inverted index table is more simplified. Meanwhile, the retrieval performance of the fingerprint information with low representativeness is poor, so that the retrieval performance is not affected by deleting the fingerprint information; instead, the retrieval performance is higher by retrieving the fingerprint information with higher representativeness, so that the workload of secondary filtering can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an audio processing method according to the present invention;

FIG. 2 is a flow chart of another audio processing method provided by the present invention;

fig. 3 is a schematic structural diagram of an audio processing device according to the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the prior art, in order to be able to retrieve more audio, all fingerprint information of each audio is usually added to the inverted index table, which puts a large storage pressure on the storage device, for example, typically about 1G of storage space is required for 1 ten thousand songs, and up to several TB of storage space is required for thousands of song libraries. Based on this, an embodiment of the present invention provides an audio processing method, please refer to fig. 1, the method may be applied to an electronic device, where the electronic device may be a smart phone, a smart watch, a tablet computer, a server, or other devices, and the method may include steps S101 to S104.

S101, extracting an audio fingerprint of the target audio.

The electronic device may perform time-frequency conversion and the like on the target audio to obtain an audio fingerprint of the target audio. The audio fingerprint refers to the characteristic information of the target audio.

S102, acquiring an inverted index table, wherein the inverted index table comprises target audio and fingerprint information of the target audio, and the fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio.

In order to efficiently retrieve the audio required by the user, an inverted index table may be included in the electronic device, which inverted index table corresponds to an audio database in the electronic device. The inverted index table comprises a plurality of audios in an audio database and fingerprint information of each audio, and the target audio can be any audio in the plurality of audios; the target audio may include a plurality of fingerprint information, and the fingerprint information of the target audio according to the embodiment of the present invention may refer to any one of a plurality of audio fingerprint information of the target audio. The fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio, that is, the electronic device may perform hash operation on the audio fingerprint of the target audio to obtain the fingerprint information of the target audio.

S103, acquiring the fingerprint information representativeness of the target audio according to the fingerprint information of the target audio, wherein the fingerprint information representativeness of the target audio is the inverse text frequency of the fingerprint information of the target audio, and the inverse text frequency is inversely proportional to the number of the matched audio.

The fingerprint information representative degree of the target audio, which may refer to the inverse text frequency of the fingerprint information of the target audio and/or the number of matches of the fingerprint information of the target audio, is used to describe the uniqueness (i.e., uniqueness) of the fingerprint information of the target audio. The inverse text frequency of the fingerprint information of the target audio is inversely proportional to the first number of audio matches in the inverted index table with the fingerprint information of the target audio, e.g., fingerprint information in the inverted index table with less audio matches with the fingerprint information of the target audio, the inverse text frequency of the fingerprint information of the target audio is low, indicating that the higher the uniqueness of the fingerprint information of the target audio, the higher the fingerprint information representativeness of the target audio; if the target audio is searched by adopting the fingerprint information of the target audio, less audio (namely, the audio with all fingerprint information matched with the fingerprint information of the target audio) can be searched, so that the workload of secondary searching and filtering is reduced. On the contrary, the fingerprint information with more audios in the inverted index table is matched with the fingerprint information of the target audio, and the inverted text frequency of the fingerprint information of the target audio is high, which indicates that the lower the uniqueness of the fingerprint information of the target audio is, namely the lower the representativeness of the fingerprint information of the target audio is; if the target audio is searched by adopting the fingerprint information of the target audio, more audio (namely, the audio with all the fingerprint information matched with the fingerprint information of the target audio) can be searched, and the workload of secondary searching and filtering is increased. For example, the fingerprint information of the target audio includes fingerprint information a and fingerprint information B, 1000 audio pieces are included in the inverted index table, fingerprint information of 100 audio pieces (the first audio number is 100) is matched with the fingerprint information a, and fingerprint information of 10 audio pieces (the first audio number is 100) is matched with the fingerprint information B. If the fingerprint information a is used to search the target audio, 100 audios can be searched, and if the fingerprint information of the 100 audios is matched with the fingerprint information a, the user needs to perform secondary filtering on the 100 audios to screen the target audio. If the target audio is searched by using the fingerprint information B of the target audio, 10 audios can be searched, and the fingerprint information B of the 10 audios is matched with the fingerprint information B of the target audio, the user needs to perform secondary filtering on the 10 (the first audio number is 10) audios to screen out the target audio. It can be seen that the representativeness of the fingerprint information a is lower than the representativeness of the fingerprint information B, that is, the inverse text frequency of the fingerprint information a is lower than the inverse text frequency of the fingerprint information B, and the workload of collecting the fingerprint information a to search the target audio is higher than the workload of collecting the fingerprint information B to search the target audio, that is, the performance of searching the target audio by using the fingerprint information a is lower than the performance of searching by using the fingerprint information B. The matching times of the fingerprint information of the target audio refer to the matching times of the fingerprint information of the target audio and the fingerprint information of the audio segment carried in the query instruction, namely the times of the user adopting the audio segment corresponding to the fingerprint information of the target audio to query, if the matching times are more, the representative degree of the fingerprint information of the target audio is higher, and if the matching times are less, the representative degree of the fingerprint information of the target audio is lower.

S104, if the representative degree of the fingerprint information of the target audio is lower than the expected degree, deleting the fingerprint information of the target audio from the inverted index table.

When the target audio is searched by adopting the fingerprint information of the target audio, if the fingerprint information of the target audio is higher in representativeness, the target audio can be quickly searched, and the utilization value of the fingerprint information of the target audio is higher; if the representative degree of the fingerprint information of the target audio is lower, the workload of searching the target audio is increased, the searching performance is reduced, and the utilization value of the fingerprint information of the target audio is lower. In order to make the inverted index table more compact, so as to reduce the memory space consumed for storing the inverted index table, fingerprint information with lower utilization value in the inverted index table can be deleted from the target inverted index table. Specifically, if the representative degree of the fingerprint information of the target audio is lower than the expected value, the electronic device may delete the target audio and the fingerprint information of the target audio from the inverted index table, and release the storage space for storing the fingerprint information of the target audio and the target audio, so as to store more fingerprint information with high use value and improve the retrieval performance.

In the embodiment of the invention, the representativeness of the fingerprint information of the target audio is obtained, and when the representativeness of the fingerprint information of the target audio is lower than expected, the fingerprint information of the target audio is deleted from the inverted index table; the fingerprint information in the inverted index table can be screened through the fingerprint information representativeness of the target audio so as to retain the fingerprint information with higher representativeness and delete the fingerprint information with lower representativeness, so that the storage space can be saved, the resource consumption of the electronic equipment can be reduced, and the inverted index table is more simplified. Meanwhile, the retrieval performance of the fingerprint information with low representativeness is poor, so that the retrieval performance is not affected by deleting the fingerprint information; instead, the retrieval performance is higher by retrieving the fingerprint information with higher representativeness, so that the workload of secondary filtering can be reduced.

Referring to fig. 2, fig. 2 is a schematic diagram showing another audio processing method according to an embodiment of the present invention, where the method may be applied to an electronic device, and the electronic device may be a smart phone, a smart watch, a tablet computer, a server, or the like. The first audio number is the number of the audio which is matched with the fingerprint information of the target audio by the fingerprint information in the inverted index table, the inverted index table comprises a plurality of audio, and the target audio is any audio in the plurality of audio. The method may include steps S201 to S206.

S201, extracting an audio fingerprint of the target audio.

In one embodiment, step s201 includes the following steps s 21-s 23.

And S21, performing time-frequency conversion on the target audio to obtain frequency domain information of the target audio.

And s22, acquiring an energy matrix of the target audio according to the frequency domain information of the target audio.

s23, determining the audio fingerprint of the target audio according to the energy matrix of the target audio.

In steps s21 to s23, the electronic device may perform time-frequency transformation on the target audio by using FFT (Fast Fourier Transformation) algorithm to obtain frequency domain information of the target audio, where the frequency domain information of the target audio describes a relationship between frequency and pitch. The energy matrix of the target audio can be calculated according to the frequency domain information of the target audio, the energy matrix of the target audio is detected, and the detected local maximum energy value is used as the audio fingerprint of the target audio.

S202, acquiring an index table, wherein the index table comprises target audio and fingerprint information of the target audio, and the fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio.

The electronic device may obtain parameter information of the target audio including at least one parameter of energy, chromaticity, loudness, pitch, etc. of the target audio. And analyzing the parameter information of the target audio to obtain the audio fingerprint of the target audio, wherein if the parameter information of the target audio is pitch, the pitch which is larger than the preset pitch in the target audio is used as the audio fingerprint of the target audio. And calculating a hash value of the audio fingerprint of the target audio through a hash algorithm, and adding the hash values of the audio fingerprints of the target audio and the target audio into the inverted index table.

In one embodiment, the inverted index table further includes location information of the fingerprint information of the target audio in the target audio and/or a frequency of occurrence of the fingerprint information of the target audio in the target audio; the location information is the location in the target audio of the word corresponding to the fingerprint. For example, the inverted index table is shown in table 1, and the inverted index table includes audio 1, audio 2 and audio 3. Audio 1 includes fingerprint information a and B, the position of fingerprint information a in audio 1 is 2s, and the frequency of occurrence in audio 1 is 1 time; the position of the fingerprint information B in the audio 1 is 16s, and the frequency of occurrence in the audio 1 is 1 time. Audio 2 includes fingerprint information a, the position of fingerprint information a in audio 2 is 5s, and the frequency of occurrence in audio 2 is 1 time. The audio 3 includes fingerprint information a, the position of the fingerprint information a in the audio 3 is 5s, and the frequency of occurrence in the audio 3 is 1 time. It can be seen that the fingerprint information a of the audio segment 1 is identical to the fingerprint information a of the audio 1, the audio 2 and the audio 3 in the inverted index table, and the fingerprint information B of the audio segment 1 is identical to the fingerprint information a of the audio 1 in the inverted index table; therefore, the uniqueness of the fingerprint information B of the audio 1 is strong, and the representativeness of the fingerprint information B of the audio 1 is strong; the uniqueness of the fingerprint information a of the audio 1 is weak, and the representativeness of the fingerprint information a of the audio 1 is weak.

TABLE 1

S203, counting the total number of the audios included in the inverted index table and the number of the audios of which the fingerprint information in the inverted index table is matched with the fingerprint information of the target audio.

S204, calculating the ratio between the total number of the audios and the number of the matched audios.

S205, determining the inverse text frequency of the fingerprint information of the target audio according to the ratio.

S206, deleting the fingerprint information of the target audio from the inverted index table if the inverse text frequency of the fingerprint information of the target audio is lower than expected. As shown in table 1, the target audio is audio 1, the fingerprint information is fingerprint information a, and if the inverse text frequency of the fingerprint information a of the target audio is lower than the expected one, the fingerprint information a of the audio 1 is deleted. Since the fingerprint information is the same, the inverse text frequency of the fingerprint information of each audio is the same, and therefore if the calculated inverse text frequency of the fingerprint information of the target audio is lower than expected, the fingerprint information in the inverted index table which is the same as the fingerprint information of the target audio is deleted.

In steps S203 to S206, the electronic device may filter the fingerprint information in the inverted index table by using the inverse text frequency of the fingerprint information. Specifically, the electronic device may count the total number of the audio in the inverted index table, where the total number of the audio in the inverted index table may also refer to the total number of the audio in the audio database, and count the number of the first audio in the inverted index table, where the fingerprint information matches the fingerprint information of the target audio; a ratio between the total number of audio frequencies and the first number of audio frequencies is calculated, and an inverse text frequency of fingerprint information of the target audio frequency is determined according to the ratio. The larger the ratio is, the fewer the number of the audios is, the fingerprint information is matched with the fingerprint information of the target audio, the larger the reverse text frequency of the fingerprint information of the target audio is, and the higher the representative degree of the fingerprint information of the target audio is; the smaller the ratio, the more the number of tones that the fingerprint information matches with the fingerprint information of the target tone, the smaller the inverse text frequency of the fingerprint information of the target tone, and the lower the representative degree of the fingerprint information of the target tone. Therefore, if the inverse text frequency of the fingerprint information of the target audio is lower than expected, it indicates that the lower the representative degree of the fingerprint information of the target audio is, the fingerprint information of the target audio is deleted from the inverted index table. For example, as shown in table 1, the target audio is audio 1, the fingerprint information of audio 1 includes fingerprint information a and fingerprint information B, and the total number of audio in the inverted index table is 3. The fingerprint information of 3 audios (i.e. the first audio number is 3) in the inverted index table is the same as the fingerprint information A of the audio 1, the inverse text frequency of the fingerprint information A can be the ratio of the total audio number to the first audio number, and the inverse text frequency of the fingerprint information A is 1; the inverted index table has 1 (i.e. the first audio number is 1), and the inverted text frequency of the fingerprint information B may be a ratio of the total audio number to the first audio number, and the inverted text frequency of the fingerprint information B is 3. Assuming that the expected text frequency is 2, deleting the fingerprint information A from the inverted index table, wherein the inverted text frequency of the fingerprint information A is lower than that of the expected text frequency; the inverse text frequency of the fingerprint information B is higher than expected, and the fingerprint information B is retained. For each piece of fingerprint information in the inverted index table, the method can be used for calculating the representativeness of each piece of fingerprint information, deleting all pieces of fingerprint information lower than expected, enabling the inverted index table to be more simplified, and saving storage space.

In one embodiment, if the inverse text frequency of the fingerprint information of the target audio is less than a preset threshold, determining that the fingerprint information representative of the target audio is lower than expected, where the preset threshold is determined according to the amount of information included in the inverted index table and/or the number of fingerprint information included in the inverted index table.

If the inverse text frequency of the fingerprint information of the target audio is smaller than a preset threshold value, which indicates that the representative degree of the fingerprint information of the target audio is lower, determining that the representative degree of the fingerprint information of the target audio is lower than the expected value. The preset threshold is determined according to the information amount included in the inverted index table and/or the number of fingerprint information included in the inverted index table, for example, the more the information amount included in the inverted index table is, and/or the more the number of fingerprint information included in the inverted index table is, the smaller the preset threshold can be set to delete a large number of fingerprint information with lower representativeness; the smaller the amount of information included in the inverted index table and/or the smaller the number of fingerprint information included in the inverted index table, the larger the preset threshold may be set to delete a small amount of fingerprint information with lower representativeness.

In one embodiment, assuming that M pieces of audio are included in the inverted index table, the number of audio pieces in the inverted index table, in which fingerprint information matches with fingerprint information a of the target audio, is V, and the inverse text frequency of fingerprint information a of the target audio is f, the inverse text frequency of fingerprint information a of the target audio may be expressed by the following formula (1).

f＝log10 ^(M/V) (1)

In one embodiment, the inverted index table is loaded into an objective function of a memory, an audio query instruction is received, the audio query instruction includes an audio segment, fingerprint information of the audio segment is obtained, and the objective function is executed to retrieve audio associated with the fingerprint information of the audio segment according to the inverted index table.

In order to ensure the real-time performance of the retrieval, the electronic device may load the inverted index table into the memory, specifically, the electronic device may load the inverted index table into an objective function of the memory, where the objective function may be a function for retrieving audio, and the objective function may be a remote procedure call function. When a query instruction is received, extracting fingerprint information of an audio segment carried by the query instruction, and executing the objective function to retrieve audio associated with the fingerprint information of the audio from an audio database according to the inverted index table. The audio frequency wanted by the user can be searched out through the inverted index table, so that the searching efficiency is improved.

In one embodiment, the representative degree of the fingerprint information of the target audio may be the number of matches of the fingerprint information of the target audio, and the electronic device may obtain the number of matches of the fingerprint information of the target audio from the history query record; the history inquiry records comprise fingerprint information of a plurality of audios and the matching times of the fingerprint information of each audio, wherein the matching times of the fingerprint information of the audios refer to the matching times of the fingerprint information of the audio and the fingerprint information of an audio segment carried in an inquiry instruction. If the matching times of the fingerprint information of the target audio are more, the user is better to search the audio by adopting the fingerprint information of the target audio, and the information representative degree of the target audio is higher, the fingerprint information utilization value of the target audio is higher; if the matching times of the fingerprint information of the target audio are less, the user does not like to search the audio by adopting the fingerprint information of the target audio, and the information representative degree of the target audio is lower, the fingerprint information utilization value of the target audio is lower. Therefore, when the matching number of the fingerprint information of the target audio is smaller than the preset number, the fingerprint information of the target audio is deleted from the inverted index table.

In another embodiment, the representativeness of the fingerprint information of the target audio may include a matching number of the fingerprint information of the target audio and an inverse text frequency of the fingerprint information, the electronic device may perform weighted summation between the matching number of the fingerprint information of the target audio and the inverse text frequency of the fingerprint information to obtain a representativeness sum, and if the representativeness sum is smaller than a preset value, it indicates that the representativeness of the fingerprint information of the target audio is lower, and the fingerprint information of the target audio may be deleted. For example, assuming that the information representative sum of the target audio is D, the inverse text frequency of the fingerprint information is f, the weight thereof is K1, the number of matches of the fingerprint information is S, and the weight thereof is K2, the information representative sum of the target audio may be expressed by the following formula (2). Wherein, since the inverse text frequency of the fingerprint information of the target audio is related to the searching efficiency, the matching times of the fingerprint information of the target audio is related to the searching habit preference of the user, the electronic device can set the weight of the inverse text frequency of the fingerprint information and the matching times of the fingerprint information according to the requirement of the user. If the weight of the reverse text frequency of the fingerprint information is larger, the filtered reverse index table can realize high-efficiency retrieval; if the matching times of the fingerprint information are more weighted, the filtered inverted index table can be more suitable for the preference of the user retrieval.

D＝f·k1+S·k2(2)

In the embodiment of the invention, the representativeness of the fingerprint information of the target audio is obtained, and when the representativeness of the fingerprint information of the target audio is lower than expected, the fingerprint information of the target audio is deleted from the inverted index table; the fingerprint information in the inverted index table can be screened through the fingerprint information representativeness of the target audio so as to retain the fingerprint information with higher representativeness and delete the fingerprint information with lower representativeness, so that the storage space can be saved, the resource consumption of the electronic equipment can be reduced, and the inverted index table is more simplified. Meanwhile, the retrieval performance of the fingerprint information with low representativeness is poor, so that the retrieval performance is not affected by deleting the fingerprint information; instead, the retrieval performance is higher by retrieving the fingerprint information with higher representativeness, and the retrieval performance can be improved to a certain extent by screening the inverted index table.

Based on the above description, the embodiment of the present invention provides a schematic structural diagram of an audio processing apparatus, where the audio processing apparatus may be operated on an electronic device, and the electronic device may include a smart phone, a smart watch, a computer, or the like. As shown in fig. 3, the apparatus includes:

an extraction unit 301 for extracting an audio fingerprint of the target audio.

An obtaining unit 302, configured to obtain an inverted index table, where the inverted index table includes the target audio and fingerprint information of the target audio, and the fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio; and acquiring the fingerprint information representativeness of the target audio according to the fingerprint information of the target audio, wherein the fingerprint information representativeness of the target audio is the inverse text frequency of the fingerprint information of the target audio, and the inverse text frequency is inversely proportional to the number of the matched audio.

And a deleting unit 303, configured to delete the fingerprint information of the target audio from the inverted index table if the fingerprint information representative of the target audio is lower than expected.

Optionally, the extracting unit 301 is configured to perform time-frequency transformation on the target audio to obtain frequency domain information of the target audio; acquiring an energy matrix of the target audio according to the frequency domain information of the target audio; and determining the audio fingerprint of the target audio according to the energy matrix of the target audio.

Optionally, the number of the matching audio is the number of the audio that the fingerprint information in the inverted index table matches the fingerprint information of the target audio, the inverted index table includes a plurality of audio, and the target audio is any audio in the plurality of audio.

Optionally, the obtaining unit 302 is configured to count the total number of audio frequencies included in the inverted index table, and the number of audio frequencies that match the fingerprint information in the inverted index table with the fingerprint information of the target audio frequency; calculating a ratio between the total number of audio and the number of matched audio; and determining the inverse text frequency of the fingerprint information of the target audio according to the ratio.

Optionally, the determining unit 304 is configured to determine that the representative degree of the fingerprint information of the target audio is lower than the expected degree if the inverse text frequency of the fingerprint information of the target audio is smaller than a preset threshold, where the preset threshold is determined according to the information amount included in the inverted index table and/or the number of fingerprint information included in the inverted index table.

Optionally, the inverted index table further includes location information of the fingerprint information of the target audio in the target audio, and/or frequency of occurrence of the fingerprint information of the target audio in the target audio.

Optionally, the query unit 305 is configured to load the inverted index table into an objective function of the memory; receiving an audio query instruction, wherein the audio query instruction comprises an audio segment; acquiring fingerprint information of the audio segment; and executing the objective function to retrieve the audio associated with the fingerprint information of the audio piece according to the inverted index table.

Referring to fig. 4, a schematic structural diagram of an electronic device according to an embodiment of the present invention is shown, where the electronic device 1000 includes: processor 1001, user interface 1003, network interface 1004, and storage device 1005 are connected to each other via bus 1002, and processor 1001, user interface 1003, network interface 1004, and storage device 1005 are connected to each other via a bus 1002.

A user interface 1003 for enabling human-machine interaction, which may include a display screen or keyboard, etc. A network interface 1004 for communication connection with an external device. A storage device 1005 is coupled to the processor 1001 for storing various software programs and/or sets of instructions. In particular implementations, storage 1005 may include high-speed random access memory, and may also include non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid state storage devices. The storage 1005 may store an operating system (hereinafter referred to as a system), such as an operating system like ANDROID, IOS, or LINUX. The storage 1005 may also store a network communication program that may be used to communicate with one or more additional devices, one or more terminal devices, and one or more network devices. The storage 1005 may also store a user interface program that can display the content image of the application program realistically through a graphical operation interface, and receive the control operation of the user on the application program through input controls such as menus, dialog boxes, buttons, and the like. The storage 1005 may also store one or more applications, such as an audio processing application, for filtering the inverted index table.

In one embodiment, the storage 1005 may also be used to store one or more program instructions; the processor 1001 may be capable of executing an audio processing method when invoking the one or more program instructions to extract a chorus, and specifically, the processor 1001 invokes the program instructions to perform the following steps:

extracting an audio fingerprint of the target audio;

Alternatively, the processor 1001 may call the program instruction to perform the following steps:

performing time-frequency transformation on the target audio to obtain frequency domain information of the target audio;

acquiring an energy matrix of the target audio according to the frequency domain information of the target audio;

and determining the audio fingerprint of the target audio according to the energy matrix of the target audio.

counting the total number of the audios included in the inverted index table and the number of the audios matched with the fingerprint information of the target audio by the fingerprint information in the inverted index table;

calculating a ratio between the total number of audio and the number of matched audio;

and determining the inverse text frequency of the fingerprint information of the target audio according to the ratio.

and if the inverse text frequency of the fingerprint information of the target audio is smaller than a preset threshold, determining that the fingerprint information representativeness of the target audio is lower than expected, wherein the preset threshold is determined according to the information amount included in the inverted index table and/or the number of the fingerprint information included in the inverted index table.

loading the inverted index table into an objective function of a memory;

receiving an audio query instruction, wherein the audio query instruction comprises an audio segment;

acquiring fingerprint information of the audio segment;

and executing the objective function to retrieve the audio associated with the fingerprint information of the audio piece according to the inverted index table.

In the embodiment of the invention, the representativeness of the fingerprint information of the target audio is obtained, and when the representativeness of the fingerprint information of the target audio is lower than expected, the fingerprint information of the target audio is deleted from the inverted index table; the fingerprint information in the inverted index table can be screened through the fingerprint information representativeness of the target audio so as to retain the fingerprint information with higher representativeness and delete the fingerprint information with lower representativeness, so that the storage space can be saved, the resource consumption of the electronic equipment can be reduced, and the inverted index table is more simplified. Meanwhile, the retrieval performance of the fingerprint information with low representativeness is poor, so that the retrieval performance is not affected by deleting the fingerprint information; instead, the retrieval performance is higher by retrieving fingerprint information with higher representativeness.

In one embodiment, the processor 1001 may be configured to read and execute computer instructions to implement an audio processing method as described in fig. 1 or fig. 2 of the present application. The principle of solving the problem of the electronic device provided in the embodiment of the present invention is similar to that of the method embodiment described in fig. 1 and fig. 2, so that the implementation and beneficial effects of the electronic device can be referred to the implementation and beneficial effects of the method embodiment, and the repetition is omitted.

The embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, and the implementation and beneficial effects of the program for solving the problem may be referred to the implementation and beneficial effects of an audio processing method described in the foregoing fig. 1 and fig. 2, and the repetition is omitted.

The above disclosure is illustrative only of some embodiments of the invention and is not intended to limit the scope of the invention, which is defined by the claims and their equivalents.

Claims

1. An audio processing method, comprising:

performing time-frequency transformation on target audio to obtain frequency domain information of the target audio;

detecting an energy matrix of the target audio, and determining the detected local maximum energy value as an audio fingerprint of the target audio;

determining the fingerprint information representativeness of the target audio according to the ratio and the historical query record; the fingerprint information representative degree of the target audio comprises the matching times of the fingerprint information of the target audio and the inverse text frequency of the fingerprint information of the target audio, wherein the inverse text frequency is inversely proportional to the number of the matching audio; the number of the matched audios is the number of audios matched with the fingerprint information of the target audio and the fingerprint information in the inverted index table; the matching times of the fingerprint information of the target audio refer to the matching times of the fingerprint information reflecting the target audio in the history query record and the fingerprint information of the audio segment carried in the query instruction;

carrying out weighted summation on the matching times of the fingerprint information of the target audio and the inverse text frequency of the fingerprint information to obtain a representative sum;

and if the representative sum is smaller than a preset value, deleting the fingerprint information of the target audio from the inverted index table.

2. The method of claim 1, wherein the number of matching audio is a number of audio in which fingerprint information in the inverted index table matches fingerprint information of the target audio, the inverted index table includes a plurality of audio, and the target audio is any one of the plurality of audio.

3. The method of claim 1, wherein the method further comprises:

4. The method of claim 1, wherein the inverted index table further comprises location information of the fingerprint information of the target audio in the target audio and/or a frequency of occurrence of the fingerprint information of the target audio in the target audio.

5. The method of claim 1, wherein the method further comprises:

loading the inverted index table into an objective function of a memory;

acquiring fingerprint information of the audio segment;

6. An audio processing apparatus, comprising:

the extraction unit is used for performing time-frequency conversion on the target audio to obtain frequency domain information of the target audio; acquiring an energy matrix of the target audio according to the frequency domain information of the target audio; detecting an energy matrix of the target audio, and determining the detected local maximum energy value as an audio fingerprint of the target audio;

the acquisition unit is used for acquiring an inverted index table, wherein the inverted index table comprises the target audio and fingerprint information of the target audio, and the fingerprint information of the target audio is a hash value of an audio fingerprint of the target audio; counting the total number of the audios included in the inverted index table and the number of the audios matched with the fingerprint information of the target audio by the fingerprint information in the inverted index table; calculating a ratio between the total number of audio and the number of matched audio; determining the fingerprint information representativeness of the target audio according to the ratio and the historical query record, wherein the fingerprint information representativeness of the target audio comprises the matching times of the fingerprint information of the target audio and the inverse text frequency of the fingerprint information of the target audio, and the inverse text frequency is inversely proportional to the number of the matching audio; the number of the matched audios is the number of audios matched with the fingerprint information of the target audio and the fingerprint information in the inverted index table;

the deleting unit is used for the matching times of the fingerprint information of the target audio frequency to refer to the matching times of the fingerprint information reflecting the target audio frequency in the history inquiry record and the fingerprint information of the audio frequency segment carried in the inquiry instruction; carrying out weighted summation on the matching times of the fingerprint information of the target audio and the inverse text frequency of the fingerprint information to obtain a representative sum; and if the representative sum is smaller than a preset value, deleting the fingerprint information of the target audio from the inverted index table.

7. An electronic device, the electronic device comprising:

a processor adapted to implement one or more instructions; the method comprises the steps of,

a computer readable storage medium storing one or more instructions adapted to be loaded by a processor and to perform the audio processing method of any of claims 1-5.

8. A computer readable storage medium storing one or more instructions adapted to be loaded by a processor and to perform the audio processing method of any of claims 1-5.