WO2022161291A1 - Audio search method and apparatus, computer device, and storage medium - Google Patents

Audio search method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022161291A1
WO2022161291A1 PCT/CN2022/073291 CN2022073291W WO2022161291A1 WO 2022161291 A1 WO2022161291 A1 WO 2022161291A1 CN 2022073291 W CN2022073291 W CN 2022073291W WO 2022161291 A1 WO2022161291 A1 WO 2022161291A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
hash
hash feature
feature
audio
Prior art date
Application number
PCT/CN2022/073291
Other languages
French (fr)
Chinese (zh)
Inventor
吕镇光
Original Assignee
百果园技术(新加坡)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2022161291A1 publication Critical patent/WO2022161291A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings

Definitions

  • the embodiments of the present application relate to the technical field of audio processing, for example, to an audio search method, apparatus, computer device, and storage medium.
  • multimedia data such as making short videos, humming songs, recordings, etc.
  • multimedia data in the Internet grow rapidly, and audio data also followed by rapid growth.
  • the audio data is compared to determine whether the audio data is the same or similar.
  • the audio data is usually sorted by a queuing system, and then the audio data is compared in order.
  • the baseline method is usually used, that is, the audio data has no specific reference standard when sorting, and the audio data is compared one by one.
  • the accuracy rate is high, it occupies a lot of resources. Time consuming is high, resulting in low overall efficiency.
  • the embodiments of the present application propose an audio search method, apparatus, computer equipment, and storage medium, so as to solve the problem of how to improve the efficiency of comparison while maintaining the accuracy of comparison audio data.
  • an embodiment of the present application provides an audio search method, including:
  • the first hash feature is calculated for the first audio data
  • the second hash feature is calculated for a plurality of the second audio data
  • the first hash feature is compared with a plurality of the second hash features in the order to find the second audio data that is the same as or similar to the first audio data.
  • the embodiment of the present application also provides an audio search method, including:
  • the first hash feature is compared with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data that is the same as the first audio data or similar;
  • the first audio data is determined to be illegal in response to second audio data being the same as or similar to the first audio data in the plurality of second audio data.
  • an embodiment of the present application also provides an audio search device, including:
  • an audio data determination module configured to determine the first audio data and a plurality of second audio data
  • a hash feature calculation module configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
  • an order determination module configured to determine the order in which the plurality of second audio data are arranged according to the density of the plurality of second hash features
  • a hash feature comparison module configured to compare the first hash feature with a plurality of the second hash features in the order to find the second hash features that are the same as or similar to the first audio data audio data.
  • an embodiment of the present application also provides an audio search device, including:
  • an audio data receiving module configured to receive the first audio data uploaded by the client, and calculate a first hash feature for the first audio data
  • the blacklist search module is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
  • an order determination module configured to determine the order in which the plurality of second audio data are arranged according to the density of the plurality of second hash features
  • a hash feature comparison module configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data the same or similar to the first audio data;
  • the illegal audio determination module is configured to determine that the first audio data is illegal in response to the presence of second audio data in the plurality of second audio data that is identical to or similar to the first audio data.
  • an embodiment of the present application further provides a computer device, the computer device comprising:
  • memory arranged to store at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the audio search method according to the first aspect or the second aspect.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the implementation of the first or second aspect is implemented The audio search method described above.
  • FIG. 1 is a flowchart of an audio search method provided in Embodiment 1 of the present application.
  • FIG. 2 is an example diagram of calculating the density of the second hash feature according to Embodiment 1 of the present application;
  • 3A is an example diagram of a short audio search provided in Embodiment 1 of the present application.
  • 3B is an example diagram of a long audio search provided in Embodiment 1 of the present application.
  • FIG. 5 is a schematic structural diagram of an audio search apparatus according to Embodiment 3 of the present application.
  • FIG. 6 is a schematic structural diagram of an audio search apparatus according to Embodiment 4 of the present application.
  • FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application.
  • FIG. 1 is a flowchart of an audio search method provided in Embodiment 1 of the application. This embodiment is applicable to sorting and comparing audio data according to the density of the hash feature of the audio data.
  • the method can be performed by an audio search device.
  • the audio search apparatus can be implemented by software and/or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, etc., including the following steps:
  • Step 101 Determine first audio data and a plurality of second audio data.
  • the first audio data and the plurality of second audio data are audio data
  • the audio data can be in the form of songs released by singers, audio data separated from video data such as short videos, movies, and TV dramas,
  • the format of the audio data may include MP3, WMA, and AAC, which is not limited in this embodiment.
  • the plurality of second audio data are pre-collected audio data in various ways, for example, the user uploads the audio data, purchases the audio data from the copyright owner, the technician records the audio data, and uses the crawler client to crawl from the network.
  • Audio data, etc. a plurality of second audio data can form an audio library, and search services can be provided to the outside, the first audio data is the audio data to be searched, that is, the audio library is searched for the same or similar to the first audio data. the second audio data.
  • the same or similar in this embodiment may refer to the first audio data and the second audio data being the same or similar in whole or in part.
  • Step 102 Calculate a first hash feature for the first audio data and calculate a second hash feature for a plurality of second audio data, respectively.
  • a hash feature (hash, also known as hash feature, fingerprint) can be calculated for it to be used as the feature of the first audio data.
  • the hash feature is recorded as the first hash feature .
  • a hash feature (hash, also known as hash feature, fingerprint) can be calculated for it to be used as the feature of the second audio data.
  • hash feature is recorded as the second hash feature .
  • the methods of calculating the first hash feature and calculating the second hash feature are the same, that is, the first hash feature is calculated for the first audio data and the second hash feature is calculated for multiple second audio data based on the same method. Hi feature.
  • step 102 may include the following steps:
  • Step 1021 Convert the first audio data into a first spectrogram.
  • the first audio data may be converted by means of Fourier transform (Discrete Fourier Transform, DFT), short-time Fourier transform (short-time Fourier transform, or short-term Fourier transform, STFT), etc.
  • DFT Discrete Fourier Transform
  • short-time Fourier transform short-time Fourier transform
  • STFT short-term Fourier transform
  • the horizontal axis of the spectrogram is time and the vertical axis is frequency, so that the first audio data is converted from a time-domain signal to a frequency-domain signal.
  • the spectrogram is denoted as the first spectrogram.
  • a data block also known as a window
  • the plurality of first data blocks are respectively converted into frequency domain signals, so that time information is preserved to a certain extent.
  • the parameters of the first audio data are two-channel, 16-bit precision, and 44100 Hz sampling.
  • the data size of 1s is 441002byte2 channel ⁇ 176kB. If 4kB is selected as the size of the data block, Fourier transform is performed on 44 blocks of data every second, and such a segmentation density can meet the requirements.
  • Step 1022 Search for a first key point on multiple spectral bands of the first spectrogram according to the energy.
  • the frequency span with the larger amplitude of the first audio data may be very wide, and may appear from low C (32.70 Hz) to high C (4186.01 Hz).
  • the first spectrogram may be divided into a plurality of spectral bands (also called sub-bands).
  • Select key points, frequency peaks from each subband for example, select the following subbands: 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz for the bass subband (bass guitars and other instruments will have a bass subband at the fundamental frequency) , the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequencies of vocals and most other instruments appear in these two subbands).
  • a key point can be selected according to the energy, which is recorded as the first key point for the convenience of distinction.
  • the point with the highest frequency (ie, the highest energy) in each subband can be selected as the first key point.
  • Step 1023 Generate a first hash feature of the first audio data based on the first key point.
  • the first key point of each data block constitutes the signature of this frame of audio data, and the signatures of different data blocks constitute the first hash feature of the entire first audio data.
  • the first hash feature of the first audio data may be cached in the memory, waiting to be compared with the second hash feature of the second audio data.
  • Step 1024 Convert the second audio data into a second spectrogram.
  • the second audio data can be converted into a spectrogram by means of Fourier transform, short-time Fourier transform, etc.
  • the horizontal axis of the spectrogram is time, and the vertical axis is frequency, so that the second audio data is converted into a spectrogram. Converted from a time domain signal to a frequency domain signal, the spectrogram is denoted as the second spectrogram for the convenience of distinction.
  • a data block also known as a window
  • the data blocks are converted to frequency domain signals separately, which preserves time information to a certain extent.
  • Step 1025 Search for a second key point on multiple spectral bands of the second spectrogram according to the energy.
  • the frequency span with the larger amplitude of the second audio data may be very wide, and may appear from the bass C (32.70 Hz) to the high C (4186.01 Hz).
  • the second spectrogram may be divided into a plurality of spectral bands (also called sub-bands).
  • Select key points, frequency peaks from each subband for example, select the following subbands: 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz for the bass subband (bass guitars and other instruments will have a bass subband at the fundamental frequency) , the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequencies of vocals and most other instruments appear in these two subbands).
  • a key point can be selected according to the energy, which is recorded as the second key point for the convenience of distinction.
  • the point with the highest frequency (ie, the highest energy) in each subband can be selected as the second key point.
  • Step 1026 Generate a second hash feature of the second audio data based on the second key point.
  • the second key point of each data block constitutes the signature of this frame of audio data, and the signatures of different data blocks constitute the second hash feature of the entire second audio data.
  • the second hash feature of the second audio data can be stored as a key for retrieving the hash table.
  • the second hash feature is usually used as the key value of the hash table, and the part pointed to by the key value The time when the second hash feature appears in the second audio data and the ID of the second audio data are included.
  • the second hash feature (Hash Tag) Time in Seconds Second audio data (Song) 30 51 99 121 195 53.52 Song A 33 56 92 151 185 12.32 Song B 39 26 89 141 251 15.34 Song C 32 67 100 128 270 78.43 Song D 30 51 99 121 195 10.89 Song E 34 57 95 111 200 54.52 Song A 34 41 93 161 202 11.89 Song E
  • the above method for calculating the first hash feature and the second hash feature is only an example.
  • other methods for calculating the first hash feature and the second hash feature may be set according to the actual situation. This embodiment of the present application does not limit this.
  • those skilled in the art can also adopt other methods for calculating the first hash feature and the second hash feature according to actual needs. This is also not restricted.
  • Step 103 Determine the order of arrangement among the plurality of second audio data according to the density of the plurality of second hash features.
  • the comparison accuracy of the hash features is higher, and when the hash features are sparse, the comparison accuracy of the hash features is lower, and it is easy to combine different or dissimilar audio data, considered to be the same or similar audio data.
  • the second hash feature statistical density (Density) of the second audio data can be used to represent the density of the second hash feature, and in the queuing system (Queuing System), the second audio data The density of the two hash features is used as a threshold, and the plurality of second audio data are sorted according to the density of the second hash features of the second audio data, so as to determine the order among the plurality of second audio data.
  • step 103 includes the following steps:
  • Step 1031 Count the number of overlapping second hash features in multiple local regions.
  • the second audio data can be divided into a plurality of local areas of the same size, and for each local area, the number of overlapping second hash features in the local area can be counted separately, and the local area is used as the unit area, the data can be regarded as local density.
  • a second spectrogram of the second audio data may be obtained, where the second spectrogram is a spectrogram obtained after converting the second audio data from time domain information to frequency domain information, and the second hash feature may be marked in on the second spectrogram.
  • Multiple windows of the same size are added to the second spectrogram to represent the range of multiple local regions, so that the number of second hash features is counted in multiple windows, and the second hash feature is used as the second hash feature in The number of multiple local regions.
  • the number of local regions (that is, the local density) is expressed as follows:
  • i is the number of overlapping second hash features within the window (ie, t to t+k).
  • a preset window may be searched, and a window may be added to the second spectrogram at preset time intervals, thereby dividing the second spectrogram into multiple local regions.
  • the width of the window is equal to the length of the preset time, that is, there is no overlap between two adjacent windows, which reduces the calculation amount of the second hash feature.
  • the width of the window is smaller than the preset time length, that is, a partial overlap between two adjacent windows can improve the accuracy of the second hash feature.
  • Step 1032 Generate the density of the second hash feature in the second audio data to which it belongs based on the number of overlaps in the multiple local regions.
  • the number of overlapping second hash features in multiple local regions may be used as a reference to generate the number of overlapping second hash features in the second audio data. density.
  • the number of overlaps in a plurality of partial regions may be compared, and if the number of overlaps in a certain partial region is the largest, the number of overlaps in the partial region with the largest number of overlaps is determined as the number of overlaps in the second hash feature to which it belongs. Density in the second audio data.
  • max is the function of taking the maximum value.
  • a window 201 , a window 202 , a window 203 , a window 204 , a window 205 , a window 206 , and a window 207 are added to the second spectrogram of a certain second hash feature, wherein the window 203
  • the number of overlaps of the second hash features in the window 203 is the highest, therefore, the number of overlaps of the second hash features in the window 203 can be selected as the density of the second hash features in the second audio data.
  • the above method for calculating the density of the second hash feature is only an example.
  • other methods for calculating the density of the second hash feature may be set according to actual conditions. Sort the number from large to small, take the number of overlaps in the j (j is a positive integer) local area before sorting 1 and calculate the average value as the density of the second hash feature in the second audio data, the embodiment of the present application This is not restricted.
  • those skilled in the art may also adopt other methods for calculating the density of the second hash feature according to actual needs, which are not limited in this embodiment of the present application.
  • Step 1033 Sort the plurality of second audio data in descending order according to the density to obtain the order of the plurality of second audio data.
  • the plurality of second audio data may be sorted in descending order according to the density, so as to determine the order of each second audio data, that is, the second The higher the density of the hash features is, the higher the order of the second audio data is; otherwise, the lower the density of the second hash features is, the lower the sequence of the second audio data is.
  • Step 104 Compare the first hash feature with a plurality of second hash features in order to find second audio data that is the same as or similar to the first audio data.
  • the second hash feature of the second audio data may be sequentially compared with the first hash feature of the first audio data according to the order in which the second audio data is arranged, so as to determine the difference between the first audio data and the first hash feature of the first audio data. Whether the two audio data are the same or similar.
  • the difference between the second hash feature of the second audio data and the first hash feature of the first audio data is large, it can be considered that the second audio data is different from the first audio data The similarity between them is low, the first hash feature does not match the second hash feature, and the search continues for the next second audio data.
  • the search can be stopped.
  • a target position may be determined, where the target position is used to represent the quantity of the second audio data to be compared, and the target position is generally much smaller than the quantity of the second audio data.
  • first hash feature matches the second hash feature, it is determined that the first audio data and the second audio data to which the second hash feature belongs are identical or similar.
  • the first audio data and a plurality of second audio data are determined, a first hash feature is calculated for the first audio data, and a second hash feature is calculated for a plurality of second audio data, respectively, according to the plurality of
  • the density of the second hash feature determines the order in which the plurality of second audio data are arranged, and the first hash feature is compared with the plurality of second hash features in order to find the first audio data that is the same or similar to the first audio data.
  • denser hash features can improve the accuracy of comparison, adjust the sorting of audio data through the density of hash features, improve the probability of searching for the same or similar audio data in the process of priority comparison, thereby reducing the In the case of the number of comparisons, the accuracy of searching for audio data is improved.
  • the first audio data is compared with the second audio data one by one, and the matching second audio data is a coincidental event.
  • the process of searching for the second audio data matching the first audio data consumes a lot of time, and the time complexity is O(N).
  • the queue system A arranges the second audio data according to the absolute number (Absolute Matches) of the second hash feature.
  • the second audio data is placed in a queue, where the second audio data at the front of the queue are most likely to be the best match and those at the back of the queue are less likely to be the correct match.
  • the queue system A can provide a stop criterion. If the first m second audio data in the queue are compared and no second audio data matching the first audio data is found, the search can be stopped, and the search result is generated as There is no second audio data matching the first audio data.
  • n is a positive integer, and m ⁇ N (m is much smaller than N).
  • the time complexity of the queue system A is O(m), and O(m) ⁇ O(N).
  • the queuing system A saves time, it is only effective when the plurality of second audio data have the same duration, and when the duration of the plurality of second audio data has a large deviation, the accuracy will decrease.
  • the duration of the second audio data A is 2 minutes
  • the duration of the second audio data B is 30 minutes
  • the second audio data B may only be due to the duration is so long that the number of second hash features of the second audio data B is greater than the number of second hash features of the second audio data A, so that the second audio data B is at the front of the queue, and the second audio data A is at the back of the queue.
  • the queue system B normalizes the duration of the second audio data (Normalised by Duration) by dividing by the duration to queue the second audio data.
  • This embodiment provides a queue system C, which performs normalization according to the density of the second hash feature, and sorts according to the density of the second hash feature, so that the difference between the absolute number of the second hash feature and the over-normalization duration is trade-offs were made.
  • the second audio data are song A (Song A) and song B (Song B) respectively, the duration of song A is less than the duration of song B, and it is assumed that the given second audio data matching the first audio data is song A.
  • the second hash feature is marked on the second spectrogram of song A and the second spectrogram of song B, respectively, and the following data are counted on them:
  • the absolute number of second hash features in song A (727) is less than the absolute number of second hash features in song B (913), so song A ranks after song B.
  • Song A's normalized duration (0.198) is greater than Song B's normalized duration (0.033), so Song A ranks ahead of Song B.
  • the density of the second hash feature in song A (0.266) is greater than the density of the second hash feature in song B (0.067), so song A ranks ahead of song B.
  • the second audio data are song A (Song A) and song B (Song B) respectively, the duration of song A is shorter than the duration of song B, and it is assumed that the given second audio data matching the first audio data is song B.
  • the second hash feature is marked on the second spectrogram of song A and the second spectrogram of song B respectively, and the following data are counted on them:
  • the absolute number of second hash features in song A (347) is less than the absolute number of second hash features in song B (2481), so song A ranks after song B.
  • Song A's normalized duration (0.094) is greater than Song B's normalized duration (0.090), so Song A ranks ahead of Song B.
  • the density of the second hash feature in song A (0.127) is less than the density of the second hash feature in song B (0.182), so song A ranks after song B.
  • the query matching song B has a higher density area, the duration of song B is longer, the absolute number of second hash features is greater than that of song A, and queue system B overcompensates for the duration, although queue system B does not Scenario one (short audio search) is valid, but not for scenario two (long audio search), while queue system C is valid for both scenario one (short audio search) and scenario two (long audio search).
  • FIG. 4 is a flowchart of an audio search method provided in Embodiment 2 of the present application. This embodiment is applicable to the case where audio data is sorted and compared according to the density of the hash feature of the audio data, so as to perform content review.
  • the method may be performed by an audio search apparatus, which may be implemented in software and/or hardware, and may be configured in computer equipment, such as a server, workstation, personal computer, etc., including the following steps:
  • Step 401 Receive first audio data uploaded by a client, and calculate a first hash feature for the first audio data.
  • the computer device acts as a multimedia platform.
  • it provides users with audio-based services, such as providing users with live programs, short videos, voice conversations, video conversations, etc., and on the other hand, receives user uploads.
  • audio-carrying files such as live broadcast data, short videos, session information, and so on.
  • Audio-carrying files such as audio-carrying files that contain pornographic, vulgar, violence, etc. content, so as to release some audio-carrying files that meet the video content review standards.
  • a streaming real-time system can be set up in the multimedia platform.
  • the user uploads the audio-carrying file to the streaming real-time system in real time through the client, and the streaming real-time system can transmit the audio-carrying file to the real-time streaming system. to computer equipment used for content moderation.
  • a database such as a distributed database
  • the user uploads the audio file to the database through the client, and the computer equipment used for content review can read the data from the database.
  • a file that carries audio can be set up in the multimedia platform.
  • the first audio data may be separated from the file carrying the audio for content auditing, and for the first audio data, a hash feature may be calculated for the first audio data as the first hash feature.
  • the first audio data can be converted into a first spectrogram, a first key point can be searched on a plurality of spectral bands of the first spectrogram according to the energy, and based on the first key point A first hash feature of the first audio data is generated.
  • Step 402 look up the currently configured blacklist.
  • some audio data containing sensitive content such as pornography, vulgarity, violence, etc. may be recorded in the blacklist as second audio data.
  • the second audio data can be continuously expanded.
  • a hash feature may be calculated for the second audio data as the second hash feature.
  • the second audio data can be converted into a second spectrogram, a second key point is searched on a plurality of spectral bands of the second spectrogram according to the energy, and based on the second key point A second hash feature of the second audio data is generated.
  • a plurality of second audio data are recorded in the blacklist, and each second audio data has been configured with a second hash feature, and the second hash feature may be loaded during content review.
  • Step 403 Determine the order of arrangement among the plurality of second audio data according to the density of the plurality of second hash features.
  • the magnitude of the first audio data uploaded by the client every day can reach tens of millions or even hundreds of millions.
  • the magnitude of the first audio data belonging to the blacklist is about several thousand. , which makes the matching rate of the blacklist lower.
  • the matching rate of the blacklist is about 0.005%.
  • the multimedia platform needs a queue system with low time consumption and high precision to capture the first audio data belonging to the blacklist as much as possible.
  • the baseline method uses the first audio data to compare with all the second audio numbers in the blacklist. Although the accuracy rate is high, the time complexity is O(N) and the time-consuming is high, which is unnecessary. Because 99.995% of the first audio data does not match the second audio data, this is an inefficient search method.
  • Queue System A Queue System A, arrange the second audio data according to the absolute number of the second hash feature (Absolute Matches)
  • Queue System B Queue System B, perform the second audio data duration Normalized by Duration to arrange the second audio data
  • This embodiment proposes a queue system C that allows pruning to more accurately select second audio data in the pruning queue using the density of the second hash feature while maintaining efficiency.
  • step 403 includes the following steps:
  • Step 4031 Count the number of overlapping second hash features in multiple local regions.
  • a second spectrogram of the second audio data can be obtained; multiple windows are added on the second spectrogram; the number of the second hash features is counted in the multiple windows, as the number of the second hash features in multiple windows. the number of local regions.
  • the width of the window is less than or equal to the preset time length.
  • Step 4032 Generate the density of the second hash feature in the second audio data to which it belongs based on the number of overlaps in the multiple local regions.
  • the number of overlaps in multiple local regions can be compared; if the number of overlaps in a local region is the largest, the number of overlaps in the local region with the largest number of overlaps is determined as the second.
  • the density of the feature in the associated second audio data is determined as the second.
  • Step 4033 Sort the plurality of second audio data in descending order according to the density to obtain the order of the plurality of second audio data.
  • Step 404 Compare the first hash feature with a plurality of second hash features in order to determine whether there is second audio data in the plurality of second audio data that is identical or similar to the first audio data.
  • the target position may be determined; the first hash feature is compared with the second hash feature located before the target position in order.
  • the first hash feature matches the second hash feature, it is determined that the first audio data is the same or similar to the second audio data to which the second hash feature belongs.
  • the baseline method, queue system A, queue system B, and queue system C are tested.
  • a test set consisting of 130 blacklisted second audio data and 1000 first audio data is used, of which 800 The first audio data does not belong to the blacklist, and the 200 first audio data belong to the blacklist.
  • Queuing system B can improve accuracy relative to queuing system A, but at the expense of lowering the push rate.
  • Queue system C can provide high push rate and precision at the same time, and the time consumption is very small.
  • Step 405 If the second audio data is the same as or similar to the first audio data in the plurality of second audio data, determine that the first audio data is illegal.
  • the first audio data is not the same or similar to any second audio data in the blacklist, it can be determined that the first audio data is legal, pass the content audit, and perform other content audits according to business requirements, or , releasing the first audio data to the public.
  • the first audio data is the same as or similar to a certain second audio data in the blacklist, it can be determined that the first audio data is illegal, cannot pass the content review, and cannot be released to the public, and generates a corresponding The prompt information is sent to the client.
  • users who log in to the client can be banned, frozen, or banned.
  • the second hash feature is calculated for the second audio data
  • the second audio data is sorted based on the density of the second hash feature
  • the first hash feature and the second hash feature are basically similar to the application of the first embodiment, so the description is relatively simple, and the relevant parts can be referred to the partial description of the first embodiment, and this embodiment will not be described in detail here. .
  • the first audio data uploaded by the client is received, and the first hash feature is calculated for the first audio data; the currently configured blacklist is searched, and a plurality of second audio data are recorded in the blacklist, and a plurality of The second audio data has been configured with the second hash feature; the order of arrangement among the plurality of second audio data is determined according to the density of the plurality of second hash features; the first hash feature and the plurality of second hash features are arranged in order
  • the feature is compared to determine whether the second audio data is the same or similar to the first audio data in the plurality of second audio data; if the second audio data is the same or similar to the first audio data in the plurality of second audio data , then it is determined that the first audio data is illegal, the denser hash features can improve the accuracy of comparison, and the sorting of audio data can be adjusted by the density of hash features.
  • the probability of getting the same or similar audio data so as to reduce the number of comparisons, improve the push rate of the search audio data, and improve the
  • FIG. 5 is a structural block diagram of an audio search apparatus provided in Embodiment 3 of the present application, including the following modules:
  • the audio data determination module 501 is configured to determine the first audio data and a plurality of second audio data
  • the hash feature calculation module 502 is configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
  • the order determination module 503 is configured to determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
  • the hash feature comparison module 504 is configured to compare the first hash feature with a plurality of the second hash features in the order to find the first audio data that is the same or similar to the first audio data. 2. Audio data.
  • the audio data determination module 501 includes:
  • a first spectrogram conversion module configured to convert the first audio data into a first spectrogram
  • a first key point search module configured to search for a first key point on a plurality of frequency spectrum bands of the first spectrogram according to energy
  • a first hash feature generation module configured to generate a first hash feature of the first audio data based on the first key point
  • a second spectrogram conversion module configured to convert each second audio data into a second spectrogram
  • a second key point searching module configured to search for a second key point on a plurality of spectral bands of the second spectrogram according to energy
  • a second hash feature generation module configured to generate a second hash feature of each of the second audio data based on the second key point.
  • the ranking determining module 503 includes:
  • a local quantity statistics module set to count the overlapping quantity of each second hash feature in multiple local areas
  • a local density generation module configured to generate the density of each second hash feature in the second audio data to which it belongs based on the number of overlaps in a plurality of the local regions
  • the audio sequence determination module is configured to sort the plurality of second audio data in descending order according to the density to obtain the sequence of the plurality of second audio data.
  • the local quantity statistics module includes:
  • a spectrogram acquisition module configured to acquire a second spectrogram of the second audio data to which each second hash feature belongs
  • a window adding module configured to add multiple windows on the second spectrogram
  • the window number statistics module is configured to count the number of each second hash feature in a plurality of the windows respectively, as the number of each second hash feature in a plurality of local areas.
  • the window adding module includes:
  • Window search module set to search for preset windows
  • a time adding module configured to add the window on the second spectrogram every preset time interval.
  • the width of the window is less than or equal to the length of the preset time.
  • the local density generation module includes:
  • a quantity comparison module configured to compare the overlapping quantities in a plurality of the local regions
  • the quantity value module is set to, if the overlapping quantity in a certain local area is the largest, determine the overlapping quantity in the partial area with the largest overlapping quantity as the second audio data to which each second hash feature belongs density in .
  • the hash feature comparison module 504 includes:
  • the target position determination module is set to determine the target position
  • a partial feature comparison module configured to compare the first hash feature with the second hash feature located before the target position in the order
  • a search and determination module configured to determine that the first audio data and the second audio data to which the second hash feature belongs are identical or similar if the first hash feature matches the second hash feature.
  • the audio search apparatus provided by the embodiment of the present application can execute the audio search method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • FIG. 6 is a structural block diagram of an audio search apparatus provided in Embodiment 4 of the present application, including the following modules:
  • the audio data receiving module 601 is configured to receive the first audio data uploaded by the client, and calculate the first hash feature for the first audio data;
  • the blacklist search module 602 is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
  • an order determination module 603, configured to determine the order of arrangement among a plurality of the second audio data according to the density of the plurality of second hash features
  • Hash feature comparison module 604 configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is a second audio in the plurality of the second audio data data is the same as or similar to the first audio data;
  • the illegal audio determination module 605 is configured to determine that the first audio data is illegal if there is second audio data in the plurality of second audio data that is the same as or similar to the first audio data.
  • the audio data receiving module 601 includes:
  • a first spectrogram conversion module configured to convert the first audio data into a first spectrogram
  • a first key point search module configured to search for a first key point on a plurality of frequency spectrum bands of the first spectrogram according to energy
  • a first hash feature generation module configured to generate a first hash feature of the first audio data based on the first key point.
  • a second spectrogram conversion module configured to convert each second audio data into a second spectrogram
  • a second key point searching module configured to search for a second key point on a plurality of spectral bands of the second spectrogram according to energy
  • a second hash feature generation module configured to generate a second hash feature of each of the second audio data based on the second key point.
  • the ranking determining module 603 includes:
  • a local quantity statistics module set to count the overlapped quantity of each second hash feature in multiple local areas
  • a local density generation module configured to generate the density of each second hash feature in the second audio data to which it belongs based on the number of overlaps in a plurality of the local regions
  • the audio sequence determination module is configured to sort the plurality of second audio data in descending order according to the density to obtain the sequence of the plurality of second audio data.
  • the local quantity statistics module includes:
  • a spectrogram acquisition module configured to acquire a second spectrogram of the second audio data to which each second hash feature belongs
  • a window adding module configured to add multiple windows on the second spectrogram
  • the window number statistics module is configured to count the number of each second hash feature in a plurality of the windows respectively, as the number of each second hash feature in a plurality of local areas.
  • the window adding module includes:
  • Window search module set to search for preset windows
  • a time adding module configured to add the window on the second spectrogram every preset time interval.
  • the width of the window is less than or equal to the length of the preset time.
  • the local density generation module includes:
  • a quantity comparison module configured to compare the overlapping quantities in a plurality of the local regions
  • the quantity value module is set to, if the overlapping quantity in a certain local area is the largest, determine the overlapping quantity in the partial area with the largest overlapping quantity as the second audio data to which each second hash feature belongs density in .
  • the hash feature comparison module 604 includes:
  • the target position determination module is set to determine the target position
  • a partial feature comparison module configured to compare the first hash feature with the second hash feature located before the target position in the order
  • a search and determination module configured to determine that the first audio data and the second audio data to which the second hash feature belongs are identical or similar if the first hash feature matches the second hash feature.
  • the audio search apparatus provided by the embodiment of the present application can execute the audio search method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • the fifth embodiment of the present application provides a computer device, in which the audio search apparatus provided by any one of the embodiments of the present application can be integrated.
  • FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application.
  • the computer device includes at least one processor 701 and a memory 702, and the memory 702 is configured to store at least one program.
  • the at least one program is executed by the at least one processor 701, the at least one processor 701 implements the description in any embodiment of the present application. audio search method.
  • Embodiment 6 of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above audio search method is implemented. Repeat.

Abstract

Embodiments of the present application provide an audio search method and apparatus, a computer device, and a storage medium. The method comprises: determining first audio data and a plurality of pieces of second audio data; calculating a first hash feature for the first audio data and calculating second hash features for the plurality of pieces of second audio data, respectively; determining a sequence of arrangement of the plurality of pieces of second audio data according to densities of the plurality of second hash features; and comparing the first hash feature with the plurality of second hash features according to the sequence to search for second audio data the same as or similar to the first audio data.

Description

一种音频搜索方法、装置、计算机设备和存储介质A kind of audio search method, apparatus, computer equipment and storage medium
本申请要求在2021年1月28日提交中国专利局、申请号为202110119351.4的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application No. 202110119351.4 filed with the China Patent Office on January 28, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及音频处理的技术领域,例如涉及一种音频搜索方法、装置、计算机设备和存储介质。The embodiments of the present application relate to the technical field of audio processing, for example, to an audio search method, apparatus, computer device, and storage medium.
背景技术Background technique
随着互联网的飞速发展,尤其是移动终端的广泛普及,用户可以方便地制作多媒体数据,例如,制作短视频、哼唱歌曲、录音,等等,使得互联网中的多媒体数据快速增长,音频数据也随之快速增长。With the rapid development of the Internet, especially the widespread popularity of mobile terminals, users can easily create multimedia data, such as making short videos, humming songs, recordings, etc., which makes the multimedia data in the Internet grow rapidly, and audio data also followed by rapid growth.
在歌曲搜索、语音内容审核等业务场景中,会对音频数据进行比对,判断音频数据是否相同或相似。In business scenarios such as song search and voice content review, the audio data is compared to determine whether the audio data is the same or similar.
由于音频数据的数量众多,通常通过队列系统(Queuing System)对音频数据进行排序,再按照顺序对比音频数据。Due to the large amount of audio data, the audio data is usually sorted by a queuing system, and then the audio data is compared in order.
在队列系统(Queuing System)中,通常使用基线方法(baseline method),即音频数据在排序时并无具体参照的标准,逐一对音频数据进行对比,虽然准确率高,但占用的资源较多、耗时较高,导致整体的效率较低。In the queuing system, the baseline method is usually used, that is, the audio data has no specific reference standard when sorting, and the audio data is compared one by one. Although the accuracy rate is high, it occupies a lot of resources. Time consuming is high, resulting in low overall efficiency.
发明内容SUMMARY OF THE INVENTION
本申请实施例提出了一种音频搜索方法、装置、计算机设备和存储介质,以解决在保持对比音频数据的准确性的情况下、如何提高对比的效率的问题。The embodiments of the present application propose an audio search method, apparatus, computer equipment, and storage medium, so as to solve the problem of how to improve the efficiency of comparison while maintaining the accuracy of comparison audio data.
第一方面,本申请实施例提供了一种音频搜索方法,包括:In a first aspect, an embodiment of the present application provides an audio search method, including:
确定第一音频数据、多个第二音频数据;determining first audio data, a plurality of second audio data;
分别对所述第一音频数据计算第一哈希特征、对多个所述第二音频数据计 算第二哈希特征;The first hash feature is calculated for the first audio data, the second hash feature is calculated for a plurality of the second audio data;
按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;Determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以查找与所述第一音频数据相同或相似的所述第二音频数据。The first hash feature is compared with a plurality of the second hash features in the order to find the second audio data that is the same as or similar to the first audio data.
第二方面,本申请实施例还提供了一种音频搜索方法,包括:In a second aspect, the embodiment of the present application also provides an audio search method, including:
接收客户端上传的第一音频数据,及对所述第一音频数据计算第一哈希特征;receiving the first audio data uploaded by the client, and calculating a first hash feature for the first audio data;
查找当前配置的黑名单,所述黑名单中记录有多个第二音频数据,多个所述第二音频数据均已配置第二哈希特征;Find the currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for a plurality of the second audio data;
按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;Determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以确定多个所述第二音频数据中是否存在第二音频数据与所述第一音频数据相同或相似;The first hash feature is compared with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data that is the same as the first audio data or similar;
响应于多个所述第二音频数据中存在第二音频数据与所述第一音频数据相同或相似,确定所述第一音频数据非法。The first audio data is determined to be illegal in response to second audio data being the same as or similar to the first audio data in the plurality of second audio data.
第三方面,本申请实施例还提供了一种音频搜索装置,包括:In a third aspect, an embodiment of the present application also provides an audio search device, including:
音频数据确定模块,设置为确定第一音频数据、多个第二音频数据;an audio data determination module, configured to determine the first audio data and a plurality of second audio data;
哈希特征计算模块,设置为分别对所述第一音频数据计算第一哈希特征、对多个所述第二音频数据计算第二哈希特征;A hash feature calculation module, configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
排序确定模块,设置为按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;an order determination module, configured to determine the order in which the plurality of second audio data are arranged according to the density of the plurality of second hash features;
哈希特征对比模块,设置为按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以查找与所述第一音频数据相同或相似的所述第二音频数据。A hash feature comparison module, configured to compare the first hash feature with a plurality of the second hash features in the order to find the second hash features that are the same as or similar to the first audio data audio data.
第四方面,本申请实施例还提供了一种音频搜索装置,包括:In a fourth aspect, an embodiment of the present application also provides an audio search device, including:
音频数据接收模块,设置为接收客户端上传的第一音频数据,及对所述第一音频数据计算第一哈希特征;an audio data receiving module, configured to receive the first audio data uploaded by the client, and calculate a first hash feature for the first audio data;
黑名单查找模块,设置为查找当前配置的黑名单,所述黑名单中记录有多个第二音频数据,多个所述第二音频数据均已配置第二哈希特征;The blacklist search module is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
排序确定模块,设置为按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;an order determination module, configured to determine the order in which the plurality of second audio data are arranged according to the density of the plurality of second hash features;
哈希特征对比模块,设置为按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以确定多个所述第二音频数据中是否存在第二音频数据与所述第一音频数据相同或相似;A hash feature comparison module, configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data the same or similar to the first audio data;
非法音频确定模块,设置为响应于多个所述第二音频数据中存在第二音频数据与所述第一音频数据相同或相似,确定所述第一音频数据非法。The illegal audio determination module is configured to determine that the first audio data is illegal in response to the presence of second audio data in the plurality of second audio data that is identical to or similar to the first audio data.
第五方面,本申请实施例还提供了一种计算机设备,所述计算机设备包括:In a fifth aspect, an embodiment of the present application further provides a computer device, the computer device comprising:
至少一个处理器;at least one processor;
存储器,设置为存储至少一个程序,memory, arranged to store at least one program,
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如第一方面或第二方面所述的音频搜索方法。When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the audio search method according to the first aspect or the second aspect.
第六方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如第一方或第二方面所述的音频搜索方法。In a sixth aspect, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the implementation of the first or second aspect is implemented The audio search method described above.
附图说明Description of drawings
图1为本申请实施例一提供的一种音频搜索方法的流程图;1 is a flowchart of an audio search method provided in Embodiment 1 of the present application;
图2为本申请实施例一提供的一种计算第二哈希特征的密度的示例图;FIG. 2 is an example diagram of calculating the density of the second hash feature according to Embodiment 1 of the present application;
图3A为本申请实施例一提供的一种短音频搜索的示例图;3A is an example diagram of a short audio search provided in Embodiment 1 of the present application;
图3B为本申请实施例一提供的一种长音频搜索的示例图;3B is an example diagram of a long audio search provided in Embodiment 1 of the present application;
图4是本申请实施例二提供的一种音频搜索方法的流程图;4 is a flowchart of an audio search method provided in Embodiment 2 of the present application;
图5为本申请实施例三提供的一种音频搜索装置的结构示意图;5 is a schematic structural diagram of an audio search apparatus according to Embodiment 3 of the present application;
图6为本申请实施例四提供的一种音频搜索装置的结构示意图;6 is a schematic structural diagram of an audio search apparatus according to Embodiment 4 of the present application;
图7为本申请实施例五提供的一种计算机设备的结构示意图。FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作详细说明。The present application will be described in detail below with reference to the accompanying drawings and embodiments.
实施例一Example 1
图1为本申请实施例一提供的一种音频搜索方法的流程图,本实施例可适用于根据音频数据的哈希特征的密度对音频数据进行排序、对比情况,该方法可以由音频搜索装置来执行,该音频搜索装置可以由软件和/或硬件实现,可配置在计算机设备中,例如,服务器、工作站、个人电脑,等等,包括如下步骤:FIG. 1 is a flowchart of an audio search method provided in Embodiment 1 of the application. This embodiment is applicable to sorting and comparing audio data according to the density of the hash feature of the audio data. The method can be performed by an audio search device. To perform, the audio search apparatus can be implemented by software and/or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, etc., including the following steps:
步骤101、确定第一音频数据、多个第二音频数据。Step 101: Determine first audio data and a plurality of second audio data.
在本实施例中,第一音频数据、多个第二音频数据均为音频数据,该音频数据的形式可以为歌手发布的歌曲、从短视频、电影、电视剧等视频数据中分离的音频数据、用户在移动终端录制的语音信号,等等,该音频数据的格式可以包括MP3、WMA、AAC,本实施例对此不加以限制。In this embodiment, the first audio data and the plurality of second audio data are audio data, and the audio data can be in the form of songs released by singers, audio data separated from video data such as short videos, movies, and TV dramas, For the voice signal recorded by the user on the mobile terminal, etc., the format of the audio data may include MP3, WMA, and AAC, which is not limited in this embodiment.
示例性的,多个第二音频数据为通过各种方式预先采集的音频数据,例如,用户上传音频数据、向版权方购买音频数据、技术人员录制音频数据、使用爬虫客户端从网络中爬取音频数据,等等,多个第二音频数据可形成音频库、可向外部提供搜索服务,第一音频数据为待搜索的音频数据,即,在音频库中搜索与第一音频数据相同或相似的第二音频数据。Exemplarily, the plurality of second audio data are pre-collected audio data in various ways, for example, the user uploads the audio data, purchases the audio data from the copyright owner, the technician records the audio data, and uses the crawler client to crawl from the network. Audio data, etc., a plurality of second audio data can form an audio library, and search services can be provided to the outside, the first audio data is the audio data to be searched, that is, the audio library is searched for the same or similar to the first audio data. the second audio data.
由于存在压缩、裁剪、背景噪声的影响,本实施例所指相同或相似,可以指第一音频数据与第二音频数据的全部或部分内容相同或相似。Due to the influence of compression, clipping, and background noise, the same or similar in this embodiment may refer to the first audio data and the second audio data being the same or similar in whole or in part.
步骤102、分别对第一音频数据计算第一哈希特征、对多个第二音频数据计算第二哈希特征。Step 102: Calculate a first hash feature for the first audio data and calculate a second hash feature for a plurality of second audio data, respectively.
对于第一音频数据,可对其计算哈希特征(hash,又称散列特征、指纹),用以作为第一音频数据的特征,为便于区分,该哈希特征记为第一哈希特征。For the first audio data, a hash feature (hash, also known as hash feature, fingerprint) can be calculated for it to be used as the feature of the first audio data. For the convenience of distinction, the hash feature is recorded as the first hash feature .
对于第二音频数据,可对其计算哈希特征(hash,又称散列特征、指纹),用以作为第二音频数据的特征,为便于区分,该哈希特征记为第二哈希特征。For the second audio data, a hash feature (hash, also known as hash feature, fingerprint) can be calculated for it to be used as the feature of the second audio data. For the convenience of distinction, the hash feature is recorded as the second hash feature .
一般情况下,计算第一哈希特征与计算第二哈希特征的方式是相同的,即基于同一方式对第一音频数据计算第一哈希特征、对多个第二音频数据计算第二哈希特征。In general, the methods of calculating the first hash feature and calculating the second hash feature are the same, that is, the first hash feature is calculated for the first audio data and the second hash feature is calculated for multiple second audio data based on the same method. Hi feature.
在本申请的一个实施例中,步骤102可以包括如下步骤:In an embodiment of the present application, step 102 may include the following steps:
步骤1021、将第一音频数据转换为第一频谱图。Step 1021: Convert the first audio data into a first spectrogram.
在本实施例中,可通过傅里叶变换(Discrete Fourier Transform,DFT)、短时傅里叶变换(short-time Fourier transform,或short-term Fourier transform,STFT)等方式将第一音频数据转换为频谱图,频谱图的横轴为时间、纵轴为频率,从而将第一音频数据从时域信号转换为频域信号,为便于区分,该频谱图记为第一频谱图。In this embodiment, the first audio data may be converted by means of Fourier transform (Discrete Fourier Transform, DFT), short-time Fourier transform (short-time Fourier transform, or short-term Fourier transform, STFT), etc. is a spectrogram, the horizontal axis of the spectrogram is time and the vertical axis is frequency, so that the first audio data is converted from a time-domain signal to a frequency-domain signal. For the convenience of distinction, the spectrogram is denoted as the first spectrogram.
把时域信号变成频域信号就会损失时间信息,因此,可以采用数据块(又称窗口)的方式,将一大段时域上的第一音频数据分成多个第一数据块,对多个第一数据块分别转换为频域信号,这样在一定程度上保留时间信息。Converting a time domain signal into a frequency domain signal will lose time information. Therefore, a data block (also known as a window) method can be used to divide a large segment of the first audio data in the time domain into multiple first data blocks. The plurality of first data blocks are respectively converted into frequency domain signals, so that time information is preserved to a certain extent.
例如,第一音频数据的参数为双声道、16-bit精度、44100Hz采样。这时1s的数据大小为441002byte2声道≈176kB。如果选择4kB当作数据块的大小,则每秒钟要对44块数据进行傅里叶变换,这样的切分密度可满足需求。For example, the parameters of the first audio data are two-channel, 16-bit precision, and 44100 Hz sampling. At this time, the data size of 1s is 441002byte2 channel ≈ 176kB. If 4kB is selected as the size of the data block, Fourier transform is performed on 44 blocks of data every second, and such a segmentation density can meet the requirements.
步骤1022、依据能量在第一频谱图的多个频谱带上查找第一关键点。Step 1022: Search for a first key point on multiple spectral bands of the first spectrogram according to the energy.
第一音频数据的幅值较大的频率跨度可能很广,从低音C(32.70Hz)到高音C(4186.01Hz)都可能出现。为了避免分析整个第一频谱图,降低计算量,可将第一频谱图分成多个频谱带(又称子带)。The frequency span with the larger amplitude of the first audio data may be very wide, and may appear from low C (32.70 Hz) to high C (4186.01 Hz). In order to avoid analyzing the entire first spectrogram and reduce the calculation amount, the first spectrogram may be divided into a plurality of spectral bands (also called sub-bands).
从每个子带中选择关键点,频率峰值,例如,选择如下几个子带:低音子带为30Hz-40Hz,40Hz-80Hz和80Hz-120Hz(贝司吉他等乐器的基频会出现低 音子带),中音和高音子带分别为120Hz-180Hz和180Hz-300Hz(人声和大部分其他乐器的基频出现在这两个子带)。Select key points, frequency peaks from each subband, for example, select the following subbands: 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz for the bass subband (bass guitars and other instruments will have a bass subband at the fundamental frequency) , the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequencies of vocals and most other instruments appear in these two subbands).
由于能量(即第一频谱图上的幅值)越大的点抗噪性就越强,因此,针对每个子带,可按照能量选择关键点,为便于区分,记为第一关键点。Since a point with a larger energy (ie, the amplitude on the first spectrogram) is more resistant to noise, for each subband, a key point can be selected according to the energy, which is recorded as the first key point for the convenience of distinction.
通常情况下,可每个子带中选择频率最大(即能量最大)的点为第一关键点。Normally, the point with the highest frequency (ie, the highest energy) in each subband can be selected as the first key point.
步骤1023、基于第一关键点生成第一音频数据的第一哈希特征。Step 1023: Generate a first hash feature of the first audio data based on the first key point.
每个数据块的第一关键点构成了这一帧音频数据的签名,不同数据块的签名构成整个第一音频数据的第一哈希特征。The first key point of each data block constitutes the signature of this frame of audio data, and the signatures of different data blocks constitute the first hash feature of the entire first audio data.
对于第一音频数据的第一哈希特征,可缓存在内存中,等待与第二音频数据的第二哈希特征进行对比。The first hash feature of the first audio data may be cached in the memory, waiting to be compared with the second hash feature of the second audio data.
步骤1024、将第二音频数据转换为第二频谱图。Step 1024: Convert the second audio data into a second spectrogram.
在本实施例中,可通过傅里叶变换、短时傅里叶变换等方式将第二音频数据转换为频谱图,频谱图的横轴为时间、纵轴为频率,从而将第二音频数据从时域信号转换为频域信号,为便于区分,该频谱图记为第二频谱图。In this embodiment, the second audio data can be converted into a spectrogram by means of Fourier transform, short-time Fourier transform, etc. The horizontal axis of the spectrogram is time, and the vertical axis is frequency, so that the second audio data is converted into a spectrogram. Converted from a time domain signal to a frequency domain signal, the spectrogram is denoted as the second spectrogram for the convenience of distinction.
把时域信号变成频域信号就会损失时间信息,因此,可以采用数据块(又称窗口)的方式,将一大段时域上的第二音频数据分成多个数据块,对每个数据块分别转换为频域信号,这样在一定程度上保留时间信息。Converting a time domain signal into a frequency domain signal will lose time information. Therefore, a data block (also known as a window) method can be used to divide a large segment of the second audio data in the time domain into multiple data blocks. The data blocks are converted to frequency domain signals separately, which preserves time information to a certain extent.
步骤1025、依据能量在第二频谱图的多个频谱带上查找第二关键点。Step 1025: Search for a second key point on multiple spectral bands of the second spectrogram according to the energy.
第二音频数据的幅值较大的频率跨度可能很广,从低音C(32.70Hz)到高音C(4186.01Hz)都可能出现。为了避免分析整个第二频谱图,降低计算量,可将第二频谱图分成多个频谱带(又称子带)。The frequency span with the larger amplitude of the second audio data may be very wide, and may appear from the bass C (32.70 Hz) to the high C (4186.01 Hz). In order to avoid analyzing the whole second spectrogram and reduce the calculation amount, the second spectrogram may be divided into a plurality of spectral bands (also called sub-bands).
从每个子带中选择关键点,频率峰值,例如,选择如下几个子带:低音子带为30Hz-40Hz,40Hz-80Hz和80Hz-120Hz(贝司吉他等乐器的基频会出现低音子带),中音和高音子带分别为120Hz-180Hz和180Hz-300Hz(人声和大部分其他乐器的基频出现在这两个子带)。Select key points, frequency peaks from each subband, for example, select the following subbands: 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz for the bass subband (bass guitars and other instruments will have a bass subband at the fundamental frequency) , the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequencies of vocals and most other instruments appear in these two subbands).
由于能量(即第二频谱图上的幅值)越大的点抗噪性就越强,因此,针对每个子带,可按照能量选择关键点,为便于区分,记为第二关键点。Since a point with a larger energy (ie, the amplitude on the second spectrogram) is more resistant to noise, for each subband, a key point can be selected according to the energy, which is recorded as the second key point for the convenience of distinction.
通常情况下,可每个子带中选择频率最大(即能量最大)的点为第二关键点。Usually, the point with the highest frequency (ie, the highest energy) in each subband can be selected as the second key point.
步骤1026、基于第二关键点生成第二音频数据的第二哈希特征。Step 1026: Generate a second hash feature of the second audio data based on the second key point.
每个数据块的第二关键点构成了这一帧音频数据的签名,不同数据块的签名构成整个第二音频数据的第二哈希特征。The second key point of each data block constitutes the signature of this frame of audio data, and the signatures of different data blocks constitute the second hash feature of the entire second audio data.
对于第二音频数据的第二哈希特征,可作为检索哈希表的关键字(key)保存,为了查找方便,第二哈希特征通常会作为哈希表的键值,键值指向的部分包括该第二哈希特征在第二音频数据中出现的时间和该第二音频数据的ID。The second hash feature of the second audio data can be stored as a key for retrieving the hash table. For the convenience of searching, the second hash feature is usually used as the key value of the hash table, and the part pointed to by the key value The time when the second hash feature appears in the second audio data and the ID of the second audio data are included.
第二哈希特征(Hash Tag)The second hash feature (Hash Tag) 时间(Time in Seconds)Time in Seconds 第二音频数据(Song)Second audio data (Song)
30 51 99 121 19530 51 99 121 195 53.5253.52 Song ASong A
33 56 92 151 18533 56 92 151 185 12.3212.32 Song BSong B
39 26 89 141 25139 26 89 141 251 15.3415.34 Song CSong C
32 67 100 128 27032 67 100 128 270 78.4378.43 Song DSong D
30 51 99 121 19530 51 99 121 195 10.8910.89 Song ESong E
34 57 95 111 20034 57 95 111 200 54.5254.52 Song ASong A
34 41 93 161 20234 41 93 161 202 11.8911.89 Song ESong E
当然,上述计算第一哈希特征、第二哈希特征的方法只是作为示例,在实施本申请实施例时,可以根据实际情况设置其它计算第一哈希特征、第二哈希特征的方法,本申请实施例对此不加以限制。另外,除了上述计算第一哈希特征、第二哈希特征的方法外,本领域技术人员还可以根据实际需要采用其它计算第一哈希特征、第二哈希特征的方法,本申请实施例对此也不加以限制。Of course, the above method for calculating the first hash feature and the second hash feature is only an example. When implementing the embodiments of the present application, other methods for calculating the first hash feature and the second hash feature may be set according to the actual situation. This embodiment of the present application does not limit this. In addition, in addition to the above-mentioned methods for calculating the first hash feature and the second hash feature, those skilled in the art can also adopt other methods for calculating the first hash feature and the second hash feature according to actual needs. This is also not restricted.
步骤103、按照多个第二哈希特征的密度确定多个第二音频数据之间排列的顺序。Step 103: Determine the order of arrangement among the plurality of second audio data according to the density of the plurality of second hash features.
在哈希特征较为密集时,哈希特征的比对准确性较高,在哈希特征较为稀 疏时,哈希特征的比对准确性较低,容易出现把不相同或不相似的音频数据,认为是相同或相似的音频数据。When the hash features are dense, the comparison accuracy of the hash features is higher, and when the hash features are sparse, the comparison accuracy of the hash features is lower, and it is easy to combine different or dissimilar audio data, considered to be the same or similar audio data.
在本实施例中,可以以第二音频数据的第二哈希特征统计密度(Density),表征第二哈希特征的稠密程度,在队列系统(Queuing System)中,以第二音频数据的第二哈希特征的密度作为门槛,按照第二音频数据的第二哈希特征的密度对多个第二音频数据进行排序,从而确定多个第二音频数据之间的顺序。In this embodiment, the second hash feature statistical density (Density) of the second audio data can be used to represent the density of the second hash feature, and in the queuing system (Queuing System), the second audio data The density of the two hash features is used as a threshold, and the plurality of second audio data are sorted according to the density of the second hash features of the second audio data, so as to determine the order among the plurality of second audio data.
在本申请的一个实施例中,第二音频数据的第二哈希特征的密度为局部密度,则在本实施例中,步骤103包括如下步骤:In an embodiment of the present application, the density of the second hash feature of the second audio data is a local density, then in this embodiment, step 103 includes the following steps:
步骤1031、统计第二哈希特征在多个局部区域中重叠的数量。Step 1031: Count the number of overlapping second hash features in multiple local regions.
在本实施例中,可将第二音频数据划分为多个大小相同的局部区域,针对每个局部区域,可分别统计第二哈希特征在该个局部区域中重叠的数量,以局部区域作为单位面积,则该数据可视为局部密度。In this embodiment, the second audio data can be divided into a plurality of local areas of the same size, and for each local area, the number of overlapping second hash features in the local area can be counted separately, and the local area is used as the unit area, the data can be regarded as local density.
示例性的,可获取第二音频数据的第二频谱图,该第二频谱图为对第二音频数据从时域信息转换为频域信息之后获得的频谱图,第二哈希特征可标记在第二频谱图上。Exemplarily, a second spectrogram of the second audio data may be obtained, where the second spectrogram is a spectrogram obtained after converting the second audio data from time domain information to frequency domain information, and the second hash feature may be marked in on the second spectrogram.
在第二频谱图上添加多个大小相同的窗口(window),用以表示多个局部区域的范围,从而在多个窗口中分别统计第二哈希特征的数量,作为第二哈希特征在多个局部区域的数量。Multiple windows of the same size are added to the second spectrogram to represent the range of multiple local regions, so that the number of second hash features is counted in multiple windows, and the second hash feature is used as the second hash feature in The number of multiple local regions.
给定第二音频数据A,在时间t处添加了一个窗口,该窗口的大小为k,则局部区域的数量(即局部密度)表示如下:Given the second audio data A, a window is added at time t, and the size of the window is k, then the number of local regions (that is, the local density) is expressed as follows:
Figure PCTCN2022073291-appb-000001
Figure PCTCN2022073291-appb-000001
其中,i为窗口(即t到t+k)内第二哈希特征重叠的数量。where i is the number of overlapping second hash features within the window (ie, t to t+k).
示例性的,对于整个第二频谱图,可查找预设的窗口,每间隔预设的时间在第二频谱图上添加窗口,从而将第二频谱图划分为多个局部区域。Exemplarily, for the entire second spectrogram, a preset window may be searched, and a window may be added to the second spectrogram at preset time intervals, thereby dividing the second spectrogram into multiple local regions.
对于窗口与预设的时间可存在如下两种关系:There are two relationships between the window and the preset time as follows:
在一种关系下,该窗口的宽度等于该预设的时间的长度,即相邻两个窗口 之间并不重叠,降低第二哈希特征的计算量。In a relationship, the width of the window is equal to the length of the preset time, that is, there is no overlap between two adjacent windows, which reduces the calculation amount of the second hash feature.
在另一种关系下,该窗口的宽度小于该预设的时间的长度,即相邻两个窗口之间部分重叠,可提高第二哈希特征的精确度。In another relationship, the width of the window is smaller than the preset time length, that is, a partial overlap between two adjacent windows can improve the accuracy of the second hash feature.
步骤1032、基于多个局部区域中重叠的数量生成第二哈希特征在所属的第二音频数据中的密度。Step 1032: Generate the density of the second hash feature in the second audio data to which it belongs based on the number of overlaps in the multiple local regions.
若在多个局部区域中统计出第二哈希特征重叠的数量,则可以以多个局部区域中第二哈希特征重叠的数量作为参考,生成第二哈希特征在第二音频数据中的密度。If the number of overlapping second hash features in multiple local regions is counted, the number of overlapping second hash features in multiple local regions may be used as a reference to generate the number of overlapping second hash features in the second audio data. density.
在一个示例中,可将多个局部区域中重叠的数量进行比较,若某个局部区域中重叠的数量最大,将重叠的数量最大的局部区域中重叠的数量确定为第二哈希特征在所属第二音频数据中的密度。In one example, the number of overlaps in a plurality of partial regions may be compared, and if the number of overlaps in a certain partial region is the largest, the number of overlaps in the partial region with the largest number of overlaps is determined as the number of overlaps in the second hash feature to which it belongs. Density in the second audio data.
给定第二音频数据A,在时间t处添加了一个窗口(局部区域),统计出该窗口中的数量为D(A,t),则第二哈希特征在第二音频数据中的密度D(A)为:Given the second audio data A, a window (local area) is added at time t, and the number in the window is D(A, t), then the density of the second hash feature in the second audio data D(A) is:
Figure PCTCN2022073291-appb-000002
Figure PCTCN2022073291-appb-000002
其中,max为取最大值的函数。Among them, max is the function of taking the maximum value.
在一个示例中,如图2所示,对某个第二哈希特征的第二频谱图添加窗口201、窗口202、窗口203、窗口204、窗口205、窗口206、窗口207,其中,窗口203中第二哈希特征重叠的数量最高,因此,可以选择窗口203中第二哈希特征重叠的数量作为第二哈希特征在该第二音频数据中的密度。In an example, as shown in FIG. 2 , a window 201 , a window 202 , a window 203 , a window 204 , a window 205 , a window 206 , and a window 207 are added to the second spectrogram of a certain second hash feature, wherein the window 203 The number of overlaps of the second hash features in the window 203 is the highest, therefore, the number of overlaps of the second hash features in the window 203 can be selected as the density of the second hash features in the second audio data.
当然,上述计算第二哈希特征的密度的方法只是作为示例,在实施本申请实施例时,可以根据实际情况设置其它计算计算第二哈希特征的密度的方法,例如,对局部区域中重叠的数量由大到小进行排序,取排序1前j(j为正整数)个局部区域中重叠的数量并计算平均值作为第二哈希特征在第二音频数据中的密度,本申请实施例对此不加以限制。另外,除了上述计算第二哈希特征的密度的方法外,本领域技术人员还可以根据实际需要采用其它计算第二哈希特征的密度的方法,本申请实施例对此也不加以限制。Of course, the above method for calculating the density of the second hash feature is only an example. When implementing the embodiments of the present application, other methods for calculating the density of the second hash feature may be set according to actual conditions. Sort the number from large to small, take the number of overlaps in the j (j is a positive integer) local area before sorting 1 and calculate the average value as the density of the second hash feature in the second audio data, the embodiment of the present application This is not restricted. In addition, in addition to the above method for calculating the density of the second hash feature, those skilled in the art may also adopt other methods for calculating the density of the second hash feature according to actual needs, which are not limited in this embodiment of the present application.
步骤1033、按照密度对多个第二音频数据进行降序排序,获得多个第二音频数据的顺序。Step 1033: Sort the plurality of second audio data in descending order according to the density to obtain the order of the plurality of second audio data.
若针对每个第二音频数据均计算出第二哈希特征的密度,则可以按照该密度对多个第二音频数据进行降序排序,从而确定每个第二音频数据的顺序,即,第二哈希特征的密度越大,则第二音频数据的顺序越前,反之,第二哈希特征的密度越小,则第二音频数据的顺序越后。If the density of the second hash feature is calculated for each second audio data, the plurality of second audio data may be sorted in descending order according to the density, so as to determine the order of each second audio data, that is, the second The higher the density of the hash features is, the higher the order of the second audio data is; otherwise, the lower the density of the second hash features is, the lower the sequence of the second audio data is.
步骤104、按照顺序将第一哈希特征与多个第二哈希特征进行对比,以查找与第一音频数据相同或相似的第二音频数据。Step 104: Compare the first hash feature with a plurality of second hash features in order to find second audio data that is the same as or similar to the first audio data.
在本实施例中,可按照第二音频数据排列的顺序,依次将第二音频数据的第二哈希特征与第一音频数据的第一哈希特征进行比较,从而确定第一音频数据与第二音频数据是否相同或相似。In this embodiment, the second hash feature of the second audio data may be sequentially compared with the first hash feature of the first audio data according to the order in which the second audio data is arranged, so as to determine the difference between the first audio data and the first hash feature of the first audio data. Whether the two audio data are the same or similar.
针对当前的第二音频数据,如果该第二音频数据的第二哈希特征与第一音频数据的第一哈希特征之间的差异较大,可以认为该第二音频数据与第一音频数据之间的相似度较低,第一哈希特征与第二哈希特征不匹配,继续搜索下一个第二音频数据。For the current second audio data, if the difference between the second hash feature of the second audio data and the first hash feature of the first audio data is large, it can be considered that the second audio data is different from the first audio data The similarity between them is low, the first hash feature does not match the second hash feature, and the search continues for the next second audio data.
针对当前的第二音频数据,如果该第二音频数据的第二哈希特征与第一音频数据的第一哈希特征之间的差异较小,可以认为该第二音频数据与第一音频数据之间的相似度较高,第一哈希特征与第二哈希特征匹配,确认搜索到与第一音频数据相同或相似的第二音频数据,此时,可停止搜索。For the current second audio data, if the difference between the second hash feature of the second audio data and the first hash feature of the first audio data is small, it can be considered that the second audio data is different from the first audio data The similarity between them is relatively high, the first hash feature matches the second hash feature, and it is confirmed that the second audio data that is the same or similar to the first audio data is found. At this time, the search can be stopped.
示例性的,可确定目标位置,目标位置用于表示对比的第二音频数据的数量,该目标位置一般远小于第二音频数据的数量。Exemplarily, a target position may be determined, where the target position is used to represent the quantity of the second audio data to be compared, and the target position is generally much smaller than the quantity of the second audio data.
按照顺序将第一哈希特征与位于目标位置之前的第二哈希特征进行对比。Compare the first hash feature with the second hash feature located before the target location in order.
若第一哈希特征与第二哈希特征匹配,则确定第一音频数据与第二哈希特征所属的第二音频数据相同或相似。If the first hash feature matches the second hash feature, it is determined that the first audio data and the second audio data to which the second hash feature belongs are identical or similar.
在本实施例中,确定第一音频数据、多个第二音频数据,分别对第一音频数据计算第一哈希特征、对多个第二音频数据计算第二哈希特征,按照多个第 二哈希特征的密度确定多个第二音频数据之间排列的顺序,按照顺序将第一哈希特征与多个第二哈希特征进行对比,以查找与第一音频数据相同或相似的第二音频数据,较为密集的哈希特征可以提高对比的精确度,通过哈希特征的密度调整音频数据的排序,提高在优先对比的过程中搜索到相同或相似的音频数据的概率,从而在减少对比的次数的情况下,提高搜索音频数据的精确度。In this embodiment, the first audio data and a plurality of second audio data are determined, a first hash feature is calculated for the first audio data, and a second hash feature is calculated for a plurality of second audio data, respectively, according to the plurality of The density of the second hash feature determines the order in which the plurality of second audio data are arranged, and the first hash feature is compared with the plurality of second hash features in order to find the first audio data that is the same or similar to the first audio data. For audio data, denser hash features can improve the accuracy of comparison, adjust the sorting of audio data through the density of hash features, improve the probability of searching for the same or similar audio data in the process of priority comparison, thereby reducing the In the case of the number of comparisons, the accuracy of searching for audio data is improved.
假设第二音频数据的数量为N(N为正整数),在队列系统(Queuing System)中:Assuming that the number of second audio data is N (N is a positive integer), in the Queuing System:
对于基线方法(baseline method),第二音频数据之间的顺序并无具体的参考标准,第一音频数据逐个与第二音频数据对比,搜索到匹配的第二音频数据属于碰巧的事件,往往在搜索第一音频数据匹配的第二音频数据的过程中耗费大量的时间,时间复杂度为O(N)。For the baseline method, there is no specific reference standard for the order between the second audio data. The first audio data is compared with the second audio data one by one, and the matching second audio data is a coincidental event. The process of searching for the second audio data matching the first audio data consumes a lot of time, and the time complexity is O(N).
因此,可能对队列系统(Queuing System)进行如下改进:Therefore, the following improvements may be made to the queuing system:
一、队列系统A(Queue System A):1. Queue System A (Queue System A):
队列系统A按照第二哈希特征的绝对数(Absolute Matches)来排列第二音频数据。The queue system A arranges the second audio data according to the absolute number (Absolute Matches) of the second hash feature.
将第二音频数据放在队列中,其中,排在队列前面的第二音频数据最有可能是最佳的匹配,那些排在队列后面的第二音频数据不太可能是正确的匹配。The second audio data is placed in a queue, where the second audio data at the front of the queue are most likely to be the best match and those at the back of the queue are less likely to be the correct match.
因此,队列系统A可以提供一个停止标准,如果对比了队列中的前m个第二音频数据,仍未搜索到与第一音频数据匹配的第二音频数据,则可以停止搜索,生成搜索结果为并不存在与第一音频数据匹配的第二音频数据。Therefore, the queue system A can provide a stop criterion. If the first m second audio data in the queue are compared and no second audio data matching the first audio data is found, the search can be stopped, and the search result is generated as There is no second audio data matching the first audio data.
其中,m为正整数,并且,m<<N(m远小于N)。Wherein, m is a positive integer, and m<<N (m is much smaller than N).
因此,队列系统A的时间复杂度为O(m),O(m)<<O(N)。Therefore, the time complexity of the queue system A is O(m), and O(m)<<O(N).
二、队列系统B(Queue System B):2. Queue System B (Queue System B):
虽然队列系统A节省了时间,但仅当多个第二音频数据具有相同的时长时才有效,当多个第二音频数据的时长出现较大偏差时,准确度就会下降。Although the queuing system A saves time, it is only effective when the plurality of second audio data have the same duration, and when the duration of the plurality of second audio data has a large deviation, the accuracy will decrease.
例如,第二音频数据A的时长是2分钟,而第二音频数据B的时长是30分钟,即使第二音频数据A是第一音频数据的正确匹配,第二音频数据B也可能仅仅因为时长太长,以至于第二音频数据B的第二哈希特征的数量大于第二音频数据A的第二哈希特征的数量,从而第二音频数据B排在队列的前面,而第二音频数据A排在队列的后面。For example, the duration of the second audio data A is 2 minutes, and the duration of the second audio data B is 30 minutes, even if the second audio data A is a correct match of the first audio data, the second audio data B may only be due to the duration is so long that the number of second hash features of the second audio data B is greater than the number of second hash features of the second audio data A, so that the second audio data B is at the front of the queue, and the second audio data A is at the back of the queue.
当有m个时长较长的第二音频数据表现出这种现象(即长音频的频繁碰撞)时,将导致第二音频数据A在队列中的匹配丢失。When there are m pieces of second audio data with longer durations exhibiting this phenomenon (that is, frequent collision of long audios), the matching of the second audio data A in the queue will be lost.
对此,队列系统B通过除以时长等方式对第二音频数据的时长进行归一化(Normalised by Duration)来排列第二音频数据。In this regard, the queue system B normalizes the duration of the second audio data (Normalised by Duration) by dividing by the duration to queue the second audio data.
但是,简单地除以第二音频数据的时长会导致过度正常化的问题,这会让较长的第二音频数据重新进入队列,正确的第二音频数据在队列仍会匹配丢失。However, simply dividing by the duration of the second audio data will lead to an over-normalization problem, which will cause the longer second audio data to re-enter the queue, and the correct second audio data will still be matched and lost in the queue.
三、队列系统C(Queue System C):3. Queue System C (Queue System C):
本实施例提供了队列系统C,根据第二哈希特征的密度进行归一化,按照第二哈希特征的密度进行排序,从而在第二哈希特征的绝对数和过度归一化时长之间进行了权衡。This embodiment provides a queue system C, which performs normalization according to the density of the second hash feature, and sorts according to the density of the second hash feature, so that the difference between the absolute number of the second hash feature and the over-normalization duration is trade-offs were made.
为使本领域技术人员更好地理解本申请实施例,以下通过具体的场景对比队列系统A、队列系统B、队列系统C:In order for those skilled in the art to better understand the embodiments of the present application, the following compares the queuing system A, the queuing system B, and the queuing system C through specific scenarios:
场景一、短音频搜索Scenario 1. Short audio search
第二音频数据分别为歌曲A(Song A)和歌曲B(Song B),歌曲A的时长小于歌曲B的时长,假设给定与第一音频数据匹配的第二音频数据为歌曲A。The second audio data are song A (Song A) and song B (Song B) respectively, the duration of song A is less than the duration of song B, and it is assumed that the given second audio data matching the first audio data is song A.
如图3A所示,在歌曲A的第二频谱图与歌曲B的第二频谱图上分别标记了第二哈希特征,对其统计如下数据:As shown in FIG. 3A , the second hash feature is marked on the second spectrogram of song A and the second spectrogram of song B, respectively, and the following data are counted on them:
Figure PCTCN2022073291-appb-000003
Figure PCTCN2022073291-appb-000003
Figure PCTCN2022073291-appb-000004
Figure PCTCN2022073291-appb-000004
使用队列系统A,歌曲A中第二哈希特征的绝对数(727)小于歌曲B中第二哈希特征的绝对数(913),因此,歌曲A排在歌曲B之后。Using queue system A, the absolute number of second hash features in song A (727) is less than the absolute number of second hash features in song B (913), so song A ranks after song B.
使用队列系统B,歌曲A归一化的时长(0.198)大于歌曲B归一化的时长(0.033),因此,歌曲A排在歌曲B之前。Using Queue System B, Song A's normalized duration (0.198) is greater than Song B's normalized duration (0.033), so Song A ranks ahead of Song B.
使用队列系统C,歌曲A中第二哈希特征的密度(0.266)大于歌曲B中第二哈希特征的密度(0.067),因此,歌曲A排在歌曲B之前。Using queuing system C, the density of the second hash feature in song A (0.266) is greater than the density of the second hash feature in song B (0.067), so song A ranks ahead of song B.
场景二、长音频搜索Scenario 2, long audio search
第二音频数据分别为歌曲A(Song A)和歌曲B(Song B),歌曲A的时长小于歌曲B的时长,假设给定与第一音频数据匹配的第二音频数据为歌曲B。The second audio data are song A (Song A) and song B (Song B) respectively, the duration of song A is shorter than the duration of song B, and it is assumed that the given second audio data matching the first audio data is song B.
如图3B所示,在歌曲A的第二频谱图与歌曲B的第二频谱图上分别标记了第二哈希特征,对其统计如下数据:As shown in FIG. 3B , the second hash feature is marked on the second spectrogram of song A and the second spectrogram of song B respectively, and the following data are counted on them:
Figure PCTCN2022073291-appb-000005
Figure PCTCN2022073291-appb-000005
使用队列系统A,歌曲A中第二哈希特征的绝对数(347)小于歌曲B中第二哈希特征的绝对数(2481),因此,歌曲A排在歌曲B之后。Using queue system A, the absolute number of second hash features in song A (347) is less than the absolute number of second hash features in song B (2481), so song A ranks after song B.
使用队列系统B,歌曲A归一化的时长(0.094)大于歌曲B归一化的时长(0.090),因此,歌曲A排在歌曲B之前。Using Queue System B, Song A's normalized duration (0.094) is greater than Song B's normalized duration (0.090), so Song A ranks ahead of Song B.
使用队列系统C,歌曲A中第二哈希特征的密度(0.127)小于歌曲B中第二哈希特征的密度(0.182),因此,歌曲A排在歌曲B之后。Using queuing system C, the density of the second hash feature in song A (0.127) is less than the density of the second hash feature in song B (0.182), so song A ranks after song B.
由此可见,与歌曲B匹配的查询存在一个密度更高的区域,歌曲B的时长较长,第二哈希特征的绝对数大于歌曲A,队列系统B对时长过度补偿,尽管队列系统B对场景一(短音频搜索)有效,但对场景二(长音频搜索)是不起作用的,而队列系统C对场景一(短音频搜索)与场景二(长音频搜索)都是有效的。It can be seen that the query matching song B has a higher density area, the duration of song B is longer, the absolute number of second hash features is greater than that of song A, and queue system B overcompensates for the duration, although queue system B does not Scenario one (short audio search) is valid, but not for scenario two (long audio search), while queue system C is valid for both scenario one (short audio search) and scenario two (long audio search).
实施例二Embodiment 2
图4为本申请实施例二提供的一种音频搜索方法的流程图,本实施例可适用于根据音频数据的哈希特征的密度对音频数据进行排序、对比,从而进行内容审核的情况,该方法可以由音频搜索装置来执行,该音频搜索装置可以由软件和/或硬件实现,可配置在计算机设备中,例如,服务器、工作站、个人电脑,等等,包括如下步骤:FIG. 4 is a flowchart of an audio search method provided in Embodiment 2 of the present application. This embodiment is applicable to the case where audio data is sorted and compared according to the density of the hash feature of the audio data, so as to perform content review. The method may be performed by an audio search apparatus, which may be implemented in software and/or hardware, and may be configured in computer equipment, such as a server, workstation, personal computer, etc., including the following steps:
步骤401、接收客户端上传的第一音频数据,及对第一音频数据计算第一哈希特征。Step 401: Receive first audio data uploaded by a client, and calculate a first hash feature for the first audio data.
在本实施例中,计算机设备作为多媒体平台,一方面,为用户提供基于音频的服务,例如,向用户提供直播节目、短视频、语音会话、视频会话,等等,另一方面,接收用户上传的携带音频的文件,例如,直播数据、短视频、会话信息,等等。In this embodiment, the computer device acts as a multimedia platform. On the one hand, it provides users with audio-based services, such as providing users with live programs, short videos, voice conversations, video conversations, etc., and on the other hand, receives user uploads. audio-carrying files, such as live broadcast data, short videos, session information, and so on.
不同的多媒体平台可按照业务、法律等因素制定视频内容审核标准,在发布携带音频的文件之前,按照该审核规范对该携带音频的文件的内容进行审核,过滤掉一些不符合视频内容审核标准的携带音频的文件,如包含色情、低俗、暴力等内容的携带音频的文件,从而发布一些符合视频内容审核标准的携带音频的文件。Different multimedia platforms can formulate video content review standards based on business, legal and other factors. Before publishing a file with audio, review the content of the file with audio according to the review specification, and filter out some that do not meet the video content review standards. Audio-carrying files, such as audio-carrying files that contain pornographic, vulgar, violence, etc. content, so as to release some audio-carrying files that meet the video content review standards.
如果对于实时性要求较高,在多媒体平台中可设置流式实时系统,用户通过客户端实时将携带音频的文件上传至该流式实时系统,该流式实时系统可将该携带音频的文件传输至用于内容审核的计算机设备。If the real-time requirement is high, a streaming real-time system can be set up in the multimedia platform. The user uploads the audio-carrying file to the streaming real-time system in real time through the client, and the streaming real-time system can transmit the audio-carrying file to the real-time streaming system. to computer equipment used for content moderation.
如果对于实时性要求较低,在多媒体平台中可设置数据库,如分布式数据库等,用户通过客户端将携带音频的文件上传至该数据库,用于内容审核的计算机设备可从该数据库读取该携带音频的文件。If the real-time requirements are low, a database, such as a distributed database, can be set up in the multimedia platform. The user uploads the audio file to the database through the client, and the computer equipment used for content review can read the data from the database. A file that carries audio.
在本实施例中,可从携带音频的文件中分离第一音频数据进行内容审核,对于第一音频数据,可对其计算哈希特征,作为第一哈希特征。In this embodiment, the first audio data may be separated from the file carrying the audio for content auditing, and for the first audio data, a hash feature may be calculated for the first audio data as the first hash feature.
在一种计算第一哈希特征的方式中,可将第一音频数据转换为第一频谱图,依据能量在第一频谱图的多个频谱带上查找第一关键点,基于第一关键点生成第一音频数据的第一哈希特征。In a method of calculating the first hash feature, the first audio data can be converted into a first spectrogram, a first key point can be searched on a plurality of spectral bands of the first spectrogram according to the energy, and based on the first key point A first hash feature of the first audio data is generated.
步骤402、查找当前配置的黑名单。 Step 402 , look up the currently configured blacklist.
在本实施例中,可以将一些包含色情、低俗、暴力等敏感内容的音频数据,作为第二音频数据记录在黑名单中,由于这些音频数据通过不同的形式产生变异,因此,黑名单中的第二音频数据是可以持续扩充的。In this embodiment, some audio data containing sensitive content such as pornography, vulgarity, violence, etc. may be recorded in the blacklist as second audio data. The second audio data can be continuously expanded.
在采集第二音频数据并记录至黑名单时,可对其计算哈希特征,作为第二哈希特征。When the second audio data is collected and recorded in the blacklist, a hash feature may be calculated for the second audio data as the second hash feature.
在一种计算第二哈希特征的方式中,可将第二音频数据转换为第二频谱图,依据能量在第二频谱图的多个频谱带上查找第二关键点,基于第二关键点生成第二音频数据的第二哈希特征。In a method for calculating the second hash feature, the second audio data can be converted into a second spectrogram, a second key point is searched on a plurality of spectral bands of the second spectrogram according to the energy, and based on the second key point A second hash feature of the second audio data is generated.
因此,黑名单中记录有多个第二音频数据,每个第二音频数据已配置第二哈希特征,在内容审核时加载该第二哈希特征即可。Therefore, a plurality of second audio data are recorded in the blacklist, and each second audio data has been configured with a second hash feature, and the second hash feature may be loaded during content review.
步骤403、按照多个第二哈希特征的密度确定多个第二音频数据之间排列的顺序。Step 403: Determine the order of arrangement among the plurality of second audio data according to the density of the plurality of second hash features.
对于多媒体平台,每天客户端上传的第一音频数据的量级可达千万甚至亿级,在这数量众多的第一音频数据中,属于黑名单的第一音频数据的量级大约在数千个,使得黑名单的匹配率较低。For a multimedia platform, the magnitude of the first audio data uploaded by the client every day can reach tens of millions or even hundreds of millions. Among this large number of first audio data, the magnitude of the first audio data belonging to the blacklist is about several thousand. , which makes the matching rate of the blacklist lower.
以某个多媒体平台某天8000万的第一音频数据为例,黑名单的匹配率大约在0.005%。Taking the 80 million first audio data of a certain multimedia platform as an example, the matching rate of the blacklist is about 0.005%.
因此,多媒体平台需要一个耗时低、精度高的队列系统来尽可能地捕获属于黑名单的第一音频数据。Therefore, the multimedia platform needs a queue system with low time consumption and high precision to capture the first audio data belonging to the blacklist as much as possible.
基线方法(baseline method)使用第一音频数据与黑名单中所有的第二音频数进行比较,虽然准确率高,但是时间复杂度为O(N),耗时较高,这是不必要的,因为,有99.995%的第一音频数据是没有匹配到第二音频数据的,这是低效的搜索方法。The baseline method uses the first audio data to compare with all the second audio numbers in the blacklist. Although the accuracy rate is high, the time complexity is O(N) and the time-consuming is high, which is unnecessary. Because 99.995% of the first audio data does not match the second audio data, this is an inefficient search method.
其他队列系统,如队列系统A(Queue System A,按照第二哈希特征的绝对数(Absolute Matches)来排列第二音频数据)和队列系统B(Queue System B,对第二音频数据的时长进行归一化(Normalised by Duration)来排列第二音频数据),通过优先推荐可能性较高的第二音频数据来提高效率。Other queuing systems, such as Queue System A (Queue System A, arrange the second audio data according to the absolute number of the second hash feature (Absolute Matches)) and Queue System B (Queue System B, perform the second audio data duration Normalized by Duration to arrange the second audio data), which improves the efficiency by preferentially recommending the second audio data with higher possibility.
但是,由于第二音频数据的时长并不一致,这些排队系统的准确率较低。However, these queuing systems are less accurate due to the inconsistent duration of the second audio data.
本实施例提出队列系统C,允许剪枝在保持效率的同时,使用第二哈希特征的密度在剪枝队列中更准确地选择第二音频数据。This embodiment proposes a queue system C that allows pruning to more accurately select second audio data in the pruning queue using the density of the second hash feature while maintaining efficiency.
在本申请的一个实施例中,步骤403包括如下步骤:In an embodiment of the present application, step 403 includes the following steps:
步骤4031、统计第二哈希特征在多个局部区域中重叠的数量。Step 4031: Count the number of overlapping second hash features in multiple local regions.
示例性的,可获取第二音频数据的第二频谱图;在第二频谱图上添加多个窗口;在多个窗口中分别统计第二哈希特征的数量,作为第二哈希特征在多个局部区域的数量。Exemplarily, a second spectrogram of the second audio data can be obtained; multiple windows are added on the second spectrogram; the number of the second hash features is counted in the multiple windows, as the number of the second hash features in multiple windows. the number of local regions.
在添加多个窗口时,可查找预设的窗口;每间隔预设的时间在第二频谱图上添加窗口。When adding multiple windows, you can search for a preset window; add windows on the second spectrogram at preset time intervals.
其中,窗口的宽度小于或等于预设的时间的长度。Wherein, the width of the window is less than or equal to the preset time length.
步骤4032、基于多个局部区域中重叠的数量生成第二哈希特征在所属第二音频数据中的密度。Step 4032: Generate the density of the second hash feature in the second audio data to which it belongs based on the number of overlaps in the multiple local regions.
在一种生成密度的方式中,可将多个局部区域中重叠的数量进行比较;若某个局部区域中重叠的数量最大,将重叠的数量最大的局部区域中重叠的数量确定为第二哈希特征在所属的第二音频数据中的密度。In a method of generating density, the number of overlaps in multiple local regions can be compared; if the number of overlaps in a local region is the largest, the number of overlaps in the local region with the largest number of overlaps is determined as the second. The density of the feature in the associated second audio data.
步骤4033、按照密度对多个第二音频数据进行降序排序,获得多个第二音频数据的顺序。Step 4033: Sort the plurality of second audio data in descending order according to the density to obtain the order of the plurality of second audio data.
步骤404、按照顺序将第一哈希特征与多个第二哈希特征进行对比,以确定多个第二音频数据中是否存在第二音频数据与所述第一音频数据相同或相似。Step 404: Compare the first hash feature with a plurality of second hash features in order to determine whether there is second audio data in the plurality of second audio data that is identical or similar to the first audio data.
示例性的,可确定目标位置;按照顺序将第一哈希特征与位于目标位置之前的第二哈希特征进行对比。Exemplarily, the target position may be determined; the first hash feature is compared with the second hash feature located before the target position in order.
若第一哈希特征与第二哈希特征匹配,则确定第一音频数据与该第二哈希特征所属的第二音频数据相同或相似。If the first hash feature matches the second hash feature, it is determined that the first audio data is the same or similar to the second audio data to which the second hash feature belongs.
本实时对基线方法、队列系统A、队列系统B、队列系统C进行了实验,实验中使用了由130首黑名单的第二音频数据和1000个第一音频数据组成的测试集,其中,800个第一音频数据并不属于黑名单,200个第一音频数据属于黑名单。In this real-time experiment, the baseline method, queue system A, queue system B, and queue system C are tested. In the experiment, a test set consisting of 130 blacklisted second audio data and 1000 first audio data is used, of which 800 The first audio data does not belong to the blacklist, and the 200 first audio data belong to the blacklist.
在实现中,对所有队列系统在停止准则为对比前m个第二音频数据的情况下的耗时和准确率,以及在没有停止准则的情况下进行的随机搜索,实验的结果如下:In the implementation, the time-consuming and accuracy rate of all queuing systems when the stopping criterion is compared with the first m second audio data, and the random search without stopping criterion, the experimental results are as follows:
队列系统queue system 耗时(Time Taken)Time Taken 推送速率(Push Rate)Push Rate 准确率(Precision)Accuracy (Precision)
基线方法baseline method 53.6853.68 20.00%20.00% 100.00%100.00%
队列系统AQueue System A 3.903.90 86.50%86.50% 94.22%94.22%
队列系统BQueue System B 4.114.11 65.00%65.00% 96.15%96.15%
队列系统Cqueue system C 4.744.74 95.50%95.50% 97.91%97.91%
对于基线方法,如果不实施停止标准,并针对全部第二音频数据进行测试,由于对所有数据库进行了严格的测试,所以得到了所有推送的正面信息,因此,推送速率达到了20%,并且达到了100.00%的精确度。For the baseline method, if the stopping criterion is not implemented and tested against all second audio data, all pushes are positive due to rigorous testing of all databases, therefore, the push rate reaches 20% and reaches 100.00% accuracy.
对于队列系统A,在设置停止准则的情况下,耗时相比基线方法减少了92%,推送速率和精度都较好。For queue system A, when the stopping criterion is set, the time consumption is reduced by 92% compared with the baseline method, and the push rate and accuracy are better.
队列系统B相对队列系统A可提高精度,但是以降低推送速率为代价的。Queuing system B can improve accuracy relative to queuing system A, but at the expense of lowering the push rate.
队列系统C可同时提供较高的推送速率和精度,而耗时很小。Queue system C can provide high push rate and precision at the same time, and the time consumption is very small.
步骤405、若多个第二音频数据中存在第二音频数据与第一音频数据相同或相似,则确定第一音频数据非法。Step 405: If the second audio data is the same as or similar to the first audio data in the plurality of second audio data, determine that the first audio data is illegal.
如果第一音频数据与黑名单中的任一第二音频数据并不相同或并不相似,则可以认定确定第一音频数据合法,通过内容审核,可视业务的需求,执行其他内容审核,或者,向公众发布该第一音频数据。If the first audio data is not the same or similar to any second audio data in the blacklist, it can be determined that the first audio data is legal, pass the content audit, and perform other content audits according to business requirements, or , releasing the first audio data to the public.
如果第一音频数据与黑名单中的某个第二音频数据相同或相似,则可以认定确定第一音频数据非法,无法通过内容审核,并不能向公众发布该第一音频数据,并生成相应的提示信息发送至客户端。与此同时,可以对该客户端中登录的用户执行禁言、冻结、封号等处罚。If the first audio data is the same as or similar to a certain second audio data in the blacklist, it can be determined that the first audio data is illegal, cannot pass the content review, and cannot be released to the public, and generates a corresponding The prompt information is sent to the client. At the same time, users who log in to the client can be banned, frozen, or banned.
在本实施例中,由于对第一音频数据计算第一哈希特征、对第二音频数据计算第二哈希特征、基于第二哈希特征的密度对第二音频数据进行排序、对比第一哈希特征与第二哈希特征,等技术特征与实施例一的应用基本相似,所以描述的比较简单,相关之处参见实施例一的部分说明即可,本实施例在此不加以详述。In this embodiment, since the first hash feature is calculated for the first audio data, the second hash feature is calculated for the second audio data, the second audio data is sorted based on the density of the second hash feature, and the first The technical features such as the hash feature and the second hash feature are basically similar to the application of the first embodiment, so the description is relatively simple, and the relevant parts can be referred to the partial description of the first embodiment, and this embodiment will not be described in detail here. .
在本实施例中,接收客户端上传的第一音频数据,及对第一音频数据计算第一哈希特征;查找当前配置的黑名单,黑名单中记录有多个第二音频数据,多个第二音频数据均已配置第二哈希特征;按照多个第二哈希特征的密度确定多个第二音频数据之间排列的顺序;按照顺序将第一哈希特征与多个第二哈希特征进行对比,以确定多个第二音频数据中是否存在第二音频数据与第一音频数据相同或相似;若多个第二音频数据中存在第二音频数据与第一音频数据相同或相似,则确定第一音频数据非法,较为密集的哈希特征可以提高对比的精确度,通过哈希特征的密度调整音频数据的排序,在基于音频的内容审核中,提高在优先对比的过程中搜索到相同或相似的音频数据的概率,从而在减少对比的次数的情况下,提高搜索音频数据的推送速率,提高搜索音频数据的精确 度。In this embodiment, the first audio data uploaded by the client is received, and the first hash feature is calculated for the first audio data; the currently configured blacklist is searched, and a plurality of second audio data are recorded in the blacklist, and a plurality of The second audio data has been configured with the second hash feature; the order of arrangement among the plurality of second audio data is determined according to the density of the plurality of second hash features; the first hash feature and the plurality of second hash features are arranged in order The feature is compared to determine whether the second audio data is the same or similar to the first audio data in the plurality of second audio data; if the second audio data is the same or similar to the first audio data in the plurality of second audio data , then it is determined that the first audio data is illegal, the denser hash features can improve the accuracy of comparison, and the sorting of audio data can be adjusted by the density of hash features. The probability of getting the same or similar audio data, so as to reduce the number of comparisons, improve the push rate of the search audio data, and improve the accuracy of the search audio data.
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作并不一定是本申请实施例所必须的。It should be noted that, for the sake of simple description, the method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the embodiments of the present application are not limited by the described action sequence, because According to the embodiments of the present application, certain steps may be performed in other sequences or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions involved are not necessarily required by the embodiments of the present application.
实施例三Embodiment 3
图5为本申请实施例三提供的一种音频搜索装置的结构框图,包括如下模块:FIG. 5 is a structural block diagram of an audio search apparatus provided in Embodiment 3 of the present application, including the following modules:
音频数据确定模块501,设置为确定第一音频数据、多个第二音频数据;The audio data determination module 501 is configured to determine the first audio data and a plurality of second audio data;
哈希特征计算模块502,设置为分别对所述第一音频数据计算第一哈希特征、对多个所述第二音频数据计算第二哈希特征;The hash feature calculation module 502 is configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
排序确定模块503,设置为按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;The order determination module 503 is configured to determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
哈希特征对比模块504,设置为按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以查找与所述第一音频数据相同或相似的所述第二音频数据。The hash feature comparison module 504 is configured to compare the first hash feature with a plurality of the second hash features in the order to find the first audio data that is the same or similar to the first audio data. 2. Audio data.
在本申请的一个实施例中,所述音频数据确定模块501包括:In an embodiment of the present application, the audio data determination module 501 includes:
第一频谱图转换模块,设置为将所述第一音频数据转换为第一频谱图;a first spectrogram conversion module, configured to convert the first audio data into a first spectrogram;
第一关键点查找模块,设置为依据能量在所述第一频谱图的多个频谱带上查找第一关键点;a first key point search module, configured to search for a first key point on a plurality of frequency spectrum bands of the first spectrogram according to energy;
第一哈希特征生成模块,设置为基于所述第一关键点生成所述第一音频数据的第一哈希特征;a first hash feature generation module, configured to generate a first hash feature of the first audio data based on the first key point;
第二频谱图转换模块,设置为将每个第二音频数据转换为第二频谱图;A second spectrogram conversion module, configured to convert each second audio data into a second spectrogram;
第二关键点查找模块,设置为依据能量在所述第二频谱图的多个频谱带上查找第二关键点;A second key point searching module, configured to search for a second key point on a plurality of spectral bands of the second spectrogram according to energy;
第二哈希特征生成模块,设置为基于所述第二关键点生成所述每个第二音频数据的第二哈希特征。A second hash feature generation module configured to generate a second hash feature of each of the second audio data based on the second key point.
在本申请的一个实施例中,所述排序确定模块503包括:In an embodiment of the present application, the ranking determining module 503 includes:
局部数量统计模块,设置为统计每个第二哈希特征在多个局部区域中重叠的数量;A local quantity statistics module, set to count the overlapping quantity of each second hash feature in multiple local areas;
局部密度生成模块,设置为基于多个所述局部区域中重叠的数量生成所述每个第二哈希特征在所属的第二音频数据中的密度;a local density generation module, configured to generate the density of each second hash feature in the second audio data to which it belongs based on the number of overlaps in a plurality of the local regions;
音频顺序确定模块,设置为按照所述密度对多个所述第二音频数据进行降序排序,获得多个所述第二音频数据的顺序。The audio sequence determination module is configured to sort the plurality of second audio data in descending order according to the density to obtain the sequence of the plurality of second audio data.
在本申请的一个实施例中,所述局部数量统计模块包括:In an embodiment of the present application, the local quantity statistics module includes:
频谱图获取模块,设置为获取所述每个第二哈希特征所属的第二音频数据的第二频谱图;a spectrogram acquisition module, configured to acquire a second spectrogram of the second audio data to which each second hash feature belongs;
窗口添加模块,设置为在所述第二频谱图上添加多个窗口;a window adding module, configured to add multiple windows on the second spectrogram;
窗口数量统计模块,设置为在多个所述窗口中分别统计所述每个第二哈希特征的数量,作为所述每个第二哈希特征在多个局部区域的数量。The window number statistics module is configured to count the number of each second hash feature in a plurality of the windows respectively, as the number of each second hash feature in a plurality of local areas.
在本申请的一个实施例中,所述窗口添加模块包括:In an embodiment of the present application, the window adding module includes:
窗口查找模块,设置为查找预设的窗口;Window search module, set to search for preset windows;
时间添加模块,设置为每间隔预设的时间在所述第二频谱图上添加所述窗口。A time adding module, configured to add the window on the second spectrogram every preset time interval.
在本申请的一个实施例中,所述窗口的宽度小于或等于所述预设的时间的长度。In an embodiment of the present application, the width of the window is less than or equal to the length of the preset time.
在本申请的一个实施例中,所述局部密度生成模块包括:In an embodiment of the present application, the local density generation module includes:
数量比较模块,设置为将多个所述局部区域中重叠的数量进行比较;a quantity comparison module, configured to compare the overlapping quantities in a plurality of the local regions;
数量取值模块,设置为若某个局部区域中重叠的数量最大,将重叠的数量 最大的所述局部区域中重叠的数量确定为所述每个第二哈希特征在所属的第二音频数据中的密度。The quantity value module is set to, if the overlapping quantity in a certain local area is the largest, determine the overlapping quantity in the partial area with the largest overlapping quantity as the second audio data to which each second hash feature belongs density in .
在本申请的一个实施例中,所述哈希特征对比模块504包括:In an embodiment of the present application, the hash feature comparison module 504 includes:
目标位置确定模块,设置为确定目标位置;The target position determination module is set to determine the target position;
部分特征对比模块,设置为按照所述顺序将所述第一哈希特征与位于所述目标位置之前的所述第二哈希特征进行对比;a partial feature comparison module, configured to compare the first hash feature with the second hash feature located before the target position in the order;
搜索确定模块,设置为若所述第一哈希特征与所述第二哈希特征匹配,则确定所述第一音频数据与所述第二哈希特征所属的第二音频数据相同或相似。A search and determination module, configured to determine that the first audio data and the second audio data to which the second hash feature belongs are identical or similar if the first hash feature matches the second hash feature.
本申请实施例所提供的音频搜索装置可执行本申请任意实施例所提供的音频搜索方法,具备执行方法相应的功能模块。The audio search apparatus provided by the embodiment of the present application can execute the audio search method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
实施例四Embodiment 4
图6为本申请实施例四提供的一种音频搜索装置的结构框图,包括如下模块:6 is a structural block diagram of an audio search apparatus provided in Embodiment 4 of the present application, including the following modules:
音频数据接收模块601,设置为接收客户端上传的第一音频数据,及对所述第一音频数据计算第一哈希特征;The audio data receiving module 601 is configured to receive the first audio data uploaded by the client, and calculate the first hash feature for the first audio data;
黑名单查找模块602,设置为查找当前配置的黑名单,所述黑名单中记录有多个第二音频数据,多个所述第二音频数据均已配置第二哈希特征;The blacklist search module 602 is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
排序确定模块603,设置为按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;an order determination module 603, configured to determine the order of arrangement among a plurality of the second audio data according to the density of the plurality of second hash features;
哈希特征对比模块604,设置为按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以确定多个所述第二音频数据中是否存在第二音频数据与所述第一音频数据相同或相似;Hash feature comparison module 604, configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is a second audio in the plurality of the second audio data data is the same as or similar to the first audio data;
非法音频确定模块605,设置为若多个所述第二音频数据中存在第二音频数据与所述第一音频数据相同或相似,则确定所述第一音频数据非法。The illegal audio determination module 605 is configured to determine that the first audio data is illegal if there is second audio data in the plurality of second audio data that is the same as or similar to the first audio data.
在本申请的一个实施例中,所述音频数据接收模块601包括:In an embodiment of the present application, the audio data receiving module 601 includes:
第一频谱图转换模块,设置为将所述第一音频数据转换为第一频谱图;a first spectrogram conversion module, configured to convert the first audio data into a first spectrogram;
第一关键点查找模块,设置为依据能量在所述第一频谱图的多个频谱带上查找第一关键点;a first key point search module, configured to search for a first key point on a plurality of frequency spectrum bands of the first spectrogram according to energy;
第一哈希特征生成模块,设置为基于所述第一关键点生成所述第一音频数据的第一哈希特征。A first hash feature generation module configured to generate a first hash feature of the first audio data based on the first key point.
在本申请的一个实施例中,还包括:In an embodiment of the present application, it also includes:
第二频谱图转换模块,设置为将每个第二音频数据转换为第二频谱图;A second spectrogram conversion module, configured to convert each second audio data into a second spectrogram;
第二关键点查找模块,设置为依据能量在所述第二频谱图的多个频谱带上查找第二关键点;A second key point searching module, configured to search for a second key point on a plurality of spectral bands of the second spectrogram according to energy;
第二哈希特征生成模块,设置为基于所述第二关键点生成所述每个第二音频数据的第二哈希特征。A second hash feature generation module configured to generate a second hash feature of each of the second audio data based on the second key point.
在本申请的一个实施例中,所述排序确定模块603包括:In an embodiment of the present application, the ranking determining module 603 includes:
局部数量统计模块,设置为统计每个第二哈希特征在多个局部区域中重叠的数量;A local quantity statistics module, set to count the overlapped quantity of each second hash feature in multiple local areas;
局部密度生成模块,设置为基于多个所述局部区域中重叠的数量生成所述每个第二哈希特征在所属的第二音频数据中的密度;a local density generation module, configured to generate the density of each second hash feature in the second audio data to which it belongs based on the number of overlaps in a plurality of the local regions;
音频顺序确定模块,设置为按照所述密度对多个所述第二音频数据进行降序排序,获得多个所述第二音频数据的顺序。The audio sequence determination module is configured to sort the plurality of second audio data in descending order according to the density to obtain the sequence of the plurality of second audio data.
在本申请的一个实施例中,所述局部数量统计模块包括:In an embodiment of the present application, the local quantity statistics module includes:
频谱图获取模块,设置为获取所述每个第二哈希特征所属的第二音频数据的第二频谱图;a spectrogram acquisition module, configured to acquire a second spectrogram of the second audio data to which each second hash feature belongs;
窗口添加模块,设置为在所述第二频谱图上添加多个窗口;a window adding module, configured to add multiple windows on the second spectrogram;
窗口数量统计模块,设置为在多个所述窗口中分别统计所述每个第二哈希特征的数量,作为所述每个第二哈希特征在多个局部区域的数量。The window number statistics module is configured to count the number of each second hash feature in a plurality of the windows respectively, as the number of each second hash feature in a plurality of local areas.
在本申请的一个实施例中,所述窗口添加模块包括:In an embodiment of the present application, the window adding module includes:
窗口查找模块,设置为查找预设的窗口;Window search module, set to search for preset windows;
时间添加模块,设置为每间隔预设的时间在所述第二频谱图上添加所述窗口。A time adding module, configured to add the window on the second spectrogram every preset time interval.
在本申请的一个实施例中,所述窗口的宽度小于或等于所述预设的时间的长度。In an embodiment of the present application, the width of the window is less than or equal to the length of the preset time.
在本申请的一个实施例中,所述局部密度生成模块包括:In an embodiment of the present application, the local density generation module includes:
数量比较模块,设置为将多个所述局部区域中重叠的数量进行比较;a quantity comparison module, configured to compare the overlapping quantities in a plurality of the local regions;
数量取值模块,设置为若某个局部区域中重叠的数量最大,将重叠的数量最大的所述局部区域中重叠的数量确定为所述每个第二哈希特征在所属的第二音频数据中的密度。The quantity value module is set to, if the overlapping quantity in a certain local area is the largest, determine the overlapping quantity in the partial area with the largest overlapping quantity as the second audio data to which each second hash feature belongs density in .
在本申请的一个实施例中,所述哈希特征对比模块604包括:In an embodiment of the present application, the hash feature comparison module 604 includes:
目标位置确定模块,设置为确定目标位置;The target position determination module is set to determine the target position;
部分特征对比模块,设置为按照所述顺序将所述第一哈希特征与位于所述目标位置之前的所述第二哈希特征进行对比;a partial feature comparison module, configured to compare the first hash feature with the second hash feature located before the target position in the order;
搜索确定模块,设置为若所述第一哈希特征与所述第二哈希特征匹配,则确定所述第一音频数据与所述第二哈希特征所属的第二音频数据相同或相似。A search and determination module, configured to determine that the first audio data and the second audio data to which the second hash feature belongs are identical or similar if the first hash feature matches the second hash feature.
本申请实施例所提供的音频搜索装置可执行本申请任意实施例所提供的音频搜索方法,具备执行方法相应的功能模块。The audio search apparatus provided by the embodiment of the present application can execute the audio search method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
实施例五Embodiment 5
本申请实施例五提供了一种计算机设备,该计算机设备中可集成本申请任一实施例提供的音频搜索装置。The fifth embodiment of the present application provides a computer device, in which the audio search apparatus provided by any one of the embodiments of the present application can be integrated.
图7为本申请实施例五提供的一种计算机设备的结构示意图。该计算机设备包括至少一个处理器701、存储器702,该存储器702设置为存储至少一个程序,当至少一个程序被至少一个处理器701执行,使得至少一个处理器701实现本申请任一实施例所述的音频搜索方法。FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 5 of the present application. The computer device includes at least one processor 701 and a memory 702, and the memory 702 is configured to store at least one program. When the at least one program is executed by the at least one processor 701, the at least one processor 701 implements the description in any embodiment of the present application. audio search method.
实施例六Embodiment 6
本申请实施例六还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述音频搜索方法的各个过程,为避免重复,这里不再赘述。Embodiment 6 of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above audio search method is implemented. Repeat.

Claims (13)

  1. 一种音频搜索方法,包括:An audio search method comprising:
    确定第一音频数据、多个第二音频数据;determining first audio data, a plurality of second audio data;
    分别对所述第一音频数据计算第一哈希特征、对多个所述第二音频数据计算第二哈希特征;respectively calculating a first hash feature for the first audio data, and calculating a second hash feature for a plurality of the second audio data;
    按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;Determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
    按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以查找与所述第一音频数据相同或相似的所述第二音频数据。The first hash feature is compared with a plurality of the second hash features in the order to find the second audio data that is the same as or similar to the first audio data.
  2. 根据权利要求1所述的方法,其中,所述分别对所述第一音频数据计算第一哈希特征、对多个所述第二音频数据计算第二哈希特征,包括:The method according to claim 1, wherein the calculating a first hash feature for the first audio data and calculating a second hash feature for a plurality of the second audio data respectively comprises:
    将所述第一音频数据转换为第一频谱图;converting the first audio data into a first spectrogram;
    依据能量在所述第一频谱图的多个频谱带上查找第一关键点;searching for a first key point on a plurality of spectral bands of the first spectrogram according to the energy;
    基于所述第一关键点生成所述第一音频数据的第一哈希特征;generating a first hash feature of the first audio data based on the first key point;
    将每个第二音频数据转换为第二频谱图;converting each second audio data to a second spectrogram;
    依据能量在所述第二频谱图的多个频谱带上查找第二关键点;searching for a second key point on a plurality of spectral bands of the second spectrogram according to the energy;
    基于所述第二关键点生成所述每个第二音频数据的第二哈希特征。A second hash feature of each of the second audio data is generated based on the second keypoint.
  3. 根据权利要求1所述的方法,其中,所述按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序,包括:The method according to claim 1, wherein the determining the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features comprises:
    统计每个第二哈希特征在多个局部区域中重叠的数量;Count the number of overlaps of each second hash feature in multiple local regions;
    基于多个所述局部区域中重叠的数量生成所述每个第二哈希特征在所属的第二音频数据中的密度;generating a density of each of the second hash features in the associated second audio data based on the number of overlaps in a plurality of the local regions;
    按照所述密度对多个所述第二音频数据进行降序排序,获得多个所述第二音频数据的顺序。Sort the plurality of second audio data in descending order according to the density to obtain an order of the plurality of second audio data.
  4. 根据权利要求3所述的方法,其中,所述统计每个第二哈希特征在多个局部区域中重叠的数量,包括:The method of claim 3, wherein the counting the number of overlaps of each second hash feature in a plurality of local regions comprises:
    获取所述每个第二哈希特征所属的第二音频数据的第二频谱图;obtaining a second spectrogram of the second audio data to which each second hash feature belongs;
    在所述第二频谱图上添加多个窗口;adding a plurality of windows on the second spectrogram;
    在多个所述窗口中分别统计所述每个第二哈希特征的数量,作为所述每个第二哈希特征在多个局部区域的数量。The number of each second hash feature is counted in a plurality of the windows, as the number of each second hash feature in a plurality of local regions.
  5. 根据权利要求4所述的方法,其中,所述在所述第二频谱图上添加多个窗口,包括:The method of claim 4, wherein the adding a plurality of windows on the second spectrogram comprises:
    查找预设的窗口;Find the default window;
    每间隔预设的时间在所述第二频谱图上添加所述窗口。The window is added on the second spectrogram at preset time intervals.
  6. 根据权利要求5所述的方法,其中,所述窗口的宽度小于或等于所述预设的时间的长度。The method according to claim 5, wherein the width of the window is less than or equal to the length of the preset time.
  7. 根据权利要求3所述的方法,其中,所述基于多个所述局部区域中重叠的数量生成所述每个第二哈希特征在所属第二音频数据中的密度,包括:The method according to claim 3, wherein generating the density of each of the second hash features in the second audio data to which they belong based on the number of overlaps in a plurality of the local regions comprises:
    将多个所述局部区域中重叠的数量进行比较;comparing the number of overlaps in a plurality of said local regions;
    响应于某个局部区域中重叠的数量最大,将重叠的数量最大的局部区域中重叠的数量确定为所述每个第二哈希特征在所属的第二音频数据中的密度。In response to the largest number of overlaps in a certain partial region, the number of overlaps in the partial region with the largest number of overlaps is determined as the density of each of the second hash features in the associated second audio data.
  8. 根据权利要求1-7任一项所述的方法,其中,所述按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以查找与所述第一音频数据相同或相似的所述第二音频数据,包括:The method according to any one of claims 1-7, wherein the comparing the first hash feature with a plurality of the second hash features in the order to find a The second audio data with the same or similar audio data, including:
    确定目标位置;determine the target location;
    按照所述顺序将所述第一哈希特征与位于所述目标位置之前的所述第二哈希特征进行对比;comparing the first hash feature with the second hash feature prior to the target location in the order;
    响应于所述第一哈希特征与所述第二哈希特征匹配,确定所述第一音频数据与所述第二哈希特征所属的第二音频数据相同或相似。In response to the first hash feature matching the second hash feature, it is determined that the first audio data is identical or similar to the second audio data to which the second hash feature belongs.
  9. 一种音频搜索方法,包括:An audio search method comprising:
    接收客户端上传的第一音频数据,及对所述第一音频数据计算第一哈希特征;receiving the first audio data uploaded by the client, and calculating a first hash feature for the first audio data;
    查找当前配置的黑名单,所述黑名单中记录有多个第二音频数据,多个所 述第二音频数据均已配置第二哈希特征;Find the blacklist of the current configuration, the blacklist is recorded with a plurality of second audio data, and a plurality of the second audio data have been configured with the second hash feature;
    按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;Determine the order of arrangement among the plurality of the second audio data according to the density of the plurality of the second hash features;
    按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以确定多个所述第二音频数据中是否存在第二音频数据与所述第一音频数据相同或相似;The first hash feature is compared with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data that is the same as the first audio data or similar;
    响应于多个所述第二音频数据中存在第二音频数据与所述第一音频数据相同或相似,确定所述第一音频数据非法。The first audio data is determined to be illegal in response to second audio data being the same as or similar to the first audio data in the plurality of second audio data.
  10. 一种音频搜索装置,包括:An audio search device, comprising:
    音频数据确定模块,设置为确定第一音频数据、多个第二音频数据;an audio data determination module, configured to determine the first audio data and a plurality of second audio data;
    哈希特征计算模块,设置为分别对所述第一音频数据计算第一哈希特征、对多个所述第二音频数据计算第二哈希特征;A hash feature calculation module, configured to calculate a first hash feature for the first audio data and a second hash feature for a plurality of the second audio data respectively;
    排序确定模块,设置为按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;an order determination module, configured to determine an order of arrangement among a plurality of the second audio data according to the density of the plurality of second hash features;
    哈希特征对比模块,设置为按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以查找与所述第一音频数据相同或相似的所述第二音频数据。A hash feature comparison module, configured to compare the first hash feature with a plurality of the second hash features in the order to find the second hash features that are the same as or similar to the first audio data audio data.
  11. 一种音频搜索装置,包括:An audio search device, comprising:
    音频数据接收模块,设置为接收客户端上传的第一音频数据,及对所述第一音频数据计算第一哈希特征;an audio data receiving module, configured to receive the first audio data uploaded by the client, and calculate a first hash feature for the first audio data;
    黑名单查找模块,设置为查找当前配置的黑名单,所述黑名单中记录有多个第二音频数据,多个所述第二音频数据均已配置第二哈希特征;The blacklist search module is configured to search for a currently configured blacklist, where a plurality of second audio data are recorded in the blacklist, and a second hash feature has been configured for the plurality of second audio data;
    排序确定模块,设置为按照多个所述第二哈希特征的密度确定多个所述第二音频数据之间排列的顺序;an order determination module, configured to determine an order of arrangement among a plurality of the second audio data according to the density of the plurality of second hash features;
    哈希特征对比模块,设置为按照所述顺序将所述第一哈希特征与多个所述第二哈希特征进行对比,以确定多个所述第二音频数据中是否存在第二音频数 据与所述第一音频数据相同或相似;A hash feature comparison module, configured to compare the first hash feature with a plurality of the second hash features in the order to determine whether there is second audio data in the plurality of second audio data the same or similar to the first audio data;
    非法音频确定模块,设置为响应于多个所述第二音频数据中存在第二音频数据与所述第一音频数据相同或相似,确定所述第一音频数据非法。The illegal audio determination module is configured to determine that the first audio data is illegal in response to the presence of second audio data in the plurality of second audio data that is identical to or similar to the first audio data.
  12. 一种计算机设备,包括:A computer device comprising:
    至少一个处理器;at least one processor;
    存储器,设置为存储至少一个程序,memory, arranged to store at least one program,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-9中任一项所述的音频搜索方法。When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the audio search method according to any one of claims 1-9.
  13. 一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1-9中任一项所述的音频搜索方法。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the audio search method according to any one of claims 1-9.
PCT/CN2022/073291 2021-01-28 2022-01-21 Audio search method and apparatus, computer device, and storage medium WO2022161291A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110119351.4A CN112784098A (en) 2021-01-28 2021-01-28 Audio searching method and device, computer equipment and storage medium
CN202110119351.4 2021-01-28

Publications (1)

Publication Number Publication Date
WO2022161291A1 true WO2022161291A1 (en) 2022-08-04

Family

ID=75759439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/073291 WO2022161291A1 (en) 2021-01-28 2022-01-21 Audio search method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112784098A (en)
WO (1) WO2022161291A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784098A (en) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 Audio searching method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915403A (en) * 2015-06-01 2015-09-16 腾讯科技(北京)有限公司 Information processing method and server
CN109189978A (en) * 2018-08-27 2019-01-11 广州酷狗计算机科技有限公司 The method, apparatus and storage medium of audio search are carried out based on speech message
CN110019921A (en) * 2017-11-16 2019-07-16 阿里巴巴集团控股有限公司 Correlating method and device, the audio search method and device of audio and attribute
CN112784098A (en) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 Audio searching method and device, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8463719B2 (en) * 2009-03-11 2013-06-11 Google Inc. Audio classification for information retrieval using sparse features
CN103440313B (en) * 2013-08-27 2018-10-16 复旦大学 music retrieval system based on audio fingerprint feature
CN107526846B (en) * 2017-09-27 2021-09-24 百度在线网络技术(北京)有限公司 Method, device, server and medium for generating and sorting channel sorting model
CN111274360A (en) * 2020-01-20 2020-06-12 深圳五洲无线股份有限公司 Answer extraction method and input method of intelligent voice question and answer and intelligent equipment
CN111462775B (en) * 2020-03-30 2023-11-03 腾讯科技(深圳)有限公司 Audio similarity determination method, device, server and medium
CN111597379B (en) * 2020-07-22 2020-11-03 深圳市声扬科技有限公司 Audio searching method and device, computer equipment and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915403A (en) * 2015-06-01 2015-09-16 腾讯科技(北京)有限公司 Information processing method and server
CN110019921A (en) * 2017-11-16 2019-07-16 阿里巴巴集团控股有限公司 Correlating method and device, the audio search method and device of audio and attribute
CN109189978A (en) * 2018-08-27 2019-01-11 广州酷狗计算机科技有限公司 The method, apparatus and storage medium of audio search are carried out based on speech message
CN112784098A (en) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 Audio searching method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112784098A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
Wang The Shazam music recognition service
US20200257722A1 (en) Method and apparatus for retrieving audio file, server, and computer-readable storage medium
US9798513B1 (en) Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications
US9092518B2 (en) Automatic identification of repeated material in audio signals
US20160328473A1 (en) Systems and Methods for Recognizing Sound and Music Signals in High Noise and Distortion
Cano et al. Robust sound modeling for song detection in broadcast audio
JP5907511B2 (en) System and method for audio media recognition
US20160132600A1 (en) Methods and Systems for Performing Content Recognition for a Surge of Incoming Recognition Queries
CN106802960B (en) Fragmented audio retrieval method based on audio fingerprints
US8706276B2 (en) Systems, methods, and media for identifying matching audio
CN1759396A (en) Improved data retrieval method and system
CN108447501B (en) Pirated video detection method and system based on audio words in cloud storage environment
WO2016189307A1 (en) Audio identification method
CN108197319A (en) A kind of audio search method and system of the characteristic point based on time-frequency local energy
US20200081914A1 (en) Systems, methods, and apparatus to improve media identification
WO2022161291A1 (en) Audio search method and apparatus, computer device, and storage medium
WO2022194277A1 (en) Audio fingerprint processing method and apparatus, and computer device and storage medium
CN109271501A (en) A kind of management method and system of audio database
Kekre et al. A review of audio fingerprinting and comparison of algorithms
Bisio et al. Opportunistic estimation of television audience through smartphones
Tzanetakis Audio-based gender identification using bootstrapping
Senevirathna et al. Radio Broadcast Monitoring to Ensure Copyright Ownership
KR20130104878A (en) The music searching method using energy and statistical filtering, apparatus and system thereof
Jie et al. Improved algorithms of music information retrieval based on audio fingerprint
Medina et al. Audio fingerprint parameterization for multimedia advertising identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22745166

Country of ref document: EP

Kind code of ref document: A1