CN108428457B - Audio duplicate removal method and device - Google Patents

Audio duplicate removal method and device Download PDF

Info

Publication number
CN108428457B
CN108428457B CN201810146085.2A CN201810146085A CN108428457B CN 108428457 B CN108428457 B CN 108428457B CN 201810146085 A CN201810146085 A CN 201810146085A CN 108428457 B CN108428457 B CN 108428457B
Authority
CN
China
Prior art keywords
compared
audio
candidate
segment
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810146085.2A
Other languages
Chinese (zh)
Other versions
CN108428457A (en
Inventor
田超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810146085.2A priority Critical patent/CN108428457B/en
Publication of CN108428457A publication Critical patent/CN108428457A/en
Application granted granted Critical
Publication of CN108428457B publication Critical patent/CN108428457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals

Abstract

The invention provides an audio duplicate removal method and an audio duplicate removal device, wherein the method comprises the following steps: acquiring characteristic information of audio to be compared; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; in the embodiment, the audio to be compared is segmented by acquiring the characteristic information of the audio to be compared, and is compared with the candidate audio based on the characteristic value in each segment, so that the calculation amount of comparison and the resources occupied during calculation are reduced, and the duplication removal efficiency is improved.

Description

Audio duplicate removal method and device
Technical Field
The present invention relates to the field of audio processing technologies, and in particular, to an audio deduplication method and an audio deduplication device.
Background
At present, a method for removing duplicate audio in services such as Baidu Feed stream rich media of a mobile phone mainly comprises the following steps: comparing the audio to be compared with the audio in the audio library one by one, determining similar audio corresponding to the audio to be compared, and performing duplication removal operation; and if the audio to be compared is the audio corresponding to the video to be compared, performing duplication removal operation when the video to be compared is similar to the video corresponding to the similar audio. However, the above audio deduplication method has high requirements on the operating efficiency of each machine in the cluster, occupies a large amount of resources of each machine, has poor deduplication efficiency and low deduplication speed, and is difficult to meet actual requirements.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present invention is to provide an audio deduplication method, which is used for solving the problems of poor audio deduplication efficiency, low speed and large occupied resource in the prior art.
A second objective of the present invention is to provide an audio deduplication apparatus.
A third object of the invention is to propose another audio deduplication apparatus.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides an audio deduplication method, including:
acquiring characteristic information of audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
according to the characteristic information of the audio to be compared, an audio library is inquired by adopting an inverted index method, and the characteristic information of the candidate audio corresponding to the audio to be compared is obtained;
aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared;
comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared;
determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared;
and when the candidate audio is the similar candidate audio corresponding to the audio to be compared, carrying out duplication removal operation on the audio to be compared.
Further, the comparing the each to-be-compared fragment with the each candidate fragment to determine a similar candidate fragment corresponding to the each to-be-compared fragment includes:
dividing each segment to be compared in a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared;
dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment;
comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared;
aiming at each fragment to be compared, acquiring a candidate high-frequency fragment corresponding to a corresponding similar candidate low-frequency fragment, comparing the candidate high-frequency fragment with the high-frequency fragment to be compared in the fragments to be compared, and determining the similar candidate fragment corresponding to each fragment to be compared.
Further, before the dividing the segments to be compared in the frequency dimension to obtain the low-frequency segments to be compared and the high-frequency segments to be compared, the method further includes:
acquiring background sounds and foreground sounds of the audio to be compared and background sounds and foreground sounds of the candidate audio;
judging whether the background sound of the audio to be compared is the same as the background sound of the candidate audio, and judging whether the foreground sound of the audio to be compared is the same as the foreground sound of the candidate audio;
and determining that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio.
Further, after comparing each low-frequency segment to be compared with each candidate low-frequency segment and determining a similar candidate low-frequency segment corresponding to each low-frequency segment to be compared, the method further includes:
aiming at each fragment to be compared, dividing the fragment to be compared in a time dimension to obtain each sub fragment to be compared;
dividing candidate segments comprising corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the to-be-compared sub-segment is equal to the time length of the candidate sub-segment;
comparing the sub-segment to be compared with the corresponding candidate sub-segment to obtain the similarity between the sub-segment to be compared and the corresponding candidate sub-segment;
and determining similar candidate segments corresponding to the to-be-compared segments according to the similarity between the to-be-compared sub-segments and the corresponding candidate sub-segments.
Further, the audio library includes: each index segment and characteristic information of audio comprising the index segment;
the querying, according to the feature information of the audio to be compared, an audio library by using an inverted index method to obtain the feature information of the candidate audio corresponding to the audio to be compared includes:
inquiring the audio library according to the characteristic information of the audio to be compared, and acquiring an index fragment matched with the audio to be compared;
determining the audio frequency comprising the matched index fragment as a candidate audio frequency corresponding to the audio frequency to be compared;
and acquiring the characteristic information of the candidate audio.
Further, before obtaining, for each candidate audio, each candidate segment in the feature information of the candidate audio and each to-be-compared segment in the feature information of the to-be-compared audio, the method further includes:
aiming at each candidate audio, acquiring a first time length of the characteristic information of the candidate audio and a second time length of the characteristic information of the audio to be compared;
when the first time length is smaller than the second time length, dividing the characteristic information of the candidate audio frequency in a time dimension to obtain each candidate segment; sliding the characteristic information of the audio to be compared on a time dimension by adopting a time frequency window to obtain each segment to be compared;
when the first time length is longer than the second time length, dividing the characteristic information of the audio to be compared in a time dimension to obtain each segment to be compared; sliding the characteristic information of the candidate audio frequency in a time dimension by adopting a time frequency window to obtain each candidate segment; the time length of the to-be-aligned segment is equal to the time length of the candidate segment.
Further, the feature information includes: mutant and non-mutant feature points;
the comparing each to-be-compared fragment with each candidate fragment to determine a similar candidate fragment corresponding to each to-be-compared fragment includes:
comparing each fragment to be compared with each candidate fragment to obtain the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
determining the similarity between each fragment to be compared and each candidate fragment according to the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
and determining similar candidate fragments corresponding to the fragments to be compared according to the similarity between the fragments to be compared and the candidate fragments.
Further, the determining whether the candidate audio is a similar candidate audio corresponding to the to-be-compared audio according to the similar candidate segment corresponding to each to-be-compared segment includes:
when the similar candidate segments are consecutive and the number of the similar candidate segments exceeds a first number threshold, or,
when the similar candidate segments are not consecutive and the number of similar candidate segments exceeds a second number threshold,
determining the candidate audio frequency as a similar candidate audio frequency corresponding to the audio frequency to be compared; the first quantity threshold is less than the second quantity threshold.
According to the audio duplicate removal method, the characteristic information of the audio to be compared is obtained; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
In order to achieve the above object, a second embodiment of the present invention provides an audio deduplication device, including:
the acquisition module is used for acquiring the characteristic information of the audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
the query module is used for querying an audio library by adopting an inverted index method according to the characteristic information of the audio to be compared and acquiring the characteristic information of the candidate audio corresponding to the audio to be compared;
the acquisition module is further configured to acquire, for each candidate audio, each candidate segment in the feature information of the candidate audio and each to-be-compared segment in the feature information of the to-be-compared audio;
the comparison module is used for comparing each to-be-compared fragment with each candidate fragment and determining similar candidate fragments corresponding to each to-be-compared fragment;
a determining module, configured to determine whether the candidate audio is a similar candidate audio corresponding to the to-be-compared audio according to a similar candidate segment corresponding to each to-be-compared segment;
and the duplication removing module is used for carrying out duplication removing operation on the audio to be compared when the candidate audio is a similar candidate audio corresponding to the audio to be compared.
Further, the alignment module comprises:
the dividing unit is used for dividing each segment to be compared on a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared;
the dividing unit is further configured to divide the candidate segments in a frequency dimension to obtain candidate low-frequency segments and candidate high-frequency segments;
the comparison unit is used for comparing each low-frequency segment to be compared with each candidate low-frequency segment and determining similar candidate low-frequency segments corresponding to each low-frequency segment to be compared;
the comparison unit is further configured to obtain, for each to-be-compared fragment, a candidate high-frequency fragment corresponding to a corresponding similar candidate low-frequency fragment, compare the candidate high-frequency fragment with a to-be-compared high-frequency fragment in the to-be-compared fragments, and determine a similar candidate fragment corresponding to each to-be-compared fragment.
Further, the comparing module further includes:
the acquisition unit is used for acquiring background sound and foreground sound of the audio to be compared and background sound and foreground sound of the candidate audio;
the judging unit is used for judging whether the background sound of the audio to be compared is the same as the background sound of the candidate audio and judging whether the foreground sound of the audio to be compared is the same as the foreground sound of the candidate audio;
the first determining unit is used for determining that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio.
Further, the comparing module further includes: a second determination unit;
the dividing unit is further configured to divide the segments to be compared in a time dimension for each segment to be compared, so as to obtain each sub-segment to be compared;
the dividing unit is further configured to divide the candidate segments including the corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the to-be-compared sub-segment is equal to the time length of the candidate sub-segment;
the comparing unit is further configured to compare the to-be-compared sub-segment with the corresponding candidate sub-segment, and obtain a similarity between the to-be-compared sub-segment and the corresponding candidate sub-segment;
the second determining unit is configured to determine a similar candidate segment corresponding to the to-be-compared segment according to a similarity between the to-be-compared sub-segment and the corresponding candidate sub-segment.
Further, the audio library includes: each index segment and characteristic information of audio comprising the index segment;
the query module is specifically configured to,
inquiring the audio library according to the characteristic information of the audio to be compared, and acquiring an index fragment matched with the audio to be compared;
determining the audio frequency comprising the matched index fragment as a candidate audio frequency corresponding to the audio frequency to be compared;
and acquiring the characteristic information of the candidate audio.
Further, the device further comprises: a dividing module;
the acquisition module is further configured to acquire, for each candidate audio, a first time length of feature information of the candidate audio and a second time length of feature information of the audio to be compared;
the dividing module is configured to divide feature information of the candidate audio in a time dimension to obtain each candidate segment when the first time length is smaller than the second time length; sliding the characteristic information of the audio to be compared on a time dimension by adopting a time frequency window to obtain each segment to be compared;
when the first time length is longer than the second time length, dividing the characteristic information of the audio to be compared in a time dimension to obtain each segment to be compared; sliding the characteristic information of the candidate audio frequency in a time dimension by adopting a time frequency window to obtain each candidate segment; the time length of the to-be-aligned segment is equal to the time length of the candidate segment.
Further, the feature information includes: mutant and non-mutant feature points;
the comparison module is specifically used for comparing the data of the data acquisition module,
comparing each fragment to be compared with each candidate fragment to obtain the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
determining the similarity between each fragment to be compared and each candidate fragment according to the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
and determining similar candidate fragments corresponding to the fragments to be compared according to the similarity between the fragments to be compared and the candidate fragments.
Further, the determining module is specifically configured to,
when the similar candidate segments are consecutive and the number of the similar candidate segments exceeds a first number threshold, or,
when the similar candidate segments are not consecutive and the number of similar candidate segments exceeds a second number threshold,
determining the candidate audio frequency as a similar candidate audio frequency corresponding to the audio frequency to be compared; the first quantity threshold is less than the second quantity threshold.
The audio duplicate removal device of the embodiment of the invention obtains the characteristic information of the audio to be compared; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
In order to achieve the above object, a third embodiment of the present invention provides an audio deduplication device, including: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the audio deduplication method as described above when executing the program.
In order to achieve the above object, a fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the audio deduplication method as described above.
In order to achieve the above object, a fifth aspect of the present invention provides a computer program product, wherein when executed by an instruction processor in the computer program product, an audio deduplication method is performed, and the method includes:
acquiring characteristic information of audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
according to the characteristic information of the audio to be compared, an audio library is inquired by adopting an inverted index method, and the characteristic information of the candidate audio corresponding to the audio to be compared is obtained;
aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared;
comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared;
determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared;
and when the candidate audio is the similar candidate audio corresponding to the audio to be compared, carrying out duplication removal operation on the audio to be compared.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of an audio deduplication method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of dividing feature information of audio to be compared in a time dimension;
FIG. 3 is a schematic diagram of sliding feature information of candidate audio in a time dimension by using a time frequency window;
FIG. 4 is a flowchart illustrating another audio deduplication method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the division of the fragments to be aligned in the frequency dimension;
FIG. 6 is a flowchart illustrating another audio deduplication method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the division of the fragments to be aligned in the time dimension;
fig. 8 is a schematic structural diagram of an audio deduplication apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
An audio deduplication method and apparatus according to an embodiment of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating an audio deduplication method according to an embodiment of the present invention. As shown in fig. 1, the audio deduplication method includes the following steps:
s101, acquiring characteristic information of audio to be compared; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain.
The execution main body of the audio duplication eliminating method provided by the invention is an audio duplication eliminating device, and the audio duplication eliminating device can be hardware equipment such as a server, a server cluster and the like or software installed on the hardware equipment. In this embodiment, the audio to be compared may be an independent audio or an audio corresponding to a video.
The format of the audio to be compared may be Pulse Code Modulation (PCM). Pulse code modulation is one of the coding modes of digital communication, and the main process is to sample analog signals such as voice and image at regular intervals, discretize the analog signals, round the sampled values to obtain integer quantization, and express the sampled values in terms of a set of binary codes to represent the amplitude of sampled pulses. Therefore, the audio to be compared includes: and an array consisting of binary codes corresponding to the sampling points.
In this embodiment, the obtaining manner of the feature information of the audio to be compared may be, for example, (1) for the audio to be compared in the PCM format, every 1000 sampling points, 4096 sampling points are taken to perform Fast Fourier Transform (FFT), so as to obtain an FFT output of 2048 points; (2) arranging the FFT outputs of 2048 points according to time to obtain data of a plurality of 2048 points; (3) then, a time frequency window is adopted to slide along the time axis of the data, wherein the length of the time frequency window is 2048 points (1 frame), and the step length of the time frequency window sliding is 1 point; (4) normalizing the data in each time frequency window, then checking the standard deviation of the central point in each time frequency window, and if the standard deviation is greater than a certain threshold value, considering the time point, the energy of the frequency point is relatively large, and the corresponding characteristic value is 1; in other cases, the corresponding characteristic value is 0, so as to obtain the characteristic information of the audio to be compared. It should be noted that the characteristic information of the audio to be compared is a characteristic value at each coordinate point with time as an abscissa and frequency as an ordinate.
Wherein, the corresponding point with the characteristic value of 1 can be determined as a mutation characteristic point; and the corresponding point with the characteristic value of 0 is determined as a non-mutation characteristic point.
In addition, it should be noted that when the audio to be compared is the audio corresponding to the video, the video may be compared first to obtain a similar video, the audio corresponding to the similar video is determined as the candidate audio, then the audio to be compared is compared with the candidate audio, and duplication removal is performed when the audio to be compared is similar to the candidate audio.
S102, according to the characteristic information of the audio to be compared, the audio library is inquired by adopting an inverted index method, and the characteristic information of the candidate audio corresponding to the audio to be compared is obtained.
In this embodiment, the audio library includes: each index segment, and feature information of audio including the index segment.
Correspondingly, the process of the audio deduplication device executing the step 102 may specifically be that, according to the feature information of the audio to be compared, an audio library is queried, and an index segment matched with the audio to be compared is obtained; determining the audio frequency comprising the matched index fragment as a candidate audio frequency corresponding to the audio frequency to be compared; and acquiring the characteristic information of the candidate audio.
The process of querying the audio library according to the feature information of the audio to be compared and acquiring the index segment matched with the audio to be compared specifically includes dividing the feature information of the audio to be compared and acquiring a plurality of segments, wherein the time lengths of the plurality of segments are the same as the time lengths of the index segments; comparing the plurality of segments with each index segment respectively to determine the similarity between the plurality of segments and the index segments; and determining the index segment with the corresponding similarity meeting a certain threshold as the index segment matched with the audio to be compared.
S103, aiming at each candidate audio, each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared are obtained.
In this embodiment, in order to reduce the number of comparison times and increase the comparison speed, before step 103, the method may further include: aiming at each candidate audio, acquiring a first time length of the characteristic information of the candidate audio and a second time length of the characteristic information of the audio to be compared; when the first time length is smaller than the second time length, dividing the characteristic information of the candidate audio frequency in a time dimension to obtain each candidate segment; sliding the characteristic information of the audio to be compared on a time dimension by adopting a time frequency window to obtain each segment to be compared; when the first time length is longer than the second time length, dividing the characteristic information of the audio to be compared in a time dimension to obtain each segment to be compared; adopting a time frequency window to slide the characteristic information of the candidate audio in a time dimension to obtain each candidate segment; the time length of the to-be-aligned segment is equal to the time length of the candidate segment.
In this embodiment, when the first time length is longer than the second time length, a schematic diagram of dividing the feature information of the audio to be compared in the time dimension may be as shown in fig. 2. In fig. 2, the abscissa of the feature information of the audio to be compared is time, and the ordinate is frequency, and 7 segments to be compared as shown in fig. 2 can be obtained after division in the time dimension.
When the first time length is greater than the second time length, a schematic diagram of sliding feature information of candidate audio in a time dimension by using a time frequency window may be as shown in fig. 3.
In this embodiment, in the process of comparing the characteristic information of the two audios, the characteristic information of the audio with a smaller time length is directly subjected to segment division, and the characteristic information of the audio with a larger time length is slid by using a time frequency window, so that the comparison times between the divided segments can be reduced to a certain extent in the process of comparing the characteristic information of the two audios, thereby reducing the calculation amount in the comparison process, improving the calculation speed, and further improving the deduplication speed.
S104, comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared.
In this embodiment, the feature information may include: mutant and non-mutant characteristic points. Correspondingly, the process of the audio deduplication device executing step 104 may specifically be that each segment to be compared is compared with each candidate segment, and the same number of mutation feature points in each segment to be compared and each candidate segment is obtained; determining the similarity between each fragment to be compared and each candidate fragment according to the same number of mutation characteristic points in each fragment to be compared and each candidate fragment; and determining similar candidate segments corresponding to the segments to be compared according to the similarity between the segments to be compared and the candidate segments.
The process of determining the similarity between each segment to be compared and each candidate segment may specifically be to obtain the number of first mutation feature points in the segments to be compared, according to the number of the same mutation feature points in each segment to be compared and each candidate segment; acquiring the number of second feature points in the candidate segment; acquiring the maximum value of the number of the first characteristic points and the number of the second characteristic points; and determining the ratio of the same number of the mutation characteristic points to the total number of the mutation characteristic points in the segment to be compared to the maximum value as the similarity between the segment to be compared and the candidate segment.
Specifically, the process of determining the similar candidate segments corresponding to the respective to-be-compared segments according to the similarity between the respective to-be-compared segments and the respective candidate segments may be to determine, for the respective to-be-compared segments, the candidate segments whose corresponding similarity is greater than a preset similarity threshold as the similar candidate segments corresponding to the to-be-compared segments.
And S105, determining whether the candidate audio is the similar candidate audio corresponding to the audio to be compared according to the similar candidate segments corresponding to the segments to be compared.
In this embodiment, the process of the audio deduplication device executing step 105 may specifically be that when the similar candidate segments are consecutive and the number of the similar candidate segments exceeds the first number threshold; or when the similar candidate segments are discontinuous and the number of the similar candidate segments exceeds a second number threshold, determining the candidate audio as the similar candidate audio corresponding to the audio to be compared. Wherein the first number threshold is less than the second number threshold.
And S106, when the candidate audio is the similar candidate audio corresponding to the audio to be compared, performing duplication elimination operation on the audio to be compared.
In addition, when the similar candidate audio corresponding to the audio to be compared does not exist, the audio to be compared is not subjected to the duplicate removal operation. In addition, it should be further explained that, when there is no similar candidate audio corresponding to the audio to be compared, an index segment matching the audio to be compared in the audio library is obtained, the feature information of the audio to be compared is determined to include the feature information of the audio of the matched index segment, and the feature information of the audio to be compared is added to the audio library.
According to the audio duplicate removal method, the characteristic information of the audio to be compared is obtained; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
Fig. 4 is a schematic flow chart of another audio deduplication method according to an embodiment of the present invention, as shown in fig. 4, based on the embodiment shown in fig. 1, step 104 may specifically include the following steps:
s1041, dividing each fragment to be compared in a frequency dimension, and acquiring a low-frequency fragment to be compared and a high-frequency fragment to be compared.
As shown in fig. 5, fig. 5 is a schematic diagram of dividing the segments to be compared in the frequency dimension, and in fig. 5, the frequency range of the segments to be compared is 0 to 640. The frequency range of the low frequency segment to be compared obtained by division is 0-320, and the frequency range of the high frequency segment to be compared obtained is 320-640.
In this embodiment, since the mutation feature points in the segments to be compared are generally mainly concentrated in the low frequency band, and the mutation feature points in the high frequency band are fewer, and for some scenes, for example, under the condition that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio, the number of the same mutation feature points in the low frequency band of the segments to be compared is larger than that of the candidate segments, and the number of the same mutation feature points in the high frequency band is smaller, for such a scene, if the segments to be compared are directly compared with the candidate segments, and whether the segments to be compared are similar to the candidate segments is determined according to the similarity of the segments to be compared and the candidate segments, it is possible to determine that some segments to be compared and the candidate segments that are actually not similar to each other, and therefore, in order to avoid such a, the segment to be compared and the candidate segment can be divided in the frequency dimension, the low-frequency segment to be compared and the candidate low-frequency segment are compared, and then the high-frequency segment to be compared and the candidate high-frequency segment are compared, so that the comparison accuracy is improved.
Therefore, further, in order to improve the accuracy and efficiency of alignment, before step 1041, the method may further include: acquiring background sounds and foreground sounds of audio to be compared and background sounds and foreground sounds of candidate audio; judging whether the background sound of the audio to be compared is the same as the background sound of the candidate audio, and judging whether the foreground sound of the audio to be compared is the same as the foreground sound of the candidate audio; and determining that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio.
In addition, if the background sound of the audio to be compared is different from the background sound of the candidate audio, or the foreground sound of the audio to be compared is the same as the foreground sound of the candidate audio, the segment comparison method in the embodiment shown in fig. 1 may be adopted, or the segment comparison method in the embodiment shown in fig. 4 may be adopted.
S1042, dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment.
S1043, comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared.
S1044 is that for each to-be-compared fragment, a candidate high-frequency fragment corresponding to the corresponding similar candidate low-frequency fragment is obtained, the candidate high-frequency fragment is compared with the to-be-compared high-frequency fragments in the to-be-compared fragments, and the similar candidate fragment corresponding to each to-be-compared fragment is determined.
In this embodiment, after obtaining similar candidate low-frequency segments corresponding to the low-frequency segments to be compared in each segment to be compared, candidate high-frequency segments corresponding to the similar candidate low-frequency segments may be obtained, and then the high-frequency segments to be compared are compared with the candidate high-frequency segments to determine the similar candidate segments corresponding to the segments to be compared.
It should be noted that, in this embodiment, for each to-be-compared segment and each candidate segment, the to-be-compared low-frequency segment is compared with the subsequent low-frequency segments; then, aiming at each fragment to be compared, acquiring a candidate high-frequency fragment corresponding to the corresponding similar candidate low-frequency fragment, comparing the candidate high-frequency fragment with the high-frequency fragment to be compared in the fragments to be compared, and determining the similar candidate fragment corresponding to each fragment to be compared; the calculation amount of comparison can be reduced, and the comparison speed is improved.
According to the audio duplicate removal method, the characteristic information of the audio to be compared is obtained; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; dividing each segment to be compared in a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared; dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment; comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared; aiming at each fragment to be compared, acquiring a candidate high-frequency fragment corresponding to the corresponding similar candidate low-frequency fragment, comparing the candidate high-frequency fragment with the high-frequency fragment to be compared in the fragments to be compared, and determining the similar candidate fragment corresponding to each fragment to be compared; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
Fig. 6 is a schematic flow chart of another audio deduplication method according to an embodiment of the present invention, as shown in fig. 6, based on the embodiment shown in fig. 4, step 104 may specifically include the following steps:
s1041, dividing each fragment to be compared in a frequency dimension, and acquiring a low-frequency fragment to be compared and a high-frequency fragment to be compared.
S1042, dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment.
S1043, comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared.
S1045, aiming at each fragment to be compared, dividing the fragment to be compared in a time dimension to obtain each sub fragment to be compared.
As shown in fig. 7, fig. 7 is a schematic diagram of dividing the segments to be compared in the time dimension, and in fig. 7, the time range of the segments to be compared is 0-16 frames. The time ranges of the two divided sub-segments to be compared are respectively 0-8 frames; 8-16 frames.
S1046, dividing the candidate segments including the corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the length of time of the to-be-compared sub-segment is equal to the length of time of the candidate sub-segment.
S1047, comparing the sub-segment to be compared with the corresponding candidate sub-segment, and obtaining the similarity between the sub-segment to be compared and the corresponding candidate sub-segment.
In this embodiment, the segment to be compared is divided into a first segment to be compared having a time range of 0-8 frames and a second segment to be compared having a time range of 8-16 frames in the time dimension; after dividing the candidate segment including the corresponding similar candidate low-frequency segment into a first candidate sub-segment with a time range of 0-8 frames and a second candidate sub-segment with a time range of 8-16 frames in the time dimension, the first to-be-compared sub-segment may be compared with the first candidate sub-segment, the second to-be-compared sub-segment may be compared with the second candidate sub-segment, a first similarity between the first to-be-compared sub-segment and the first candidate sub-segment is obtained, and a second similarity between the second to-be-compared sub-segment and the second candidate sub-segment is obtained.
S1048, according to the similarity between the sub-segment to be compared and the corresponding candidate sub-segment, determining a similar candidate segment corresponding to the segment to be compared.
In this embodiment, the audio deduplication device may compare the first similarity with the second similarity to obtain a smaller value of the first similarity and the second similarity, and determine that the segment to be compared is similar to the candidate segment when the smaller value satisfies the corresponding similarity threshold; and if the smaller value does not meet the corresponding similarity threshold, determining that the segment to be compared is not similar to the candidate segment.
In this embodiment, since the mutation feature points in the segments to be compared are generally mainly concentrated in the low frequency band, and there are fewer mutation feature points in the high frequency band, after comparing each low frequency segment to be compared with each candidate low frequency segment and determining the similar candidate low frequency segment corresponding to each low frequency segment to be compared, the segments to be compared and the candidate segments including the corresponding similar candidate low frequency segments are divided in the time dimension and compared, respectively, so that the number of mutation feature points to be compared in the comparison process can be increased, thereby improving the comparison accuracy.
According to the audio duplicate removal method, the characteristic information of the audio to be compared is obtained; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; dividing each segment to be compared in a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared; dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment; comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared; aiming at each fragment to be compared, the fragments to be compared are divided in a time dimension, and each sub fragment to be compared is obtained; dividing candidate segments comprising corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the sub-segment to be compared is equal to the time length of the candidate sub-segment; comparing the sub-segment to be compared with the corresponding candidate sub-segment to obtain the similarity between the sub-segment to be compared and the corresponding candidate sub-segment; determining similar candidate segments corresponding to the segments to be compared according to the similarity between the sub-segments to be compared and the corresponding candidate sub-segments; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
Fig. 8 is a schematic structural diagram of an audio deduplication apparatus according to an embodiment of the present invention. As shown in fig. 8, includes: an acquisition module 81, a query module 82, a comparison module 83, a determination module 84, and a deduplication module 85.
The acquiring module 81 is configured to acquire feature information of the audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
the query module 82 is configured to query an audio library by using an inverted index method according to the feature information of the audio to be compared, and acquire feature information of candidate audio corresponding to the audio to be compared;
the obtaining module 81 is further configured to obtain, for each candidate audio, each candidate segment in the feature information of the candidate audio and each to-be-compared segment in the feature information of the to-be-compared audio;
a comparing module 83, configured to compare each of the to-be-compared fragments with each of the candidate fragments, and determine similar candidate fragments corresponding to each of the to-be-compared fragments;
a determining module 84, configured to determine whether the candidate audio is a similar candidate audio corresponding to the to-be-compared audio according to a similar candidate segment corresponding to each to-be-compared segment;
a duplicate removal module 85, configured to perform a duplicate removal operation on the audio to be compared when the candidate audio is a similar candidate audio corresponding to the audio to be compared.
The audio frequency duplication eliminating device provided by the invention can be hardware equipment such as a server, a server cluster and the like, or software installed on the hardware equipment. In this embodiment, the audio to be compared may be an independent audio or an audio corresponding to a video.
In this embodiment, the obtaining manner of the feature information of the audio to be compared may be, for example, (1) for the audio to be compared in the PCM format, every 1000 sampling points, 4096 sampling points are taken to perform Fast Fourier Transform (FFT), so as to obtain an FFT output of 2048 points; (2) arranging the FFT outputs of 2048 points according to time to obtain data of a plurality of 2048 points; (3) then, a time frequency window is adopted to slide along the time axis of the data, wherein the length of the time frequency window is 2048 points (1 frame), and the step length of the time frequency window sliding is 1 point; (4) normalizing the data in each time frequency window, then checking the standard deviation of the central point in each time frequency window, and if the standard deviation is greater than a certain threshold value, considering the time point, the energy of the frequency point is relatively large, and the corresponding characteristic value is 1; in other cases, the corresponding characteristic value is 0, so as to obtain the characteristic information of the audio to be compared. It should be noted that the characteristic information of the audio to be compared is a characteristic value at each coordinate point with time as an abscissa and frequency as an ordinate.
Wherein, the corresponding point with the characteristic value of 1 can be determined as a mutation characteristic point; and the corresponding point with the characteristic value of 0 is determined as a non-mutation characteristic point.
In addition, it should be noted that when the audio to be compared is the audio corresponding to the video, the video may be compared first to obtain a similar video, the audio corresponding to the similar video is determined as the candidate audio, then the audio to be compared is compared with the candidate audio, and duplication removal is performed when the audio to be compared is similar to the candidate audio.
Further, on the basis of the above embodiment, the audio library includes: each index segment, and feature information of audio including the index segment.
Correspondingly, the query module 82 is specifically configured to,
inquiring the audio library according to the characteristic information of the audio to be compared, and acquiring an index fragment matched with the audio to be compared; determining the audio frequency comprising the matched index fragment as a candidate audio frequency corresponding to the audio frequency to be compared; and acquiring the characteristic information of the candidate audio.
The process of querying the audio library according to the feature information of the audio to be compared and acquiring the index segment matched with the audio to be compared specifically includes dividing the feature information of the audio to be compared and acquiring a plurality of segments, wherein the time lengths of the plurality of segments are the same as the time lengths of the index segments; comparing the plurality of segments with each index segment respectively to determine the similarity between the plurality of segments and the index segments; and determining the index segment with the corresponding similarity meeting a certain threshold as the index segment matched with the audio to be compared.
Further, on the basis of the above embodiment, in order to reduce the number of times of alignment and improve the speed of alignment, the apparatus may further include: and dividing the modules.
The obtaining module 81 is further configured to obtain, for each candidate audio, a first time length of the feature information of the candidate audio, and a second time length of the feature information of the audio to be compared;
the dividing module is configured to divide feature information of the candidate audio in a time dimension to obtain each candidate segment when the first time length is smaller than the second time length; sliding the characteristic information of the audio to be compared on a time dimension by adopting a time frequency window to obtain each segment to be compared;
when the first time length is longer than the second time length, dividing the characteristic information of the audio to be compared in a time dimension to obtain each segment to be compared; sliding the characteristic information of the candidate audio frequency in a time dimension by adopting a time frequency window to obtain each candidate segment; the time length of the to-be-aligned segment is equal to the time length of the candidate segment.
In this embodiment, in the process of comparing the characteristic information of the two audios, the characteristic information of the audio with a smaller time length is directly subjected to segment division, and the characteristic information of the audio with a larger time length is slid by using a time frequency window, so that the comparison times between the divided segments can be reduced to a certain extent in the process of comparing the characteristic information of the two audios, thereby reducing the calculation amount in the comparison process, improving the calculation speed, and further improving the deduplication speed.
Further, on the basis of the above embodiment, the feature information includes: mutant and non-mutant characteristic points.
Correspondingly, the comparing module 83 is specifically configured to,
comparing each fragment to be compared with each candidate fragment to obtain the same number of mutation characteristic points in each fragment to be compared and each candidate fragment; determining the similarity between each fragment to be compared and each candidate fragment according to the same number of mutation characteristic points in each fragment to be compared and each candidate fragment; and determining similar candidate fragments corresponding to the fragments to be compared according to the similarity between the fragments to be compared and the candidate fragments.
The process of determining the similarity between each segment to be compared and each candidate segment may specifically be to obtain the number of first mutation feature points in the segments to be compared, according to the number of the same mutation feature points in each segment to be compared and each candidate segment; acquiring the number of second feature points in the candidate segment; acquiring the maximum value of the number of the first characteristic points and the number of the second characteristic points; and determining the ratio of the same number of the mutation characteristic points to the total number of the mutation characteristic points in the segment to be compared to the maximum value as the similarity between the segment to be compared and the candidate segment.
Specifically, the process of determining the similar candidate segments corresponding to the respective to-be-compared segments according to the similarity between the respective to-be-compared segments and the respective candidate segments may be to determine, for the respective to-be-compared segments, the candidate segments whose corresponding similarity is greater than a preset similarity threshold as the similar candidate segments corresponding to the to-be-compared segments.
Further, on the basis of the above-mentioned embodiment, the determining module 84 is specifically configured to,
when the similar candidate segments are continuous and the number of the similar candidate segments exceeds a first number threshold, or when the similar candidate segments are discontinuous and the number of the similar candidate segments exceeds a second number threshold, determining the candidate audio as a similar candidate audio corresponding to the audio to be compared; the first quantity threshold is less than the second quantity threshold.
In addition, when the similar candidate audio corresponding to the audio to be compared does not exist, the audio to be compared is not subjected to the duplicate removal operation. In addition, it should be further explained that, when there is no similar candidate audio corresponding to the audio to be compared, an index segment matching the audio to be compared in the audio library is obtained, the feature information of the audio to be compared is determined to include the feature information of the audio of the matched index segment, and the feature information of the audio to be compared is added to the audio library.
The audio duplicate removal device of the embodiment of the invention obtains the characteristic information of the audio to be compared; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
Fig. 9 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention, as shown in fig. 9, based on the embodiment shown in fig. 8, the comparison module 83 includes:
a dividing unit 831, configured to divide the segments to be compared in a frequency dimension, and obtain a low-frequency segment to be compared and a high-frequency segment to be compared;
the dividing unit 831 is further configured to divide the candidate segments in a frequency dimension to obtain candidate low-frequency segments and candidate high-frequency segments;
a comparing unit 832, configured to compare each to-be-compared low-frequency segment with each candidate low-frequency segment, and determine a similar candidate low-frequency segment corresponding to each to-be-compared low-frequency segment;
the comparing unit 832 is further configured to, for each to-be-compared segment, obtain a candidate high-frequency segment corresponding to a corresponding similar candidate low-frequency segment, compare the candidate high-frequency segment with a to-be-compared high-frequency segment in the to-be-compared segments, and determine a similar candidate segment corresponding to each to-be-compared segment.
In this embodiment, since the mutation feature points in the segments to be compared are generally mainly concentrated in the low frequency band, and the mutation feature points in the high frequency band are fewer, and for some scenes, for example, under the condition that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio, the number of the same mutation feature points in the low frequency band of the segments to be compared is larger than that of the candidate segments, and the number of the same mutation feature points in the high frequency band is smaller, for such a scene, if the segments to be compared are directly compared with the candidate segments, and whether the segments to be compared are similar to the candidate segments is determined according to the similarity of the segments to be compared and the candidate segments, it is possible to determine that some segments to be compared and the candidate segments that are actually not similar to each other, and therefore, in order to avoid such a, the segment to be compared and the candidate segment can be divided in the frequency dimension, the low-frequency segment to be compared and the candidate low-frequency segment are compared, and then the high-frequency segment to be compared and the candidate high-frequency segment are compared, so that the comparison accuracy is improved.
Therefore, further, in order to improve the accuracy and efficiency of the alignment, referring to fig. 10, the alignment module 83 further includes: an acquisition unit 833, a judgment unit 834, and a first determination unit 835;
the obtaining unit 833 is configured to obtain a background sound and a foreground sound of the audio to be compared, and a background sound and a foreground sound of the candidate audio;
a judging unit 834, configured to judge whether a background sound of the audio to be compared is the same as a background sound of the candidate audio, and judge whether a foreground sound of the audio to be compared is the same as a foreground sound of the candidate audio;
a first determining unit 835, configured to determine that a background sound of the to-be-compared audio is the same as a background sound of the candidate audio, and a foreground sound of the to-be-compared audio is different from a foreground sound of the candidate audio.
In this embodiment, after obtaining similar candidate low-frequency segments corresponding to the low-frequency segments to be compared in each segment to be compared, candidate high-frequency segments corresponding to the similar candidate low-frequency segments may be obtained, and then the high-frequency segments to be compared are compared with the candidate high-frequency segments to determine the similar candidate segments corresponding to the segments to be compared.
It should be noted that, in this embodiment, for each to-be-compared segment and each candidate segment, the to-be-compared low-frequency segment is compared with the subsequent low-frequency segments; then, aiming at each fragment to be compared, acquiring a candidate high-frequency fragment corresponding to the corresponding similar candidate low-frequency fragment, comparing the candidate high-frequency fragment with the high-frequency fragment to be compared in the fragments to be compared, and determining the similar candidate fragment corresponding to each fragment to be compared; the calculation amount of comparison can be reduced, and the comparison speed is improved.
The audio duplicate removal device of the embodiment of the invention obtains the characteristic information of the audio to be compared; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; dividing each segment to be compared in a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared; dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment; comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared; aiming at each fragment to be compared, acquiring a candidate high-frequency fragment corresponding to the corresponding similar candidate low-frequency fragment, comparing the candidate high-frequency fragment with the high-frequency fragment to be compared in the fragments to be compared, and determining the similar candidate fragment corresponding to each fragment to be compared; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
Fig. 11 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention, as shown in fig. 11, based on the embodiment shown in fig. 9, the comparison module 83 further includes: a second determination unit 836.
The dividing unit 831 is further configured to divide the segments to be compared in a time dimension for each segment to be compared, so as to obtain each sub-segment to be compared;
the dividing unit 831 is further configured to divide the candidate segments including the corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the to-be-compared sub-segment is equal to the time length of the candidate sub-segment;
the comparing unit 832 is further configured to compare the sub-segment to be compared with the corresponding candidate sub-segment, and obtain a similarity between the sub-segment to be compared and the corresponding candidate sub-segment;
the second determining unit 836 is configured to determine, according to the similarity between the to-be-compared sub-segment and the corresponding candidate sub-segment, a similar candidate segment corresponding to the to-be-compared segment.
In this embodiment, the segment to be compared is divided into a first segment to be compared having a time range of 0-8 frames and a second segment to be compared having a time range of 8-16 frames in the time dimension; after dividing the candidate segment including the corresponding similar candidate low-frequency segment into a first candidate sub-segment with a time range of 0-8 frames and a second candidate sub-segment with a time range of 8-16 frames in the time dimension, the first to-be-compared sub-segment may be compared with the first candidate sub-segment, the second to-be-compared sub-segment may be compared with the second candidate sub-segment, a first similarity between the first to-be-compared sub-segment and the first candidate sub-segment is obtained, and a second similarity between the second to-be-compared sub-segment and the second candidate sub-segment is obtained.
In this embodiment, the audio deduplication device may compare the first similarity with the second similarity to obtain a smaller value of the first similarity and the second similarity, and determine that the segment to be compared is similar to the candidate segment when the smaller value satisfies the corresponding similarity threshold; and if the smaller value does not meet the corresponding similarity threshold, determining that the segment to be compared is not similar to the candidate segment.
In this embodiment, since the mutation feature points in the segments to be compared are generally mainly concentrated in the low frequency band, and there are fewer mutation feature points in the high frequency band, after comparing each low frequency segment to be compared with each candidate low frequency segment and determining the similar candidate low frequency segment corresponding to each low frequency segment to be compared, the segments to be compared and the candidate segments including the corresponding similar candidate low frequency segments are divided in the time dimension and compared, respectively, so that the number of mutation feature points to be compared in the comparison process can be increased, thereby improving the comparison accuracy.
The audio duplicate removal device of the embodiment of the invention obtains the characteristic information of the audio to be compared; the characteristic information is the characteristic value corresponding to each time point and each frequency point of the audio to be compared in the frequency domain; according to the characteristic information of the audio to be compared, inquiring an audio library by adopting an inverted index method, and acquiring the characteristic information of candidate audio corresponding to the audio to be compared; aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared; dividing each segment to be compared in a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared; dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment; comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared; aiming at each fragment to be compared, the fragments to be compared are divided in a time dimension, and each sub fragment to be compared is obtained; dividing candidate segments comprising corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the sub-segment to be compared is equal to the time length of the candidate sub-segment; comparing the sub-segment to be compared with the corresponding candidate sub-segment to obtain the similarity between the sub-segment to be compared and the corresponding candidate sub-segment; determining similar candidate segments corresponding to the segments to be compared according to the similarity between the sub-segments to be compared and the corresponding candidate sub-segments; determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared; in the embodiment, the audio to be compared is subjected to segment division by acquiring each time point of the audio to be compared on a frequency domain and the characteristic value corresponding to each frequency point, and the audio to be compared is compared with the candidate audio in the audio library based on the characteristic value in each segment, so that the comparison calculated amount is reduced, the resources occupied during calculation are reduced, and the deduplication efficiency and the deduplication speed are improved.
Fig. 12 is a schematic structural diagram of another audio deduplication apparatus according to an embodiment of the present invention. The audio deduplication device comprises:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002, when executing the program, implements the audio deduplication method provided in the above embodiments.
Further, the audio deduplication apparatus further comprises:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that may be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (e.g., at least one disk memory).
The processor 1002 is configured to implement the audio deduplication method according to the foregoing embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an audio deduplication method as described above.
The invention also provides a computer program product, which when executed by an instruction processor in the computer program product, implements the audio deduplication method as described above.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (19)

1. An audio deduplication method, comprising:
acquiring characteristic information of audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
according to the characteristic information of the audio to be compared, an audio library is inquired by adopting an inverted index method, and the characteristic information of the candidate audio corresponding to the audio to be compared is obtained;
aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared;
comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared; the similar candidate segments are candidate segments, wherein the corresponding low frequency band is similar to the low frequency band of the segment to be compared, and the corresponding high frequency band is similar to the high frequency band of the segment to be compared;
determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared;
and when the candidate audio is the similar candidate audio corresponding to the audio to be compared, carrying out duplication removal operation on the audio to be compared.
2. The method of claim 1, wherein the comparing the each to-be-compared segment with the each candidate segment to determine a similar candidate segment corresponding to the each to-be-compared segment comprises:
dividing each segment to be compared in a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared;
dividing each candidate segment in a frequency dimension to obtain a candidate low-frequency segment and a candidate high-frequency segment;
comparing each low-frequency fragment to be compared with each candidate low-frequency fragment, and determining similar candidate low-frequency fragments corresponding to each low-frequency fragment to be compared;
aiming at each fragment to be compared, acquiring a candidate high-frequency fragment corresponding to a corresponding similar candidate low-frequency fragment, comparing the candidate high-frequency fragment with the high-frequency fragment to be compared in the fragments to be compared, and determining the similar candidate fragment corresponding to each fragment to be compared.
3. The method according to claim 2, wherein before the dividing the respective segments to be compared in the frequency dimension to obtain the low-frequency segments to be compared and the high-frequency segments to be compared, the method further comprises:
acquiring background sounds and foreground sounds of the audio to be compared and background sounds and foreground sounds of the candidate audio;
judging whether the background sound of the audio to be compared is the same as the background sound of the candidate audio, and judging whether the foreground sound of the audio to be compared is the same as the foreground sound of the candidate audio;
and determining that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio.
4. The method according to claim 2, wherein after comparing the each low-frequency segment to be compared with the each candidate low-frequency segment and determining a similar candidate low-frequency segment corresponding to the each low-frequency segment to be compared, the method further comprises:
aiming at each fragment to be compared, dividing the fragment to be compared in a time dimension to obtain each sub fragment to be compared;
dividing candidate segments comprising corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the to-be-compared sub-segment is equal to the time length of the candidate sub-segment;
comparing the sub-segment to be compared with the corresponding candidate sub-segment to obtain the similarity between the sub-segment to be compared and the corresponding candidate sub-segment;
and determining similar candidate segments corresponding to the to-be-compared segments according to the similarity between the to-be-compared sub-segments and the corresponding candidate sub-segments.
5. The method of claim 1, wherein the audio library comprises: each index segment and characteristic information of audio comprising the index segment;
the querying, according to the feature information of the audio to be compared, an audio library by using an inverted index method to obtain the feature information of the candidate audio corresponding to the audio to be compared includes:
inquiring the audio library according to the characteristic information of the audio to be compared, and acquiring an index fragment matched with the audio to be compared;
determining the audio frequency comprising the matched index fragment as a candidate audio frequency corresponding to the audio frequency to be compared;
and acquiring the characteristic information of the candidate audio.
6. The method according to claim 1, wherein, for each candidate audio, obtaining each candidate segment in the feature information of the candidate audio and before each to-be-compared segment in the feature information of the to-be-compared audio, further comprises:
aiming at each candidate audio, acquiring a first time length of the characteristic information of the candidate audio and a second time length of the characteristic information of the audio to be compared;
when the first time length is smaller than the second time length, dividing the characteristic information of the candidate audio frequency in a time dimension to obtain each candidate segment; sliding the characteristic information of the audio to be compared on a time dimension by adopting a time frequency window to obtain each segment to be compared;
when the first time length is longer than the second time length, dividing the characteristic information of the audio to be compared in a time dimension to obtain each segment to be compared; sliding the characteristic information of the candidate audio frequency in a time dimension by adopting a time frequency window to obtain each candidate segment; the time length of the to-be-aligned segment is equal to the time length of the candidate segment.
7. The method of claim 1, wherein the feature information comprises: mutant and non-mutant feature points;
the comparing each to-be-compared fragment with each candidate fragment to determine a similar candidate fragment corresponding to each to-be-compared fragment includes:
comparing each fragment to be compared with each candidate fragment to obtain the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
determining the similarity between each fragment to be compared and each candidate fragment according to the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
and determining similar candidate fragments corresponding to the fragments to be compared according to the similarity between the fragments to be compared and the candidate fragments.
8. The method according to claim 1, wherein the determining whether the candidate audio is a similar candidate audio corresponding to the to-be-compared audio according to a similar candidate segment corresponding to the each to-be-compared segment comprises:
when the similar candidate segments are consecutive and the number of the similar candidate segments exceeds a first number threshold, or,
when the similar candidate segments are not consecutive and the number of similar candidate segments exceeds a second number threshold,
determining the candidate audio frequency as a similar candidate audio frequency corresponding to the audio frequency to be compared; the first quantity threshold is less than the second quantity threshold.
9. An audio deduplication apparatus, comprising:
the acquisition module is used for acquiring the characteristic information of the audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
the query module is used for querying an audio library by adopting an inverted index method according to the characteristic information of the audio to be compared and acquiring the characteristic information of the candidate audio corresponding to the audio to be compared;
the acquisition module is further configured to acquire, for each candidate audio, each candidate segment in the feature information of the candidate audio and each to-be-compared segment in the feature information of the to-be-compared audio;
the comparison module is used for comparing each to-be-compared fragment with each candidate fragment and determining similar candidate fragments corresponding to each to-be-compared fragment; the similar candidate segments are candidate segments, wherein the corresponding low frequency band is similar to the low frequency band of the segment to be compared, and the corresponding high frequency band is similar to the high frequency band of the segment to be compared;
a determining module, configured to determine whether the candidate audio is a similar candidate audio corresponding to the to-be-compared audio according to a similar candidate segment corresponding to each to-be-compared segment;
and the duplication removing module is used for carrying out duplication removing operation on the audio to be compared when the candidate audio is a similar candidate audio corresponding to the audio to be compared.
10. The apparatus of claim 9, wherein the alignment module comprises:
the dividing unit is used for dividing each segment to be compared on a frequency dimension to obtain a low-frequency segment to be compared and a high-frequency segment to be compared;
the dividing unit is further configured to divide the candidate segments in a frequency dimension to obtain candidate low-frequency segments and candidate high-frequency segments;
the comparison unit is used for comparing each low-frequency segment to be compared with each candidate low-frequency segment and determining similar candidate low-frequency segments corresponding to each low-frequency segment to be compared;
the comparison unit is further configured to obtain, for each to-be-compared fragment, a candidate high-frequency fragment corresponding to a corresponding similar candidate low-frequency fragment, compare the candidate high-frequency fragment with a to-be-compared high-frequency fragment in the to-be-compared fragments, and determine a similar candidate fragment corresponding to each to-be-compared fragment.
11. The apparatus of claim 10, wherein the alignment module further comprises:
the acquisition unit is used for acquiring background sound and foreground sound of the audio to be compared and background sound and foreground sound of the candidate audio;
the judging unit is used for judging whether the background sound of the audio to be compared is the same as the background sound of the candidate audio and judging whether the foreground sound of the audio to be compared is the same as the foreground sound of the candidate audio;
the first determining unit is used for determining that the background sound of the audio to be compared is the same as the background sound of the candidate audio, and the foreground sound of the audio to be compared is different from the foreground sound of the candidate audio.
12. The apparatus of claim 10, wherein the alignment module further comprises: a second determination unit;
the dividing unit is further configured to divide the segments to be compared in a time dimension for each segment to be compared, so as to obtain each sub-segment to be compared;
the dividing unit is further configured to divide the candidate segments including the corresponding similar candidate low-frequency segments in a time dimension to obtain each candidate sub-segment; the time length of the to-be-compared sub-segment is equal to the time length of the candidate sub-segment;
the comparing unit is further configured to compare the to-be-compared sub-segment with the corresponding candidate sub-segment, and obtain a similarity between the to-be-compared sub-segment and the corresponding candidate sub-segment;
the second determining unit is configured to determine a similar candidate segment corresponding to the to-be-compared segment according to a similarity between the to-be-compared sub-segment and the corresponding candidate sub-segment.
13. The apparatus of claim 9, wherein the audio library comprises: each index segment and characteristic information of audio comprising the index segment;
the query module is specifically configured to,
inquiring the audio library according to the characteristic information of the audio to be compared, and acquiring an index fragment matched with the audio to be compared;
determining the audio frequency comprising the matched index fragment as a candidate audio frequency corresponding to the audio frequency to be compared;
and acquiring the characteristic information of the candidate audio.
14. The apparatus of claim 9, further comprising: a dividing module;
the acquisition module is further configured to acquire, for each candidate audio, a first time length of feature information of the candidate audio and a second time length of feature information of the audio to be compared;
the dividing module is configured to divide feature information of the candidate audio in a time dimension to obtain each candidate segment when the first time length is smaller than the second time length; sliding the characteristic information of the audio to be compared on a time dimension by adopting a time frequency window to obtain each segment to be compared;
when the first time length is longer than the second time length, dividing the characteristic information of the audio to be compared in a time dimension to obtain each segment to be compared; sliding the characteristic information of the candidate audio frequency in a time dimension by adopting a time frequency window to obtain each candidate segment; the time length of the to-be-aligned segment is equal to the time length of the candidate segment.
15. The apparatus of claim 9, wherein the feature information comprises: mutant and non-mutant feature points;
the comparison module is specifically used for comparing the data of the data acquisition module,
comparing each fragment to be compared with each candidate fragment to obtain the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
determining the similarity between each fragment to be compared and each candidate fragment according to the same number of mutation characteristic points in each fragment to be compared and each candidate fragment;
and determining similar candidate fragments corresponding to the fragments to be compared according to the similarity between the fragments to be compared and the candidate fragments.
16. The apparatus of claim 9, wherein the means for determining is configured to,
when the similar candidate segments are consecutive and the number of the similar candidate segments exceeds a first number threshold, or,
when the similar candidate segments are not consecutive and the number of similar candidate segments exceeds a second number threshold,
determining the candidate audio frequency as a similar candidate audio frequency corresponding to the audio frequency to be compared; the first quantity threshold is less than the second quantity threshold.
17. An audio deduplication apparatus, comprising:
memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the audio deduplication method as claimed in any one of claims 1-8 when executing the program.
18. A non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the audio deduplication method of any one of claims 1-8.
19. A computer program product which, when executed by a processor of instructions in the computer program product, performs a method of audio deduplication, the method comprising:
acquiring characteristic information of audio to be compared; the characteristic information is characteristic values corresponding to each time point and each frequency point of the audio to be compared in the frequency domain;
according to the characteristic information of the audio to be compared, an audio library is inquired by adopting an inverted index method, and the characteristic information of the candidate audio corresponding to the audio to be compared is obtained;
aiming at each candidate audio, acquiring each candidate segment in the feature information of the candidate audio and each segment to be compared in the feature information of the audio to be compared;
comparing each segment to be compared with each candidate segment, and determining similar candidate segments corresponding to each segment to be compared; the similar candidate segments are candidate segments, wherein the corresponding low frequency band is similar to the low frequency band of the segment to be compared, and the corresponding high frequency band is similar to the high frequency band of the segment to be compared;
determining whether the candidate audio is a similar candidate audio corresponding to the audio to be compared according to the similar candidate segment corresponding to each segment to be compared;
and when the candidate audio is the similar candidate audio corresponding to the audio to be compared, carrying out duplication removal operation on the audio to be compared.
CN201810146085.2A 2018-02-12 2018-02-12 Audio duplicate removal method and device Active CN108428457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810146085.2A CN108428457B (en) 2018-02-12 2018-02-12 Audio duplicate removal method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810146085.2A CN108428457B (en) 2018-02-12 2018-02-12 Audio duplicate removal method and device

Publications (2)

Publication Number Publication Date
CN108428457A CN108428457A (en) 2018-08-21
CN108428457B true CN108428457B (en) 2021-03-23

Family

ID=63157006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810146085.2A Active CN108428457B (en) 2018-02-12 2018-02-12 Audio duplicate removal method and device

Country Status (1)

Country Link
CN (1) CN108428457B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110970054B (en) * 2019-11-06 2022-06-24 广州视源电子科技股份有限公司 Method and device for automatically stopping voice acquisition, terminal equipment and storage medium
CN112241467A (en) * 2020-12-18 2021-01-19 北京爱数智慧科技有限公司 Audio duplicate checking method and device
CN114444623B (en) * 2022-04-11 2022-08-12 智昌科技集团股份有限公司 Industrial robot-oriented anomaly detection and analysis method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06350940A (en) * 1993-06-03 1994-12-22 Sony Corp Audio multiplex receiver
CN101159834A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 Method and system for detecting repeatable video and audio program fragment
CN102567503A (en) * 2010-12-16 2012-07-11 微软公司 Extensible pipeline for data deduplication
CN104866604A (en) * 2015-06-01 2015-08-26 腾讯科技(北京)有限公司 Information processing method and server
US9626373B2 (en) * 2012-10-01 2017-04-18 Western Digital Technologies, Inc. Optimizing data block size for deduplication
CA3014675A1 (en) * 2016-03-18 2017-09-21 Qualcomm Incorporated Audio processing for temporally mismatched signals
CN107293307A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Audio-frequency detection and device
US9886446B1 (en) * 2011-03-15 2018-02-06 Veritas Technologies Llc Inverted index for text searching within deduplication backup system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10152977B2 (en) * 2015-11-20 2018-12-11 Qualcomm Incorporated Encoding of multiple audio signals

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06350940A (en) * 1993-06-03 1994-12-22 Sony Corp Audio multiplex receiver
CN101159834A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 Method and system for detecting repeatable video and audio program fragment
CN102567503A (en) * 2010-12-16 2012-07-11 微软公司 Extensible pipeline for data deduplication
US9886446B1 (en) * 2011-03-15 2018-02-06 Veritas Technologies Llc Inverted index for text searching within deduplication backup system
US9626373B2 (en) * 2012-10-01 2017-04-18 Western Digital Technologies, Inc. Optimizing data block size for deduplication
CN104866604A (en) * 2015-06-01 2015-08-26 腾讯科技(北京)有限公司 Information processing method and server
CA3014675A1 (en) * 2016-03-18 2017-09-21 Qualcomm Incorporated Audio processing for temporally mismatched signals
CN107293307A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Audio-frequency detection and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Data Deduplication for Audio Data Files;Mohamad Zaini Nurshafiqah, Hikari Yoshii, Fumiya Enomoto;《GOOGLE》;20170322;全文 *
Similarity and Locality Based Indexing for High Performance Data Deduplication;Wen Xia,Hong Jiang,Dan Feng,Yu Hua;《IEEE Transactions on Computers》;20140215;全文 *
基于内容重复的音视频检测;吴思远;《中国优秀硕士学位论文全文数据库信息科技辑》;20131115;正文7、27-32页,图4-1 *
重复数据删除技术;敖莉,舒继武,李明强;《软件学报》;20100531;全文 *

Also Published As

Publication number Publication date
CN108428457A (en) 2018-08-21

Similar Documents

Publication Publication Date Title
US10210884B2 (en) Systems and methods facilitating selective removal of content from a mixed audio recording
CN108428457B (en) Audio duplicate removal method and device
KR100896737B1 (en) Device and method for robustry classifying audio signals, method for establishing and operating audio signal database and a computer program
US8467892B2 (en) Content-based audio comparisons
WO2020248308A1 (en) Audio pop detection method and apparatus, and storage medium
JP6901798B2 (en) Audio fingerprinting based on audio energy characteristics
CN108305637B (en) Earphone voice processing method, terminal equipment and storage medium
CN110505169B (en) Phase calibration method and device
CN110111811A (en) Audio signal detection method, device and storage medium
CN111312290B (en) Audio data tone quality detection method and device
CN109903775B (en) Audio popping detection method and device
US20220254365A1 (en) Method and device for audio repair and readable storage medium
CN110400573B (en) Data processing method and device
WO2020186695A1 (en) Voice information batch processing method and apparatus, computer device, and storage medium
US8538748B2 (en) Method and apparatus for enhancing voice signal in noisy environment
CN105791602B (en) Sound quality testing method and system
CN107889031B (en) Audio control method, audio control device and electronic equipment
CN115243183A (en) Audio detection method, device and storage medium
US20190245522A1 (en) Method and device for adjusting passband width of filter
CN110097888B (en) Human voice enhancement method, device and equipment
US9165561B2 (en) Apparatus and method for processing voice signal
CN112291696A (en) Audio chip testing method, storage device and computer equipment
CN112423213A (en) Microphone array detection method and device
CN113611327A (en) Abnormal sound detection and analysis method and device, terminal equipment and readable storage medium
CN111145770B (en) Audio processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant