CN104090902B

CN104090902B - Audio tag method to set up and device

Info

Publication number: CN104090902B
Application number: CN201410025446.XA
Authority: CN
Inventors: 赵伟峰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2014-01-20
Filing date: 2014-01-20
Publication date: 2016-06-08
Anticipated expiration: 2034-01-20
Also published as: CN104090902A

Abstract

The present invention proposes a kind of audio tag method to set up and device, and its audio tag method to set up includes: arrange the rule of correspondence of multiple color label and sound element characteristic according to the element characteristic of color; Extract the sound element characteristic of audio file; According to the described sound element characteristic extracted and the described rule of correspondence, audio file is configured corresponding color label. Audio frequency can be combined by the present invention with color, improve the extensibility of audio frequency controller, search especially for audio frequency, compared to audio search modes such as traditional school, keywords, even if an audio file is not known about by user, the melody feature of this audio frequency can also be learned according to the understanding of color element characteristic on its color label, therefore can find the audio frequency that user wants quickly and easily, substantially increase search efficiency.

Description

Audio tag method to set up and device

Technical field

The present invention relates to field of computer technology, particularly to a kind of audio tag method to set up and device.

Background technology

Along with the growth of the geometrical progression of internet information amount, from the information bank of magnanimity, how quickly and accurately to find the information of needs, become people and use a big bottleneck of the Internet. Content-based multimedia retrieval is an emerging research field, and it provides brand-new way of search to people, namely carrys out searching multimedia information by multimedia itself. Multimedia messages includes the various ways such as audio frequency, video, image, animation, and wherein audio-frequency information occupies sizable ratio. And in the middle of audio frequency, music is again modal form. Currently for the retrieval of music, mainly search for according to text keyword, for instance music name, author, performance singer, special edition, school, the lyrics etc. But music itself and text keyword are essentially different, and user uses keyword to scan for, and precondition is that target music must be had gained some understanding by user, it is familiar with associated text message. If user is interested in music rhythm itself, and the text message such as title of the song, the lyrics is known nothing, then it is difficult to find out the music of needs by existing searching method. As can be seen here, existing audio search mode exists and cannot carry out, according to audio content, the limitation retrieved.

Summary of the invention

The purpose of the embodiment of the present invention is to provide a kind of audio tag method to set up and device, to solve the problem that existing audio search mode cannot carry out retrieving according to audio content.

The embodiment of the present invention proposes a kind of audio tag method to set up, including:

Element characteristic according to color arranges the rule of correspondence of multiple color label and sound element characteristic;

Extract the sound element characteristic of audio file;

According to the described sound element characteristic extracted and the described rule of correspondence, audio file is configured corresponding color label;

Wherein, the step of the rule of correspondence that the described element characteristic according to color arranges multiple color label and sound element characteristic includes:

According to color saturation and brightness, set up the bivector combination of the plurality of color label;

The distribution in two-dimensional coordinate system of the bivector according to each color label, arranges label to color label;

Corresponding relation between the label of sound element characteristic and each color label is set.

The embodiment of the present invention also proposes a kind of audio tag and arranges device, including:

Rule arranges module, for arranging the rule of correspondence of multiple color label and sound element characteristic according to the element characteristic of color;

Sound element characteristic extraction module, for extracting the sound element characteristic of audio file;

Label configuration module, for according to the described sound element characteristic extracted and the described rule of correspondence, configuring corresponding color label to audio file;

Wherein, described rule arranges module and farther includes:

Vector Groups builds vertical unit jointly, for according to color saturation and brightness, setting up the bivector combination of the plurality of color label;

Label arranges unit, is used for the distribution in two-dimensional coordinate system of the bivector according to each color label, color label is arranged label;

Correspondence setting unit, is used for the corresponding relation arranging between the label of sound element characteristic and each color label.

The embodiment of the present invention also proposes one or more storage medium comprising computer executable instructions, and described computer executable instructions is used for performing a kind of audio tag method to set up, said method comprising the steps of:

Extract the sound element characteristic of audio file;

According to the described sound element characteristic extracted and the described rule of correspondence, audio file is configured corresponding color label.

Relative to prior art, the invention has the beneficial effects as follows: audio frequency can be combined by the method for the embodiment of the present invention and device with color, improve the extensibility of audio frequency controller, search especially for audio frequency, compared to audio search modes such as traditional school, keywords, even if an audio file is not known about by user, the melody feature of this audio frequency can also be learned according to the understanding of color element characteristic on its color label, therefore the audio frequency that user can be found quickly and easily to want, substantially increases search efficiency.

Accompanying drawing explanation

Fig. 1 is the flow chart of a kind of audio tag method to set up of the embodiment of the present invention;

Fig. 2 is the flow chart of the another kind of audio tag method to set up of the embodiment of the present invention;

Fig. 3 is a kind of color label distribution schematic diagram of the embodiment of the present invention;

Fig. 4 is the structure chart that a kind of audio tag of the embodiment of the present invention arranges device;

Fig. 5 is the structure chart that the another kind of audio tag of the embodiment of the present invention arranges device.

Detailed description of the invention

For the present invention aforementioned and other technology contents, feature and effect, can clearly present in following cooperation describes in detail with reference to graphic preferred embodiment. By the explanation of detailed description of the invention, when can be reach technological means that predetermined purpose takes and effect is able to more deeply and concrete understanding to the present invention, however institute's accompanying drawings be only to provide with reference to and purposes of discussion, be not used for the present invention is any limitation as.

In long-term production practices and social activity, people can gradually form the different understanding to different color and soulful sympathetic response, some colors give magnificent, simple, graceful, beautiful, distinct, ardent sensation, and some colors make people feel celebrating, happy, happy, comfortable, sweet, melancholy, dull .... different colors makes emotion that people produces and aesthetic feeling be not quite similar.The main thought of the embodiment of the present invention is audio frequency and color to be combined, color label is stamped for audio file, thus the emotional factor in audio frequency is extracted, audio frequency and perception are combined, and then improves audio frequency controller and listen to the extensibility of the various application occasions such as label, social recommendation bent library management, music assorting, melody association, personalized recommendation, user.

Referring to Fig. 1, it is the flow chart of a kind of audio tag method to set up of the embodiment of the present invention, and it comprises the following steps:

S101, arranges the rule of correspondence of multiple color label and sound element characteristic according to the element characteristic of color.

S102, extracts the sound element characteristic of audio file.

S103, according to the described sound element characteristic extracted and the described rule of correspondence, configures corresponding color label to audio file.

The selection of color essential feature and sound element characteristic all can as desired to determine, color essential feature can be one or more in the chromatic characteristics such as saturation (Chroma), brightness (Value), form and aspect (Hue), and sound element characteristic can be one or more in the acoustic characteristics such as the frequency of audio frequency, the amplitude of audio frequency sound, spectral centroid. In described multiple color labels, the color essential feature of different color label is likely to difference, thus can be correspondingly arranged from different sound element characteristics.

Specifically, color label can be preset by technical staff with the rule of correspondence of sound element characteristic, it is also possible to arranges interface by providing a user with, user is configured according to demand. Such as, can arrange corresponding more than the audio frequency of a setting value with mean amplitude of tide more than the color label of a brightness value, or can also arrange more than an intensity value and corresponding more than the audio frequency of a setting value with average frequency less than the color label of a brightness value.

Audio frequency can be combined by the method for the present embodiment with color, improve the extensibility of audio frequency controller, search especially for audio frequency, compared to audio search modes such as traditional school, keywords, even if an audio file is not known about by user, the melody feature of this audio frequency can also be learned according to the understanding of color element characteristic on its color label, therefore can find the audio frequency that user wants quickly and easily, substantially increase search efficiency.

For being further appreciated by this method, illustrate with a comparatively detailed embodiment below:

Referring to Fig. 2, it is the flow chart of another kind of audio tag method to set up of the embodiment of the present invention, and in the present embodiment, color essential feature adopts saturation and brightness, and sound element characteristic adopts spectral centroid, and the method comprises the following steps:

S201, according to color saturation and brightness, sets up the bivector combination of the plurality of color label.

S202, according to the distribution in two-dimensional coordinate system of the bivector of each color label, arranges label to color label.

Incorporated by reference to referring to Fig. 3, wherein x-axis represents the value of saturation, y-axis represents the value of brightness, two dimension combinations, totally 4 kinds of color labels, according to the distribution in two-dimensional coordinate system of 4 kinds of color labels, if set, " 2 ", " 3 ", " 4 " four labels, can be understood as " bright ", " strong ", " gloomy " and " calmness " (understanding of color can be defined by the title of color label according to user) respectively.

S203, arranges the corresponding relation between the spectral centroid of audio frequency and the label of each color label.By introducing the corresponding relation that several threshold values arrange between spectral centroid and label, specifically, namely can be that the spectral centroid of audio frequency and predetermined threshold value are compared, and determine the label corresponding with spectral centroid according to comparison result. The present embodiment introduces first threshold and Second Threshold.

S204, is divided into multiframe audio frame signal by audio file.

S205, calculates the amplitude spectrum of each frame audio frame signal.

Setting an audio file and comprise M frame audio frame signal, M is positive integer, then any frame audio frame signal that this audio file comprises is represented by x_i(n), wherein, i represents the order of this frame audio frame signal in this audio file, and i is positive integer and i=1,2 ... M, n are positive integer and n=0,1,2, N-1, and wherein N is the length of this frame audio frame signal, and namely N is the sampling number of this frame audio frame signal. x_iN the amplitude spectrum of () is represented by X_i(n), X_iN () can adopt following formula (1) calculating to obtain:

X_i(n)=abs [fft (x_i(n))](1)

In above-mentioned formula (1), abs [] is modulus computing or signed magnitude arithmetic(al); Fft (x_i(n)) for x_iThe fast Fourier transform of (n), n=0,1,2, N-1 and the power that value is 2 of N.

S206, calculates the spectral centroid of each frame audio frame signal according to described amplitude spectrum.

According to formula (1), it is possible to calculate the spectral centroid C of each frame audio frame signal of audio file:

C = \frac{Σ_{n = 0}^{N - 1} X (n) * n}{Σ_{n = 0}^{N - 1} X (n)} - - - (2)

S207, according to the order of frame audio frame signal each in audio file, builds the spectral centroid sequence of audio file.

Spectral centroid sequence C (i) of audio file can be expressed as:

C (i) = \frac{Σ_{n = 0}^{N - 1} X_{i} (n) * n}{Σ_{n = 0}^{N - 1} X_{i} (n)} - - - (3)

S208, calculates average and the standard deviation of described spectral centroid sequence.

S209, compares with default first threshold and Second Threshold respectively by the average and standard deviation that calculate acquisition.

S210, according to comparison result, configures corresponding color label to audio file.

Assuming the average E and standard deviation V of spectral centroid sequence C (i), first threshold TE and Second Threshold TV, the label ID of color label is as follows with the rule of correspondence of spectral centroid:

(1) as E >=TE&&V >=TV time, ID=2, represent the amplitude relatively big (intensity of sound is bigger) of audio frequency and the amplitude of variation between audio frame big (tonal variations is big);

(2) as E>and=TE&&V<when TV, ID=4;

(3) as E<TE&&V>=TV time, ID=1;

(4) when E < TE&&V < when TV, ID=3;

Above-mentioned (1)��(4) the some rule of correspondence be according to color be commonly understood by arrange, but do not limit this method with this, it is possible to according to the needs of user, the rule of correspondence is adjusted.

The method of the present embodiment is by analyzing audio file feature, audio frequency can be combined together with the color in image, a color label is stamped for each audio file, make user can understand audio content and melody quickly and intuitively, facilitate the inquiry of audio file, and can also as label basis, listen to label, social recommendation etc. various application occasions be extended to bent library management, music assorting, melody association, personalized recommendation, user, substantially increase the extensibility of audio frequency controller.

The embodiment of the present invention also proposes a kind of audio tag and arranges device, refers to Fig. 4, and this audio tag arranges device and includes rule and arrange module 41, sound element characteristic extraction module 42 and label configuration module 43.

Rule arranges module 41 for arranging the rule of correspondence of multiple color label and sound element characteristic according to the element characteristic of color.The selection of color essential feature and sound element characteristic all can as desired to determine, color essential feature can be one or more in saturation, brightness, color equal color characteristic, and sound element characteristic can be one or more in the acoustic characteristics such as the frequency of audio frequency, the amplitude of audio frequency sound, spectral centroid. In described multiple color labels, the color essential feature of different color label is likely to difference, thus can be correspondingly arranged from different sound element characteristics.

Sound element characteristic extraction module 42 is for extracting the sound element characteristic of audio file.

Label configuration module 43 arranges, for the described sound element characteristic extracted according to sound element characteristic extraction module 42 and rule, the described rule of correspondence that module 41 is arranged, and audio file is configured corresponding color label.

By the device of the present embodiment, it is possible to for audio configuration color label, thus improve the extensibility of audio file management.

Referring to Fig. 5, it is the structure chart that the another kind of audio tag of the embodiment of the present invention arranges device. The audio tag of the present embodiment arrange device include rule arrange module 41, sound element characteristic extraction module 42 and label configuration module 43. In the present embodiment, color essential feature adopts saturation and brightness, and sound element characteristic adopts spectral centroid.

Compared with the embodiment of Fig. 4, the rule of the present embodiment arranges module 41 and farther includes: Vector Groups builds vertical unit 411 jointly, label arranges unit 412 and correspondence setting unit 413. Vector Groups builds vertical unit 411 jointly for according to color saturation and brightness, setting up the bivector combination of the plurality of color label. Label arranges unit 412 for according to the distribution in two-dimensional coordinate system of the bivector of each color label, color label being arranged label. Correspondence setting unit 413 is used for the corresponding relation arranging between the label of sound element characteristic and each color label.

The sound element characteristic extraction module 42 of the present embodiment farther includes: audio frame division unit 421, amplitude spectrum computing unit 422, spectral centroid computing unit 423 and sequence construct unit 424. Audio frame division unit 421 for being divided into multiframe audio frame signal by audio file. Amplitude spectrum computing unit 422 is for calculating the amplitude spectrum of each frame audio frame signal. Spectral centroid computing unit 423 for calculating the spectral centroid of each frame audio frame signal according to described amplitude spectrum. Sequence construct unit 424, for the order according to frame audio frame signal each in audio file, builds the spectral centroid sequence of audio file.

The label configuration module 43 of the present embodiment farther includes: series processing unit 431, comparing unit 432 and color label dispensing unit 433. Series processing unit 431 is for calculating average and the standard deviation of described spectral centroid sequence. Comparing unit 432 is for comparing the average and standard deviation that calculate acquisition with default first threshold and Second Threshold respectively. Color label dispensing unit 433 is for according to comparison result, configuring corresponding color label to audio file.

The device of the present embodiment is by analyzing audio file feature, audio frequency can be combined together with the color in image, a color label is stamped for each audio file, make user can understand audio content and melody quickly and intuitively, facilitate the inquiry of audio file, and can also as label basis, listen to label, social recommendation etc. various application occasions be extended to bent library management, music assorting, melody association, personalized recommendation, user, substantially increase the extensibility of audio frequency controller.

Through the above description of the embodiments, those skilled in the art is it can be understood that can realize by hardware to the embodiment of the present invention, it is also possible to the mode adding necessary general hardware platform by software realizes. Based on such understanding, the technical scheme of the embodiment of the present invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, including some instructions with so that a computer equipment (can be personal computer, server, or the network equipment etc.) performs the embodiment of the present invention, each implements the method described in scene.

The above, it it is only presently preferred embodiments of the present invention, not the present invention is done any pro forma restriction, although the present invention is disclosed above with preferred embodiment, but it is not limited to the present invention, any those skilled in the art, without departing within the scope of technical scheme, when the technology contents of available the disclosure above makes a little change or is modified to the Equivalent embodiments of equivalent variations, in every case it is without departing from technical scheme content, according to any simple modification that above example is made by the technical spirit of the present invention, equivalent variations and modification, all still fall within the scope of technical solution of the present invention.

Claims

1. an audio tag method to set up, it is characterised in that including:

Extract the sound element characteristic of audio file;

2. audio tag method to set up as claimed in claim 1, it is characterised in that the described element characteristic according to color arranges multiple color label with the step of the rule of correspondence of sound element characteristic, and the element characteristic of described color includes saturation and brightness.

3. audio tag method to set up as claimed in claim 1, it is characterised in that described sound element characteristic is the spectral centroid of audio frequency.

4. audio tag method to set up as claimed in claim 3, it is characterised in that the step of the sound element characteristic of described extraction audio file includes:

Audio file is divided into multiframe audio frame signal;

Calculate the amplitude spectrum of each frame audio frame signal;

The spectral centroid of each frame audio frame signal is calculated according to described amplitude spectrum;

According to the order of frame audio frame signal each in audio file, build the spectral centroid sequence of audio file.

5. audio tag method to set up as claimed in claim 4, it is characterised in that the described step that audio file configures corresponding color label includes:

Calculate average and the standard deviation of described spectral centroid sequence;

The average and standard deviation that calculate acquisition are compared with default first threshold and Second Threshold respectively;

According to comparison result, audio file is configured corresponding color label.

6. an audio tag arranges device, it is characterised in that including:

Wherein, described rule arranges module and farther includes:

7. audio tag as claimed in claim 6 arranges device, it is characterised in that the element characteristic of described color includes saturation and brightness.

8. audio tag as claimed in claim 6 arranges device, it is characterised in that described sound element characteristic is the spectral centroid of audio frequency.

9. audio tag as claimed in claim 8 arranges device, it is characterised in that described sound element characteristic extraction module farther includes:

Audio frame division unit, for being divided into multiframe audio frame signal by audio file;

Amplitude spectrum computing unit, for calculating the amplitude spectrum of each frame audio frame signal;

Spectral centroid computing unit, for calculating the spectral centroid of each frame audio frame signal according to described amplitude spectrum;

Sequence construct unit, for the order according to frame audio frame signal each in audio file, builds the spectral centroid sequence of audio file.

10. audio tag as claimed in claim 9 arranges device, it is characterised in that described label configuration module farther includes:

Series processing unit, for calculating average and the standard deviation of described spectral centroid sequence;

Comparing unit, for comparing the average and standard deviation that calculate acquisition with default first threshold and Second Threshold respectively;

Color label dispensing unit, for according to comparison result, configuring corresponding color label to audio file.