CN103853749B - Mode-based audio retrieval method and system - Google Patents
Mode-based audio retrieval method and system Download PDFInfo
- Publication number
- CN103853749B CN103853749B CN201210505562.2A CN201210505562A CN103853749B CN 103853749 B CN103853749 B CN 103853749B CN 201210505562 A CN201210505562 A CN 201210505562A CN 103853749 B CN103853749 B CN 103853749B
- Authority
- CN
- China
- Prior art keywords
- audio
- audio data
- source
- sequence
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a mode-based audio retrieval method and system. The audio retrieval method includes marking a plurality of original audio data on the basis of modes to acquire the audio marking sequences of the original audio data; acquiring the audio marking sequences of target audio data; determining the matching degree of the target audio data and the original audio data according to preset matching rules on the basis of the audio marking sequences of the target audio data and the audio marking sequences of the original audio data; outputting the original audio data with the matching degree higher than a preset matching degree threshold as retrieval results. By using the method and system, audio marking and retrieving can be performed automatically and iteratively on the basis of background modes without manual marking, and accordingly more accurate and reasonable audio retrieval results are provided.
Description
Technical field
Present invention relates in general to multimedia information retrieval field, especially, is related to based on the audio search method of pattern
And system.
Background technology
The widely available high speed development for having promoted multimedia information technology of the Internet.The many matchmakers that can be obtained from the Internet
Volume data amount rapidly increases.For example, on YouTube websites the audio-video document of upload per minute up to as many as 48 hours.Magnanimity
Data make it impossible to be browsed one by one, and the index to data and retrieval are also more challenged.
The data file that required subject matter how is correctly found from information bank is the research in multimedia information retrieval field
One of focus.For example, wedding celebration Chevron Research Company (CRC) may want to according to a small amount of wedding ceremony sample, find magnanimity material to make most
Whole wedding celebration file.The producer in radio station or the production team of video website, it is desirable to based on limited information from magnanimity number
Program category interested is searched according in, for quick program making help is provided.Additionally, user may want to have by oneself
Multimedia database carries out automatic labelling filing, so as to more effectively be managed.
Compared to the retrieval based on video, the retrieval scope of application based on audio frequency is wider, for example, can only obtain audio frequency number
According in the case of(For example, radiobroadcasting).Audio frequency is contained contributes to a considerable amount of information of understanding content, and compares
Audio file is generally less for video.Therefore, for example having to video file because network upload capacity is limited
It is compressed in the case of slightly obscuring, it is more clear that audio frequency but can be made to.
However, the audio index and search method of prior art have many defects.First, existing audio index and retrieval
Method needs substantial amounts of manual markings.For example for audio frequency website, generally there is substantial amounts of unmarked file or simple marking
File, without well description between these files, and lack and recommend to connect with the effective dependency of other data.Work
Personnel can only be manually high to the famous program in part or visit capacity file carry out manual markings and recommend connection.Therefore, so
Audio index and search method be simply possible to use in specific area and limited set of data samples.
Secondly, existing audio index and search method are based only on audio indicia and are modeled in itself, and this can cause rope
Draw inaccurate with retrieval result.For example, equally it is current sound, under natural river background mode and under family kitchen background mode
Meaning be diverse.Again for example, brouhaha is also different in entertainment, talk show or sports cast
's.If one section of river singing of the stream of user input is used as sample, it is desirable to similar material is retrieved from multimedia database, that
Existing audio search method is given including the current sound under natural river pattern and under family kitchen pattern in which can not differentiate between
Data file.Obviously, in the case where context is not considered, the result of many audio retrievals is inaccurate.
Again, existing audio search method generally adopts single sequential search strategy, i.e., first by audio data segment, connect
Carries out Classification and Identification for per section.Thus, the mistake in previous steps can affect the implementing result of subsequent step, cause progressively
In accumulating last retrieval result so that retrieval result is inaccurate or even completely offsets from searched targets.
Accordingly, it would be desirable to a kind of audio search method performed automatically without the need for manually participating in labelling and system.
Further, need a kind of based on background mode and the audio search method of audio class similarity can be considered and be
System.
Further, needs are a kind of can automatically eliminate cumulative error so as to provide the audio frequency of more accurate retrieval result
Search method and system.
The content of the invention
It is an object of the present invention to automatically source audio data are carried out with the labelling based on pattern and modeling, and consider
Audio class similarity ground provides accurate audio retrieval result.
For this purpose, the audio search method and system of the present invention are segmented by iteration processing come to source audio number with cluster integration
According to automated audio labelling is carried out, the decision tree based on background mode and the leaf node being directed on decision tree are built in each iteration
Training segmentation markers model, is finally based on model comparision and with reference to audio class similarity providing audio retrieval result.
According to the first aspect of the invention, there is provided a kind of audio search method based on pattern, including:Based on pattern pair
Multiple source audio data are marked, to obtain the audio indicia sequence of each source audio data;Obtain the sound of target audio data
Frequency labelled sequence;The audio indicia sequence of audio indicia sequence and each source audio data based on target audio data, according to pre-
Determine the matching degree that matched rule is determined between target audio data and source audio data;And output matching degree is higher than predetermined matching
The source audio data of degree threshold value, as retrieval result.
In one embodiment, multiple source audio data are marked including for each source audio number based on pattern
Operate as follows according to performing:(a)Each source audio data is divided, to obtain multiple segmentations;(b)It is multiple based on what is obtained
Segmentation, using clustering algorithm the audio class sequence of each source audio data is determined;(c)According to for multiple source audio data
Determined by audio class sequence, based on mode construction decision tree;(d)For each leaf node on decision tree, training segmentation mark
Note model;(e)Trained segmentation markers model is utilized, the audio indicia sequence of each source audio data is obtained and is adjusted to this
The division of source audio data;And(f)In the case where predetermined iterated conditional is met, repeat aforesaid operations(b)Extremely(e).
According to the second aspect of the invention, there is provided a kind of audio retrieval system based on pattern, including:Labelling apparatus,
It is configured to pattern to be marked multiple source audio data, to obtain the audio indicia sequence of each source audio data;Target
Acquisition device, is configured to obtain the audio indicia sequence of target audio data;Matching degree determining device, is configured to the mesh
Each source audio data that the audio indicia sequence and the labelling apparatus of the target audio data that mark acquisition device is obtained is obtained
Audio indicia sequence, according to predetermined matched rule the matching degree between target audio data and source audio data is determined;And inspection
Rope output device, is configured as output to the source sound of the matching degree that determined by the matching degree determining device higher than predetermined matching degree threshold value
Frequency evidence, as retrieval result.
In one embodiment, the labelling apparatus include:Device is divided, is configured to that each source audio data is carried out to draw
Point, to obtain multiple segmentations;Clustering apparatus, are configured to the multiple segmentations for being obtained, and using clustering algorithm each source is determined
The audio class sequence of voice data;Decision tree construction device, is configured to be directed to multiple source audios according to the clustering apparatus
The audio class sequence that data determine, based on mode construction decision tree;Model training apparatus, are configured to by the decision tree structure
Each leaf node built on the decision tree of device structure, trains segmentation markers model;Segmentation adjusting apparatus, are configured to using by institute
The segmentation markers model of model training apparatus training is stated, the audio indicia sequence of each source audio data is obtained and is adjusted to the source
The division of voice data;And iterated conditional judgment means, it is configured to judge whether to meet predetermined iterated conditional.
Using the method for the present invention and system, audio retrieval can be automatically performed without the need for manually participating in labelling.
Using the method for the present invention and system, audio class labelling can be made iteratively based on background mode, so as to provide
More accurate rational audio retrieval result.
Using the method for the present invention and system, it can be considered that audio class similarity and carrying out audio frequency inspection with reference to background mode
Rope.
Description of the drawings
Disclosure illustrative embodiments are described in more detail by combining accompanying drawing, the disclosure above-mentioned and its
Its purpose, feature and advantage will be apparent from, wherein, in disclosure illustrative embodiments, identical reference number
Typically represent same parts.
Fig. 1 shows the block diagram for being suitable to the exemplary computer system/server for realizing embodiment of the present invention.
Fig. 2 is the general flow chart exemplified with the audio search method based on pattern according to embodiments of the present invention.
Fig. 3 schematically shows an example of audio class sequence.
Fig. 4 is for carrying out to source audio data based on the audio class labelling of pattern exemplified with according to embodiments of the present invention
Process flow chart.
Fig. 5 schematically shows an example of clustering processing.
Fig. 6 is for the flow chart based on the process of mode construction decision tree exemplified with according to embodiments of the present invention.
Fig. 7 schematically shows decision tree and builds the example for processing.
Fig. 8 be exemplified with it is according to embodiments of the present invention for determine between target audio data and source audio data
The flow chart of the process with degree.
Fig. 9 shows the functional block diagram of the audio retrieval system based on pattern according to embodiments of the present invention.
Specific embodiment
The preferred implementation of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Preferred implementation, however, it is to be appreciated that may be realized in various forms the disclosure and the embodiment party that should not be illustrated here
Formula is limited.Conversely, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosure
Scope intactly conveys to those skilled in the art.
Person of ordinary skill in the field knows that the present invention can be implemented as system, method or computer program.
Therefore, the disclosure can be implemented as following form, i.e.,:Can be completely hardware, can also be completely software(Including
Firmware, resident software, microcode etc.), can also be the form that hardware and software is combined, referred to generally herein as " circuit ", " mould
Block " or " system ".Additionally, in certain embodiments, the present invention is also implemented as in one or more computer-readable mediums
In computer program form, in the computer-readable medium include computer-readable program code.
The combination in any of one or more computer-readable media can be adopted.Computer-readable medium can be calculated
Machine readable signal medium or computer-readable recording medium.Computer-readable recording medium for example can be --- but do not limit
In --- the system of electricity, magnetic, optical, electromagnetic, infrared ray or quasiconductor, device or device, or arbitrarily more than combination.Calculate
The more specifically example of machine readable storage medium storing program for executing(Non exhaustive list)Including:Electrical connection with one or more wires, just
Take formula computer disk, hard disk, random access memory(RAM), read only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this document, computer-readable recording medium can be it is any comprising or storage journey
The tangible medium of sequence, the program can be commanded execution system, device, and either device is used or in connection.
Computer-readable signal media can include the data signal propagated in a base band or as a carrier wave part,
Wherein carry computer-readable program code.The data signal of this propagation can take various forms, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or
Transmit for by instruction execution system, device, either device to be used or program in connection.
The program code included on computer-readable medium can with any appropriate medium transmission, including --- but do not limit
In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
Computer for performing present invention operation can be write with one or more programming language or its combination
Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++,
Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
Fully perform on the user computer, partly perform on the user computer, perform as an independent software kit, portion
Part on the user computer is divided to perform on the remote computer or perform on remote computer or server completely.
In being related to the situation of remote computer, remote computer can be by the network of any kind --- including LAN (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer(For example carried using Internet service
Come by Internet connection for business).
Method, device below with reference to the embodiment of the present invention(System)With the flow chart of computer program and/or
The block diagram description present invention.It should be appreciated that each square frame in each square frame and flow chart and/or block diagram of flow chart and/or block diagram
Combination, can be realized by computer program instructions.These computer program instructions can be supplied to general purpose computer, special
The processor of computer or other programmable data processing units, so as to produce a kind of machine, these computer program instructions
Performed by computer or other programmable data processing units, generate in flowchart and/or the square frame in block diagram and advise
The device of fixed function/operation.
These computer program instructions can also be stored in can cause computer or other programmable data processing units
In the computer-readable medium for working in a specific way, so, the instruction being stored in computer-readable medium just produces one
Command device (the instruction of function/operation specified in the individual square frame including in flowchart and/or block diagram
Means manufacture)(manufacture).
Computer program instructions can also be loaded into computer, other programmable data processing units or miscellaneous equipment
On so that series of operation steps is performed on computer, other programmable data processing units or miscellaneous equipment, in terms of producing
The process that calculation machine is realized, so that the instruction performed on computer or other programmable devices can provide flowchart
And/or the process of function/operation specified in the square frame in block diagram.
Fig. 1 shows the block diagram for being suitable to the exemplary computer system/server 12 for realizing embodiment of the present invention.
The computer system/server 12 that Fig. 1 shows is only an example, should not be to the function of the embodiment of the present invention and use range
Bring any restriction.
As shown in figure 1, computer system/server 12 is showed in the form of universal computing device.Computer system/service
The component of device 12 can be including but not limited to:One or more processor or processing unit 16, system storage 28, connection
Different system component(Including system storage 28 and processing unit 16)Bus 18.
Bus 18 represents one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, AGP, processor or using various bus structures in any bus-structured local bus.Lift
For example, these architectures include but is not limited to industry standard architecture(ISA)Bus, MCA(MAC)
Bus, enhancement mode isa bus, VESA(VESA)Local bus and periphery component interconnection(PCI)Bus.
Computer system/server 12 typically comprises various computing systems computer-readable recording medium.These media can be appointed
What usable medium that can be accessed by computer system/server 12, including volatibility and non-volatile media, it is moveable and
Immovable medium.
System storage 28 can include the computer system readable media of form of volatile memory, such as random access memory
Memorizer(RAM)30 and/or cache memory 32.It is removable that computer system/server 12 may further include other
Dynamic/immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for
Read and write immovable, non-volatile magnetic media(Fig. 1 do not show, commonly referred to " hard disk drive ").Although not showing in Fig. 1
Going out, can providing for may move non-volatile magnetic disk(Such as " floppy disk ")The disc driver of read-write, and to removable
Anonvolatile optical disk(Such as CD-ROM, DVD-ROM or other optical mediums)The CD drive of read-write.In these cases,
Each driver can be connected by one or more data media interfaces with bus 18.Memorizer 28 can include at least one
Individual program product, the program product has one group(For example, at least one)Program module, these program modules are configured to perform
The function of various embodiments of the present invention.
With one group(At least one)Program/the utility 40 of program module 42, can be stored in such as memorizer 28
In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs
Module and routine data, potentially include the realization of network environment in each or certain combination in these examples.Program mould
Block 42 generally performs function and/or method in embodiment described in the invention.
Computer system/server 12 can also be with one or more external equipments 14(Such as keyboard, sensing equipment, aobvious
Show device 24 etc.)Communication, the equipment that can also enable a user to be interacted with the computer system/server 12 with one or more leads to
Letter, and/or any set with enable the computer system/server 12 to be communicated with one or more of the other computing device
It is standby(Such as network interface card, modem etc.)Communication.This communication can pass through input/output(I/O)Interface 22 is carried out.And
And, computer system/server 12 can also be by network adapter 20 and one or more network(Such as LAN
(LAN), wide area network(WAN)And/or public network, such as the Internet)Communication.As illustrated, network adapter 20 passes through bus
18 communicate with other modules of computer system/server 12.It should be understood that although not shown in can be with reference to computer
Systems/servers 12 use other hardware and/or software module, including but not limited to:At microcode, device driver, redundancy
Reason unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
As it was previously stated, the audio search method and system of the present invention are segmented by iteration processing come to source sound with cluster integration
Frequency builds the decision tree based on background mode according to automated audio labelling is carried out, in each iteration and for the leaf on decision tree
Node trains segmentation markers model, is finally based on model comparision and with reference to audio class similarity providing audio retrieval result.
Embodiments of the invention are specifically described below with reference to Fig. 2 to Fig. 9.Fig. 2 is exemplified with according to of the invention real
Apply the general flow chart of the audio search method 200 based on pattern of example.Firstly, it is necessary to being for example included in audio database
Multiple source audio data are carried out based on the audio class labelling of pattern, to obtain the audio indicia sequence of each source audio data(Step
202).
It should be noted that " audio class " referred to herein refers to one kind classification for audio frequency.Ideally,
" audio class " can be the event category involved by a section audio, for example shot, singing of the stream, cheer or shriek etc..However,
Generally, " audio class " not necessarily exactly corresponds to the event category involved by audio frequency, and it can be only based on specific
Audio processing algorithms(For example, clustering algorithm)Operation result, and may not possess semantic meaning.Need not know in the present invention
Knowing the event category that each audio class specifically represents can just carry out accurate audio indicia and retrieval, therefore of the invention
Audio class and search method are performed automatically without the need for supervision.
Voice data is made up of continuous or discrete many section audios, so herein referred " audio class sequence " is referred to
With the sequence of audio class of time, the audio class and its corresponding persistent period sequentially occurred in voice data is which described.Fig. 3
In show an example of audio class sequence ideally." background mode " or " pattern " referred to herein refers to sound
Frequency is according to involved ambient conditions, such as natural river, family kitchen, station, entertainment, talk show or sports cast
Deng.
Fig. 4 implements process 400 exemplified with one kind of step 202 in detail, wherein whole with cluster by iteration segmentation
Conjunction processes to carry out automated audio labelling to source audio data, and decision tree and pin based on background mode are built in each iteration
To the leaf node training segmentation markers model on decision tree.
Processing 400 can start at step 402.In step 402, to each the source sound in multiple source audio data
Frequency evidence is divided, to obtain multiple segmentations.In one embodiment, can according to source audio data in it is quiet carry out draw
Point.In another embodiment, source audio data can be divided according to the audio frequency window of scheduled duration.In another embodiment
In, can temporally be evenly dividing source audio data.In a further embodiment, can using quiet division, audio frequency window divide and
Any number of combinations to source audio data dividing in being temporally evenly dividing.
It should be noted that may be relatively coarse to the division result of source audio data in step 402.Changed by follow-up
Build for clustering processing, decision tree and process and model training process, and by adopting Viterbi algorithm, can obtain increasingly
Accurately divide.
Then, in step 404, the multiple segmentations for being obtained based on the division in step 402, are determined every using clustering algorithm
The audio class sequence of individual source audio data.In one example, using the audio frequency characteristics from the multiple stage extractions for being obtained come
Build mixed Gauss model(GMM).Once it is determined that model, it is possible to determine the distance of each audio class.Then, based on constructed
GMM, using clustering algorithm be based on special audio feature(For example, the audio frequency characteristics of time domain or frequency domain)With audio class distance, by
Level cluster and finally the audio class sequence of determination source audio data.
According to clustering algorithm and predetermined clusters criterion, clustering processing can stop at desired cluster level.In this example
The variable-definition of the level that clustering processing is stopped is " audio class ", and is " audio frequency by the variable-definition of each level under it
Subclass ".Correspondingly, a series of audio class being sequentially arranged may be constructed " audio class sequence ".As it was previously stated, should manage
Solution, the audio class obtained in step 404 and audio frequency subclass are probably without semantic meaning.
Fig. 5 shows an example of clustering processing, and each point in wherein L1 is represented according to from the extraction of multiple audio parsings
The GMM model variable that builds of audio frequency characteristics, L2, L3....Ln are represented using clustering algorithm based on specific time domain or frequency domain audio
The Audio clustering rank that feature and audio class distance are obtained, each point in wherein Ln(For example, a, b, c, d, e etc.)It is defined as sound
Frequency class, and each point in L2 to Ln-1 is considered the audio frequency subclass of the voice data.
Next, in a step 406, according to multiple audio class sequences that step 404 determines for multiple source audio data,
Based on mode construction decision tree.What Fig. 6 showed step 406 implements process 600 based on one kind of mode construction decision tree.
First, each audio class at step 602, in the audio class sequence that will be determined in step 404(A for example, in Fig. 5 in Ln levels,
b、c、d、e……)It is defined as the root node of decision tree.
Then, in step 604, based on being defined as context of the audio class of root node in audio class sequence, structure
Modeling formula problem set.Mode issue collection can build according to pre-defined rule, for example, cause the distinction of branch maximum.At one
In example, the context of audio class can refer to the audio class in audio class sequence before and after the audio class.Show another
In example, one or more obtained for the audio class in the clustering processing that the context of audio class can refer in step 404
Audio frequency subclass.The context of audio class can reflect to a certain extent the background mode of audio class.For example, for train vapour
The related audio class of whistling, if the audio class previous audio class in the sequence and broadcast acoustic correlation, and the audio class is in sequence
In latter audio class and noisy people's acoustic correlation, then be likely to be the background mode in railway station.But, if previous audio class with
Shot correlation, latter audio class and cheer acoustic correlation, that is likely to be the film scene pattern of " railway guerrilla forces " etc.
Finally, in step 606, with constructed mode issue collection, branch is carried out to the audio class in audio class sequence,
So as to build the leaf node of decision tree." leaf node of decision tree " referred to herein refer to do not possess in decision tree it is any downwards
Child node node.That is, any node for possessing downward child node is defined as " root node ".It should be noted that can
So that decision tree to be branched off into downwards destined node rank, such as when the audio indicia number included in each leaf node is less than pre-
Determine to terminate the structure to decision tree during threshold value.
Fig. 7 shows that decision tree builds the example for processing, wherein audio class b be, for example, Fig. 5 example in by poly-
An audio class in the audio class sequence that class process is obtained.Assume what is obtained for multiple source audio data by clustering processing
Four groups are had comprising audio class b in audio class sequence, as shown in fig. 7, being respectively(a-b+c)、(a-b+e)、(d-b+a)With
(d-b+c), wherein symbol "-" represents audio class b previous audio class in the sequence, and symbol "+" represents audio class b in sequence
In latter audio class.That is,(a-b+c)Represent that audio class b previous audio class in the sequence is a and latter audio class is
c。
Using based on context problem set, progressively by audio class b to inferior division until such as b1, b2, b3, b4 etc.
Leaf node.For example, can first select " whether comprising audio class a in context " as problem to carry out branch to audio class b,
Therefore, branch out(d-b+c)And it is defined as leaf node b1.Then, " whether previous audio class is a " conduct can be selected to ask
Topic comes further branch, thus branches out(d-b+a)And it is defined as leaf node b2.It is then possible to select " latter audio class
Whether it is c " as problem further branch, thus distinguish(a-b+e)With(a-b+c)And it is respectively defined as leaf node
B3 and b4.So far, the structure to decision tree is completed.
Fig. 4 is returned to, next, in a step 408, for each leaf node on decision tree, segmentation markers model is trained.
In one example, segmentation markers model can include hidden Markov model(HMM)And duration model.Then, utilize
The segmentation markers model trained, obtains the audio indicia sequence of each source audio data, and adjusts to the source audio data
Divide(Step 410).It should be noted that " audio indicia sequence " referred to herein is related to audio class sequence but is different from
Audio class sequence, it does not simultaneously correspond to the event category involved by audio frequency, and is only based on some audio processing algorithms(Example
Such as, Viterbi algorithm)Operation result, in order to follow-up matching treatment.In one embodiment of the invention, step 410
Can be realized by following operation:First, using the segmentation markers model in step 408 training, source audio data are determined
Audio class distance;Then, based on the segmentation markers model trained, using the audio frequency characteristics and institute extracted from source audio data really
Fixed audio class distance carries out Viterbi decodings;Finally, according to Viterbi decoded results, the audio frequency mark of source audio data is obtained
Note sequence, and adjust the division to source audio data.
Next, into determination step 412, it is determined whether meet predetermined iterated conditional.In one example, predetermined iteration
Condition can include:Predetermined segment difference is not less than to the adjustment amount of the division of source audio data, and/or, iterationses
Less than predetermined iterationses threshold value.
In the case of judging that needs are iterated in step 412, method 400 goes to step 404, with based in step
To carry out, clustering processing, decision tree build process and segmentation markers model training is processed for segmentation after readjusting in 410.And
Judge that iteration can be jumped out in step 412, then export the audio indicia sequence of obtained voice data in step 414.
In one embodiment of the invention, it may also be determined that source sound before step 402 is divided to voice data
Whether frequency evidence is speech data(Step 416).Source audio data included in audio database are probably speech data
Possibly non-speech data.Support vector machine well known in the art can be utilized(SVM)Method is carrying out the area of speech/non-speech
Point.Voice and non-voice are distinguished exactly, contribute to follow-up segmentation, cluster, decision tree structure and model training step.
Return now to the method 200 of Fig. 2, obtain in step 202. each source audio data audio indicia sequence it
Afterwards, method 200 proceeds to step 204.In step 204, the audio indicia sequence of target audio data is obtained.The present invention's
In one embodiment, the segmentation markers model trained at for example 408 the step of Fig. 4 can be based on, to target audio data
Viterbi decodings are carried out, to obtain the audio indicia sequence of the target audio data.
Next, at step 206, the audio indicia sequence and step of the target audio data obtained based on step 204 place
The audio indicia sequence of each source audio data obtained at rapid 202, according to predetermined matched rule target audio data and source are determined
Matching degree between voice data.
Fig. 8 shows that the one kind for the matching degree that step 206 sets the goal between voice data and source audio data really is concrete
Realize processing 800, wherein consider the matching of similarity between audio class and background mode retrieve and sort and
The related source audio data of target audio data.
First, at step 802, it is determined that the audio frequency between the audio class related to target audio data and source audio data
Class distance.For example audio class distance can be determined based on the segmentation markers model trained at 408 the step of Fig. 4.Then,
At step 804, by the way that the audio indicia sequence of target audio data is compared with the audio indicia sequence of source audio data
Compared with based on the audio class distance for determining in step 802 come sequence of calculation matching score.In one example, it is possible to use dynamic
The consolidation of state time(DTW)Algorithm calculates the audio indicia sequence and source sound of target audio data using audio class distance as weight
Similarity between the audio indicia sequence of frequency evidence, i.e. sequences match score.
Then, at step 806, by each audio frequency in the audio class sequence for counting target audio data and source audio data
The number of class, count matching score.For example, every kind of audio class can be counted and occurs how many times in specific time period.Count
Matching score is calculated to be contributed to finding similar background mode.Finally, in step 808, combined with each self-corresponding weighted value
The count matches score calculated in the sequences match score calculated in step 804 and step 806, so that it is determined that target audio data
With the matching degree of source audio data.It should be noted that sequences match score and each self-corresponding weighted value of count matches score
Can determine according to actual needs or based on experience value.In one example, sequences match and counting can only be considered
Any one in matching somebody with somebody.For example, sequences match score can be based only on determine target audio data and source audio data
With degree.
Fig. 2 is returned to, after the matching degree between target audio data and source audio data is determined in step 206, method
200 proceed to the source audio data of step 208, i.e. output matching degree higher than predetermined matching degree threshold value as retrieval result.Arrive this,
Method 200 terminates.In some embodiments, after retrieval result is determined, source audio data can also be added to audio frequency
Segmentation markers model in the step of in data base further to train such as Fig. 4 408.
Fig. 9 shows the functional block diagram of the audio retrieval system 900 based on pattern according to embodiments of the present invention.Audio frequency is examined
The functional module of cable system 900 being implemented in combination in by hardware, software or the hardware and software for realizing the principle of the invention.This
Art personnel are understandable that the functional module described in Fig. 9 can combine or be divided into submodule, from
And realize the principle of foregoing invention.Therefore, description herein can be supported to any possible of functions described herein module
Combine or divide or further limit.
Audio retrieval system 900 can be automatically based upon background mode and be made iteratively audio class without the need for manually participating in labelling
Labelling and retrieval, so as to provide more accurate rational audio retrieval result.Audio retrieval system 900 can include labelling apparatus
902nd, Target Acquisition device 904, matching degree determining device 906 and search and output device 908.
Labelling apparatus 902 are configured to enter the multiple source audio data being for example included in audio database based on pattern
Line flag, to obtain the audio indicia sequence of each source audio data.In one embodiment, labelling apparatus 902 can include drawing
Separating device 912, clustering apparatus 914, decision tree construction device 916, model training apparatus 918, segmentation adjusting apparatus 920 and repeatedly
For condition judgement device 922.Divide device 912 to be configured to divide each source audio data, to obtain multiple segmentations.
In one example, dividing device 912 by any one in following or can appoint multiple combinations to carry out source audio data
Divide:Quiet according to source audio data is divided;Source audio data are divided according to the audio frequency window of scheduled duration;
And temporally it is evenly dividing source audio data.In one embodiment, dividing device 912 includes being configured to determine source audio number
According to the speech recognition equipment for being whether speech data and the result of speech recognition equipment determination is configured to source audio data
Divided to obtain the division performs device of multiple segmentations.
Clustering apparatus 914 are configurable to based on the multiple segmentations for being obtained, and using clustering algorithm each source audio is determined
The audio class sequence of data.In one example, clustering apparatus 914 include:First cluster sub-device, is configured to using from being obtained
Multiple stage extractions audio frequency characteristics building GMM;With the second cluster sub-device, the first cluster sub-device is configured to
The GMM of structure, using clustering algorithm based on special audio feature and audio class distance, determines the audio class sequence of source audio data
Row.
Decision tree construction device 916 is configurable to the sound determined for multiple source audio data according to clustering apparatus 914
Frequency class sequence, based on mode construction decision tree.In one example, decision tree construction device 916 includes:First decision tree builds
Sub-device, is configured to define the audio class in the audio class sequence determined by clustering apparatus 914 as the root node of decision tree;The
Two decision trees build sub-device, are configured to be defined as the audio class of root node in audio frequency by the first decision tree structure sub-device
Context in class sequence, forming types problem set;And the 3rd decision tree build sub-device, be configured to constructed mould
Formula problem set, the audio class in the audio class sequence to determined by carries out branch, so as to build the leaf node of decision tree.
Model training apparatus 918 are configurable to each on the decision tree for being built by decision tree construction device 916
Leaf node, trains segmentation markers model.In one example, segmentation markers model is, for example, HMM and duration model.
Segmentation adjusting apparatus 920 can be configured to, with the segmentation markers model trained by model training apparatus 918, obtain
Obtain the division of the audio indicia sequence and adjustment of each source audio data to the source audio data.In one example, segmentation is adjusted
Engagement positions 920 include:First segmentation adjustment sub-device, is configured to using the segmentation markers mould trained by model training apparatus 918
Type, determines the audio class distance of source audio data;Second segmentation adjustment sub-device, is configured to trained segmentation markers mould
Type, the audio class distance determined using the audio frequency characteristics extracted from source audio data and by the first segmentation adjustment sub-device is carried out
Viterbi is decoded;And the 3rd segmentation adjustment sub-device, be configured to according to by second segmentation adjust sub-device obtain
Viterbi decoded results, obtain the audio indicia sequence of source audio data, and adjust the division to source audio data.
Iterated conditional judgment means 922 can be configured to judge whether to meet predetermined iterated conditional.In one example,
Predetermined iterated conditional can include:Predetermined segment difference is not less than to the adjustment amount of the division of source audio data, and/or,
Iterationses are less than predetermined iterationses threshold value.
Target Acquisition device 904 can be configured to obtain the audio indicia sequence of target audio data.In an enforcement
In example, Target Acquisition device 904 may be configured to the segmentation markers model based on the training of model training apparatus 918, to mesh
Mark voice data carries out Viterbi decodings, to obtain the device of the audio indicia sequence of the target audio data.
Matching degree determining device 906 can be configured to the target audio data obtained based on Target Acquisition device 904
The audio indicia sequence of each source audio data in the audio database that audio indicia sequence and labelling apparatus 902 are obtained, according to
Predetermined matched rule determines the matching degree between target audio data and source audio data.
In one embodiment, matching degree determining device 906 includes:Audio class similarity determining device, is configured to determine
Audio class distance between the audio class related to target audio data and source audio data;Sequence comparison means, is configured to lead to
Cross and be compared the audio indicia sequence of target audio data with the audio indicia sequence of source audio data, based on by audio class
The audio class distance that similarity determining device determines carrys out sequence of calculation matching score;Comparison means is counted, counting is configured to pass
The number of each audio class, count matching score in the audio class sequence of target audio data and source audio data;And
With degree computing device, it is configured to the respective weighted value sequences match score that calculated by sequence comparison means of combination and by counting
The count matches score that comparison means is calculated, calculates the matching degree of target audio data and source audio data.
Search and output device 908 can be configured to export what is determined by matching degree determining device 906 in audio database
Matching degree is higher than the source audio data of predetermined matching degree threshold value, used as retrieval result.
Using the method for the present invention and system, audio retrieval can be automatically performed without the need for manually participating in labelling.
Using the method for the present invention and system, audio class labelling can be made iteratively based on background mode, so as to provide
More accurate rational audio retrieval result.
Using the method for the present invention and system, it can be considered that audio class similarity and carrying out audio frequency inspection with reference to background mode
Rope.
Flow chart and block diagram in accompanying drawing shows system, method and the computer journey of multiple embodiments of the invention
The architectural framework in the cards of sequence product, function and operation.At this point, each square frame in flow chart or block diagram can generation
A part for table one module, program segment or code a, part for the module, program segment or code is used comprising one or more
In the executable instruction of the logic function for realizing regulation.It should also be noted that in some are as the realization replaced, being marked in square frame
The function of note can also be with different from the order generation of institute's labelling in accompanying drawing.For example, two continuous square frames can essentially base
Originally it is performed in parallel, they can also be performed in the opposite order sometimes, this is depending on involved function.It is also noted that
It is, the combination of each square frame and block diagram and/or the square frame in flow chart in block diagram and/or flow chart can to use and perform rule
Fixed function or the special hardware based system of operation, or can be with the groups of specialized hardware and computer instruction realizing
Close to realize.
It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.In the case of the scope and spirit without departing from illustrated each embodiment, for this skill
Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport
Best explaining principle, practical application or the technological improvement to the technology in market of each embodiment, or lead this technology
Other those of ordinary skill in domain are understood that each embodiment disclosed herein.
Claims (20)
1. a kind of audio search method based on pattern, including:
Multiple source audio data are marked based on pattern, to obtain the audio indicia sequence of each source audio data;
Obtain the audio indicia sequence of target audio data;
The audio indicia sequence of audio indicia sequence and each source audio data based on target audio data, according to predetermined matching rule
Then determine the matching degree between target audio data and source audio data;And
Output matching degree is higher than the source audio data of predetermined matching degree threshold value, used as retrieval result;
Wherein, based on pattern multiple source audio data are marked including:
A () divides to each source audio data, to obtain multiple segmentations;
B () determines the audio class sequence of each source audio data based on the multiple segmentations for being obtained using clustering algorithm;
C () basis is directed to audio class sequence determined by multiple source audio data, based on mode construction decision tree;
D () trains segmentation markers model for each leaf node on decision tree;
E () utilizes trained segmentation markers model, obtain the audio indicia sequence of each source audio data and adjust to the source
The division of voice data;And
F () repeats aforesaid operations (b) to (e) in the case where predetermined iterated conditional is met.
2. method according to claim 1, wherein, each source audio data is carried out dividing includes any one following
Or it is multiple:
Quiet according to source audio data is divided;
Source audio data are divided according to the audio frequency window of scheduled duration;And
Temporally it is evenly dividing source audio data.
3. method according to claim 1, wherein, determine each source using clustering algorithm based on the multiple segmentations for being obtained
The audio class sequence of voice data includes:
Mixed Gauss model GMM is built using the audio frequency characteristics from the multiple stage extractions for being obtained;With
Based on constructed GMM, using clustering algorithm based on special audio feature and audio class distance, source audio data are determined
Audio class sequence.
4. method according to claim 1, wherein, according to the audio class sequence determined by multiple source audio data
Row are included based on mode construction decision tree:
Root node of the audio class in audio class sequence determined by definition as decision tree;
Based on being defined as context of the audio class of root node in audio class sequence, forming types problem set;And
Based on constructed mode issue collection, the audio class in the audio class sequence to determined by carries out branch, sentences so as to build
The leaf node certainly set.
5. method according to claim 3, wherein, for each leaf node training segmentation markers model bag on decision tree
Include:
For each leaf node on decision tree, training hidden Markov model HMM and duration model.
6. method according to claim 1, wherein, utilize trained segmentation markers model to obtain the sound of source audio data
The division of frequency labelled sequence and adjustment to source audio data includes:
Trained segmentation markers model is utilized, the audio class distance of source audio data is determined;
Based on the segmentation markers model trained, the audio frequency characteristics extracted from source audio data audio frequency with determined by is utilized
Class distance carries out Viterbi decodings;And
According to Viterbi decoded results, the audio indicia sequence of source audio data is obtained, and adjustment is drawn to source audio data
Point.
7. method according to claim 1, wherein, source audio data are divided to be included with obtaining multiple segmentations:
Determine whether source audio data are speech data;And
Based on the result of the determination, source audio data are divided to obtain multiple segmentations.
8. method according to claim 1, wherein, the predetermined iterated conditional includes any one or more following:
Predetermined segment difference is not less than to the adjustment amount of the division of source audio data;And
Iterationses are less than predetermined iterationses threshold value.
9. method according to claim 1, wherein, obtaining the audio indicia sequence of target audio data includes:
Based on the segmentation markers model trained, Viterbi decodings are carried out to the target audio data, to obtain the target sound
The audio indicia sequence of frequency evidence.
10. method according to any one of claim 1 to 9, wherein, target sound frequency is determined according to predetermined matched rule
Include according to the matching degree between source audio data:
It is determined that the audio class distance between the audio class related to target audio data and source audio data;
By the way that the audio indicia sequence of target audio data is compared with the audio indicia sequence of source audio data, based on institute
It is determined that audio class distance carry out sequence of calculation matching score;
By the number of each audio class in the audio class sequence for counting target audio data and source audio data, count matching
Score;And
Calculated sequences match score and count matches score is combined with respective weighted value, target audio data and source is calculated
The matching degree of voice data.
A kind of 11. audio retrieval systems based on pattern, including:
Labelling apparatus, are configured to pattern and multiple source audio data are marked, to obtain the audio frequency of each source audio data
Labelled sequence;
Target Acquisition device, is configured to obtain the audio indicia sequence of target audio data;
Matching degree determining device, is configured to the audio indicia sequence of the target audio data that the Target Acquisition device is obtained
The audio indicia sequence of each source audio data obtained with the labelling apparatus, according to predetermined matched rule target sound frequency is determined
According to the matching degree between source audio data;And
Search and output device, the matching degree for being configured as output to be determined by the matching degree determining device is higher than predetermined matching degree threshold value
Source audio data, as retrieval result;
Wherein, the labelling apparatus include:
Device is divided, is configured to divide each source audio data, to obtain multiple segmentations;
Clustering apparatus, are configured to the multiple segmentations for being obtained, and using clustering algorithm the audio frequency of each source audio data is determined
Class sequence;
Decision tree construction device, is configured to the audio class sequence determined for multiple source audio data according to the clustering apparatus
Row, based on mode construction decision tree;
Model training apparatus, each leaf node being configured on the decision tree by decision tree construction device structure, instruction
Practice segmentation markers model;
Segmentation adjusting apparatus, are configured to, using the segmentation markers model trained by the model training apparatus, obtain each source sound
Division of the audio indicia sequence and adjustment of frequency evidence to the source audio data;And
Iterated conditional judgment means, are configured to judge whether to meet predetermined iterated conditional.
12. systems according to claim 11, wherein, the division device is by any one or more following come right
Each source audio data is divided:
Quiet according to source audio data is divided;
Source audio data are divided according to the audio frequency window of scheduled duration;And
Temporally it is evenly dividing source audio data.
13. systems according to claim 11, wherein, the clustering apparatus include:
First cluster sub-device, is configured to build mixed Gaussian mould using the audio frequency characteristics from the multiple stage extractions for being obtained
Type GMM;With
Second cluster sub-device, is configured to the GMM that the first cluster sub-device builds, using clustering algorithm based on specific
Audio frequency characteristics and audio class distance, determine the audio class sequence of source audio data.
14. systems according to claim 11, wherein, the decision tree construction device includes:
First decision tree builds sub-device, and the audio class for being configured to define in the audio class sequence determined by the clustering apparatus is made
For the root node of decision tree;
Second decision tree builds sub-device, is configured to build the audio class that sub-device is defined as root node by the first decision tree
Context in audio class sequence, forming types problem set;And
3rd decision tree builds sub-device, constructed mode issue collection is configured to, in the audio class sequence to determined by
Audio class carry out branch, so as to build the leaf node of decision tree.
15. systems according to claim 13, wherein, the model training apparatus include:It is configured on decision tree
Each leaf node training hidden Markov model HMM and duration model device.
16. systems according to claim 11, wherein, the segmentation adjusting apparatus include:
First segmentation adjustment sub-device, is configured to, using the segmentation markers model trained by the model training apparatus, determine source
The audio class distance of voice data;
Second segmentation adjustment sub-device, is configured to trained segmentation markers model, using carrying from the source audio data
The audio frequency characteristics for taking and the audio class distance determined by the described first segmentation adjustment sub-device carry out Viterbi decodings;And
3rd segmentation adjustment sub-device, is configured to adjust the Viterbi decoding knots that sub-device is obtained according to by the described second segmentation
Really, the audio indicia sequence of source audio data is obtained, and adjusts the division to source audio data.
17. systems according to claim 11, wherein, the division device includes:
Speech recognition equipment, is configured to determine whether source audio data are speech data;With
Divide performs device, be configured to the result that the speech recognition equipment determines, source audio data are divided with
Obtain multiple segmentations.
18. systems according to claim 11, wherein, the predetermined iterated conditional includes any one following or many
It is individual:
Predetermined segment difference is not less than to the adjustment amount of the division of source audio data;And
Iterationses are less than predetermined iterationses threshold value.
19. systems according to claim 11, wherein, the Target Acquisition device includes:
Trained segmentation markers model is configured to, Viterbi decodings is carried out to the target audio data, to be somebody's turn to do
The device of the audio indicia sequence of target audio data.
20. systems according to any one of claim 11 to 19, wherein, the matching degree determining device includes:
Audio class similarity determining device, is configured to determine between the audio class related to target audio data and source audio data
Audio class distance;
Sequence comparison means, is configured to pass the audio indicia by the audio indicia sequence of target audio data and source audio data
Sequence is compared, and is matched come the sequence of calculation based on the audio class distance determined by the audio class similarity determining device
Point;
Comparison means is counted, each audio class in the audio class sequence for counting target audio data and source audio data is configured to pass
Number, count matching score;And
Matching degree computing device, is configured to be obtained by the sequences match that the sequence comparison means is calculated with the combination of respective weighted value
The count matches score divided and calculated by the counting comparison means, calculating target audio data are matched with source audio data
Degree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210505562.2A CN103853749B (en) | 2012-11-30 | 2012-11-30 | Mode-based audio retrieval method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210505562.2A CN103853749B (en) | 2012-11-30 | 2012-11-30 | Mode-based audio retrieval method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853749A CN103853749A (en) | 2014-06-11 |
CN103853749B true CN103853749B (en) | 2017-04-26 |
Family
ID=50861416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210505562.2A Active CN103853749B (en) | 2012-11-30 | 2012-11-30 | Mode-based audio retrieval method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853749B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335466A (en) * | 2015-09-25 | 2016-02-17 | 百度在线网络技术(北京)有限公司 | Audio data retrieval method and apparatus |
CN107293308B (en) * | 2016-04-01 | 2019-06-07 | 腾讯科技(深圳)有限公司 | A kind of audio-frequency processing method and device |
CN105955699A (en) * | 2016-06-14 | 2016-09-21 | 珠海格力电器股份有限公司 | Remote control equipment position determining method and device and terminal equipment |
CN107665240A (en) * | 2017-09-01 | 2018-02-06 | 北京雷石天地电子技术有限公司 | audio file clustering method and device |
CN110472190A (en) * | 2018-05-09 | 2019-11-19 | 北京京东尚科信息技术有限公司 | The method and apparatus for filling ordered sequence |
CN109965764A (en) * | 2019-04-18 | 2019-07-05 | 科大讯飞股份有限公司 | Closestool control method and closestool |
CN110399521B (en) * | 2019-06-21 | 2023-06-06 | 平安科技(深圳)有限公司 | Music retrieval method, system, computer device and computer readable storage medium |
CN110688414B (en) * | 2019-09-29 | 2022-07-22 | 京东方科技集团股份有限公司 | Method and device for processing time series data and computer readable storage medium |
CN111460215B (en) * | 2020-03-30 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Audio data processing method and device, computer equipment and storage medium |
CN112015942B (en) * | 2020-08-28 | 2024-08-30 | 上海掌门科技有限公司 | Audio processing method and device |
CN113947855A (en) * | 2021-09-18 | 2022-01-18 | 中标慧安信息技术股份有限公司 | Intelligent building personnel safety alarm system based on voice recognition |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI220483B (en) * | 2002-10-17 | 2004-08-21 | Inst Information Industry | Creation method of search database for audio/video information and song search system |
WO2008126262A1 (en) * | 2007-03-30 | 2008-10-23 | Pioneer Corporation | Content explanation apparatus and method |
CN101364222A (en) * | 2008-09-02 | 2009-02-11 | 浙江大学 | Two-stage audio search method |
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN102033927A (en) * | 2010-12-15 | 2011-04-27 | 哈尔滨工业大学 | Rapid audio searching method based on GPU (Graphic Processing Unit) |
-
2012
- 2012-11-30 CN CN201210505562.2A patent/CN103853749B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI220483B (en) * | 2002-10-17 | 2004-08-21 | Inst Information Industry | Creation method of search database for audio/video information and song search system |
WO2008126262A1 (en) * | 2007-03-30 | 2008-10-23 | Pioneer Corporation | Content explanation apparatus and method |
CN101364222A (en) * | 2008-09-02 | 2009-02-11 | 浙江大学 | Two-stage audio search method |
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN102033927A (en) * | 2010-12-15 | 2011-04-27 | 哈尔滨工业大学 | Rapid audio searching method based on GPU (Graphic Processing Unit) |
Non-Patent Citations (1)
Title |
---|
基于特征相似度的音频检索技术研究;潘俊兰;《中国优秀硕士学位论文全文数据库信息科技辑》;20111215;第18页第4.1.2节,第21页-25页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103853749A (en) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853749B (en) | Mode-based audio retrieval method and system | |
US10671666B2 (en) | Pattern based audio searching method and system | |
US10133538B2 (en) | Semi-supervised speaker diarization | |
CN110557589B (en) | System and method for integrating recorded content | |
CN111243602B (en) | Voiceprint recognition method based on gender, nationality and emotion information | |
CN110473566A (en) | Audio separation method, device, electronic equipment and computer readable storage medium | |
CN107785018B (en) | Multi-round interaction semantic understanding method and device | |
CN108255840B (en) | Song recommendation method and system | |
CN103500579B (en) | Audio recognition method, Apparatus and system | |
CN110189757A (en) | A kind of giant panda individual discrimination method, equipment and computer readable storage medium | |
CN106887225A (en) | Acoustic feature extracting method, device and terminal device based on convolutional neural networks | |
CN109859772A (en) | Emotion identification method, apparatus and computer readable storage medium | |
CN105224581B (en) | The method and apparatus of picture are presented when playing music | |
CN107679031B (en) | Advertisement and blog identification method based on stacking noise reduction self-coding machine | |
CN105741835A (en) | Audio information processing method and terminal | |
CN111462774B (en) | Music emotion credible classification method based on deep learning | |
Rumagit et al. | Model comparison in speech emotion recognition for Indonesian language | |
CN104240719A (en) | Feature extraction method and classification method for audios and related devices | |
CN109002529A (en) | Audio search method and device | |
Vrysis et al. | Mobile audio intelligence: From real time segmentation to crowd sourced semantics | |
JP2015001695A (en) | Voice recognition device, and voice recognition method and program | |
CN104239372B (en) | A kind of audio data classification method and device | |
CN106708890A (en) | Intelligent high fault-tolerant video identification system based on multimoding fusion and identification method thereof | |
CN110019556A (en) | A kind of topic news acquisition methods, device and its equipment | |
CN110708619B (en) | Word vector training method and device for intelligent equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |