CN203522960U - Multimedia playing device with functions of voice controlling and humming searching - Google Patents

Multimedia playing device with functions of voice controlling and humming searching Download PDF

Info

Publication number
CN203522960U
CN203522960U CN201320422658.2U CN201320422658U CN203522960U CN 203522960 U CN203522960 U CN 203522960U CN 201320422658 U CN201320422658 U CN 201320422658U CN 203522960 U CN203522960 U CN 203522960U
Authority
CN
China
Prior art keywords
module
voice
central processing
processing unit
humming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201320422658.2U
Other languages
Chinese (zh)
Inventor
赵欢
王飞
陈佐
干文洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201320422658.2U priority Critical patent/CN203522960U/en
Application granted granted Critical
Publication of CN203522960U publication Critical patent/CN203522960U/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The utility model discloses a multimedia playing device with functions of voice controlling and humming searching. The multimedia playing device comprises a human-computer interaction module, a voice input module, a voice data processing module, a central processor, a media storage module, a playing and decoding module, an audio output module and a network interface module, wherein the voice input module is connected with the input end of the voice data processing module, the voice data processing module is connected with the central processor, the central processor is further connected with the human-computer interaction module, the media storage module and the network interface module, and the output end of the central processor is connected with the audio output module through the playing and decoding module. The multimedia playing device with the functions of voice controlling and humming searching has the advantages of being capable of releasing hands of a user, bringing good experiences to the user, convenient to use, accurate in humming detection and wide in application range.

Description

The multimedia playing apparatus with voice control and singing search function
Technical field
The utility model relates to multimedia equipment field, is specifically related to a kind of multimedia playing apparatus with voice control and singing search function.
Background technology
The music player of prior art possesses is only to conventional the controlling of the manual broadcasting of audio file, time-out, upper song etc., for the resource searching of digital audio, is only also by user, to input song title, chanteur etc. restrictive condition to complete.These some restrictive conditions make user when using player, can not liberate out the both hands of oneself, and the song for own uncertain title cannot search song resource simultaneously, and these all greatly reduce user's Experience Degree.
Utility model content
The technical problems to be solved in the utility model is to provide that a kind of both hands, user that can liberate user experiences, easy to use, humming detects accurately, applied range have that voice are controlled and the multimedia playing apparatus of singing search function.
In order to solve the problems of the technologies described above, the technical solution adopted in the utility model is:
A kind of multimedia playing apparatus with voice control and singing search function, comprise human-computer interaction module, voice input module, language data process module, central processing unit, media store module, broadcast decoder module, audio frequency output module and Network Interface Module, described voice input module is connected with the input of language data process module, described language data process module is connected with central processing unit, described central processing unit also respectively with human-computer interaction module, media store module, Network Interface Module is connected, and the output of described central processing unit is connected with audio frequency output module by broadcast decoder module.
Further improvement as technique scheme:
Described human-computer interaction module is touch display screen module.
Described language data process module is dsp processor.
Described Network Interface Module is a kind of in 3G interface module, GPRS interface module, WIFI interface module.
The utlity model has following advantage: the utility model comprises human-computer interaction module, voice input module, central control module, media store module, broadcast decoder module, audio frequency output module, humming data processing module and Network Interface Module, central control module respectively with human-computer interaction module, voice input module, broadcast decoder module, humming data processing module, Network Interface Module is connected, humming data processing module respectively with media store module, Network Interface Module is connected, the data input pin of broadcast decoder module is connected with media store module, the voice data output of broadcast decoder module is connected with audio frequency output module, the utility model is controlled and singing search in conjunction with voice, user can realize audio frequency Play Control and the resource management of two kinds of modes, the first is by the mode of touch screen, the second is by control and the management of the mode completion system of phonetic entry control command, voice are controlled mode Network Based and are transmitted identification request, there is the simple feature that realizes, user can realize without manually controlling by humming data processing module simultaneously, user's both hands have been liberated, be particularly suitable for user vehicle-mounted, in the occasions such as process operation.Humming data processing module utilizes user to provide simple one section of humming melody to realize the search of digital audio resource, and return to by the mode of Internet Transmission the audio resource that user searches for, avoided user when forgetting song title and chanteur, cannot obtain the awkward situation of audio resource.In sum, the utility model has promoted adaptability and the user experience of application scenario greatly with respect to traditional audio playing apparatus, can liberate user's both hands, have advantages of that user experiences, easy to use, humming detects accurately, applied range.
Accompanying drawing explanation
Fig. 1 is the implementing procedure schematic diagram of the utility model embodiment method.
Fig. 2 is the frame structure schematic diagram of the utility model embodiment.
Fig. 3 is the circuit theory schematic diagram of human-computer interaction module in the utility model embodiment.
Fig. 4 is the circuit theory schematic diagram of voice input module in the utility model embodiment, broadcast decoder module and audio frequency output module.
Fig. 5 is the circuit theory schematic diagram of Network Interface Module in the utility model embodiment.
Fig. 6 is the frame structure schematic diagram of language data process module in the utility model embodiment.
Fig. 7 is the frame structure schematic diagram of match search service end in the utility model embodiment.
Fig. 8 is the operation principle schematic diagram of voice control model in the utility model embodiment.
Fig. 9 is the operation principle schematic diagram of singing search pattern in the utility model embodiment.
Marginal data: 1, human-computer interaction module; 2, voice input module; 3, language data process module; 31, pretreatment module; 311, divide frame submodule; 312, windowing submodule; 313, short-time energy calculating sub module; 314, zero-crossing rate calculating sub module; 315, end points judgement submodule; 316, voice enhancer module; 32, humming data processing module; 321, short-time average magnitade difference function calculating sub module; 322, fundamental tone sequential extraction procedures submodule; 323, sequence of notes conversion submodule; 4, central processing unit; 5, media store module; 6, broadcast decoder module; 7, audio frequency output module; 8, Network Interface Module; 9, match search service end; 91, breadth first search's submodule; 92, meticulous matched sub-block; 10, speech-recognition services end.
Embodiment
As shown in Figure 1, the multimedia playing apparatus that the present embodiment has voice control and singing search function comprises human-computer interaction module 1, voice input module 2, language data process module 3, central processing unit 4, media store module 5, broadcast decoder module 6, audio frequency output module 7 and Network Interface Module 8, voice input module 2 is connected with the input of language data process module 3, language data process module 3 is connected with central processing unit 4, central processing unit 4 also respectively with human-computer interaction module 1, media store module 5, Network Interface Module 8 is connected, and the output of central processing unit 4 is connected with audio frequency output module 7 by broadcast decoder module 6.
In the present embodiment, human-computer interaction module 1 selects to comprise the mode of operation of voice control model and singing search pattern for receiving user, voice input module 2 is for gathering speech data, language data process module 3 is for the speech data gathering is carried out to preliminary treatment, and further according to pretreated speech data, extracts fundamental tone sequence and be converted to sequence of notes under singing search pattern, central processing unit 4 carries out speech recognition for the speech-recognition services end 10 under voice control model, pretreated speech data and syntax rule being uploaded on the Internet, the recognition result returning according to speech-recognition services end 10 carries out Play Control or resource management to local multimedia resource, and sequence of notes is sent to the match search service end 9 on the Internet under singing search pattern, by match search service end 9, in note property data base, carry out match search and find the multimedia resource identity information matching with sequence of notes, and according to multimedia resource identity information from multimedia resource corresponding to the Internet download and store media store module into, media store module 5 is for storing local multimedia resource, broadcast decoder module 6 is decoded for the multimedia resource to Play Control, audio frequency output module 7 is for exporting the audio frequency obtaining after multimedia resource decoding, Network Interface Module 8 is used to central processing unit that the function of internet access speech-recognition services end and match search service end is provided, match search service end 9 finds the multimedia resource identity information matching with sequence of notes and returns to central processing unit 4 for carry out match search at note property data base, the input of language data process module 3 is connected with voice input module 2, central processing unit 4 respectively with human-computer interaction module 1, language data process module 3, media store module 5 is connected, central processing unit 4 is connected with match search service end 9 with the speech-recognition services end 10 of the Internet by Network Interface Module 8, the output of central processing unit 4 is connected with audio frequency output module 7 by broadcast decoder module 6.As shown in Figure 2, the course of work of the present embodiment is as follows: 1) user selects mode of operation by human-computer interaction module 1, voice input module 2 gathers speech data and carries out preliminary treatment by dedicated voice data processing chip (language data process module 3), the mode of operation of selecting as user is redirect execution step 2 of voice control model), if the mode of operation that user selects is singing search pattern, redirect execution step 3); 2) the speech-recognition services end 10 that central processing unit 4 is uploaded to pretreated speech data and syntax rule on the Internet carries out speech recognition, and the recognition result returning according to speech-recognition services end 10 carries out Play Control or resource management to local multimedia resource; 3) central processing unit 4 is extracted pretreated speech data fundamental tone sequence and is converted to sequence of notes by dedicated voice data processing chip (language data process module 3), sequence of notes is sent to the match search service end 9 on the Internet, by match search service end 9, in note property data base, carry out match search and find the multimedia resource identity information matching with sequence of notes, and according to multimedia resource identity information from multimedia resource corresponding to the Internet download and store local media store module 5 into.
The control that human-computer interaction module 1 shows and operates for system interface, the human-computer interaction module 1 of the present embodiment adopts 4 wire resistive touchscreen to realize, the controller of 4 wire resistive touchscreen adopts the ADS7843 touch-screen decoding chip of TI company, have low in energy consumption, the feature that touch-sensitive degree is high.As shown in Figure 3, CS, the DCLK of ADS7843 touch-screen decoding chip, DIN, BUSY, PENRQ, IN3, IN4 pin be external central processing unit 4 respectively, X+, the Y+ of ADS7843 touch-screen decoding chip, X-, Y-pin connect respectively XM, XP, YM, the YP pin of 4 wire resistive touchscreen, then as the output of 4 wire resistive touchscreen.
Voice input module 2, broadcast decoder module 6 and audio frequency output module 7 adopt ALC5621 chip and auxiliary circuit to realize, ALC5621 chip and auxiliary circuit are built-in phonetic entry, broadcast decoder and audio output function, thereby can realize by chip piece and peripheral circuit thereof the function of voice input module 2, broadcast decoder module 6 and 7 three parts of audio frequency output module, realize simple and conveniently, and make circuit structure compacter.Four external pin connecting circuits of ALC5621 chip as shown in Figure 4, MIC_1N in the peripheral interface being connected with external central processing unit 4 of ALC5621 chip wherein, MIC_1P, MIC_2N, MIC_2P pin connects respectively two-way MIC_P, MIC_M, EAR1_ON in peripheral interface, EAR1_OP connects respectively the EAR_M of microphone (MICROPHONE) REC, EAR_P signal pins, MIC_BIAS pin in peripheral interface connects the MICBIA signal pins of microphone (MICROPHONE) MIC, LINE_ON in peripheral interface, LINE_OP pin connects the MIC_P that ALC5621 chip connects, MIC_M signal pins.
Central processing unit 4 is System Implementation core cells, it is the ARM microprocessor chip of SAMSUNG S5PV210 that the central processing unit 4 of the present embodiment adopts the concrete model of ARM Cortex-A8 framework, and in central processing unit 4, has transplanted Android2.0 embedded OS.
Media store module 5 specifically realizes based on chip K4T1G084QE-HCF7.
Network Interface Module 8 is for system and extraneous data communication, and optional mode has 3G interface module, GPRS interface module, WIFI interface module, and in the present embodiment, adopting chip signal is the WIFI interface module of REALTEK8188UM3WIFI chip.As shown in Figure 5, Network Interface Module 8 is realized based on REALTEK8188UM3WIFI chip, and the pins such as the SDCH_D1 of Network Interface Module 8, SDCH_D0, USBDN, USBDP are external central processing unit 4 respectively.
The server that match search service end 9 is accessing Internet, is built-in with note property data base.
In the present embodiment, language data process module 3 adopts the dsp chip that the model of TI company is F28335 to realize, and the present embodiment utilizes the strong advantage of dsp chip multimedia operational capability, can greatly promote the language data process speed of whole system.As shown in Figure 6, language data process module 3 comprises for the speech data gathering being carried out to pretreated pretreatment module 31 and for extracting fundamental tone sequence and being converted to the humming data processing module 32 of sequence of notes, pretreatment module 31 comprises:
Divide frame submodule 311, for the speech data that voice input module 2 is collected, divide frame.
Windowing submodule 312, processes for each frame voice signal being added to Hamming window.
Short-time energy calculating sub module 313, for the short-time energy of calculating each frame voice signal according to formula (1).
E n = Σ m = 0 255 x n 2 ( m ) - - - ( 1 )
In formula (1), E nbe the short-time energy of the voice signal of n frame, x n(m) be the voice signal of m sampled point in n frame.
Zero-crossing rate calculating sub module 314, for calculating the zero-crossing rate of each frame voice signal according to formula (2).
Z n = 1 2 Σ m = 0 255 | sgn [ x n ( m ) ] - sgn [ x n ( m - 1 ) ] | - - - ( 2 )
In formula (2), Z nbe the zero-crossing rate of the voice signal of n frame, sgn[] be sign function, sgn[] meet the relation shown in formula (3), x n(m) be the voice signal of m sampled point in n frame, x n(m-1) be the voice signal of m-1 sampled point in n frame, || be the operator that takes absolute value.
sgn [ x ] = 1 , ( x &GreaterEqual; 0 ) - 1 , ( x < 0 ) - - - ( 3 )
In formula (3), x is the size of speech sample point value.
End points judgement submodule 315, for judge present frame voice signal forward short-time energy and the zero-crossing rate of continuous three frame voice signals whether meet formula (4) and formula (5), if meet formula (4), judge that present frame voice signal is as start frame; If meet formula (5), judge that present frame voice signal is as end frame.
E n &GreaterEqual; &PartialD; E max And Z n &GreaterEqual; &PartialD; Z max (4)
E n &le; &PartialD; E min And Z n &le; &PartialD; Z min (5)
In formula (4),
Figure BDA00003519235500056
for default short-time energy decision threshold higher limit,
Figure BDA00003519235500057
for default short-time energy decision threshold lower limit; In formula (5),
Figure BDA00003519235500058
for default zero-crossing rate decision threshold higher limit,
Figure BDA00003519235500059
for default zero-crossing rate decision threshold lower limit; In the present embodiment, in end points judgement submodule 315, default zero-crossing rate decision threshold higher limit is the present frame voice signal short-time energy maximum of continuous three frame voice signals forward, default zero-crossing rate decision threshold lower limit is the present frame voice signal short-time energy mean value of continuous three frame voice signals forward, zero-crossing rate decision threshold higher limit value is 100, and zero-crossing rate decision threshold lower limit value is 70.
Voice enhancer module 316, for extracting efficient voice signal according to start frame and end frame, carries out voice enhancing to efficient voice signal.
Wherein, the output of voice input module 2 is connected with a minute frame submodule 311, divide frame submodule 311, windowing submodule 312 to be connected successively, the input of end points judgement submodule 315 is connected with windowing submodule 312 by short-time energy calculating sub module 313, zero-crossing rate calculating sub module 314 respectively, the output of end points judgement submodule 315 is connected with voice enhancer module 316, and the output of voice enhancer module 316 is connected with central processing unit 4 with humming data processing module 32 respectively.
As shown in Figure 6, in the present embodiment, humming data processing module 32 comprises:
Short-time average magnitade difference function calculating sub module 321, for calculating the short-time average magnitade difference function of each frame speech data according to formula (6).
D ( k ) = 1 2 * a + 1 &Sigma; j = 0 n - 1 | &Sigma; l = x 1 x 2 x ( l ) - &Sigma; l = j - a j + a x ( l ) | x 1 = mod ( j + k , n ) - a x 2 = mod ( j + k , n ) + a - - - ( 6 )
In formula (6), the short-time average magnitade difference function of D (k) for calculating; A represents the sample rate factor; J represents the subscript of counting out for calculating the speech sample used of D (k) function, and n represents the size of window; X1 represent computing voice signal sampling point range value and lower limit, x2 represent computing voice signal sampling point range value and the upper limit, x (l) represents a frame speech data, l represents the subscript of voice signal sampled point, k represents window bias internal amount, and k value size is between 0 to n.
Extracting fundamental tone sequence is a very important link in whole singing search system the inside, and direct relation follow-up note extraction and melody characteristics and extracted.Prior art adopts classical short-time average magnitade difference function function (AMDF) to complete for the pitch Detection part of extracting fundamental tone sequence is general, and its computational methods are suc as formula shown in (6-1).
D ( k ) = 1 n - k - 1 &Sigma; j = 0 n - k - 1 | x ( j + k ) - x ( j ) | - - - ( 6 - 1 )
In formula (6-1), what x (j) showed is a speech frame that length is n, and k representative is a side-play amount, and the size of value, between 0 to n, can be calculated the short-time average magnitade difference function D (k) in the situation of k side-play amount for different side-play amounts.The present embodiment is in order to improve accuracy and the robustness of pitch Detection link, the average magnitude difference function that the present embodiment is revised according to the improved pitch Detection function of formula (6) MAMDF() calculate the short-time average magnitade difference function of each frame speech data, and use step 3.1.2)~3.1.3) complete the extraction of fundamental tone sequence.For accuracy, with respect to AMDF, what the MAMDF of the present embodiment formula (6) adopted while calculating short-time average magnitade difference function is fixing length computation scope, has overcome peak amplitude difference terms and has reduced the phenomenon of failing to judge causing, and has promoted accuracy; For robustness, because the present embodiment through type (6) is averaging amplitude difference in mode in groups, two sampled points that only directly adopt of prior art calculate the periodic characteristic that amplitude difference can promote voiced sound signal relatively, owing to having strengthened the periodic characteristic of voiced sound signal in computational process, can obviously reduce the impact that voice signal big ups and downs bring pitch Detection like this, in low signal-to-noise ratio environment, also can greatly reduce the error that noise causes pitch Detection, play good testing result.
Table 1: the present embodiment (MAMDF) under different signal to noise ratio environment with the identification error rate comparative analysis table of prior art (AMDF), wherein voice packet is containing the noisy speech (GPE%) under clean speech and different signal to noise ratio environment.
Using method Clean speech 10dB 5dB 0dB -5dB -10dB
AMDF 7.07 10.97 15.02 22.87 35.61 52.4
MAMDF 5.58 7.62 9.53 13.14 20.88 34.47
Known referring to table 1, the relative prior art of identification error rate (AMDF) of the present embodiment has had obvious reduction.Partly there is the phenomenon that detects poor effect, has leak source in the pitch Detection for prior art, what when the MAMDF of the present embodiment formula (6) calculates short-time average magnitade difference function, formula adopted is fixing length computation scope, can overcome peak amplitude difference terms and reduce the phenomenon of failing to judge causing.
Fundamental tone sequential extraction procedures submodule 322, for calculate each frame speech data pitch period according to formula (7), and is converted to fundamental frequency by pitch period, thereby obtains the fundamental tone sequence of speech data.
TP = arg k MIN k = TP min TP max ( D ( k ) ) - - - ( 7 )
In formula (7), TP represents pitch period, TP minrepresent given lower limit, TP maxrepresent the given upper limit, the short-time average magnitade difference function of D (k) for calculating; K represents that series of computation short-time average magnitade difference function the inside meets and is positioned at given lower limit TP min, given upper limit TP maxbetween sampled point sequence in the minimum sampling point position of short-time average magnitade difference function D (k).
Sequence of notes conversion submodule 323, for being converted to sequence of notes by fundamental tone sequence according to formula (8).
p=69+12log 2(f/440) (8)
In formula (8), p represents the sequence of notes being converted to, and f represents the fundamental tone sequence of input.
Between short-time average magnitade difference function calculating sub module 321, fundamental tone sequential extraction procedures submodule 322, sequence of notes conversion submodule 323, be connected successively, the input of short-time average magnitade difference function calculating sub module 321 is connected with voice enhancer module 316, and the output of sequence of notes conversion submodule 323 is connected with central processing unit 4.
In the present embodiment, the server that match search service end 9 is accessing Internet, is built-in with note property data base.As shown in Figure 7, match search service end 9 comprises breadth first search's submodule 91 and meticulous matched sub-block 92.
Breadth first search's submodule 91 is for traveling through each characteristic sequence of note property data base, sequence of notes to be matched is mated with each note of current characteristic sequence, often obtain a match point, according to formula (9), calculate next match point until all notes coupling is complete, deletion punishment number of times in record matching process and increase punishment number of times, final all match point constitutive characteristic sequences match path, according to the coupling cost in formula (10) calculated characteristics sequences match path, according to coupling cost, sorted in characteristic sequence coupling path, and according to the characteristic sequence coupling path of sequencing selection specified quantity.
P next = arg min d ( X i + 1 , D j + 1 ) d ( X i , D j + 1 ) + &alpha; 1 d ( X i + 1 , D j ) + &alpha; 2 - - - ( 9 )
In formula (9), P nextbe the position of next match point in current characteristic sequence; d(X i+1, D j+1) expression X i+1and D j+1pitch distance between two notes; α 1and α 2be constant, α 1be illustrated in the deletion penalty factor in matching process, α 2be illustrated in the interpolation penalty factor in matching process; X irepresent the note that in sequence of notes, sequence number is i, sequence of notes to be matched is expressed as X 1x 2x 3x 4x n, D jthe note that in representation feature sequence, sequence number is j, current characteristic sequence is expressed as D 1d 2d 3d 4d m.
P=A 11+A 22 (10)
In formula (10), P is coupling cost, A 1be illustrated in the deletion punishment number of times in matching process, α 1be illustrated in the deletion penalty factor in matching process, A 2be illustrated in the increase punishment number of times in matching process, α 2be illustrated in the interpolation penalty factor in matching process.
Meticulous matched sub-block 92 is for the characteristic sequence coupling path for specified quantity, the distance that adopts DTW algorithm to calculate between each characteristic sequence coupling path and sequence of notes according to formula (11) obtains cost matrix, the characteristic sequence coupling path of chosen distance minimum from the characteristic sequence coupling path of specified quantity, and in note property data base, carry out match search and find corresponding multimedia resource identity information.
D i , j = min d ( X i , D j ) + D i - 1 , j - 1 d ( X i , D j ) + D i , j - 1 + &alpha; 1 d ( X i , D j ) + D i - 1 , j + &alpha; 2 - - - ( 11 )
In formula (11), d (X i, D j) expression X iand D jpitch distance between two notes, α 1and α 2be constant, α 1be illustrated in the deletion penalty factor in matching process, α 2be illustrated in the interpolation penalty factor in matching process; The output of breadth first search's submodule 91 is connected with meticulous matched sub-block 92.The complexity of traditional DTW algorithm is all O (N all the time 2), and in this enforcement by above-mentioned steps 3.2.1)~3.2.2) and breadth first search's process integrating step 3.2.3) meticulous matching process, by breadth first search, two search matching processs of meticulous coupling, only in essence, search the O (N that part adopts 2) full coupling, algorithm complex is reduced, and can greatly improve the efficiency of coupling, shortens the response time of coupling, can realize higher discrimination and computing cost less.
The speech recognition interface that the speech-recognition services end 10 of the present embodiment adopts University of Science and Technologys news to fly to provide, also can adopt speech recognition interface that Google provides or other speech recognition interfaces etc. in addition as required.
The present embodiment is when work, human-computer interaction module 1 can be realized user and carry out routine control by the mode of button above touch-screen UI, and the conventional touch screen in the present embodiment is controlled and included: speech play, speech pause, F.F. control, rewind down are controlled, next is controlled, upper one first control; Resource management function: add, delete, sort; Play mode selection function, comprising: shuffle, played in order, single circulation etc.: voice are controlled and opened selection function button, humming function of search is opened and selected button; In addition, human-computer interaction module 1 can also be for selecting voice control model and singing search pattern.
As Fig. 1, shown in Fig. 2 and Fig. 8, under voice control model, user sets syntax rule and monitoring service thread in advance in central processing unit 4, if user by voice input module 2 recorded speech, creates voice and controls request, after voice input module 2 sampled speech data by bus by voice data transmission to language data process module 3, in language data process module 3, complete the preliminary treatment of voice signal, then the mode of operation instruction that language data process module 3 sends according to central processing unit 4 judges whether to carry out singing search feature extraction computing, under voice control model, language data process module 3 is not carried out singing search feature extraction computing and directly will be through pretreated audio data transmission to central processing unit 4, central processing unit 4 is uploaded to speech-recognition services end 10 by network by pretreated speech data together with syntax rule, request monitoring resource return messages controlled in voice simultaneously, the recognition result returning according to speech-recognition services end 10 is carried out corresponding instruction, instruction comprises broadcasting, suspend, F.F. is controlled, rewind down is controlled, next control, upper one first control, file adds, file is deleted, file ordering, shuffle, played in order, single circulation etc., thereby the multimedia file to this locality storage carries out Play Control or resource management, can be applicable to user's inconvenience and by manual mode, control the occasion of player, such as vehicle-mounted etc.
As Fig. 1, shown in Fig. 2 and Fig. 9, under singing search pattern, if user is by voice input module 2 recorded speech, after voice input module 2 sampled speech data by bus by voice data transmission to language data process module 3, in language data process module 3, complete the preliminary treatment of voice signal, then the mode of operation instruction that language data process module 3 sends according to central processing unit 4 judges whether to carry out singing search feature extraction computing, under singing search pattern, language data process module 3 is carried out singing search feature extraction computing and from pretreated speech data, is extracted fundamental tone sequence, then fundamental tone sequence is converted to sequence of notes and sends to central processing unit 4, 4 of central processing units send to match search service end 9 the monitor resource return messages on the Internet by sequence of notes, match search service end 9 is used default matching algorithm to carry out match search and is found the multimedia resource identity information of coupling and return to central processing unit 4 in note property data base, central processing unit 4 according to multimedia resource identity information from multimedia resource corresponding to the Internet download and store into local media store module 5, thereby can meet user when not knowing song title, chanteur etc. information, only by the melody in one section of song of humming, realize the Search and acquirement of resource, raising user's that can be larger Experience Degree.
The above is only preferred implementation of the present utility model, and protection range of the present utility model is also not only confined to above-described embodiment, and all technical schemes belonging under the utility model thinking all belong to protection range of the present utility model.It should be pointed out that for those skilled in the art, in the some improvements and modifications that do not depart under the utility model principle prerequisite, these improvements and modifications also should be considered as protection range of the present utility model.

Claims (4)

1. one kind has that voice are controlled and the multimedia playing apparatus of singing search function, it is characterized in that: comprise human-computer interaction module (1), voice input module (2), language data process module (3), central processing unit (4), media store module (5), broadcast decoder module (6), audio frequency output module (7) and Network Interface Module (8), described voice input module (2) is connected with the input of language data process module (3), described language data process module (3) is connected with central processing unit (4), described central processing unit (4) also respectively with human-computer interaction module (1), media store module (5), Network Interface Module (8) is connected, and the output of described central processing unit (4) is connected with audio frequency output module (7) by broadcast decoder module (6).
2. the multimedia playing apparatus with voice control and singing search function according to claim 1, is characterized in that: described human-computer interaction module (1) is touch display screen module.
3. the multimedia playing apparatus with voice control and singing search function according to claim 2, is characterized in that: described language data process module (3) is dsp processor.
4. according to claim 3 have that voice are controlled and the multimedia playing apparatus of singing search function, it is characterized in that: described Network Interface Module (8) is a kind of in 3G interface module, GPRS interface module, WIFI interface module.
CN201320422658.2U 2013-07-16 2013-07-16 Multimedia playing device with functions of voice controlling and humming searching Expired - Fee Related CN203522960U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201320422658.2U CN203522960U (en) 2013-07-16 2013-07-16 Multimedia playing device with functions of voice controlling and humming searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201320422658.2U CN203522960U (en) 2013-07-16 2013-07-16 Multimedia playing device with functions of voice controlling and humming searching

Publications (1)

Publication Number Publication Date
CN203522960U true CN203522960U (en) 2014-04-02

Family

ID=50381826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201320422658.2U Expired - Fee Related CN203522960U (en) 2013-07-16 2013-07-16 Multimedia playing device with functions of voice controlling and humming searching

Country Status (1)

Country Link
CN (1) CN203522960U (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598541A (en) * 2014-12-29 2015-05-06 乐视网信息技术(北京)股份有限公司 Identification method and device for multimedia file
CN105244021A (en) * 2015-11-04 2016-01-13 厦门大学 Method for converting singing melody to MIDI (Musical Instrument Digital Interface) melody
CN105565247A (en) * 2015-02-15 2016-05-11 屠尉 Auxiliary voice system of oiling machine
CN106775570A (en) * 2017-02-21 2017-05-31 联想(北京)有限公司 Audio frequency apparatus, audio collection Play System and method including the audio frequency apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598541A (en) * 2014-12-29 2015-05-06 乐视网信息技术(北京)股份有限公司 Identification method and device for multimedia file
CN105565247A (en) * 2015-02-15 2016-05-11 屠尉 Auxiliary voice system of oiling machine
CN105244021A (en) * 2015-11-04 2016-01-13 厦门大学 Method for converting singing melody to MIDI (Musical Instrument Digital Interface) melody
CN105244021B (en) * 2015-11-04 2019-02-12 厦门大学 Conversion method of the humming melody to MIDI melody
CN106775570A (en) * 2017-02-21 2017-05-31 联想(北京)有限公司 Audio frequency apparatus, audio collection Play System and method including the audio frequency apparatus

Similar Documents

Publication Publication Date Title
CN103366784B (en) There is multi-medium play method and the device of Voice command and singing search function
CN102999161B (en) A kind of implementation method of voice wake-up module and application
CN103021409B (en) A kind of vice activation camera system
CN103440862B (en) A kind of method of voice and music synthesis, device and equipment
CN102111314B (en) Smart home voice control system and method based on Bluetooth transmission
CN203522960U (en) Multimedia playing device with functions of voice controlling and humming searching
CN101504834B (en) Humming type rhythm identification method based on hidden Markov model
CN103823867A (en) Humming type music retrieval method and system based on note modeling
CN104575504A (en) Method for personalized television voice wake-up by voiceprint and voice identification
CN108551686A (en) The extraction and analysis of audio characteristic data
US11295761B2 (en) Method for constructing voice detection model and voice endpoint detection system
CN101286317B (en) Speech recognition device, model training method and traffic information service platform
CN102847325B (en) Toy control method and system based on voice interaction of mobile communication terminal
CN101424924A (en) Sound control intelligent household control system
CN106503184A (en) Determine the method and device of the affiliated class of service of target text
CN103093316A (en) Method and device of bill generation
CN201181413Y (en) Acoustic control intelligent household control device
CN104123930A (en) Guttural identification method and device
CN107274892A (en) Method for distinguishing speek person and device
CN104091601A (en) Method and device for detecting music quality
CN1645363A (en) Portable realtime dialect inter-translationing device and method thereof
CN110889008B (en) Music recommendation method and device, computing device and storage medium
CN105608114B (en) A kind of music retrieval method and device
CN201600925U (en) Player with language search function
WO2023169258A1 (en) Audio detection method and apparatus, storage medium and electronic device

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140402

Termination date: 20160716

CF01 Termination of patent right due to non-payment of annual fee