CN103347070B - Push method, terminal, server and the system of speech data - Google Patents

Push method, terminal, server and the system of speech data Download PDF

Info

Publication number
CN103347070B
CN103347070B CN201310268905.2A CN201310268905A CN103347070B CN 103347070 B CN103347070 B CN 103347070B CN 201310268905 A CN201310268905 A CN 201310268905A CN 103347070 B CN103347070 B CN 103347070B
Authority
CN
China
Prior art keywords
background noise
characteristic vector
noise data
environmental form
speech data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310268905.2A
Other languages
Chinese (zh)
Other versions
CN103347070A (en
Inventor
郭涛
蔡经伟
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaomi Inc
Original Assignee
Xiaomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Inc filed Critical Xiaomi Inc
Priority to CN201310268905.2A priority Critical patent/CN103347070B/en
Publication of CN103347070A publication Critical patent/CN103347070A/en
Application granted granted Critical
Publication of CN103347070B publication Critical patent/CN103347070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of method, terminal, server and system for pushing speech data, belong to multimedia technology field.Method includes:A background noise data are obtained every preset time, and extract the characteristic vector of background noise data;The characteristic vector of background noise data is uploaded onto the server, the corresponding environmental form of characteristic vector is determined according to the characteristic vector prestored and the corresponding relation of environmental form by server, and pushes to terminal the speech data corresponding with environmental form;The speech data that the reception server is pushed.The present invention is by obtaining background noise data, and extract the characteristic vector of background noise data, and then the characteristic vector of background noise data uploads onto the server, the corresponding environmental form of characteristic vector is determined by server, and the speech data to terminal push corresponding thereto, therefore, can be achieved according to external environment condition is that user pushes speech data, so as to meet the listening demand in user's different time and place, Consumer's Experience is improved.

Description

Push method, terminal, server and the system of speech data
Technical field
The present invention relates to multimedia technology field, more particularly to a kind of method for pushing speech data, terminal, server and System.
Background technology
With developing rapidly for science and technology, more and more terminals with speech data playing function enter popular regard It is wild.For example, MP3(Moving Picture Experts Group Audio Layer III, Motion Picture Experts Group's audio layer 3)Player, mobile phone and tablet personal computer etc..User is by clicking on the void of the physical button or display of terminal on a terminal screen Manually selecting for speech data can freely be carried out by intending button.However, when user is in relatively crowded environment or in face of without screen The terminal of display, it is reluctant or is difficult to when manually selecting of speech data, in order to lift Consumer's Experience and meet user's How listening demand, carry out speech data push automatically, becomes the problem that those skilled in the art more pay close attention to.
The content of the invention
The embodiments of the invention provide a kind of method, terminal, server and system for pushing speech data.The technical side Case is as follows:
First aspect includes there is provided a kind of method for pushing speech data, methods described:
A background noise data are obtained every preset time, and extract the characteristic vector of the background noise data;
The characteristic vector of the background noise data is uploaded onto the server, by the server according to the spy prestored Levy vector and determine the corresponding environmental form of the characteristic vector with the corresponding relation of environmental form, and pushed and the ring to terminal The corresponding speech data of border type;
Receive the speech data of the server push.
It is preferred that described obtain a background noise data every preset time, including:
When obtaining background noise data first, the background noise data of a length of first predetermined time period when obtaining one section;
During the non-background noise data of acquisition first, a length of second predetermined time period when preset time obtains one section Background noise data;
Wherein, first predetermined time period is less than second predetermined time period.
It is preferred that the characteristic vector for extracting the background noise data, including:
The background noise data are decoded, the voice signal of the background noise data is obtained;
The spectrum signature of the voice signal is extracted, the characteristic vector of the voice signal is obtained.
It is preferred that after the voice signal for obtaining the background noise data, methods described also includes:
The voice signal of the background noise data to obtaining carries out frequency-domain transform;
The spectrum signature for extracting the voice signal, including:
Extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
Second aspect includes there is provided a kind of terminal, the terminal:
Acquisition module, for obtaining a background noise data every preset time;
Extraction module, the characteristic vector for extracting the background noise data that the acquisition module is got;
Uploading module, the characteristic vector of the background noise data for the extraction module to be extracted is uploaded to service Device, determines that the characteristic vector is corresponding by the server according to the characteristic vector prestored and the corresponding relation of environmental form Environmental form, and push the speech data corresponding with the environmental form to terminal;
Receiving module, the speech data for receiving the server push.
It is preferred that the acquisition module, including:
First acquisition unit, for first obtain background noise data when, obtain one section when a length of first preset time The background noise data of length;
Second acquisition unit, in the non-background noise data of acquisition first, one section of duration to be obtained every preset time For the background noise data of the second predetermined time period;
Wherein, first predetermined time period is less than second predetermined time period.
It is preferred that the extraction module is used to decode the background noise data, the background noise number is obtained According to voice signal;The spectrum signature of the voice signal is extracted, the characteristic vector of the voice signal is obtained.
It is preferred that the terminal also includes:
Conversion module, the voice signal for the background noise data to obtaining carries out frequency-domain transform;
The extraction module is used to extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
The third aspect, additionally provides a kind of method for pushing speech data, and methods described includes:
The characteristic vector for the background noise data that receiving terminal is uploaded;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of the background noise data The corresponding environmental form of vector;
The speech data corresponding with the environmental form is pushed to the terminal.
It is preferred that characteristic vector and the corresponding relation of environmental form that the basis is prestored determine the background noise Before the corresponding environmental form of characteristic vector of data, methods described also includes:
The mapping table of characteristic vector and environmental form is set, and by characteristic vector pass corresponding with environmental form It is that table is stored;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of the background noise data The corresponding environmental form of vector, including:
The mapping table of the characteristic vector and environmental form is searched according to the characteristic vector of the background noise data, Obtain the corresponding environmental form of characteristic vector of the background noise data.
It is preferred that before the speech data corresponding with the environmental form to terminal push, methods described Also include:
The corresponding relation of environmental form and speech data type is set;
The corresponding voice number of the environmental form is determined according to the environmental form and the corresponding relation of speech data type According to type;
The speech data corresponding with the environmental form to terminal push, including:
Speech data corresponding with the speech data type is pushed to the terminal.
Fourth aspect includes there is provided a kind of server, the server:
Receiving module, the characteristic vector of the background noise data uploaded for receiving terminal;
First determining module, for determining the back of the body according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector of scape noise data;
Pushing module, for pushing the speech data corresponding with the environmental form to the terminal.
It is preferred that the server also includes:
First setup module, the mapping table for setting characteristic vector and environmental form;
Memory module, characteristic vector and the mapping table of environmental form for first setup module to be set are entered Row storage;
First determining module, including:
Searching unit, for searching the characteristic vector and environmental form according to the characteristic vector of the background noise data Mapping table;
Acquiring unit, the corresponding environmental form of characteristic vector for obtaining the background noise data.
It is preferred that the server also includes:
Second setup module, the corresponding relation for setting environmental form and speech data type;
Second determining module, for pair of the environmental form and speech data type that are set according to second setup module The corresponding speech data type of the determination environmental form should be related to;
The pushing module, for pushing speech data corresponding with the speech data type to the terminal.
5th aspect includes there is provided a kind of system for pushing speech data, the system:Terminal and server;
Wherein, terminal terminal as described above;
The server server as described above.
The beneficial effect that technical solution of the present invention is brought is:
Terminal by obtaining a background noise data every preset time, and extract the features of background noise data to Amount, and then the characteristic vector of background noise data is uploaded onto the server, by server according to the characteristic vector prestored and The corresponding relation of environmental form determines the corresponding environmental form of characteristic vector, and pushes to terminal the language corresponding with environmental form Sound data.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so as to meet user's different time and ground The listening demand of point, improves Consumer's Experience.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.
Fig. 1 is a kind of method flow diagram for push speech data that the embodiment of the present invention one is provided;
Fig. 2 is the method flow diagram for another push speech data that the embodiment of the present invention one is provided;
Fig. 3 is a kind of method flow diagram for push speech data that the embodiment of the present invention two is provided;
Fig. 4 is a kind of structural representation for terminal that the embodiment of the present invention three is provided;
Fig. 5 is a kind of internal structure schematic diagram for acquisition module that the embodiment of the present invention three is provided;
Fig. 6 is the structural representation for another terminal that the embodiment of the present invention three is provided;
Fig. 7 is a kind of structural representation for server that the embodiment of the present invention four is provided;
Fig. 8 is the structural representation for another server that the embodiment of the present invention four is provided;
Fig. 9 is a kind of internal structure schematic diagram for first determining module that the embodiment of the present invention four is provided;
Figure 10 is the structural representation for another server that the embodiment of the present invention four is provided;
Figure 11 is a kind of system structure diagram for push speech data that the embodiment of the present invention five is provided.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in detail.
Embodiment one
The embodiments of the invention provide a kind of method for pushing speech data, so that terminal performs the angle of this method as an example, Referring to Fig. 1, method flow provided in an embodiment of the present invention is included:
101:A background noise data are obtained every preset time, and extract the characteristic vector of background noise data;
It is preferred that obtaining a background noise data every preset time, include but is not limited to:
When obtaining background noise data first, the background noise data of a length of first predetermined time period when obtaining one section;
During the non-background noise data of acquisition first, a length of second predetermined time period when preset time obtains one section Background noise data;
Wherein, the first predetermined time period is less than the second predetermined time period.
It is preferred that extracting the characteristic vector of background noise data, include but is not limited to:
Background noise data are decoded, the voice signal of background noise data is obtained;
The spectrum signature of voice signal is extracted, the characteristic vector of voice signal is obtained.
It is preferred that after obtaining the voice signals of background noise data, this method also includes:
The voice signal of background noise data to obtaining carries out frequency-domain transform;
The spectrum signature of voice signal is extracted, is included but is not limited to:
Extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
102:The characteristic vector of background noise data is uploaded onto the server, from server according to the feature prestored to Amount determines the corresponding environmental form of characteristic vector with the corresponding relation of environmental form, and corresponding with environmental form to terminal push Speech data;
103:The speech data that the reception server is pushed.
So that server performs the angle of this method as an example, referring to Fig. 2, the method flow that the present embodiment is provided includes:
201:The characteristic vector for the background noise data that receiving terminal is uploaded;
202:Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of background noise data The corresponding environmental form of vector;
It is preferred that determining the spy of background noise data according to the characteristic vector and the corresponding relation of environmental form that prestore Levy before vectorial corresponding environmental form, this method also includes:
The mapping table of characteristic vector and environmental form is set, and by the mapping table of characteristic vector and environmental form Stored;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of background noise data Corresponding environmental form, includes but is not limited to:
The mapping table of characteristic vector and environmental form is searched according to the characteristic vector of background noise data, background is obtained The corresponding environmental form of characteristic vector of noise data.
203:The speech data corresponding with environmental form is pushed to terminal.
It is preferred that before the speech data corresponding with environmental form is pushed to terminal, this method also includes:
The corresponding relation of environmental form and speech data type is set;
The corresponding speech data type of environmental form is determined according to environmental form and the corresponding relation of speech data type;
The speech data corresponding with environmental form is pushed to terminal, is included but is not limited to:
Speech data corresponding with speech data type is pushed to terminal.
The method that the present embodiment is provided, terminal extracts the back of the body by obtaining a background noise data every preset time The characteristic vector of scape noise data, and then the characteristic vector of background noise data is uploaded onto the server, by server according to pre- The characteristic vector first stored and the corresponding relation of environmental form determine the corresponding environmental form of characteristic vector, and to terminal push with The corresponding speech data of environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so that full Sufficient user's different time and the listening demand in place, improve Consumer's Experience.
Embodiment two
It is right in conjunction with the content of above-described embodiment one the embodiments of the invention provide a kind of method for pushing speech data The mode provided in an embodiment of the present invention for pushing speech data carries out that explanation is explained in detail.Referring to Fig. 3, what the present embodiment was provided Method flow includes:
301:Terminal obtains a background noise data every preset time, and extract the features of background noise data to Amount;
Wherein, the size of preset time concretely 10 minutes, certainly, the size of preset time is in addition to above-mentioned numerical value, also It can be 5 minutes or 6 minutes etc., the present embodiment is to the size of preset time without specific restriction.
In addition, obtaining a background noise data every preset time, include but is not limited to:
When obtaining background noise data first, the background noise data of a length of first predetermined time period when obtaining one section;
During the non-background noise data of acquisition first, a length of second predetermined time period when preset time obtains one section Background noise data;
Wherein, the first predetermined time period is less than the second predetermined time period.
Said process is described in detail with a specific example below.
To obtain the time of background noise data first as 10:00:00, preset time is 10 minutes, the first preset time Length is 8 seconds, and the second predetermined time period is that exemplified by 10 seconds, then terminal is from 10:00:00 starts to record background noise data, when Time showing is 10:00:When 08, stop the recording of background noise data;And when the time is shown as 10:10:When 00, start second The recording of secondary background noise data, this recording length is 10 seconds, i.e., 10:10:Stop secondary recording when 10;Rear During continuous recording, handling process is consistent with above-mentioned processing mode, that is, a length of 10 seconds when recording one time within 10 minutes Background noise data, and after the background noise data recorded, in the storage medium that terminal should be stored it in, for example, , also can be by background noise data storage in other kinds of storage medium in internal memory or RAM card, the present embodiment is situated between to storage The type of matter is without specific restriction.
In addition, extracting the characteristic vector of background noise data, include but is not limited to:
Background noise data are decoded, the voice signal of background noise data is obtained;
The spectrum signature of voice signal is extracted, the characteristic vector of voice signal is obtained.
It is preferred that after obtaining the voice signals of background noise data, this method also includes:
The voice signal of background noise data to obtaining carries out frequency-domain transform;
The spectrum signature of voice signal is extracted, is included but is not limited to:
Extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
Wherein, when extracting the spectrum signature of voice signal, MFCC can specifically be used(Mel Frequency Cepstrum Coefficien, Mel frequency scramble coefficient)、CWT(Continuous Wavelet Transform, continuous wavelet transform)、 STHT(Short Time Fourier Transform, Short Time Fourier Transform)Which kind of etc. technology, specifically extracted using technology The spectrum signature of voice signal, can depend on the circumstances, the present embodiment is not especially limited to this.
302:Terminal uploads onto the server the characteristic vector of background noise data;
For the step, terminal, can be directly by background noise data after the characteristic vector of background noise data is obtained Characteristic vector uploads onto the server, also can be for the purpose for reducing network transmission burden, by the characteristic vector of background noise data Uploaded onto the server again after compression packing, specifically using which kind of upload mode, the present embodiment is not especially limited to this.
303:The characteristic vector for the background noise data that server receiving terminal is uploaded;
Wherein, if server receives the characteristic vector of background noise data, can directly it be cached;If service Device receives the characteristic vector of the background noise data of packing compression, then is being decompressed the feature that obtains background noise data After vector, cached into storage medium.And storage medium concretely internal memory or hard disk, certainly, the type of storage medium Can be also other kinds of storage medium, for example, flash memory or CD etc., the present embodiment is to storage medium in addition to the above-mentioned type Type without specific restriction.
304:Server determines background noise data according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector;
For the step, background noise data are determined according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector before, this method also includes:
The mapping table of characteristic vector and environmental form is set, and by the mapping table of characteristic vector and environmental form Stored;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of background noise data Corresponding environmental form, includes but is not limited to:
The mapping table of characteristic vector and environmental form is searched according to the characteristic vector of background noise data, background is obtained The corresponding environmental form of characteristic vector of noise data.
Wherein, set the specific implementation of the corresponding relation of characteristic vector and environmental form that the following two kinds side is usually taken Formula:
First way, the corresponding relation of artificial setting characteristic vector and environmental form;
For example, when characteristic vector is less than predetermined threshold value, environmental form is set into quiet environment;When characteristic vector back and forth During randomized jitter, environmental form is set to outdoor irregular noisy environment;When the regular bounce of characteristic vector, by environment Type is set to outdoor regular noisy environment;Wherein, the big I of predetermined threshold value is configured on demand, and the present embodiment is to default The size of threshold value is without specific restriction.
The second way, sets characteristic vector corresponding with environmental form using the method for the machine learning such as SVMs Relation;
Server collects a sample set, i.e., the background noise data of small-scale quantity, then to this sample set in advance In each sample classified, it is determined that wherein which background noise data correspond to quiet environment, which background noise data Corresponding to outdoor irregular noisy environment, which background noise data corresponds to outdoor regular noisy environment;Afterwards, server The characteristic vector of each sample in sample set is calculated using disaggregated model algorithm, and is calculated according to the characteristic vector of each sample To a disaggregated model;When it is follow-up there are new samples when, as long as calculating the characteristic vector of the new samples and then being nested into point In class model, it is possible to automatically derive its corresponding environmental form.
It should be noted that first way is due to being the corresponding relation for artificially setting characteristic vector and environmental form, because This, sets precision poor compared to the second way, but this set method is more succinct;And the second way is due to using The mode of Machine self-learning, therefore adaptability is stronger, and with the gradually expansion of sample set, set precision also can be increasingly It is high.And which kind of set-up mode is specifically used when performing the method that the present embodiment is provided, it can depend on the circumstances, the present embodiment is to this It is not especially limited.
In addition, when the corresponding relation for pre-setting characteristic vector and environmental form, and by corresponding relation record in correspondence pass Be in table after, the mapping table can be used directly when subsequently performing this method again, i.e., provides performing the present embodiment every time Method when without being performed both by the step every time, only when corresponding relation has renewal, mapping table is updated.
305:Server pushes the speech data corresponding with environmental form to terminal;
For the step, server is pushed to terminal before the speech data corresponding with environmental form, and this method is also wrapped Include:
The corresponding relation of environmental form and speech data type is set;
The corresponding speech data type of environmental form is determined according to environmental form and the corresponding relation of speech data type;
The speech data corresponding with environmental form is pushed to terminal, is included but is not limited to:
Speech data corresponding with speech data type is pushed to terminal.
Wherein, when setting the corresponding relation of environmental form and sound-type, corresponding table as shown in table 1 below can be set:
Table 1
Environmental form Speech data type
Quiet environment Light music
Outdoor irregular noisy environment Rock and roll, pop music
Outdoor regular noisy environment Rural area, national music
Determine that the corresponding environmental form of a certain characteristic vector is irregular noisy for outdoor according to above-mentioned steps 304 for example, working as Environment, then the type that the speech data pushed can be determined preferably according to table 1 is rock music or pop music.
It is preferred that when server is to terminal push speech data, while being pushed according to exterior environmental conditions, this The step of method that embodiment is provided also is included according to the hobby of user further to push associated speech data.For example, in peace Under quiet environment, light music that server is pushed is pushed according to the hobby of user nor randomly select, and it is The light music that user may like, rather than careless one first light music.The method that the present embodiment is provided is supported to listen to user History speech data counted, so as to analyze the type for the light music that user likes.Specific implementation can be according to existing Some analytic statistics modes realize that here is omitted.
306:The speech data that terminal the reception server is pushed.
Wherein, after terminal receives the speech data that server is pushed, the speech data is stored in the storage of itself In medium, to treat subsequently to play out.And storage medium concretely internal memory or RAM card, certainly, the type of storage medium is removed Can be also other kinds of storage medium, for example, flash memory or CD etc., the present embodiment is to storage medium outside the above-mentioned type Type is without specific restriction.
Method provided in an embodiment of the present invention, by obtaining a background noise data every preset time, and extracts the back of the body The characteristic vector of scape noise data, and then the characteristic vector of background noise data is uploaded onto the server, by server according to pre- The characteristic vector first stored and the corresponding relation of environmental form determine the corresponding environmental form of characteristic vector, and to terminal push with The corresponding speech data of environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so that full Sufficient user's different time and the listening demand in place, improve Consumer's Experience.
Embodiment three
The embodiments of the invention provide a kind of terminal, referring to Fig. 4, the terminal includes:
Acquisition module 41, for obtaining a background noise data every preset time;
Extraction module 42, the characteristic vector for extracting the background noise data that acquisition module 41 is got;
Uploading module 43, the characteristic vector of the background noise data for extraction module 42 to be extracted is uploaded to service Device, the corresponding environmental classes of characteristic vector are determined by server according to the characteristic vector prestored and the corresponding relation of environmental form Type, and push the speech data corresponding with environmental form to terminal;
Receiving module 44, the speech data pushed for the reception server.
It is preferred that referring to Fig. 5, acquisition module 41, including:
First acquisition unit 411, for when obtaining background noise data first, when obtaining one section a length of first it is default when Between length background noise data;
Second acquisition unit 412, in the non-background noise data of acquisition first, when preset time obtains one section The background noise data of a length of second predetermined time period;
Wherein, the first predetermined time period is less than the second predetermined time period.
It is preferred that extraction module is used to decode background noise data, the voice signal of background noise data is obtained; The spectrum signature of voice signal is extracted, the characteristic vector of voice signal is obtained.
It is preferred that referring to Fig. 6, terminal also includes:
Conversion module 45, the voice signal for the background noise data to obtaining carries out frequency-domain transform;
Extraction module 42 is used to extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
Terminal provided in an embodiment of the present invention, by obtaining a background noise data every preset time, and extracts the back of the body The characteristic vector of scape noise data, and then the characteristic vector of background noise data is uploaded onto the server, by server according to pre- The characteristic vector first stored and the corresponding relation of environmental form determine the corresponding environmental form of characteristic vector, and to terminal push with The corresponding speech data of environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so that full Sufficient user's different time and the listening demand in place, improve Consumer's Experience.
Example IV
The embodiments of the invention provide a kind of server, referring to Fig. 7, the server includes:
Receiving module 71, the characteristic vector of the background noise data uploaded for receiving terminal;
First determining module 72, for determining background according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector of noise data;
Pushing module 73, for pushing the speech data corresponding with environmental form to terminal.
It is preferred that referring to Fig. 8, the server also includes:
First setup module 74, the mapping table for setting characteristic vector and environmental form;
Memory module 75, characteristic vector and the mapping table of environmental form for the first setup module 74 to be set are entered Row storage;
Referring to Fig. 9, the first determining module 72, including:
Searching unit 721, pair for searching characteristic vector and environmental form according to the characteristic vector of background noise data Answer relation table;
Acquiring unit 722, the corresponding environmental form of characteristic vector for obtaining background noise data.
It is preferred that referring to Figure 10, the server also includes:
Second setup module 76, the corresponding relation for setting environmental form and speech data type;
Second determining module 77, for pair of the environmental form and speech data type that are set according to the second setup module 76 The corresponding speech data type of determination environmental form should be related to;
Pushing module 73, for pushing speech data corresponding with speech data type to terminal.
Server provided in an embodiment of the present invention, by being closed according to the characteristic vector prestored is corresponding with environmental form System determines the corresponding environmental form of characteristic vector, and pushes to terminal the speech data corresponding with environmental form.Therefore, can be real It is now that user pushes speech data automatically according to external environment condition, so as to meet the listening demand in user's different time and place, carries High Consumer's Experience.
Embodiment five
The embodiments of the invention provide a kind of system for pushing speech data, referring to Figure 11, the system includes:Terminal 1101 And server 1102;
Wherein, the terminal of such as embodiment three of terminal 1101;
The server of such as example IV of server 1102.
System provided in an embodiment of the present invention, by according to the characteristic vector and the corresponding relation of environmental form prestored Determine the corresponding environmental form of characteristic vector, and push to terminal the speech data corresponding with environmental form.Therefore, it can be achieved It is that user pushes speech data automatically according to external environment condition, so as to meet the listening demand in user's different time and place, improves Consumer's Experience.
Embodiment six
The embodiments of the invention provide a kind of equipment for pushing speech data, and the push voice number in the embodiment of the present invention According to equipment can include one or more following parts:For performing computer program instructions with complete various flows and The processor of method, for information and storage program instruction random access memory(RAM)And read-only storage(ROM), it is used for The memory of data storage and information, for storing form, catalogue or the database of other data structures, I/O equipment, interface, Antenna etc..
In the embodiment of the present invention, computer program instructions are stored in memory, simultaneously in the form of one or more modules It is configured to by computing device, said one or multiple modules have following function:
A background noise data are obtained every preset time, and extract the characteristic vector of the background noise data;
The characteristic vector of the background noise data is uploaded onto the server, by the server according to the spy prestored Levy vector and determine the corresponding environmental form of the characteristic vector with the corresponding relation of environmental form, and pushed and the ring to terminal The corresponding speech data of border type;
Receive the speech data of the server push.
The embodiment citing of above-mentioned functions has been described in detail in embodiment of the method, repeats no more here.
In summary, equipment provided in an embodiment of the present invention, by obtaining a background noise data every preset time, And the characteristic vector of background noise data is extracted, and then the characteristic vector of background noise data is uploaded onto the server, by servicing Device determines the corresponding environmental form of characteristic vector according to the characteristic vector prestored and the corresponding relation of environmental form, and to end End pushes the speech data corresponding with environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes voice number According to so as to meet the listening demand in user's different time and place, improving Consumer's Experience.
Embodiment seven
The embodiment of the present invention additionally provides a kind of equipment for pushing speech data, and the push voice in the embodiment of the present invention The equipment of data can include one or more following parts:For performing computer program instructions to complete various flows With the processor of method, random access memory is instructed for information and storage program(RAM)And read-only storage(ROM), use In the memory of data storage and information, for storing form, catalogue or the database of other data structures, I/O equipment, boundary Face, antenna etc..
In the embodiment of the present invention, computer program instructions are stored in memory, simultaneously in the form of one or more modules It is configured to by computing device, said one or multiple modules have following function:
The characteristic vector for the background noise data that receiving terminal is uploaded;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of the background noise data The corresponding environmental form of vector;
The speech data corresponding with the environmental form is pushed to the terminal.
The embodiment citing of above-mentioned functions has been described in detail in embodiment of the method, repeats no more here.
In summary, equipment provided in an embodiment of the present invention, by according to the characteristic vector and environmental form prestored Corresponding relation determine the corresponding environmental form of characteristic vector, and push to terminal the speech data corresponding with environmental form. Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so as to meet listening for user's different time and place Demand is listened, Consumer's Experience is improved.
It should be noted that:The system of terminal, server and push speech data that above-described embodiment is provided is recommending language , can as needed will be above-mentioned only with the division progress of above-mentioned each functional module for example, in practical application during sound data Function distribution is completed by different functional modules, i.e., terminal, the internal structure of server are divided into different functional modules, with Complete all or part of function described above.In addition, terminal, server and push speech data that above-described embodiment is provided System with push speech data embodiment of the method belong to same design, it implements process and refers to embodiment of the method, this In repeat no more.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can be by hardware To complete, the hardware of correlation can also be instructed to complete by program, program can be stored in a kind of computer-readable storage In medium, storage medium mentioned above can be read-only storage, disk or CD etc..
Presently preferred embodiments of the present invention is these are only, is not intended to limit the invention, it is all in the spirit and principles in the present invention Within, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (11)

1. a kind of method for pushing speech data, it is characterised in that methods described includes:
A background noise data are obtained every preset time, and the background noise data are decoded, the back of the body is obtained The voice signal of scape noise data, extracts the spectrum signature of the voice signal, obtains the characteristic vector of the voice signal;
The characteristic vector of the background noise data is uploaded onto the server, from the server according to the feature prestored to Amount determines the corresponding environmental form of the characteristic vector with the corresponding relation of environmental form, and is pushed and the environmental classes to terminal The type speech data corresponding with the hobby speech data type of user, the hobby speech data type is by the server root The history speech data statistics listened to according to the user is obtained, and the corresponding relation is to utilize SVMs machine learning side Method carries out what is obtained after machine learning;
Receive the speech data of the server push;
Wherein, it is described to obtain a background noise data every preset time, including:
When obtaining background noise data first, the background noise data of a length of first predetermined time period when obtaining one section;
During the non-background noise data of acquisition first, the background of a length of second predetermined time period when preset time obtains one section Noise data;
Wherein, first predetermined time period is less than second predetermined time period.
2. according to the method described in claim 1, it is characterised in that the voice signal for obtaining the background noise data it Afterwards, methods described also includes:
The voice signal of the background noise data to obtaining carries out frequency-domain transform;
The spectrum signature for extracting the voice signal, including:
Extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
3. a kind of terminal for pushing speech data, it is characterised in that the terminal includes:
Acquisition module, for obtaining a background noise data every preset time;
Extraction module, the characteristic vector for extracting the background noise data that the acquisition module is got;
Uploading module, the characteristic vector of the background noise data for the extraction module to be extracted uploads onto the server, by The server determines the corresponding ring of the characteristic vector according to the characteristic vector prestored and the corresponding relation of environmental form Border type, and push the speech data that to like speech data type corresponding with the environmental form and user, institute to terminal State hobby speech data type listened to by the server according to the user history speech data statistics obtain, it is described right It should be related to after carrying out machine learning using SVMs machine learning method and obtain;
Wherein, the acquisition module, including:
First acquisition unit, for first obtain background noise data when, obtain one section when a length of first predetermined time period Background noise data;
Second acquisition unit, for it is non-obtain background noise data first when, a length of the when preset time obtains one section The background noise data of two predetermined time periods;
Wherein, first predetermined time period is less than second predetermined time period;
The extraction module is used to decode the background noise data, obtains the sound letter of the background noise data Number;The spectrum signature of the voice signal is extracted, the characteristic vector of the voice signal is obtained.
4. terminal according to claim 3, it is characterised in that the terminal also includes:
Conversion module, the voice signal for the background noise data to obtaining carries out frequency-domain transform;
The extraction module is used to extract the spectrum signature for carrying out the voice signal after frequency-domain transform.
5. a kind of method for pushing speech data, it is characterised in that methods described includes:
The characteristic vector for the background noise data that receiving terminal is uploaded, the terminal is obtained when obtaining background noise data first The background noise data of a length of first predetermined time period when taking one section, in the non-background noise data of acquisition first, every pre- If the background noise data of a length of second predetermined time period when the time obtains one section, first predetermined time period is less than institute The second predetermined time period is stated, the characteristic vector is to be decoded by the terminal to the background noise data, is obtained After the voice signal of the background noise data, then extract what the spectrum signature of the voice signal was obtained;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of the background noise data Corresponding environmental form, the corresponding relation is obtained after carrying out machine learning using SVMs machine learning method;
The speech data corresponding with the hobby speech data type of the environmental form and user is pushed to the terminal, it is described The history speech data statistics that hobby speech data type was listened to according to the user is obtained.
6. method according to claim 5, it is characterised in that characteristic vector and environmental form that the basis is prestored Corresponding relation determine the corresponding environmental form of characteristic vector of the background noise data before, methods described also includes:
The mapping table of characteristic vector and environmental form is set, and by the mapping table of the characteristic vector and environmental form Stored;
Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of the background noise data Corresponding environmental form, including:
The mapping table of the characteristic vector and environmental form is searched according to the characteristic vector of the background noise data, is obtained The corresponding environmental form of characteristic vector of the background noise data.
7. method according to claim 5, it is characterised in that described relative with the environmental form to terminal push Before the speech data answered, methods described also includes:
The corresponding relation of environmental form and speech data type is set;
The corresponding speech data class of the environmental form is determined according to the environmental form and the corresponding relation of speech data type Type;
The speech data corresponding with the environmental form to terminal push, including:
Speech data corresponding with the speech data type is pushed to the terminal.
8. a kind of server, it is characterised in that the server includes:
Receiving module, the characteristic vector of the background noise data uploaded for receiving terminal, the terminal is obtaining background first During noise data, the background noise data of a length of first predetermined time period, background noise is obtained non-first when obtaining one section During data, the background noise data of a length of second predetermined time period when preset time obtains one section, described first presets Time span is less than second predetermined time period, and the characteristic vector is to the background noise data by the terminal Decoded, after the voice signal for obtaining the background noise data, then extract what the spectrum signature of the voice signal was obtained;
First determining module, for determining that the background is made an uproar according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector of sound data, the corresponding relation is to carry out machine using SVMs machine learning method Obtained after device study;
Pushing module, it is corresponding with the hobby speech data type of the environmental form and user for being pushed to the terminal Speech data, the history speech data statistics that the hobby speech data type was listened to according to the user is obtained.
9. server according to claim 8, it is characterised in that the server also includes:
First setup module, the mapping table for setting characteristic vector and environmental form;
Memory module, characteristic vector and the mapping table of environmental form for first setup module to be set are deposited Storage;
First determining module, including:
Searching unit, pair for searching the characteristic vector and environmental form according to the characteristic vector of the background noise data Answer relation table;
Acquiring unit, the corresponding environmental form of characteristic vector for obtaining the background noise data.
10. server according to claim 8, it is characterised in that the server also includes:
Second setup module, the corresponding relation for setting environmental form and speech data type;
Second determining module, the environmental form pass corresponding with speech data type for being set according to second setup module System determines the corresponding speech data type of the environmental form;
The pushing module, for pushing speech data corresponding with the speech data type to the terminal.
11. a kind of system for pushing speech data, it is characterised in that the system includes:Terminal and server;
Wherein, the terminal of the terminal as described in claim 3 to 4 described in any claim;
The server of the server as described in claim 8 to 10 described in any claim.
CN201310268905.2A 2013-06-28 2013-06-28 Push method, terminal, server and the system of speech data Active CN103347070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310268905.2A CN103347070B (en) 2013-06-28 2013-06-28 Push method, terminal, server and the system of speech data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310268905.2A CN103347070B (en) 2013-06-28 2013-06-28 Push method, terminal, server and the system of speech data

Publications (2)

Publication Number Publication Date
CN103347070A CN103347070A (en) 2013-10-09
CN103347070B true CN103347070B (en) 2017-08-01

Family

ID=49281844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310268905.2A Active CN103347070B (en) 2013-06-28 2013-06-28 Push method, terminal, server and the system of speech data

Country Status (1)

Country Link
CN (1) CN103347070B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318925B (en) * 2014-09-25 2017-09-01 小米科技有限责任公司 Audio data processing method and device
CN104618446A (en) * 2014-12-31 2015-05-13 百度在线网络技术(北京)有限公司 Multimedia pushing implementing method and device
CN105227656B (en) * 2015-09-28 2018-09-07 百度在线网络技术(北京)有限公司 Information-pushing method based on speech recognition and device
EP3701523B1 (en) * 2017-10-27 2021-10-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Noise attenuation at a decoder
CN109067883B (en) * 2018-08-10 2021-06-29 珠海格力电器股份有限公司 Information pushing method and device
CN109347986A (en) * 2018-12-04 2019-02-15 北京羽扇智信息科技有限公司 A kind of voice messaging method for pushing, device, electronic equipment and storage medium
CN109819375A (en) * 2019-01-11 2019-05-28 平安科技(深圳)有限公司 Adjust method and apparatus, storage medium, the electronic equipment of volume

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101448340A (en) * 2007-11-26 2009-06-03 联想(北京)有限公司 Mobile terminal state detection method and system and mobile terminal
CN102024058A (en) * 2010-12-31 2011-04-20 万音达有限公司 Music recommendation method and system
CN102082799A (en) * 2011-01-26 2011-06-01 惠州市德赛西威汽车电子有限公司 Vehicle-carried multimedia service system accessing method and system thereof
CN102654860A (en) * 2011-03-01 2012-09-05 北京彩云在线技术开发有限公司 Personalized music recommendation method and system
CN102700482A (en) * 2012-06-01 2012-10-03 浙江吉利汽车研究院有限公司杭州分公司 System for changing in-car atmosphere by external environment
CN103024213A (en) * 2012-12-17 2013-04-03 江苏乐买到网络科技有限公司 Method and device for providing personalized information and service for users

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4122947B2 (en) * 2002-11-28 2008-07-23 ヤマハ株式会社 Music information distribution device
CN1301387C (en) * 2004-06-04 2007-02-21 广东科龙电器股份有限公司 Noise source identifying method for air-conditioner based on nervous network
US8161039B2 (en) * 2005-02-15 2012-04-17 Koninklijke Philips Electronics N.V. Automatic personal play list generation based on external factors such as weather, financial market, media sales or calendar data
CN102543119A (en) * 2011-12-31 2012-07-04 北京百纳威尔科技有限公司 Scene-based music playing processing method and music playing device
CN103067863B (en) * 2012-12-24 2016-12-28 宁波源丰消防设备有限公司 Vehicle mounted multimedia player method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101448340A (en) * 2007-11-26 2009-06-03 联想(北京)有限公司 Mobile terminal state detection method and system and mobile terminal
CN102024058A (en) * 2010-12-31 2011-04-20 万音达有限公司 Music recommendation method and system
CN102082799A (en) * 2011-01-26 2011-06-01 惠州市德赛西威汽车电子有限公司 Vehicle-carried multimedia service system accessing method and system thereof
CN102654860A (en) * 2011-03-01 2012-09-05 北京彩云在线技术开发有限公司 Personalized music recommendation method and system
CN102700482A (en) * 2012-06-01 2012-10-03 浙江吉利汽车研究院有限公司杭州分公司 System for changing in-car atmosphere by external environment
CN103024213A (en) * 2012-12-17 2013-04-03 江苏乐买到网络科技有限公司 Method and device for providing personalized information and service for users

Also Published As

Publication number Publication date
CN103347070A (en) 2013-10-09

Similar Documents

Publication Publication Date Title
CN103347070B (en) Push method, terminal, server and the system of speech data
CN103886857B (en) A kind of noise control method and equipment
KR101954550B1 (en) Volume adjustment method, system and equipment, and computer storage medium
CN110956957B (en) Training method and system of speech enhancement model
CN104683294B (en) A kind of data processing method and system
CN104980337B (en) A kind of performance improvement method and device of audio processing
CN109951743A (en) Barrage information processing method, system and computer equipment
CN103886731B (en) A kind of noise control method and equipment
CN104766608A (en) Voice control method and voice control device
CN103491488A (en) Echo cancellation method and device for microphone
CN109658935B (en) Method and system for generating multi-channel noisy speech
WO2011035626A1 (en) Audio playing method and audio playing apparatus
CN109361995B (en) Volume adjusting method and device for electrical equipment, electrical equipment and medium
CN111640411B (en) Audio synthesis method, device and computer readable storage medium
CN107301030A (en) A kind of method for controlling volume, device and a kind of terminal
CN105812581A (en) Volume automatic adjustment method and device
CN109493883A (en) A kind of audio time-delay calculation method and apparatus of smart machine and its smart machine
CN109242555B (en) Voice-based advertisement playing method and related product
CN107733876A (en) A kind of stream media caption display methods, mobile terminal and storage device
CN103281425A (en) Method and device for analyzing contact through conversation voice
CN110111811A (en) Audio signal detection method, device and storage medium
CN113611324A (en) Method and device for inhibiting environmental noise in live broadcast, electronic equipment and storage medium
CN110232909A (en) A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN107452398A (en) Echo acquisition methods, electronic equipment and computer-readable recording medium
CN204117590U (en) Voice collecting denoising device and voice quality assessment system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Address after: 100102 Beijing Wangjing West Road, a volume of stone world building, A, block, floor 12

Applicant after: Xiaomi Technology Co., Ltd.

Address before: 100102 Beijing Wangjing West Road, a volume of stone world building, A, block, floor 12

Applicant before: Beijing Xiaomi Technology Co., Ltd.

C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Address after: 100085 Beijing city Haidian District Qinghe Street No. 68 Huarun colorful city shopping center two floor 13

Applicant after: Xiaomi Technology Co., Ltd.

Address before: 100102 Beijing Wangjing West Road, a volume of stone world building, A, block, floor 12

Applicant before: Xiaomi Technology Co., Ltd.

COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100102 CHAOYANG, BEIJING TO: 100085 HAIDIAN, BEIJING

GR01 Patent grant
GR01 Patent grant