CN103347070B

CN103347070B - Push method, terminal, server and the system of speech data

Info

Publication number: CN103347070B
Application number: CN201310268905.2A
Authority: CN
Inventors: 郭涛; 蔡经伟; 刘伟
Original assignee: Xiaomi Inc
Current assignee: Xiaomi Inc
Priority date: 2013-06-28
Filing date: 2013-06-28
Publication date: 2017-08-01
Anticipated expiration: 2033-06-28
Also published as: CN103347070A

Abstract

The invention discloses a kind of method, terminal, server and system for pushing speech data, belong to multimedia technology field.Method includes：A background noise data are obtained every preset time, and extract the characteristic vector of background noise data；The characteristic vector of background noise data is uploaded onto the server, the corresponding environmental form of characteristic vector is determined according to the characteristic vector prestored and the corresponding relation of environmental form by server, and pushes to terminal the speech data corresponding with environmental form；The speech data that the reception server is pushed.The present invention is by obtaining background noise data, and extract the characteristic vector of background noise data, and then the characteristic vector of background noise data uploads onto the server, the corresponding environmental form of characteristic vector is determined by server, and the speech data to terminal push corresponding thereto, therefore, can be achieved according to external environment condition is that user pushes speech data, so as to meet the listening demand in user's different time and place, Consumer's Experience is improved.

Description

Push method, terminal, server and the system of speech data

Technical field

The present invention relates to multimedia technology field, more particularly to a kind of method for pushing speech data, terminal, server and System.

Background technology

With developing rapidly for science and technology, more and more terminals with speech data playing function enter popular regard It is wild.For example, MP3（Moving Picture Experts Group Audio Layer III, Motion Picture Experts Group's audio layer 3）Player, mobile phone and tablet personal computer etc..User is by clicking on the void of the physical button or display of terminal on a terminal screen Manually selecting for speech data can freely be carried out by intending button.However, when user is in relatively crowded environment or in face of without screen The terminal of display, it is reluctant or is difficult to when manually selecting of speech data, in order to lift Consumer's Experience and meet user's How listening demand, carry out speech data push automatically, becomes the problem that those skilled in the art more pay close attention to.

The content of the invention

The embodiments of the invention provide a kind of method, terminal, server and system for pushing speech data.The technical side Case is as follows：

First aspect includes there is provided a kind of method for pushing speech data, methods described：

A background noise data are obtained every preset time, and extract the characteristic vector of the background noise data；

The characteristic vector of the background noise data is uploaded onto the server, by the server according to the spy prestored Levy vector and determine the corresponding environmental form of the characteristic vector with the corresponding relation of environmental form, and pushed and the ring to terminal The corresponding speech data of border type；

Receive the speech data of the server push.

It is preferred that described obtain a background noise data every preset time, including：

When obtaining background noise data first, the background noise data of a length of first predetermined time period when obtaining one section；

During the non-background noise data of acquisition first, a length of second predetermined time period when preset time obtains one section Background noise data；

Wherein, first predetermined time period is less than second predetermined time period.

It is preferred that the characteristic vector for extracting the background noise data, including：

The background noise data are decoded, the voice signal of the background noise data is obtained；

The spectrum signature of the voice signal is extracted, the characteristic vector of the voice signal is obtained.

It is preferred that after the voice signal for obtaining the background noise data, methods described also includes：

The voice signal of the background noise data to obtaining carries out frequency-domain transform；

The spectrum signature for extracting the voice signal, including：

Extract the spectrum signature for carrying out the voice signal after frequency-domain transform.

Second aspect includes there is provided a kind of terminal, the terminal：

Acquisition module, for obtaining a background noise data every preset time；

Extraction module, the characteristic vector for extracting the background noise data that the acquisition module is got；

Uploading module, the characteristic vector of the background noise data for the extraction module to be extracted is uploaded to service Device, determines that the characteristic vector is corresponding by the server according to the characteristic vector prestored and the corresponding relation of environmental form Environmental form, and push the speech data corresponding with the environmental form to terminal；

Receiving module, the speech data for receiving the server push.

It is preferred that the acquisition module, including：

First acquisition unit, for first obtain background noise data when, obtain one section when a length of first preset time The background noise data of length；

Second acquisition unit, in the non-background noise data of acquisition first, one section of duration to be obtained every preset time For the background noise data of the second predetermined time period；

It is preferred that the extraction module is used to decode the background noise data, the background noise number is obtained According to voice signal；The spectrum signature of the voice signal is extracted, the characteristic vector of the voice signal is obtained.

It is preferred that the terminal also includes：

Conversion module, the voice signal for the background noise data to obtaining carries out frequency-domain transform；

The extraction module is used to extract the spectrum signature for carrying out the voice signal after frequency-domain transform.

The third aspect, additionally provides a kind of method for pushing speech data, and methods described includes：

The characteristic vector for the background noise data that receiving terminal is uploaded；

Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of the background noise data The corresponding environmental form of vector；

The speech data corresponding with the environmental form is pushed to the terminal.

It is preferred that characteristic vector and the corresponding relation of environmental form that the basis is prestored determine the background noise Before the corresponding environmental form of characteristic vector of data, methods described also includes：

The mapping table of characteristic vector and environmental form is set, and by characteristic vector pass corresponding with environmental form It is that table is stored；

Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of the background noise data The corresponding environmental form of vector, including：

The mapping table of the characteristic vector and environmental form is searched according to the characteristic vector of the background noise data, Obtain the corresponding environmental form of characteristic vector of the background noise data.

It is preferred that before the speech data corresponding with the environmental form to terminal push, methods described Also include：

The corresponding relation of environmental form and speech data type is set；

The corresponding voice number of the environmental form is determined according to the environmental form and the corresponding relation of speech data type According to type；

The speech data corresponding with the environmental form to terminal push, including：

Speech data corresponding with the speech data type is pushed to the terminal.

Fourth aspect includes there is provided a kind of server, the server：

Receiving module, the characteristic vector of the background noise data uploaded for receiving terminal；

First determining module, for determining the back of the body according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector of scape noise data；

Pushing module, for pushing the speech data corresponding with the environmental form to the terminal.

It is preferred that the server also includes：

First setup module, the mapping table for setting characteristic vector and environmental form；

Memory module, characteristic vector and the mapping table of environmental form for first setup module to be set are entered Row storage；

First determining module, including：

Searching unit, for searching the characteristic vector and environmental form according to the characteristic vector of the background noise data Mapping table；

Acquiring unit, the corresponding environmental form of characteristic vector for obtaining the background noise data.

It is preferred that the server also includes：

Second setup module, the corresponding relation for setting environmental form and speech data type；

Second determining module, for pair of the environmental form and speech data type that are set according to second setup module The corresponding speech data type of the determination environmental form should be related to；

The pushing module, for pushing speech data corresponding with the speech data type to the terminal.

5th aspect includes there is provided a kind of system for pushing speech data, the system：Terminal and server；

Wherein, terminal terminal as described above；

The server server as described above.

The beneficial effect that technical solution of the present invention is brought is：

Terminal by obtaining a background noise data every preset time, and extract the features of background noise data to Amount, and then the characteristic vector of background noise data is uploaded onto the server, by server according to the characteristic vector prestored and The corresponding relation of environmental form determines the corresponding environmental form of characteristic vector, and pushes to terminal the language corresponding with environmental form Sound data.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so as to meet user's different time and ground The listening demand of point, improves Consumer's Experience.

Brief description of the drawings

Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.

Fig. 1 is a kind of method flow diagram for push speech data that the embodiment of the present invention one is provided；

Fig. 2 is the method flow diagram for another push speech data that the embodiment of the present invention one is provided；

Fig. 3 is a kind of method flow diagram for push speech data that the embodiment of the present invention two is provided；

Fig. 4 is a kind of structural representation for terminal that the embodiment of the present invention three is provided；

Fig. 5 is a kind of internal structure schematic diagram for acquisition module that the embodiment of the present invention three is provided；

Fig. 6 is the structural representation for another terminal that the embodiment of the present invention three is provided；

Fig. 7 is a kind of structural representation for server that the embodiment of the present invention four is provided；

Fig. 8 is the structural representation for another server that the embodiment of the present invention four is provided；

Fig. 9 is a kind of internal structure schematic diagram for first determining module that the embodiment of the present invention four is provided；

Figure 10 is the structural representation for another server that the embodiment of the present invention four is provided；

Figure 11 is a kind of system structure diagram for push speech data that the embodiment of the present invention five is provided.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in detail.

Embodiment one

The embodiments of the invention provide a kind of method for pushing speech data, so that terminal performs the angle of this method as an example, Referring to Fig. 1, method flow provided in an embodiment of the present invention is included：

101：A background noise data are obtained every preset time, and extract the characteristic vector of background noise data；

It is preferred that obtaining a background noise data every preset time, include but is not limited to：

Wherein, the first predetermined time period is less than the second predetermined time period.

It is preferred that extracting the characteristic vector of background noise data, include but is not limited to：

Background noise data are decoded, the voice signal of background noise data is obtained；

The spectrum signature of voice signal is extracted, the characteristic vector of voice signal is obtained.

It is preferred that after obtaining the voice signals of background noise data, this method also includes：

The voice signal of background noise data to obtaining carries out frequency-domain transform；

The spectrum signature of voice signal is extracted, is included but is not limited to：

102：The characteristic vector of background noise data is uploaded onto the server, from server according to the feature prestored to Amount determines the corresponding environmental form of characteristic vector with the corresponding relation of environmental form, and corresponding with environmental form to terminal push Speech data；

103：The speech data that the reception server is pushed.

So that server performs the angle of this method as an example, referring to Fig. 2, the method flow that the present embodiment is provided includes：

201：The characteristic vector for the background noise data that receiving terminal is uploaded；

202：Characteristic vector and the corresponding relation of environmental form according to prestoring determine the feature of background noise data The corresponding environmental form of vector；

It is preferred that determining the spy of background noise data according to the characteristic vector and the corresponding relation of environmental form that prestore Levy before vectorial corresponding environmental form, this method also includes：

The mapping table of characteristic vector and environmental form is set, and by the mapping table of characteristic vector and environmental form Stored；

Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of background noise data Corresponding environmental form, includes but is not limited to：

The mapping table of characteristic vector and environmental form is searched according to the characteristic vector of background noise data, background is obtained The corresponding environmental form of characteristic vector of noise data.

203：The speech data corresponding with environmental form is pushed to terminal.

It is preferred that before the speech data corresponding with environmental form is pushed to terminal, this method also includes：

The corresponding relation of environmental form and speech data type is set；

The corresponding speech data type of environmental form is determined according to environmental form and the corresponding relation of speech data type；

The speech data corresponding with environmental form is pushed to terminal, is included but is not limited to：

Speech data corresponding with speech data type is pushed to terminal.

The method that the present embodiment is provided, terminal extracts the back of the body by obtaining a background noise data every preset time The characteristic vector of scape noise data, and then the characteristic vector of background noise data is uploaded onto the server, by server according to pre- The characteristic vector first stored and the corresponding relation of environmental form determine the corresponding environmental form of characteristic vector, and to terminal push with The corresponding speech data of environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so that full Sufficient user's different time and the listening demand in place, improve Consumer's Experience.

Embodiment two

It is right in conjunction with the content of above-described embodiment one the embodiments of the invention provide a kind of method for pushing speech data The mode provided in an embodiment of the present invention for pushing speech data carries out that explanation is explained in detail.Referring to Fig. 3, what the present embodiment was provided Method flow includes：

301：Terminal obtains a background noise data every preset time, and extract the features of background noise data to Amount；

Wherein, the size of preset time concretely 10 minutes, certainly, the size of preset time is in addition to above-mentioned numerical value, also It can be 5 minutes or 6 minutes etc., the present embodiment is to the size of preset time without specific restriction.

In addition, obtaining a background noise data every preset time, include but is not limited to：

Said process is described in detail with a specific example below.

To obtain the time of background noise data first as 10:00:00, preset time is 10 minutes, the first preset time Length is 8 seconds, and the second predetermined time period is that exemplified by 10 seconds, then terminal is from 10:00:00 starts to record background noise data, when Time showing is 10:00:When 08, stop the recording of background noise data；And when the time is shown as 10:10:When 00, start second The recording of secondary background noise data, this recording length is 10 seconds, i.e., 10:10:Stop secondary recording when 10；Rear During continuous recording, handling process is consistent with above-mentioned processing mode, that is, a length of 10 seconds when recording one time within 10 minutes Background noise data, and after the background noise data recorded, in the storage medium that terminal should be stored it in, for example, , also can be by background noise data storage in other kinds of storage medium in internal memory or RAM card, the present embodiment is situated between to storage The type of matter is without specific restriction.

In addition, extracting the characteristic vector of background noise data, include but is not limited to：

Wherein, when extracting the spectrum signature of voice signal, MFCC can specifically be used（Mel Frequency Cepstrum Coefficien, Mel frequency scramble coefficient）、CWT（Continuous Wavelet Transform, continuous wavelet transform）、 STHT（Short Time Fourier Transform, Short Time Fourier Transform）Which kind of etc. technology, specifically extracted using technology The spectrum signature of voice signal, can depend on the circumstances, the present embodiment is not especially limited to this.

302：Terminal uploads onto the server the characteristic vector of background noise data；

For the step, terminal, can be directly by background noise data after the characteristic vector of background noise data is obtained Characteristic vector uploads onto the server, also can be for the purpose for reducing network transmission burden, by the characteristic vector of background noise data Uploaded onto the server again after compression packing, specifically using which kind of upload mode, the present embodiment is not especially limited to this.

303：The characteristic vector for the background noise data that server receiving terminal is uploaded；

Wherein, if server receives the characteristic vector of background noise data, can directly it be cached；If service Device receives the characteristic vector of the background noise data of packing compression, then is being decompressed the feature that obtains background noise data After vector, cached into storage medium.And storage medium concretely internal memory or hard disk, certainly, the type of storage medium Can be also other kinds of storage medium, for example, flash memory or CD etc., the present embodiment is to storage medium in addition to the above-mentioned type Type without specific restriction.

304：Server determines background noise data according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector；

For the step, background noise data are determined according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector before, this method also includes：

Wherein, set the specific implementation of the corresponding relation of characteristic vector and environmental form that the following two kinds side is usually taken Formula：

First way, the corresponding relation of artificial setting characteristic vector and environmental form；

For example, when characteristic vector is less than predetermined threshold value, environmental form is set into quiet environment；When characteristic vector back and forth During randomized jitter, environmental form is set to outdoor irregular noisy environment；When the regular bounce of characteristic vector, by environment Type is set to outdoor regular noisy environment；Wherein, the big I of predetermined threshold value is configured on demand, and the present embodiment is to default The size of threshold value is without specific restriction.

The second way, sets characteristic vector corresponding with environmental form using the method for the machine learning such as SVMs Relation；

Server collects a sample set, i.e., the background noise data of small-scale quantity, then to this sample set in advance In each sample classified, it is determined that wherein which background noise data correspond to quiet environment, which background noise data Corresponding to outdoor irregular noisy environment, which background noise data corresponds to outdoor regular noisy environment；Afterwards, server The characteristic vector of each sample in sample set is calculated using disaggregated model algorithm, and is calculated according to the characteristic vector of each sample To a disaggregated model；When it is follow-up there are new samples when, as long as calculating the characteristic vector of the new samples and then being nested into point In class model, it is possible to automatically derive its corresponding environmental form.

It should be noted that first way is due to being the corresponding relation for artificially setting characteristic vector and environmental form, because This, sets precision poor compared to the second way, but this set method is more succinct；And the second way is due to using The mode of Machine self-learning, therefore adaptability is stronger, and with the gradually expansion of sample set, set precision also can be increasingly It is high.And which kind of set-up mode is specifically used when performing the method that the present embodiment is provided, it can depend on the circumstances, the present embodiment is to this It is not especially limited.

In addition, when the corresponding relation for pre-setting characteristic vector and environmental form, and by corresponding relation record in correspondence pass Be in table after, the mapping table can be used directly when subsequently performing this method again, i.e., provides performing the present embodiment every time Method when without being performed both by the step every time, only when corresponding relation has renewal, mapping table is updated.

305：Server pushes the speech data corresponding with environmental form to terminal；

For the step, server is pushed to terminal before the speech data corresponding with environmental form, and this method is also wrapped Include：

The corresponding relation of environmental form and speech data type is set；

Speech data corresponding with speech data type is pushed to terminal.

Wherein, when setting the corresponding relation of environmental form and sound-type, corresponding table as shown in table 1 below can be set：

Table 1

Environmental form

Speech data type

Quiet environment	Light music
		Outdoor irregular noisy environment	Rock and roll, pop music
Outdoor regular noisy environment	Rural area, national music

Determine that the corresponding environmental form of a certain characteristic vector is irregular noisy for outdoor according to above-mentioned steps 304 for example, working as Environment, then the type that the speech data pushed can be determined preferably according to table 1 is rock music or pop music.

It is preferred that when server is to terminal push speech data, while being pushed according to exterior environmental conditions, this The step of method that embodiment is provided also is included according to the hobby of user further to push associated speech data.For example, in peace Under quiet environment, light music that server is pushed is pushed according to the hobby of user nor randomly select, and it is The light music that user may like, rather than careless one first light music.The method that the present embodiment is provided is supported to listen to user History speech data counted, so as to analyze the type for the light music that user likes.Specific implementation can be according to existing Some analytic statistics modes realize that here is omitted.

306：The speech data that terminal the reception server is pushed.

Wherein, after terminal receives the speech data that server is pushed, the speech data is stored in the storage of itself In medium, to treat subsequently to play out.And storage medium concretely internal memory or RAM card, certainly, the type of storage medium is removed Can be also other kinds of storage medium, for example, flash memory or CD etc., the present embodiment is to storage medium outside the above-mentioned type Type is without specific restriction.

Method provided in an embodiment of the present invention, by obtaining a background noise data every preset time, and extracts the back of the body The characteristic vector of scape noise data, and then the characteristic vector of background noise data is uploaded onto the server, by server according to pre- The characteristic vector first stored and the corresponding relation of environmental form determine the corresponding environmental form of characteristic vector, and to terminal push with The corresponding speech data of environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so that full Sufficient user's different time and the listening demand in place, improve Consumer's Experience.

Embodiment three

The embodiments of the invention provide a kind of terminal, referring to Fig. 4, the terminal includes：

Acquisition module 41, for obtaining a background noise data every preset time；

Extraction module 42, the characteristic vector for extracting the background noise data that acquisition module 41 is got；

Uploading module 43, the characteristic vector of the background noise data for extraction module 42 to be extracted is uploaded to service Device, the corresponding environmental classes of characteristic vector are determined by server according to the characteristic vector prestored and the corresponding relation of environmental form Type, and push the speech data corresponding with environmental form to terminal；

Receiving module 44, the speech data pushed for the reception server.

It is preferred that referring to Fig. 5, acquisition module 41, including：

First acquisition unit 411, for when obtaining background noise data first, when obtaining one section a length of first it is default when Between length background noise data；

Second acquisition unit 412, in the non-background noise data of acquisition first, when preset time obtains one section The background noise data of a length of second predetermined time period；

It is preferred that extraction module is used to decode background noise data, the voice signal of background noise data is obtained； The spectrum signature of voice signal is extracted, the characteristic vector of voice signal is obtained.

It is preferred that referring to Fig. 6, terminal also includes：

Conversion module 45, the voice signal for the background noise data to obtaining carries out frequency-domain transform；

Extraction module 42 is used to extract the spectrum signature for carrying out the voice signal after frequency-domain transform.

Terminal provided in an embodiment of the present invention, by obtaining a background noise data every preset time, and extracts the back of the body The characteristic vector of scape noise data, and then the characteristic vector of background noise data is uploaded onto the server, by server according to pre- The characteristic vector first stored and the corresponding relation of environmental form determine the corresponding environmental form of characteristic vector, and to terminal push with The corresponding speech data of environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so that full Sufficient user's different time and the listening demand in place, improve Consumer's Experience.

Example IV

The embodiments of the invention provide a kind of server, referring to Fig. 7, the server includes：

Receiving module 71, the characteristic vector of the background noise data uploaded for receiving terminal；

First determining module 72, for determining background according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector of noise data；

Pushing module 73, for pushing the speech data corresponding with environmental form to terminal.

It is preferred that referring to Fig. 8, the server also includes：

First setup module 74, the mapping table for setting characteristic vector and environmental form；

Memory module 75, characteristic vector and the mapping table of environmental form for the first setup module 74 to be set are entered Row storage；

Referring to Fig. 9, the first determining module 72, including：

Searching unit 721, pair for searching characteristic vector and environmental form according to the characteristic vector of background noise data Answer relation table；

Acquiring unit 722, the corresponding environmental form of characteristic vector for obtaining background noise data.

It is preferred that referring to Figure 10, the server also includes：

Second setup module 76, the corresponding relation for setting environmental form and speech data type；

Second determining module 77, for pair of the environmental form and speech data type that are set according to the second setup module 76 The corresponding speech data type of determination environmental form should be related to；

Pushing module 73, for pushing speech data corresponding with speech data type to terminal.

Server provided in an embodiment of the present invention, by being closed according to the characteristic vector prestored is corresponding with environmental form System determines the corresponding environmental form of characteristic vector, and pushes to terminal the speech data corresponding with environmental form.Therefore, can be real It is now that user pushes speech data automatically according to external environment condition, so as to meet the listening demand in user's different time and place, carries High Consumer's Experience.

Embodiment five

The embodiments of the invention provide a kind of system for pushing speech data, referring to Figure 11, the system includes：Terminal 1101 And server 1102；

Wherein, the terminal of such as embodiment three of terminal 1101；

The server of such as example IV of server 1102.

System provided in an embodiment of the present invention, by according to the characteristic vector and the corresponding relation of environmental form prestored Determine the corresponding environmental form of characteristic vector, and push to terminal the speech data corresponding with environmental form.Therefore, it can be achieved It is that user pushes speech data automatically according to external environment condition, so as to meet the listening demand in user's different time and place, improves Consumer's Experience.

Embodiment six

The embodiments of the invention provide a kind of equipment for pushing speech data, and the push voice number in the embodiment of the present invention According to equipment can include one or more following parts：For performing computer program instructions with complete various flows and The processor of method, for information and storage program instruction random access memory（RAM）And read-only storage（ROM）, it is used for The memory of data storage and information, for storing form, catalogue or the database of other data structures, I/O equipment, interface, Antenna etc..

In the embodiment of the present invention, computer program instructions are stored in memory, simultaneously in the form of one or more modules It is configured to by computing device, said one or multiple modules have following function：

Receive the speech data of the server push.

The embodiment citing of above-mentioned functions has been described in detail in embodiment of the method, repeats no more here.

In summary, equipment provided in an embodiment of the present invention, by obtaining a background noise data every preset time, And the characteristic vector of background noise data is extracted, and then the characteristic vector of background noise data is uploaded onto the server, by servicing Device determines the corresponding environmental form of characteristic vector according to the characteristic vector prestored and the corresponding relation of environmental form, and to end End pushes the speech data corresponding with environmental form.Therefore, can be achieved automatically according to external environment condition is that user pushes voice number According to so as to meet the listening demand in user's different time and place, improving Consumer's Experience.

Embodiment seven

The embodiment of the present invention additionally provides a kind of equipment for pushing speech data, and the push voice in the embodiment of the present invention The equipment of data can include one or more following parts：For performing computer program instructions to complete various flows With the processor of method, random access memory is instructed for information and storage program（RAM）And read-only storage（ROM）, use In the memory of data storage and information, for storing form, catalogue or the database of other data structures, I/O equipment, boundary Face, antenna etc..

In summary, equipment provided in an embodiment of the present invention, by according to the characteristic vector and environmental form prestored Corresponding relation determine the corresponding environmental form of characteristic vector, and push to terminal the speech data corresponding with environmental form. Therefore, can be achieved automatically according to external environment condition is that user pushes speech data, so as to meet listening for user's different time and place Demand is listened, Consumer's Experience is improved.

It should be noted that：The system of terminal, server and push speech data that above-described embodiment is provided is recommending language , can as needed will be above-mentioned only with the division progress of above-mentioned each functional module for example, in practical application during sound data Function distribution is completed by different functional modules, i.e., terminal, the internal structure of server are divided into different functional modules, with Complete all or part of function described above.In addition, terminal, server and push speech data that above-described embodiment is provided System with push speech data embodiment of the method belong to same design, it implements process and refers to embodiment of the method, this In repeat no more.

The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.

One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can be by hardware To complete, the hardware of correlation can also be instructed to complete by program, program can be stored in a kind of computer-readable storage In medium, storage medium mentioned above can be read-only storage, disk or CD etc..

Presently preferred embodiments of the present invention is these are only, is not intended to limit the invention, it is all in the spirit and principles in the present invention Within, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims

1. a kind of method for pushing speech data, it is characterised in that methods described includes：

A background noise data are obtained every preset time, and the background noise data are decoded, the back of the body is obtained The voice signal of scape noise data, extracts the spectrum signature of the voice signal, obtains the characteristic vector of the voice signal；

The characteristic vector of the background noise data is uploaded onto the server, from the server according to the feature prestored to Amount determines the corresponding environmental form of the characteristic vector with the corresponding relation of environmental form, and is pushed and the environmental classes to terminal The type speech data corresponding with the hobby speech data type of user, the hobby speech data type is by the server root The history speech data statistics listened to according to the user is obtained, and the corresponding relation is to utilize SVMs machine learning side Method carries out what is obtained after machine learning；

Receive the speech data of the server push；

Wherein, it is described to obtain a background noise data every preset time, including：

During the non-background noise data of acquisition first, the background of a length of second predetermined time period when preset time obtains one section Noise data；

2. according to the method described in claim 1, it is characterised in that the voice signal for obtaining the background noise data it Afterwards, methods described also includes：

The spectrum signature for extracting the voice signal, including：

3. a kind of terminal for pushing speech data, it is characterised in that the terminal includes：

Acquisition module, for obtaining a background noise data every preset time；

Uploading module, the characteristic vector of the background noise data for the extraction module to be extracted uploads onto the server, by The server determines the corresponding ring of the characteristic vector according to the characteristic vector prestored and the corresponding relation of environmental form Border type, and push the speech data that to like speech data type corresponding with the environmental form and user, institute to terminal State hobby speech data type listened to by the server according to the user history speech data statistics obtain, it is described right It should be related to after carrying out machine learning using SVMs machine learning method and obtain；

Wherein, the acquisition module, including：

First acquisition unit, for first obtain background noise data when, obtain one section when a length of first predetermined time period Background noise data；

Second acquisition unit, for it is non-obtain background noise data first when, a length of the when preset time obtains one section The background noise data of two predetermined time periods；

Wherein, first predetermined time period is less than second predetermined time period；

The extraction module is used to decode the background noise data, obtains the sound letter of the background noise data Number；The spectrum signature of the voice signal is extracted, the characteristic vector of the voice signal is obtained.

4. terminal according to claim 3, it is characterised in that the terminal also includes：

5. a kind of method for pushing speech data, it is characterised in that methods described includes：

The characteristic vector for the background noise data that receiving terminal is uploaded, the terminal is obtained when obtaining background noise data first The background noise data of a length of first predetermined time period when taking one section, in the non-background noise data of acquisition first, every pre- If the background noise data of a length of second predetermined time period when the time obtains one section, first predetermined time period is less than institute The second predetermined time period is stated, the characteristic vector is to be decoded by the terminal to the background noise data, is obtained After the voice signal of the background noise data, then extract what the spectrum signature of the voice signal was obtained；

Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of the background noise data Corresponding environmental form, the corresponding relation is obtained after carrying out machine learning using SVMs machine learning method；

The speech data corresponding with the hobby speech data type of the environmental form and user is pushed to the terminal, it is described The history speech data statistics that hobby speech data type was listened to according to the user is obtained.

6. method according to claim 5, it is characterised in that characteristic vector and environmental form that the basis is prestored Corresponding relation determine the corresponding environmental form of characteristic vector of the background noise data before, methods described also includes：

The mapping table of characteristic vector and environmental form is set, and by the mapping table of the characteristic vector and environmental form Stored；

Characteristic vector and the corresponding relation of environmental form according to prestoring determine the characteristic vector of the background noise data Corresponding environmental form, including：

The mapping table of the characteristic vector and environmental form is searched according to the characteristic vector of the background noise data, is obtained The corresponding environmental form of characteristic vector of the background noise data.

7. method according to claim 5, it is characterised in that described relative with the environmental form to terminal push Before the speech data answered, methods described also includes：

The corresponding relation of environmental form and speech data type is set；

The corresponding speech data class of the environmental form is determined according to the environmental form and the corresponding relation of speech data type Type；

Speech data corresponding with the speech data type is pushed to the terminal.

8. a kind of server, it is characterised in that the server includes：

Receiving module, the characteristic vector of the background noise data uploaded for receiving terminal, the terminal is obtaining background first During noise data, the background noise data of a length of first predetermined time period, background noise is obtained non-first when obtaining one section During data, the background noise data of a length of second predetermined time period when preset time obtains one section, described first presets Time span is less than second predetermined time period, and the characteristic vector is to the background noise data by the terminal Decoded, after the voice signal for obtaining the background noise data, then extract what the spectrum signature of the voice signal was obtained；

First determining module, for determining that the background is made an uproar according to the characteristic vector and the corresponding relation of environmental form that prestore The corresponding environmental form of characteristic vector of sound data, the corresponding relation is to carry out machine using SVMs machine learning method Obtained after device study；

Pushing module, it is corresponding with the hobby speech data type of the environmental form and user for being pushed to the terminal Speech data, the history speech data statistics that the hobby speech data type was listened to according to the user is obtained.

9. server according to claim 8, it is characterised in that the server also includes：

Memory module, characteristic vector and the mapping table of environmental form for first setup module to be set are deposited Storage；

First determining module, including：

Searching unit, pair for searching the characteristic vector and environmental form according to the characteristic vector of the background noise data Answer relation table；

10. server according to claim 8, it is characterised in that the server also includes：

Second determining module, the environmental form pass corresponding with speech data type for being set according to second setup module System determines the corresponding speech data type of the environmental form；

11. a kind of system for pushing speech data, it is characterised in that the system includes：Terminal and server；

Wherein, the terminal of the terminal as described in claim 3 to 4 described in any claim；

The server of the server as described in claim 8 to 10 described in any claim.