CN108899036A

CN108899036A - A kind of processing method and processing device of voice data

Info

Publication number: CN108899036A
Application number: CN201810549538.6A
Authority: CN
Inventors: 林凤绿; 张驰; 叶顺平
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Chumen Wenwen Information Technology Co Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2018-11-27

Abstract

The embodiment of the present invention provides a kind of processing method and processing device of voice data, the method includes：Obtain the operation information from the first user；Based on the operation information, the corresponding user intent information of the first user is determined；If the user intent information, which is used to indicate, plays the tone information from second user, based on the user intent information, obtain the first tone information data to be played corresponding with the user intent information, wherein the first tone information data are recorded by second user；Play the first tone information data.In this way, obtaining tone information data to be played by identification user intent information and playing out, the function of intelligent audio equipment can be enriched and improve the degree of intelligence of intelligent audio equipment.

Description

A kind of processing method and processing device of voice data

Technical field

The present embodiments relate to intelligent terminal application field more particularly to the processing methods and dress of a kind of voice data It sets.

Background technique

With the rise of smart home, Internet of Things, the intelligent audios equipment such as intelligent sound box, wearable device has biggish Development, intelligent audio equipment can not only be interacted with user, and have the function of voice broadcasting.

Currently, With the fast development of internet, voice data playing function provided by intelligent audio equipment is mostly to adopt The voice data for collecting user's input searches feedback information corresponding with the voice data, in internet web page from internet Music, the Weather information in internet play the feedback information after getting feedback information.But intelligent audio equipment Provided service is mostly the interactive service of user and internet, and this interactive service is more single, and cannot provide multiple intelligence Service is recorded and played to message between energy audio frequency apparatus, can not realize individually to leave a message and record and play service.

During stating intelligent audio equipment in use, inventor has found that existing intelligent audio equipment does not have voice to stay Say mailbox function, the tone information that user is recorded in other equipment or current device can not be played, there are function compared with For the lower technical problem of single, degree of intelligence.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of processing method and processing device of voice data, main purpose is to lead to Identification user intent information is crossed to play the tone information recorded in other equipment or current device, audio is can be improved and sets Standby degree of intelligence, and the function of abundant audio frequency apparatus.

In order to achieve the above objectives, the embodiment of the present invention mainly provides the following technical solutions：

In a first aspect, the embodiment of the present invention provides a kind of processing method of voice data, the method includes：It is come from The operation information of first user；Based on the operation information, the corresponding user intent information of the first user is determined；If the use Family intent information be used to indicate play the tone information from second user, be based on the user intent information, obtain with it is described The corresponding first tone information data to be played of user intent information, wherein the first tone information data are used by second It records at family；Play the first tone information data.

Second aspect, the embodiment of the present invention provide a kind of processing unit of voice data, and described device includes：It obtains single Member, for obtaining the operation information from the first user；First determination unit determines first for being based on the operation information The corresponding user intent information of user；Acquiring unit, if being used to indicate broadcasting from second for the user intent information The tone information of user is based on the user intent information, obtains corresponding with the user intent information to be played first Tone information data, wherein the first tone information data are recorded by second user；Broadcast unit, for playing described One tone information data.

The third aspect, the embodiment of the present invention provide a kind of storage medium, and the storage medium includes the program of storage, In, described program operation when control the storage medium where equipment execute the processing method of above-mentioned voice data the step of.

Fourth aspect, the embodiment of the present invention provide a kind of intelligent audio equipment, and the intelligent audio equipment includes：At least one A processor；And at least one processor, the bus being connected to the processor；Wherein, the processor, memory pass through The bus completes mutual communication；The processor is used to call the program instruction in the memory, above-mentioned to execute The step of processing method of voice data.

The processing method and processing device of voice data provided in an embodiment of the present invention is obtaining the operation from the first user It, can be according to the operation information, to determine user intent information corresponding to the first user after information；Next, if first The user intent information of user is to be used to indicate to play the tone information from second user, will be intended to letter based on the user Breath, to obtain the first tone information data to be played corresponding with the user intent information, wherein first voice stays Speech data are recorded by second user；Finally, the first tone information data can be played.In this way, passing through identification user intent information It plays the tone information data recorded in other audio frequency apparatuses or present video equipment, can be realized multiple intelligent audios Service is recorded and played to message between equipment, is also able to achieve individually message and records and play service, thus, it improves audio and sets Standby degree of intelligence, and the function of abundant audio frequency apparatus.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings：

Fig. 1 is the structural schematic diagram of the phonetic searching system in the embodiment of the present invention one；

Fig. 2 is the flow diagram one of the processing method of the voice data in the embodiment of the present invention one；

Fig. 3 A is the flow diagram two of the processing method of the voice data in the embodiment of the present invention one；

Fig. 3 B is the flow diagram three of the processing method of the voice data in the embodiment of the present invention one；

Fig. 4 is the structural schematic diagram of the processing unit of the voice data in the embodiment of the present invention two；

Fig. 5 is the structural schematic diagram of the intelligent audio equipment in the embodiment of the present invention three.

Specific embodiment

The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention It is fully disclosed to those skilled in the art.

Embodiment one

The embodiment of the present invention provides a kind of phonetic searching system, and Fig. 1 is the phonetic searching system in the embodiment of the present invention one Structural schematic diagram, shown in Figure 1, which includes：Overall control center (Controller) 101, automatic language Sound identification service (ASR, Automatic Speech Recognition) module 102, question and answer (QA, Query Answer) service Module 103, dialogue management (DM, Dialogue Management) module 104, client (Client) 105 and text-to-speech (TTS, Text to Speech) service module 106；

Wherein, above-mentioned overall control center, for the voice operating information according to transmitted by client, by calling system Other service modules determine the corresponding user intent information of the operation information, and search for corresponding with the user intent information wait broadcast The tone information data put.

Above-mentioned ASR service module, for carrying out speech recognition to voice operating information transmitted by overall control center, by language Sound operation information is converted to text identification as a result, and text recognition result is issued overall control center.The ASR service includes：Stream Media services (streaming server) module and recognizer server (identification service) module, wherein Streaming server module mainly does some audio decoders to the voice operating information that overall control center is sent, sample rate turns The audio processings such as change, recognizer server module mainly will treated that voice data is converted to text (text) number According to, while during conversion, to overall control center returning part result (partial result), short pause (short Pause), the speech characteristic parameters information such as mute (silence), final result (final result).

Above-mentioned QA service module, for passing through qa-api after receiving text identification result transmitted by overall control center Come call DM module come to text identification result carry out semantic analysis, which is natural language processing (NLP, Natural Language Processing) portal service.

Above-mentioned DM module is right after obtaining text identification result transmitted by overall control center for doing dialog logic control Text recognition result carries out semantic analysis, determines user intent information.The DM module is by query analysis (query- Analysis) service module, buffer service (cache-server) module and spatial term (NLG, Natural Language Generation) service module realizes.Wherein, query-analysis service module is mainly used for completing Semantic understanding, including intent classifier and entity word extract the two functions, and in practical applications, query-analysis services mould Block can be realized by natural language understanding (NLU, Natural laguage understanding) technology；cache- Server module is used for according to tone information data needed for user intent information inquiry, and stores query result, so as to client Intelligent audio device plays tone information data where holding, in practical applications, cache-server module on the one hand can be with Some lesser data of variation are stored in advance, to improve retrieval rate, on the other hand, can also pass through and call internet hunt Engine, such as onebox, to retrieve required search result；NLG service module is used for according to NLG technology to cache-server Various information in the search result searched carry out structured analysis, and according to search need be organized into one it is succinct from Right language, to facilitate user to listen to.

Above-mentioned client, for using the NLG data in search result, request of the initiation to TTS service module will be literary The NLG data of this format are converted to voice data, thus, it is played out in intelligent audio equipment.

Above-mentioned TTS service module, for text data to be converted to voice data.

In practical applications, client is set in intelligent audio equipment, and intelligent audio equipment can be come in a variety of manners Implement.For example, intelligent audio equipment described in the embodiment of the present invention may include such as intelligent sound box, smart television, intelligence Carry-on equipment such as the smart home devices such as set-top box, smart phone, tablet computer, smartwatch, Intelligent bracelet etc..When So, it can also be not specifically limited in the embodiment of the present invention here for other types of audio frequency apparatus.

Further, in conjunction with above-mentioned phonetic searching system, the embodiment of the present invention provides a kind of processing method of voice data, The processing method of the voice data is applied to intelligent audio equipment.

Fig. 2 is the flow diagram one of the processing method of the voice data in the embodiment of the present invention one, shown in Figure 2, The processing method of the voice data includes：

S201：Obtain the operation information from the first user；

Specifically, according to the difference of the action type of the first user, aforesaid operations information can be voice operating information, It is also possible to touch control operation information, it is, of course, also possible to be other types of operation information, such as fingerprinting operation information, here, sheet Inventive embodiments are not specifically limited.

In practical applications, when the first user wants to give by the other users of intelligent audio device plays or the first user The tone information oneself stayed, when leaving a message to other users or leaving a message to oneself, the first user can pass through the side of interactive voice Formula realizes that such as the first user inquires that intelligent audio equipment " message for having me ", " playing message ", " I wants by voice Message ", " I will create voice reminder " etc., at this point, intelligent audio equipment will obtain the letter of the voice operating from the first user Breath, alternatively, the first user can also be realized by way of touch control operation, such as the first user can press intelligent audio equipment On play button perhaps record button can also in the user interface of intelligent audio equipment press play message-leaving function button or Message button etc. is recorded, the voice play function of intelligent audio equipment is opened or records message-leaving function, and generates respective operations Information, at this point, intelligent audio equipment will obtain the touch control operation information from the first user.

S202：Based on operation information, the corresponding user intent information of the first user is determined；

Specifically, below with reference to phonetic searching system, by taking operation information is voice operating information as an example, to illustrate how The corresponding user intent information of the first user is determined according to operation information.It obtains in intelligent audio equipment from the first user Voice operating information after, which can be sent to overall control center, overall control center calls ASR service module to pass through The voice operating information is converted to textual identification by speech recognition technology, and then overall control center is by text identification information It is sent to QA service module, QA service module calls DM module to carry out semantic reason to text recognition result by qa-api Solution, DM module use natural language understanding technology, carry out semantic understanding to text identification information, and determine the first user couple The user intent information answered.In this way, just obtaining the corresponding user intent information of the first user.

Application scenarios one：User intent information, which is used to indicate, plays the tone information from second user.

Illustratively, when aforesaid operations information is voice operating information, if the corresponding text of voice operating information Identification information is that " what message I has " " message for playing me " " having to my message " corresponding user intent information is " to broadcast Put tone information "；If the corresponding textual identification of voice operating information is that " message for playing A1 to me " " has A1 to me Message " corresponding user intent information is " playing the tone information from A1 ", " playing the tone information that A2 is stayed ".This When, it can determine that above-mentioned user intent information is to be used to indicate to play the tone information from second user.

Application scenarios two：User intent information is used to indicate leaves a message to second user recorded speech.

Illustratively, when aforesaid operations information is voice operating information, if the corresponding text of voice operating information Identification information is that " I will leave a message " " I wants to record message " " creation voice reminder " corresponding user intent information is " recorded speech Message "；If the corresponding textual identification of voice operating information is that " give A1 message " " reminding to A1 recorded speech " " I am B, I will leave a message to A1 " corresponding user intent information is " leaving a message to A1 recorded speech ", " staying tone information to A1 ".If should The corresponding textual identification of voice operating information is " I is B, I will leave a message to owner ", and corresponding user intent information is " B will leave a message ".It leaves a message at this point, can determine that above-mentioned user intent information is used to indicate to second user recorded speech.

It in practical applications, can be right by operation information institute when operation information obtained is touch control operation information The user that the function of answering is determined as the first user is intended to.For example, if function corresponding to touch control operation information is to play message, At this point it is possible to determine that the user intent information of the first user is " playing tone information "；If corresponding to touch control operation information Function is to record message, at this point it is possible to determine that the user intent information of the first user is " recorded speech message ".

In the specific implementation process, if it is determined that the user intent information of the first user is to be used to indicate broadcasting from second The tone information of user then executes S203 to S204.

S203：Based on user intent information, the first tone information number to be played corresponding with user intent information is obtained According to；

Wherein, the first tone information data are recorded by second user.

In practical applications, an intelligent audio equipment can be by multiple users come using for example, having four in one family A kinsfolk, respectively：Mother, father, eldest daughter and small daughter, the intelligent sound box in family then correspond to four users.

So, according to the difference of the scene of practical application, second user can be identical as the first user, e.g., when mother returns After to home, the intelligent sound box in family is can be used to play the voice for prompting that the previous day self is recorded and stay in mother Speech；Second user can also be not identical as the first user, as mother can also play two female using the intelligent sound box in family Youngster gives her tone information.

Certainly, second user can be a user, or multiple users, such as two users, three users.This In, the embodiment of the present invention is not specifically limited.

In the specific implementation process, in order to obtain the first tone information data to be played, above-mentioned S203 may include with Lower step：

Step 2031：Based on user intent information, the corresponding identification information of the first tone information data is determined；

In practical applications, the corresponding identification information of above-mentioned first tone information data can be user identity information, such as Leave a message listener user identity information, leave a message recording side user identity information, or recording time information, certainly, It can also be other information that can be identified for that tone information data, such as multiple combinations in equipment identification information or above- mentioned information Deng here, the embodiment of the present invention is not specifically limited.

Specifically, user identity information can be User ID, user's pet name, address name etc..

Step 2032：From tone information data set, tone information data that label information and identification information are matched It is determined as the first tone information data.

In practical applications, tone information data set can store in the local storage space of intelligent audio equipment, Can store in shared memory space associated by multiple intelligent audio equipment, it is, of course, also possible to be stored in it is other external, As voice mail server memory space in, here, the embodiment of the present invention is not specifically limited.

Illustratively, when the processing method of the voice data is applied to single intelligent audio equipment, such as multiple users make Tone information is carried out with an intelligent sound box, is deposited at this point, tone information data set can store in the local of the intelligent sound box It stores up in space；When the processing method of the voice data is applied in tone information system, as the tone information system includes：Intelligence Energy speaker and smartwatch, the first user use intelligent sound box, and second user uses smartwatch, intelligent sound box and smartwatch It shares memory space with preset cloud respectively to be associated with, at this point, tone information data set can store and share memory space in cloud.

In practical applications, for the ease of being quickly found out required tone information data, when storaged voice leaves a message data, Corresponding mark can be generated according to the recording time of the tone information, the equipment of recording, message recording side, message listener etc. Sign information.

In this way, after obtaining the corresponding identification information of the first tone information data by step 2031, so that it may by this The label information of each of identification information and tone information data set tone information data is matched, finally, in voice It leaves a message in data set, the tone information data that label information and the identification information match are required the first language to be played Sound message data.

The is realized below with the user identifier for listener of leaving a message and at least one of the user identifier of message recording side For the identification information of one tone information data, to illustrate how based on the corresponding user intent information of the first user, to determine The corresponding identification information of first tone information data.

In the specific implementation process, above-mentioned steps 2031 may comprise steps of：

Step 2031a：User intent information is parsed, judges whether user intent information meets preset condition, and Generate judging result；

Specifically, it after the intent information for obtaining the first user, " is broadcast if the text structure of user intent information meets A1 is put to the tone information of B1 ", show in user intent information while indicating the user identifier A1 and message of message recording side The user identifier B1 of listener；If the text structure of user intent information meets " playing the tone information from A2 ", " plays The tone information that A2 is stayed " shows the user identifier A2 that message recording side is simply had an indication that in user intent information, and instruction is not stayed Say the user identifier of listener；If the text structure of user intent information meets, " tone information for playing to B2 ", " broadcasting is stayed To the tone information of B2 ", show the user identifier B2 that message listener is simply had an indication that in user intent information, instruction is not stayed Say the user identifier of recording side；If the text structure of user intent information meets " playing tone information ", show that user is intended to The user identifier that message recording side is not pointed out in information does not also point out the user identifier of message listener.

Certainly, in practical applications, above-mentioned preset condition can also be other, be not limited to " broadcasting A1 enumerated above To the tone information of B1 ", " play the tone information from A2 ", the forms such as " playing tone information ", can be by those skilled in the art Member determines that here, the embodiment of the present invention is not specifically limited according to the actual situation in the specific implementation process.

Step 2031b：The corresponding user's mark of the first tone information data is obtained according to preset strategy based on judging result Know information；

Wherein, user identity information is the first user identity information of the first user and the second user mark of second user At least one of information.

Step 2031c：User identity information is determined as identification information.

In practical applications, according to the difference of judging result, corresponding preset strategy is also different.Specifically, Above-mentioned steps 2031b may exist and be not limited to following three kinds of situations.

Situation one：The user identifier of message recording side is explicitly pointed out in user intent information, i.e. the second of second user is used Family identification information directly extracts required user identity information from user intent information.

So, above-mentioned steps 2031b may include：If it is judged that showing that user intent information meets the first default item Part extracts second user identification information from user intent information.

Here, the first preset condition refer to include in user intent information second user second user identification information.

Illustratively, when user intent information is " playing mother to the tone information of father ", mother is second user Second user identification information, father be the first user the first user identity information；When user intent information is " to play Zhang San To my tone information " when, Zhang San is the second user identification information of second user.

Situation two：The user identifier of message listener is explicitly pointed out in user intent information, i.e. the first of the first user is used Family identification information directly extracts required user identity information from user intent information.

So, above-mentioned steps 2031b may include：If it is judged that showing that user intent information meets the second default item Part extracts the first user identity information from user intent information.

Here, the second preset condition refer to include in user intent information the first user the first user identity information.

Illustratively, when user intent information is " playing Zhang San to the tone information of Li Si ", Zhang San is second user Second user identification information, Li Si be the first user the first user identity information；When user intent information is " to play king five Tone information " when, king five be the first user the first user identity information.

Situation three：The user identifier of message listener and the use of message recording side are not explicitly pointed out in user intent information Family mark, shows prompt information, to obtain required user identity information.

So, above-mentioned steps 2031b may include：If it is judged that showing that user intent information meets third and presets item Part shows default prompt information corresponding with user intent information to the first user, receives the response message from the first user； Based on response message, second user identification information and/or the first user identity information are obtained.

Illustratively, when user intent information is " playing tone information ", at this point, can not be mentioned from user intent information User identifier needed for taking out needs to show default prompt information to user, needed for being obtained according to the response message of user Identification information.

In practical applications, default prompt information can disappear for the prompt of the user identifier for obtaining message listener Breath, such as " may I ask you is whom ", or for obtaining the prompting message of the identification information of message recording side, such as " may I ask will be broadcast The tone information who is stayed put ", it is, of course, also possible to be the prompting message of other contents, such as " it may I ask and whom needs to play stayed to whose Speech ", here, the embodiment of the present invention is not specifically limited.

In practical applications, the difference of the mode interacted according to intelligent audio equipment and user shows default prompt The mode of information can be varied.For example, default prompt information can be broadcasted by voice broadcast, it can also be by aobvious The content of default prompt information is directly displayed out in display screen, it is, of course, also possible to otherwise, such as show on a user interface Push button, drop-down menu etc. come allow user select message listener and message recording side.

Certainly, in practical applications, above-mentioned steps 2031b can also be realized by other means, the embodiment of the present invention It is not specifically limited.

S204：Play the first tone information data.

Specifically, obtain with after the first tone information data corresponding to the user intent information of the first user, The first tone information data can be played to the first user.

In practical applications, due to the case where using identical equipment there are multiple users, can such as make per capita for one four mouthfuls With the intelligent sound box in family, in order to avoid playback error, realize it is effective play, can before playing tone information every time, First determine whether lower accessed tone information data give the user's of current operation intelligent audio equipment.

So, in the specific implementation process, above-mentioned S204 may include：It is right when operation information is voice operating information Voice operating information carries out Application on Voiceprint Recognition, obtains the vocal print feature of the first user；Believed according to user's vocal print feature and user identifier Mapping relations between breath determine the first user identity information corresponding with the vocal print feature of the first user；First user is marked Know information to be matched with the message listener label of the first tone information data；If successful match, plays the first voice and stay Say data.

Specifically, useful due to directly being carried in the voice operating information when operation information is voice operating information The vocal print feature at family, and vocal print feature is capable of the identity of unique identification user, therefore, can directly to the voice operating information into Row Application on Voiceprint Recognition, to obtain the vocal print feature of the first user, next, can be believed according to user's vocal print feature and user identifier Mapping relations between breath determine corresponding first user identity information of the vocal print feature of first user, finally, by first User identity information is matched with the message listener label of the first tone information data, according to matching result, so that it may really The first tone information data are made whether to the first current user's.If successful match, show the first voice Message data are exactly to the message of the first user, and the first user is that message listener can be listened to, at this point, can play this One tone information data.

In addition, when operation information is touch control operation information touch control operation can be carried out to intelligent audio equipment in user During, while other biological informations of user, such as the fingerprint characteristic of user are acquired, to determine the mark of user Information.It is, of course, also possible to show preset authentication prompt letter to the first user before playing the first tone information data Breath, to obtain the user identity information of the first user responded.

In an alternative embodiment of the invention, Fig. 3 A is the process of the processing method of the voice data in the embodiment of the present invention one Schematic diagram two, referring to shown in Fig. 3 A, after executing above-mentioned steps S201 and S202, if it is determined that the user of the first user is intended to Information is to be used to indicate to leave a message to second user recorded speech, and the processing method of the voice data can also include：

S301：Acquire the second tone information data from the first user；

S302：According to user intent information, the corresponding second user identification information of second user is determined；

In the specific implementation process, similar with the process of identification information of the first tone information data is determined, according to user Intent information determines that the corresponding second user identification information of second user may exist and be not limited to following three kinds of modes.

Mode one：If only pointing out the user identifier of message recording side in user intent information, i.e. the first user the with User identity information, above-mentioned S302 may include：From user intent information, the first user identifier letter of the first user is extracted Breath；By in preset user identity information library, all user identity informations in addition to the first user identity information determine the second use The second user identification information at family.

As an example it is assumed that preset user identity information library includes：Zhang San, Li Si, king five, if user intent information For " Zhang San will leave a message ", it is possible to which Li Si and king five to be determined as to the second user identification information of second user.

Mode two：If explicitly pointing out the user identifier of message listener in user intent information, i.e. the of second user Two user identity informations, above-mentioned S302 may include：From user intent information, the second user mark letter of second user is extracted Breath.

For example, if user intent information is " leaving a message to Li Si ", so that it may Zhang San are directly determined as the second use The second user identification information at family.

Mode three：The user identifier of message listener and the use of message recording side are not explicitly pointed out in user intent information Family mark, can show prompt information by intelligent audio equipment, to obtain required user identity information.Above-mentioned S302 can To include：Default prompt information corresponding with user intent information is shown to the first user, receives the response from the first user Information；Based on response message, second user identification information and/or the first user identity information are obtained.

For example, if user intent information is " recorded speech message ", at this point, being directly to be intended to believe from user Required user identifier is extracted in breath, needs to show default prompt information to user, to be obtained according to the response message of user Take required identification information.

In practical applications, default prompt information can disappear for the prompt of the user identifier for obtaining message listener Breath, such as " whom may I ask you will leave a message to ", or for obtaining the prompting message of the identification information of message recording side, such as " ask Ask that whom you are ", it is, of course, also possible to be the prompting message of other contents, such as " whom, which may I ask, to leave a message to whom ", here, the present invention Embodiment is not specifically limited.

Certainly, other than the above-mentioned embodiment listed, in practical applications, above-mentioned S302 can also be by other means It realizes, the embodiment of the present invention is not specifically limited.

S303：Second user identification information is labeled as the corresponding message listener label of the second tone information data；

Specifically, for the ease of being quickly found out required tone information when playing message, second user is being obtained Second user identification information after, so that it may the second user identification information is stayed labeled as the second tone information data are corresponding Say listener label.

S304：The second tone information data after storage label.

In an alternative embodiment of the invention, it in order to more accurately find required tone information, referring to shown in Fig. 3 B, is holding Before row S304, the processing method of above-mentioned voice data can also include：

S305：Application on Voiceprint Recognition is carried out to the second tone information data, obtains the vocal print feature of the first user；

S306：According to the mapping relations between user's vocal print feature and user identity information, the determining sound with the first user Corresponding first user identity information of line feature；

S307：First user identity information is labeled as the corresponding message recording side's label of the second tone information data.

Specifically, for the ease of more accurately finding required tone information when playing message, first is being obtained After the first user identity information of user, so that it may which first user identity information is corresponding labeled as the second tone information data Message recording side's label.

After having executed S307, S304 can be executed, to store the second tone information data after label.

Here, it should be noted that can only mark message listener label in practical applications, can also only mark Note message recording side's label, can also mark message listener label and message recording side's label simultaneously, certainly, in addition to giving second Tone information data are marked outside label with user identity information, can also by other information, such as recording time information, equipment Identification information etc. comes to the second corresponding label of tone information data markers.

So far, the treatment process to voice data is just completed.

As shown in the above, technical solution provided in an embodiment of the present invention is obtaining the operation from the first user It, can be according to the operation information, to determine user intent information corresponding to the first user after information；Next, if first The user intent information of user is to be used to indicate tone information of the broadcasting from second user to be, will be intended to letter based on the user Breath, to obtain the first tone information data to be played corresponding with user intent information, wherein the first tone information data by Second user is recorded；Finally, the first tone information data can be played.In this way, other to play by identification user intent information The tone information data recorded in audio frequency apparatus or present video equipment can be realized between multiple intelligent audio equipment Service is recorded and played to message, is also able to achieve individually message and records and play service, thus, improve the intelligent journey of audio frequency apparatus Degree, and the function of abundant audio frequency apparatus.

Embodiment two

Based on the same inventive concept, as an implementation of the above method, the embodiment of the invention provides a kind of voice data Processing unit, the Installation practice is corresponding with preceding method embodiment, and to be easy to read, present apparatus embodiment is no longer to aforementioned Detail content in embodiment of the method is repeated one by one, it should be understood that before the device in the present embodiment can correspond to realization State the full content in embodiment of the method.

Fig. 4 is the structural schematic diagram of the processing unit of the voice data in the embodiment of the present invention two, shown in Figure 4, should Device 40 includes：Obtaining unit 401, for obtaining the operation information from the first user；First determination unit 402 is used for base In operation information, the corresponding user intent information of the first user is determined；Acquiring unit 403, if used for user intent information In instruction play the tone information from second user, be based on user intent information, obtain it is corresponding with user intent information to The the first tone information data played, wherein the first tone information data are recorded by second user；Broadcast unit 404, for broadcasting Put the first tone information data.

In embodiments of the present invention, acquiring unit is also used to determine the first tone information data based on user intent information Corresponding identification information；From tone information data set, the tone information data that label information and identification information are matched are true It is set to the first tone information data.

In embodiments of the present invention, acquiring unit is also used to parse user intent information, judges that user is intended to letter Whether breath meets preset condition, and generates judging result；The first tone information is obtained according to preset strategy based on judging result The corresponding user identity information of data, wherein user identity information is that the first user identity information of the first user and second are used At least one of the second user identification information at family；User identity information is determined as identification information.

In embodiments of the present invention, acquiring unit is also used to if it is judged that showing that user intent information meets first Preset condition extracts second user identification information from user intent information；If it is judged that showing that user intent information is full The second preset condition of foot extracts the first user identity information from user intent information；If it is judged that showing that user is intended to Information meets third preset condition, shows default prompt information corresponding with user intent information to the first user, reception comes from The response message of first user；Based on response message, second user identification information and/or the first user identity information are obtained.

In embodiments of the present invention, broadcast unit, for believing voice operating when operation information is voice operating information Breath carries out Application on Voiceprint Recognition, obtains the vocal print feature of the first user；According to reflecting between user's vocal print feature and user identity information Relationship is penetrated, determines the first user identity information corresponding with the vocal print feature of the first user；By the first user identity information and The message listener label of one tone information data is matched；If successful match, the first tone information data are played.

In other embodiments of the present invention, above-mentioned apparatus further includes：Acquisition unit, if be used for for user intent information It is indicated to second user recorded speech message, acquires the second tone information data from the first user；Second determination unit is used According to user intent information, the corresponding second user identification information of second user is determined；First marking unit is used for second User identity information is labeled as the corresponding message listener label of the second tone information data；Storage unit, for storing label The second tone information data afterwards.

In an alternative embodiment of the invention, above-mentioned apparatus further includes：Recognition unit, for the second tone information data into Row Application on Voiceprint Recognition obtains the vocal print feature of the first user；Third determination unit, for according to user's vocal print feature and user identifier Mapping relations between information determine the first user identity information corresponding with the vocal print feature of the first user；Second label is single Member, for the first user identity information to be labeled as the corresponding message recording side's label of the second tone information data.

The processing unit for the voice data introduced by the present embodiment is the voice that can be executed in the embodiment of the present invention The device of the processing method of data, so the processing method based on voice data described in the embodiment of the present invention, this field Those of skill in the art can understand the specific embodiment and its various change of the processing unit of the voice data of the present embodiment Form, so how to realize the processing side of the voice data in the embodiment of the present invention for the processing unit of the voice data at this Method is no longer discussed in detail.As long as the processing method that those skilled in the art implement voice data in the embodiment of the present invention is adopted Device belongs to the range to be protected of the application.

In practical applications, the processing unit of the voice data can be applied in intelligent audio equipment.Intelligent audio equipment It can implement in a variety of manners.For example, intelligent audio equipment described in the embodiment of the present invention may include such as intelligent sound The smart home devices such as case, smart television, Intelligent set top box, such as smart phone, tablet computer, smartwatch, Intelligent bracelet Etc. carry-on equipment etc..It is, of course, also possible to be not specifically limited in the embodiment of the present invention here for other types of audio frequency apparatus.

Embodiment three

Based on the same inventive concept, the embodiment of the present invention provides a kind of intelligent audio equipment.Fig. 5 is the embodiment of the present invention three In intelligent audio equipment structural schematic diagram, shown in Figure 5, which includes：At least one processor 51；And at least one processor 52, the bus 53 being connect with the processor 51；Wherein, the processor 51, memory 52 Mutual communication is completed by the bus 53；The processor 51 is used to call the program instruction in the memory 52, To execute following steps：Obtain the operation information from the first user；Based on operation information, the corresponding user of the first user is determined Intent information；If user intent information, which is used to indicate, plays the tone information from second user, it is based on user intent information, Obtain the first tone information data to be played corresponding with user intent information, wherein the first tone information data are by second User records；Play the first tone information data.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs：It is anticipated based on user Figure information determines the corresponding identification information of the first tone information data；From tone information data set, by label information and mark The tone information data that information matches are determined as the first tone information data.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs：User is intended to Information is parsed, and judges whether user intent information meets preset condition, and generates judging result；Based on judging result, press According to preset strategy, the corresponding user identity information of the first tone information data is obtained, wherein user identity information is the first user The first user identity information and at least one of the second user identification information of second user；User identity information is determined For identification information.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs：If it is determined that knot Fruit shows that user intent information meets the first preset condition, from user intent information, extracts second user identification information；If Judging result shows that user intent information meets the second preset condition, from user intent information, extracts the first user identifier letter Breath；If it is judged that showing that user intent information meets third preset condition, to the first user displaying and user intent information Corresponding default prompt information receives the response message from the first user；Based on response message, second user mark letter is obtained Breath and/or the first user identity information.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs：Work as operation information When for voice operating information, Application on Voiceprint Recognition is carried out to voice operating information, obtains the vocal print feature of the first user；According to user's sound Mapping relations between line feature and user identity information determine the first user identifier corresponding with the vocal print feature of the first user Information；First user identity information is matched with the message listener label of the first tone information data；If matching at Function plays the first tone information data.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs：If user anticipates Figure information is used to indicate leaves a message to second user recorded speech, acquires the second tone information data from the first user；According to User intent information determines the corresponding second user identification information of second user；Second user identification information is labeled as second The corresponding message listener label of tone information data；The second tone information data after storage label.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs：To the second voice Data of leaving a message carry out Application on Voiceprint Recognition, obtain the vocal print feature of the first user；According to user's vocal print feature and user identity information it Between mapping relations, determine corresponding with the vocal print feature of the first user the first user identity information；First user identifier is believed Breath is labeled as the corresponding message recording side's label of the second tone information data.

The embodiment of the invention also provides a kind of processor, the processor is for running program, wherein described program fortune The processing method of the voice data in above-described embodiment is executed when row.

Above-mentioned processor can be by central processing unit (Central Processing Unit, CPU), microprocessor (Micro Processor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field-programmable Gate array (Field Programmable Gate Array, FPGA) etc. is realized.Memory may include computer-readable medium In non-volatile memory, the shapes such as random access memory (Random Access Memory, RAM) and/or Nonvolatile memory Formula, if read-only memory (Read Only Memory, ROM) or flash memory (Flash RAM), memory include at least one storage Chip.

Example IV

Based on the same inventive concept, the present embodiment provides a kind of storage medium, above-mentioned storage medium be stored with one or Multiple programs, said one or multiple programs can be executed by one or more processor, to realize in above-described embodiment The processing method of voice data.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, Usable storage medium (including but not limited to magnetic disk storage, CD-ROM (Compact Disc Read-Only Memory, CD-ROM), optical memory etc.) on the form of computer program product implemented.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, RAM and/or Nonvolatile memory etc. Form, such as ROM or Flash RAM.Memory is the example of computer-readable medium.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. Computer readable storage medium can be ROM, programmable read only memory (Programmable Read-Only Memory, PROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read-Only Memory, EPROM), electricity Erasable Programmable Read Only Memory EPROM (Electrically Erasable Programmable Read-Only Memory, EEPROM), magnetic RAM (Ferromagnetic Random Access Memory, FRAM), flash Device (Flash Memory), magnetic surface storage, CD or CD-ROM (Compact Disc Read-Only Memory, The memories such as CD-ROM)；Be also possible to flash memory or other memory techniques, CD-ROM, digital versatile disc (DVD) or Other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium, It can be used for storing and can be accessed by a computing device information；It can also be various including one of above-mentioned memory or any combination Electronic equipment, such as mobile phone, computer, tablet device, personal digital assistant.As defined in this article, computer can Reading medium not includes temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.

It will be understood by those skilled in the art that the embodiment of the present invention can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the present invention Form.It is deposited moreover, the present invention can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The above is only the embodiment of the present invention, are not intended to restrict the invention.To those skilled in the art, The invention may be variously modified and varied.It is all within the spirit and principles of the present invention made by any modification, equivalent replacement, Improve etc., it should be included within scope of the presently claimed invention.

Claims

1. a kind of processing method of voice data, which is characterized in that the method includes：

Obtain the operation information from the first user；

Based on the operation information, the corresponding user intent information of the first user is determined；

If the user intent information, which is used to indicate, plays the tone information from second user, letter is intended to based on the user Breath obtains the first tone information data to be played corresponding with the user intent information, wherein first tone information Data are recorded by second user；

Play the first tone information data.

2. the method according to claim 1, wherein it is described be based on the user intent information, obtain with it is described The corresponding first tone information data to be played of user intent information, including：

Based on the user intent information, the corresponding identification information of the first tone information data is determined；

From tone information data set, the tone information data that label information and the identification information match are determined as described First tone information data.

3. according to the method described in claim 2, determining described the it is characterized in that, described be based on the user intent information The corresponding identification information of one tone information data, including：

The user intent information is parsed, judges whether the user intent information meets preset condition, and generates and sentences Disconnected result；

The corresponding user identity information of the first tone information data is obtained according to preset strategy based on the judging result, Wherein, the user identity information is the first user identity information of the first user and the second user identification information of second user At least one of；

The user identity information is determined as the identification information.

4. according to the method described in claim 3, it is characterized in that, described obtained based on the judging result according to preset strategy The corresponding user identity information of the first tone information data is taken, including：

If the judging result shows that the user intent information meets the first preset condition, from the user intent information In, extract the second user identification information；

If the judging result shows that the user intent information meets the second preset condition, from the user intent information In, extract first user identity information；

If the judging result shows that the user intent information meets third preset condition, to first user show with The corresponding default prompt information of the user intent information, receives the response message from the first user；Believed based on the response Breath, obtains the second user identification information and/or the first user identity information.

5. the first tone information data are played the method according to claim 1, wherein described, including：

When the operation information is voice operating information, Application on Voiceprint Recognition is carried out to the voice operating information, first is obtained and uses The vocal print feature at family；

According to the mapping relations between user's vocal print feature and user identity information, the determining vocal print feature with first user Corresponding first user identity information；

First user identity information is matched with the message listener label of the first tone information data；

If successful match, the first tone information data are played.

6. determining the first user the method according to claim 1, wherein being based on the instruction information described After corresponding user intent information, the method also includes：

It leaves a message if the user intent information is used to indicate to second user recorded speech, acquires second from the first user Tone information data；

According to the user intent information, the corresponding second user identification information of second user is determined；

The second user identification information is labeled as the corresponding message listener label of the second tone information data；

The second tone information data after storage label.

7. according to the method described in claim 6, it is characterized in that, it is described storage label after the second tone information data it Before, the method also includes：

Application on Voiceprint Recognition is carried out to the second tone information data, obtains the vocal print feature of first user；

First user identity information is labeled as the corresponding message recording side's label of the second tone information data.

8. a kind of processing unit of voice data, which is characterized in that described device includes：

Obtaining unit, for obtaining the operation information from the first user；

First determination unit determines the corresponding user intent information of the first user for being based on the operation information；

Acquiring unit plays the tone information from second user if be used to indicate for the user intent information, is based on The user intent information obtains the first tone information data to be played corresponding with the user intent information, wherein institute The first tone information data are stated to be recorded by second user；

Broadcast unit, for playing the first tone information data.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment execute the processing method of voice data as described in any one of claim 1 to 7 Step.

10. a kind of intelligent audio equipment, which is characterized in that the intelligent audio equipment includes：

At least one processor；

And at least one processor, the bus being connected to the processor；

Wherein, the processor, memory complete mutual communication by the bus；The processor is described for calling The step of program instruction in memory, processing method to execute voice data as described in any one of claim 1 to 7.