CN110113646A

CN110113646A - Intelligent interaction processing method, system and storage medium based on AI voice

Info

Publication number: CN110113646A
Application number: CN201910239885.3A
Authority: CN
Inventors: 周胜杰
Original assignee: Shenzhen Konka Electronic Technology Co Ltd
Current assignee: Shenzhen Konka Electronic Technology Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2019-08-09
Anticipated expiration: 2039-03-27
Also published as: CN110113646B

Abstract

The invention discloses intelligent interaction processing method, system and storage mediums based on AI voice, the method: intelligent video camera head of the connection setting with far field voice module Application on Voiceprint Recognition on smart television in advance, for being interacted by the far field voice module of intelligent video camera head with smart television；Intelligent video camera head captured in real-time and the phonetic image information for obtaining user, and using building AI home intelligent interaction scenarios database corresponding with user behavior data in advance, the phonetic image information of user is analyzed and processed；Smart television is according to analysis processing as a result, prejudging to the behavioural habits of user and carrying out corresponding interaction response.The present invention provides a kind of intelligent interaction processing method, systems based on AI voice for facilitating intelligent recognition and interaction to recommend, and smart television is made to increase better intelligent interaction function, user-friendly.

Description

Intelligent interaction processing method, system and storage medium based on AI voice

Technical field

The present invention relates to Smart Home technical fields, and in particular to a kind of intelligent interaction processing method based on AI voice, System and storage medium.

Background technique

With the progress of science and technology, intelligentized consumer electronics are also gradually popularized, and one of technology of AI voice vocal print is known It is not a kind of current technology in more forward position, can recognize that (gender, age can distinguish difference for the voice attribute of speaker Speaker sound belong to (can distinguish that in short by vocal print is which user says)).

Current Application on Voiceprint Recognition application also rests on the primary stage, substantially also in the sound that can recognize that some bases Line attribute (such as: male, female, old, children, vocal print ownership (being whose vocal print)), lacks the AI household scene based on sound groove recognition technology in e Application layer exploitation.

The smart television of the prior art does not have better intelligent interaction function yet, and inconvenient user uses sometimes

Therefore, the existing technology needs to be improved and developed.

Summary of the invention

Place in view of above-mentioned deficiencies of the prior art, the purpose of the present invention is to provide a kind of, and the intelligence based on AI voice is handed over Mutual processing method, system and storage medium provide a kind of intelligence based on AI voice for facilitating intelligent recognition and interaction to recommend Interaction processing method, system make smart television increase better intelligent interaction function, user-friendly.

In order to achieve the above object, this invention takes following technical schemes:

A kind of intelligent interaction processing method based on AI voice, wherein include the following steps:

A, the intelligent video camera head with far field voice module Application on Voiceprint Recognition is arranged in connection on smart television in advance, for passing through intelligence The far field voice module of energy camera is interacted with smart television；

B, intelligent video camera head captured in real-time and the phonetic image information of user is obtained, and utilizes building and user behavior data in advance Corresponding AI home intelligent interaction scenarios database, is analyzed and processed the phonetic image information of user；

C, smart television is according to analysis processing as a result, prejudging to the behavioural habits of user and carrying out interaction sound accordingly It answers.

The intelligent interaction processing method based on AI voice, wherein the step A further include: A1, in advance building with The corresponding AI home intelligent interaction scenarios database of user behavior data.

The intelligent interaction processing method based on AI voice, wherein the step B includes:

Intelligent video camera head is in running order when smart television is switched on；

Intelligent video camera head captured in real-time and the phonetic image information for obtaining user, listen to the speech utterance of user, and user is said It talks about voice record and carries out AI home intelligent interaction process；

AI home intelligent interaction process utilizes building AI home intelligent interaction scenarios data corresponding with user behavior data in advance Library is analyzed and processed the phonetic image information of user；

It is prejudged according to the behavioural habits of user, and constantly learns to correct according to the mutual-action behavior of user.

The intelligent interaction processing method based on AI voice, wherein utilization building in advance and use in the step B The corresponding AI home intelligent interaction scenarios database of family behavioral data, the step that the phonetic image information of user is analyzed and processed Suddenly include:

Carry out the semantics recognition and scenario building class of phonetic order；

Carry out vocal print attributive analysis, the analysis of vocal print emotional characteristics, recognition of face analysis, subscriber household scene point of active user Analysis, the mood analysis of user, the analysis of scene historical record；

Intelligence creation custom system big data, analyzes the phonetic order of user by constructing AI home intelligent interaction scenarios Processing.

The intelligent interaction processing method based on AI voice, wherein it is described carry out phonetic order semantics recognition and The step of scenario building class includes:

It carries out the semantics recognition of phonetic order decomposition: analyzing speaking for user and belong to instruction class or scenario building class；

It is described carry out active user vocal print attributive analysis the step of include:

Carry out the vocal print Attribute Recognition of active user: which vocal print user occurred simultaneously；

The vocal print emotional characteristics analysis includes: what the scene of vocal print appearance is, what everyone vocal print scene is, comprehensive Closing scene is what；

The recognition of face analysis includes: that who with whom to be occurred in the same time, what expression is, what the time is；

The subscriber household scene analysis is found a view by intelligent video camera head according to predetermined template analysis；

The mood analysis of the user is analyzed by vocal print, vocal print emotional characteristics, human face expression and scene；

The scene historical record analysis includes: what processing event occurred for which vocal print scene composition, when occurs , user carried out any interaction after generation, for prejudging the next step behavior of user by historical data analysis, carried out Some pretreated outputs.

The intelligent interaction processing method based on AI voice, wherein the step C includes:

Smart television creates the attribute record of a user according to the result of analysis processing, and by the ID of user, vocal print attribute, people Identifier of the face attribute as user navigates to user by any one of above three attribute；

When detecting strange a vocal print or face, the attribute record of default creation user, and pass through subsequent interaction Intelligence increases the vocal print attribute that vocal print corresponds to user；And if what user recorded first is the increased User ID of vocal print attribute, mistake Subsequent interactive intelligent increases the face character of user；

After creating successful user, the big data tables of data based on User ID is automatically created, tables of data records the various rows of user To record, interacting record, intersection record；

It is prejudged according to the behavioural habits of user, and constantly learns to correct according to the mutual-action behavior of user；

After carrying out AI home intelligent reciprocal decomposition to the phonetic image information of user, the pre-execution operation of user is obtained, or push away It recommends the best interactive scene of user and is prompted accordingly.

A kind of intelligent interaction processing system based on AI voice, wherein include: processor, memory and communication bus；

The intelligent interaction processing routine based on AI voice that can be executed by the processor is stored on the memory；

The communication bus realizes the connection communication between processor and memory；

The processor realizes following steps when executing the intelligent interaction processing routine based on AI voice:

The intelligent interaction processing system based on AI voice, wherein the processor executes described based on AI voice Intelligent interaction processing routine when also realize following steps:

A1, in advance building AI home intelligent interaction scenarios database corresponding with user behavior data；

Intelligence creation custom system big data, analyzes the phonetic order of user by constructing AI home intelligent interaction scenarios Processing；

The scene historical record analysis includes: what processing event occurred for which vocal print scene composition, when occurs , user carried out any interaction after generation, for prejudging the next step behavior of user by historical data analysis, carried out Some pretreated outputs；

A kind of storage medium, wherein the computer-readable recording medium storage has one or more program, and described one A or multiple programs can be executed by one or more processor, to realize the intelligence based on AI voice described in any one Step in interaction processing method.

Compared to the prior art, the intelligent interaction processing method provided by the invention based on AI voice, system and storage are situated between Matter, the present invention pass through intelligence by carrying the intelligent video camera head with far field voice module Application on Voiceprint Recognition, user in smart television Far field voice and the TV of camera interact, and each interactive voice of user all passes through AI home intelligent interactive system block It is analyzed and processed, the content for analyzing processing includes: that (phonetic order decomposes the semantics recognition of phonetic order, is by Command Resolution Specify instruction class and scenario building class (classification in new subdivision field can be increased according to improving for analysis system)), current use Vocal print attribute (Application on Voiceprint Recognition (gender, age), vocal print emotional characteristics (excited, worried, flat etc.), recognition of face (user at family Attribute, expression attribute), custom system association), subscriber household scene analysis (people, Duo Geren, persons' composition, a home scenarios The mood analysis of (meet, have a dinner party, lying fallow, being found a view by intelligent video camera head according to predetermined template analysis), user (pass through vocal print + vocal print emotional characteristics+human face expression+scene), scene historical record analysis (which vocal print scene composition occurred what processing When event occurs, what interaction user carried out after generation, by historical data analysis, prejudges under user One walking is to carry out some pretreated outputs)), intelligence creates custom system big data (User ID, user property, Yong Hujiao Mutually record, user-association (interaction of the user with user) record), by building AI home intelligent interaction scenarios to the voice of user Instruction is further analyzed processing, promotes the scenario building ability and affective interaction ability of AI voice；Above-mentioned all data There are on cloud.

The present invention provides a kind of affective interaction experience of profound level for smart home and the interaction of AI speech-sound intelligent, improves The experience property and interest of product improve the home intelligent home intelligence experience centered on TV, provide one kind and accompany It is experienced with the household of formula.The present invention makes smart television increase better intelligent interaction function, user-friendly.

Detailed description of the invention

Fig. 1 is the flow chart of the intelligent interaction processing method provided by the invention based on AI voice.

Fig. 2 is the functional block diagram of mobile terminal preferred embodiment of the present invention.

Specific embodiment

To make the purpose of the present invention, technical solution and effect clearer, clear and definite, right as follows in conjunction with drawings and embodiments The present invention is further described.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not used to Limit the present invention.

Referring to Fig. 1, the intelligent interaction processing method provided by the invention based on AI voice the following steps are included:

S100, in advance intelligent video camera head of the connection setting with far field voice module Application on Voiceprint Recognition on smart television, for leading to The far field voice module for crossing intelligent video camera head is interacted with smart television；

Intelligence of the connection setting with far field voice module Application on Voiceprint Recognition on smart television in advance is needed in the embodiment of the present invention Camera, for being interacted by the far field voice module of intelligent video camera head with smart television.Smart television, which carries, to be had far The intelligent video camera head of field voice module Application on Voiceprint Recognition, user are interacted by the far field voice of intelligent video camera head with TV, are used The each interactive voice at family all passes through AI home intelligent interactive system block and is analyzed and processed.

The step S100 further include: A1, in advance building AI home intelligent interaction scenarios corresponding with user behavior data Database.Such as building works as user speech and says " it is joyful that has " behavioral data, then " user is frequently necessary to play for corresponding recommendation Game or tour itineraries " give user.

S200, intelligent video camera head captured in real-time and the phonetic image information for obtaining user, and utilize building and user in advance The corresponding AI home intelligent interaction scenarios database of behavioral data, is analyzed and processed the phonetic image information of user.

The step S200 is specifically included:

Wherein, the utilization in the step B constructs AI home intelligent interaction scenarios corresponding with user behavior data in advance Database, the step of being analyzed and processed to the phonetic image information of user include:

Wherein, the step of semantics recognition and scenario building class for carrying out phonetic order includes:

It is realized in this step S200, user is interacted by the far field voice of intelligent video camera head with TV, and user's is every One interactive voice all passes through AI home intelligent interactive system block and is analyzed and processed, and the content for analyzing processing includes: that voice refers to (phonetic order decomposes the semantics recognition of order, is clear instruction class and scenario building class (according to the complete of analysis system by Command Resolution The kind classification that can increase new subdivision field)), vocal print attribute (Application on Voiceprint Recognition (gender, age), vocal print feelings of active user Thread feature (excited, worried, flat etc.), recognition of face (user property, expression attribute), custom system association), subscriber household field Scape analysis (people, Duo Geren, persons' composition, home scenarios (meet, have a dinner party, lying fallow, by intelligent video camera head find a view by According to predetermined template analysis), the mood of user analyze (passing through vocal print+vocal print emotional characteristics+human face expression+scene), scene history Record and analyze (what processing event occurred for which vocal print scene composition, when occurred, and user carried out after generation Any interaction prejudges the next step behavior of user by historical data analysis, carries out some pretreated outputs)), intelligence is created Build custom system big data (User ID, user property, user's intersection record, user-association (interaction of the user with user) note Record), processing is further analyzed to the phonetic order of user by building AI home intelligent interaction scenarios, promotes AI voice Scenario building ability and affective interaction ability.There are on cloud for above-mentioned all data.

S300, smart television are according to analysis processing as a result, prejudging and carrying out corresponding to the behavioural habits of user Interaction response.

The step S300 is specifically included:

Such as: user A+ user B is having issued an instruction [we today why on earth], AI home intelligent against camera Interactive system analysis A before party B-subscriber either with or without there is TV simultaneously before, if there is cross then provide them before did The interactive memory recalling of thing, and provide according to the home scenarios of today the opinion and recommendation of today, opinion and recommend be it is polynary, can Be inside TV application data (such as see TV, play game, learn to cook) be also possible to shopping (it is trendy recommend, shopping beat Folding), the operation datas such as tourism (travelling recommend), these data be all prejudged according to the behavioural habits of user, and according to The mutual-action behavior of user constantly learns to correct, and so that AI home intelligent interactive system is intelligently close to the users and thinks of gained.

It is described in further detail below by way of a concrete application embodiment present invention:

S11, smart television carry the intelligent video camera head with far field voice module Application on Voiceprint Recognition.

Intelligent video camera head is in running order when S12, smart television booting.

S13, intelligent video camera head listen to speaking for user, and user's record of speaking is passed to AI home intelligent interaction system System.

S14, AI home intelligent interactive system are analyzed and processed speaking for user；The content of analysis processing includes: language The semantics recognition (phonetic order decomposition) of sound instruction: analyzing speaking for user is to belong to instruction class (instruction class belongs to saying for user It is very clear to talk about Intentionality, and does not need just to can be carried out instruction execution by scene analysis, such as: I will see the electricity of Liu Dehua Shadow, I to listen Zhang Liangying song, I to eat pork braised in brown sauce etc.) or scenario building class (such as: weather too it is hot what if, do now it is assorted What etc. good, good boring eh, noon eat).

(Application on Voiceprint Recognition (gender, age bracket etc.): which vocal print user occurred the vocal print attribute of active user simultaneously

Vocal print emotional characteristics (excited, worried, flat etc.): what the scene that vocal print occurs is, everyone vocal print scene is What, what (going out scene (default definition: excited, warm, glad, lively etc.) by voiceprint analysis) is comprehensive scene be

Recognition of face (user property, expression attribute), custom system association): who with whom occurred in the same time, and expression is assorted , what the time is.

Subscriber household scene analysis ((meet, have a dinner party, lying fallow, leading to by a people, Duo Geren, persons' composition, home scenarios It crosses intelligent video camera head to find a view according to predetermined template analysis)

The mood of user analyzes (passing through vocal print+vocal print emotional characteristics+human face expression+scene)

(any processing event occurred the analysis of scene historical record for which vocal print scene composition, when occurs, it occurs User carried out any interaction afterwards, by historical data analysis, prejudges the next step behavior of user, carries out some pretreated defeated Out))

Processing is further analyzed to the phonetic order of user by building AI home intelligent interaction scenarios, promotes AI voice Scenario building ability and affective interaction ability.

S15, when intelligent video camera head detects user voice data and is transferred to AI home intelligent interactive system, AI family Intelligent interactive system will create the attribute record of a user, and using the ID of user, vocal print attribute, face character as user's Identifier can navigate to user by any one of above three attribute.

S16, when AI home intelligent interactive system detects a strange vocal print or face, wound will be defaulted The attribute record of user is built, and the vocal print attribute that vocal print corresponds to user is increased by subsequent interactive intelligent.In turn if with What family recorded first is the increased User ID of vocal print attribute, and excessively subsequent interactive intelligent increases the face character of user.

After S17, creation successful user, the big data tables of data based on User ID is automatically created, tables of data records user Various actions record, interaction record, (record of the instruction history and instruction execution that are sent including user, use such as intersection record Subsequent interaction etc. of the family to instruction execution, the basic data of user are shown in that 6,7,8,9,10,11 list but are not limited to enumerate Data record).

After the voice that S18, user send is decomposed by AI home intelligent interactive system, the pre-execution operation of user is obtained, Or the interactive scene that recommended user is best.

Such as: user A+ user B is having issued an instruction [we today why on earth] against camera, and AI home intelligent is handed over Mutual network analysis A before party B-subscriber either with or without there is TV simultaneously before, if there is crossing the thing for then providing and doing before them The interactive memory recalling of feelings, and provide according to the home scenarios of today the opinion and recommendation of today, opinion and recommend be it is polynary, can be with Be application data (such as see TV, play game, learn to cook) inside TV be also possible to shopping (trendy recommend, shopping give a discount), The operation datas such as tourism (travelling recommend), these data are prejudged according to the behavioural habits of user, and according to user Mutual-action behavior constantly learn to correct, make AI home intelligent interactive system be intelligently close to the users think of gained.

Therefore the present invention provides a kind of intelligent interaction processing method based on AI voice, a kind of convenience is provided Intelligent interaction processing method, the system based on AI voice that intelligent recognition and interaction are recommended, increase smart television preferably Intelligent interaction function, it is user-friendly.

As shown in Fig. 2, based on the above-mentioned intelligent interaction processing method based on AI voice, the present invention further correspondingly provides one kind Intelligent interaction processing system based on AI voice, the intelligent interaction processing system based on AI voice can be smart television, Desktop PC, notebook, palm PC and intelligent sound smart machine.The intelligent interaction processing system based on AI voice Including processor 10, memory 20 and display screen 30, processor 10 is connect by communication bus 50 with memory 20, described aobvious Display screen 30 is connect by communication bus 50 with processor 10.Fig. 2 illustrates only the intelligent interaction processing system based on AI voice Members, it should be understood that being not required for implementing all components shown, the implementation that can be substituted is more or less Component.

The memory 20 is can be in some embodiments in the intelligent interaction processing system based on AI voice Portion's storage unit, such as the memory of the intelligent interaction processing system based on AI voice.The memory 20 is in other embodiments In be also possible to the External memory equipment of the intelligent interaction processing system based on AI voice, such as it is described based on AI voice The plug-in type USB flash disk being equipped in intelligent interaction processing system, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the memory 20 can also both include The internal storage unit for being based on the intelligent interaction processing system of AI voice also includes External memory equipment.The memory 20 is used The application software and Various types of data of the intelligent interaction processing system based on AI voice, such as the installation are installed in storage The program code etc. of intelligent interaction processing system based on AI voice.The memory 20 can be also used for temporarily storing Output or the data that will be exported.In one embodiment, the intelligent interaction processing based on AI voice is stored on memory 20 Method program 40, being somebody's turn to do the intelligent interaction processing method program 40 based on AI voice can be performed by processor 10, to realize this Based on the intelligent interaction processing method of AI voice in application.

The processor 10 can be in some embodiments a central processing unit (Central Processing Unit, CPU), microprocessor, mobile phone baseband processor or other data processing chips, for running the journey stored in the memory 20 Sequence code or processing data, such as execute the intelligent interaction processing method etc. based on AI voice.

The display screen 30 can be LED display, liquid crystal display, touch liquid crystal display in some embodiments And OLED(Organic Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..The display screen 30 is used In the information for being shown in the intelligent interaction processing system based on AI voice and for showing visual user interface.

In one embodiment, when processor 10 executes the intelligent interaction processing method in the memory 20 based on AI voice It is performed the steps of when program 40

C, smart television is according to analysis processing as a result, prejudging to the behavioural habits of user and carrying out interaction sound accordingly It answers, as detailed above.

Wherein, following steps are also realized when the processor executes the intelligent interaction processing routine based on AI voice:

After carrying out AI home intelligent reciprocal decomposition to the phonetic image information of user, the pre-execution operation of user is obtained, or push away It recommends the best interactive scene of user and is prompted accordingly, as detailed above.

Based on the above embodiment, described computer-readable to deposit the present invention also provides a kind of computer readable storage medium Storage media is stored with one or more program, and one or more of programs can be executed by one or more processor, To realize the step in the intelligent interaction processing method based on AI voice as described in above-mentioned any one, as detailed above.

In conclusion intelligent interaction processing method, system and the storage medium provided by the invention based on AI voice, this hair It is bright by carrying the intelligent video camera head with far field voice module Application on Voiceprint Recognition in smart television, user is by intelligent video camera head Far field voice is interacted with TV, and each interactive voice of user all passes through AI home intelligent interactive system block and analyze Processing, the content for analyzing processing includes: that (phonetic order decomposes the semantics recognition of phonetic order, is clearly to instruct by Command Resolution Class and scenario building class (classification in new subdivision field can be increased according to improving for analysis system)), the vocal print of active user Attribute (Application on Voiceprint Recognition (gender, age), vocal print emotional characteristics (excited, worried, flat etc.), recognition of face (user property, table Feelings attribute), custom system association), subscriber household scene analysis (people, Duo Geren, persons' composition, home scenarios (party, Have a dinner party, lie fallow, found a view by intelligent video camera head according to predetermined template analysis), the mood of user analyzes (by vocal print+vocal print Emotional characteristics+human face expression+scene), scene historical record analysis (what processing event occurred for which vocal print scene composition, When occur, what interaction user carried out after generation, by historical data analysis, prejudged next walking of user To carry out some pretreated outputs)), intelligence creates custom system big data (User ID, user property, user's interaction note Record, user-association (interaction of the user with user) record), by building AI home intelligent interaction scenarios to the phonetic order of user It is further analyzed processing, promotes the scenario building ability and affective interaction ability of AI voice；Above-mentioned all data exist Yun Shang.

Certainly, those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, It is that related hardware (such as processor, controller etc.) can be instructed to complete by computer program, the program can store In a computer-readable storage medium, which may include the process such as above-mentioned each method embodiment when being executed.Its Described in storage medium can be for memory, magnetic disk, CD etc..

It should be understood that the application of the present invention is not limited to the above for those of ordinary skills can With improvement or transformation based on the above description, all these modifications and variations all should belong to the guarantor of appended claims of the present invention Protect range.

Claims

1. a kind of intelligent interaction processing method based on AI voice, which comprises the steps of:

2. the intelligent interaction processing method according to claim 1 based on AI voice, which is characterized in that the step A is also It include: A1, in advance building AI home intelligent interaction scenarios database corresponding with user behavior data.

3. the intelligent interaction processing method according to claim 1 based on AI voice, which is characterized in that the step B packet It includes:

4. the intelligent interaction processing method according to claim 1 based on AI voice, which is characterized in that in the step B Utilization corresponding with the user behavior data AI home intelligent interaction scenarios database of building in advance, to the phonetic image letter of user Ceasing the step of being analyzed and processed includes:

5. the intelligent interaction processing method according to claim 4 based on AI voice, which is characterized in that the carry out voice The step of semantics recognition and scenario building class of instruction includes:

6. the intelligent interaction processing method according to claim 1 based on AI voice, which is characterized in that the step C packet It includes:

7. a kind of intelligent interaction processing system based on AI voice characterized by comprising processor, memory and communication are total Line；

8. the intelligent interaction processing system according to claim 7 based on AI voice, which is characterized in that the processor is held Following steps are also realized when the capable intelligent interaction processing routine based on AI voice:

9. the intelligent interaction processing system according to claim 7 based on AI voice, which is characterized in that the processor is held Following steps are also realized when the capable intelligent interaction processing routine based on AI voice:

10. a kind of storage medium, which is characterized in that the computer-readable recording medium storage has one or more program, One or more of programs can be executed by one or more processor, to realize such as claim 1-6 any one institute The step in the intelligent interaction processing method based on AI voice stated.