CN110113646B

CN110113646B - AI voice-based intelligent interactive processing method, system and storage medium

Info

Publication number: CN110113646B
Application number: CN201910239885.3A
Authority: CN
Inventors: 周胜杰
Original assignee: Shenzhen Konka Electronic Technology Co Ltd
Current assignee: Shenzhen Konka Electronic Technology Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2021-09-21
Anticipated expiration: 2039-03-27
Also published as: CN110113646A

Abstract

The invention discloses an intelligent interactive processing method, a system and a storage medium based on AI voice, wherein the method comprises the following steps: the method comprises the following steps that an intelligent camera with far-field voice module voiceprint recognition is connected and arranged on the intelligent television in advance and used for interacting with the intelligent television through the far-field voice module of the intelligent camera; the intelligent camera shoots and acquires the voice image information of the user in real time, and analyzes and processes the voice image information of the user by utilizing an AI family intelligent interaction scene database which is constructed in advance and corresponds to the user behavior data; and the intelligent television pre-judges the behavior habits of the user and carries out corresponding interactive response according to the analysis and processing result. The invention provides an AI voice-based intelligent interaction processing method and system convenient for intelligent recognition and interactive recommendation, which enable a smart television to have a better intelligent interaction function and are convenient for a user to use.

Description

AI voice-based intelligent interactive processing method, system and storage medium

Technical Field

The invention relates to the technical field of intelligent home furnishing, in particular to an intelligent interactive processing method and system based on AI voice and a storage medium.

Background

With the development of scientific technology, intelligent consumer electronics are becoming popular, and voiceprint recognition, which is one of AI speech technologies, is a leading technology at present, and can identify the voice attributes of speakers (gender, age, and voice affiliation of different speakers (which user says a sentence can be distinguished through voiceprint)).

Current voiceprint recognition applications remain in the infancy and are essentially still at the level of being able to identify some underlying voiceprint attributes (e.g., male, female, old, young, voiceprint affiliation (who's voiceprint)), lacking AI home scenario application level development based on voiceprint recognition technology.

The smart television in the prior art also has a better intelligent interaction function, and is sometimes inconvenient for users to use

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the defects of the prior art, the invention aims to provide an intelligent interactive processing method, system and storage medium based on AI voice, and provides an intelligent interactive processing method and system based on AI voice, which are convenient for intelligent recognition and interactive recommendation, so that a better intelligent interactive function is added to the intelligent television, and the use by a user is facilitated.

In order to achieve the purpose, the invention adopts the following technical scheme:

an AI voice-based intelligent interactive processing method comprises the following steps:

A. the method comprises the following steps that an intelligent camera with far-field voice module voiceprint recognition is connected and arranged on the intelligent television in advance and used for interacting with the intelligent television through the far-field voice module of the intelligent camera;

B. the intelligent camera shoots and acquires the voice image information of the user in real time, and analyzes and processes the voice image information of the user by utilizing an AI family intelligent interaction scene database which is constructed in advance and corresponds to the user behavior data;

C. and the intelligent television pre-judges the behavior habits of the user and carries out corresponding interactive response according to the analysis and processing result.

The intelligent interactive processing method based on AI voice, wherein the step A further comprises: and A1, constructing an AI family intelligent interaction scene database corresponding to the user behavior data in advance.

The intelligent interactive processing method based on AI voice, wherein the step B comprises:

when the intelligent television is started, the intelligent camera is in a working state;

the intelligent camera shoots and acquires voice image information of a user in real time, intercepts the speaking voice of the user and records the speaking voice of the user for AI family intelligent interactive processing;

the AI family intelligent interaction processing utilizes an AI family intelligent interaction scene database which is constructed in advance and corresponds to the user behavior data to analyze and process the voice image information of the user;

and prejudging according to the behavior habits of the users, and continuously learning and correcting according to the interactive behaviors of the users.

The intelligent interactive processing method based on AI voice, wherein the step B of analyzing and processing the voice image information of the user by using an AI home intelligent interactive scene database which is pre-constructed and corresponds to the user behavior data comprises the steps of:

performing semantic recognition and scene construction of voice commands;

performing voiceprint attribute analysis, voiceprint emotion characteristic analysis, face recognition analysis, user family scene analysis, user emotion analysis and scene history analysis of the current user;

intelligently creating user system big data, and analyzing and processing the voice instruction of the user by constructing an AI family intelligent interaction scene.

The intelligent interactive processing method based on AI voice, wherein the steps of performing semantic recognition and scene construction of voice instructions comprise:

performing semantic recognition of voice instruction decomposition: analyzing whether the speaking of the user belongs to an instruction class or a scene construction class;

the step of performing voiceprint attribute analysis of the current user comprises:

performing voiceprint attribute identification of the current user: which voiceprint users have appeared at the same time;

the voiceprint emotional characteristic analysis comprises the following steps: what the scene of the voiceprint appears, what the voiceprint scene of each person is, what the comprehensive scene is;

the face recognition analysis comprises: who and who appeared at the same time, what the expression was, what the time was;

the user family scene analysis is analyzed according to a preset template through the view finding of the intelligent camera;

the emotion analysis of the user is carried out through voiceprints, voiceprint emotion characteristics, human face expressions and scenes;

the scene history analysis includes: and what processing event happens in what voiceprint scene combination, what interaction happens when the voiceprint scene combination happens and what interaction is carried out by the user after the voiceprint scene combination happens are used for predicting the next action of the user through historical data analysis and outputting some preprocessing.

The intelligent interactive processing method based on AI voice, wherein the step C comprises:

the smart television creates an attribute record of the user according to the analysis processing result, takes the ID, the voiceprint attribute and the face attribute of the user as the identification value of the user, and locates the user according to any one of the three attributes;

when an unfamiliar voiceprint or face is detected, an attribute record of a user is established by default, and the voiceprint attribute of the user corresponding to the voiceprint is increased through subsequent interactive intelligence; if the user firstly records the user ID with the voiceprint attribute increased, the face attribute of the user is increased through subsequent interactive intelligence;

after a successful user is created, automatically creating a big data table based on the user ID, wherein the data table records various behavior records, interaction records and interaction records of the user;

pre-judging according to the behavior habits of the users, and continuously learning and correcting according to the interaction behaviors of the users;

and after carrying out AI family intelligent interactive decomposition on the voice image information of the user, obtaining the pre-execution operation of the user, or recommending the best interactive scene of the user and carrying out corresponding prompt.

An intelligent interactive processing system based on AI voice, comprising: a processor, a memory, and a communication bus;

the memory stores an AI voice based intelligent interactive processing program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

when the processor executes the AI voice-based intelligent interactive processing program, the following steps are realized:

The intelligent interactive processing system based on the AI voice, wherein the processor executes the intelligent interactive processing program based on the AI voice, and further implements the following steps:

a1, an AI family intelligent interaction scene database corresponding to the user behavior data is pre-constructed;

performing semantic recognition and scene construction of voice commands;

intelligently creating user system big data, and analyzing and processing a voice instruction of a user by constructing an AI home intelligent interaction scene;

the scene history analysis includes: which voiceprint scene combinations have occurred with what processing event, when, what interaction the user has performed after the occurrence, for predicting the next action of the user through historical data analysis, and outputting some preprocessing;

A storage medium in which one or more programs are stored, the one or more programs being executable by one or more processors to implement the steps of any one of the AI speech-based intelligent interactive processing methods.

Compared with the prior art, the intelligent interactive processing method, the system and the storage medium based on the AI voice provided by the invention have the advantages that the intelligent camera with the far-field voice module voiceprint recognition is carried on the intelligent television, the user interacts with the television through the far-field voice of the intelligent camera, each sentence of voice interaction of the user is analyzed and processed through the AI family intelligent interactive system block, and the analyzed and processed contents comprise: semantic recognition of voice commands (voice command decomposition, which decomposes commands into a definite command class and a scene construction class (new field classification can be added according to the improvement of an analysis system)), voiceprint attributes of current users (voiceprint recognition (gender, age), voiceprint emotional characteristics (excitement, worry, calm, and the like), face recognition (user attributes, expression attributes), user system association), user family scene analysis (one person, multiple persons, personnel combination, family scenes (party, dinner, leisure, and the like, which are analyzed according to a preset template through intelligent camera framing), emotion analysis of users (which is analyzed through voiceprint + voiceprint emotional characteristics + face expression + scene), scene history analysis (which voiceprint scene combinations have processed events, what happens, what interaction the users have done after the occurrence), through historical data analysis, the next step of behavior of the user is judged in advance, and some preprocessing outputs are carried out)), user system big data (user ID, user attribute, user interaction record and user association (interaction between the user and the user) record) are created intelligently, and the voice instruction of the user is further analyzed and processed by constructing an AI family intelligent interaction scene, so that the scene construction capacity and the emotion interaction capacity of AI voice are improved; all the data mentioned above are stored on the cloud.

The invention provides a deep emotion interaction experience for smart home and AI voice intelligent interaction, improves the experience and interest of products, improves the intelligent experience of household smart home with a television as the center, and provides a companion home experience. The invention increases better intelligent interaction function for the intelligent television and is convenient for users to use.

Drawings

Fig. 1 is a flowchart of an AI speech-based intelligent interactive processing method according to the present invention.

Fig. 2 is a functional block diagram of a mobile terminal according to a preferred embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the intelligent interactive processing method based on AI speech provided by the present invention includes the following steps:

s100, connecting and setting an intelligent camera with far-field voice module voiceprint recognition on the intelligent television in advance, and interacting with the intelligent television through the far-field voice module of the intelligent camera;

in the embodiment of the invention, the intelligent camera with the far-field voice module voiceprint recognition function is required to be connected and arranged on the intelligent television in advance and is used for interacting with the intelligent television through the far-field voice module of the intelligent camera. The intelligent television carries an intelligent camera with a far-field voice module voiceprint recognition function, a user interacts with the television through far-field voice of the intelligent camera, and each sentence of voice interaction of the user is analyzed and processed through an AI (artificial intelligence) family intelligent interaction system block.

The step S100 further includes: and A1, constructing an AI family intelligent interaction scene database corresponding to the user behavior data in advance. For example, when the user speaks the behavior data of 'what is fun' in voice, the user is correspondingly recommended to 'games or travel items that the user needs to play frequently'.

S200, the intelligent camera shoots and acquires voice image information of the user in real time, and the voice image information of the user is analyzed and processed by utilizing an AI family intelligent interaction scene database which is constructed in advance and corresponds to the user behavior data.

The step S200 specifically includes:

The step B of analyzing and processing the voice image information of the user by using the pre-constructed AI home intelligent interactive scene database corresponding to the user behavior data includes:

performing semantic recognition and scene construction of voice commands;

The steps of performing semantic recognition and scene construction of the voice command comprise:

In this step S200, the user interacts with the television through the far-field speech of the intelligent camera, each sentence of speech interaction of the user is analyzed and processed through the AI home intelligent interaction system block, and the analyzed and processed contents include: semantic recognition of voice commands (voice command decomposition, which decomposes commands into a definite command class and a scene construction class (new field classification can be added according to the improvement of an analysis system)), voiceprint attributes of current users (voiceprint recognition (gender, age), voiceprint emotional characteristics (excitement, worry, calm, and the like), face recognition (user attributes, expression attributes), user system association), user family scene analysis (one person, multiple persons, personnel combination, family scenes (party, dinner, leisure, and the like, which are analyzed according to a preset template through intelligent camera framing), emotion analysis of users (which is analyzed through voiceprint + voiceprint emotional characteristics + face expression + scene), scene history analysis (which voiceprint scene combinations have processed events, what happens, what interaction the users have done after the occurrence), through historical data analysis, the next step behaviors of the user are judged in advance, and some preprocessing outputs are carried out)), user system big data (user ID, user attributes, user interaction records and user association (interaction between the user and the user) records) are created intelligently, and the voice instruction of the user is further analyzed and processed through constructing an AI family intelligent interaction scene, so that the scene construction capacity and the emotion interaction capacity of AI voice are improved. All the data mentioned above are stored on the cloud.

S300, the smart television pre-judges the behavior habits of the user and carries out corresponding interactive response according to the analysis processing result.

The step S300 specifically includes:

For example: a user A + a user B send an instruction to a camera [ we do today ], an AI family intelligent interaction system analyzes whether before the A/B user has a television at the same time, if the A/B user has the television at the same time, interactive memories of things that they have done before are given, and present opinions and recommendations are given according to present family scenes, the opinions and the recommendations are diverse, application data (such as watching television, playing games and cooking) in the television, shopping (new style recommendations, shopping discounts), travel (travel recommendations) and other operation data can be given, the data are pre-judged according to the behaviors of the users, and the data are continuously learned and corrected according to the interactive behaviors of the users, so that the AI family intelligent interaction system is close to the habits and obtained by the users.

The invention is described in further detail below by way of a specific application example:

s11, the smart television is provided with an intelligent camera with a far-field voice module for voiceprint recognition.

S12, when the intelligent television is turned on, the intelligent camera is in a working state.

And S13, the intelligent camera monitors the user 'S speaking and transmits the user' S speaking record to the AI home intelligent interactive system.

S14, the AI home intelligent interactive system analyzes and processes the speaking of the user; the content of the analysis process includes: semantic recognition of voice commands (voice command decomposition): the method comprises the steps of analyzing whether the speech of a user belongs to an instruction class (the instruction class belongs to the fact that the speech intention of the user is very clear, and the instruction can be executed without scene analysis, such as how I want to watch Liu De movie, listen to a beautiful song, eat red pork and the like) or a scene construction class (such as how hot weather exists, what you do at present, what you feel chatty, what you eat at noon and the like).

Voiceprint attributes of the current user (voiceprint recognition (gender, age, etc.): which voiceprint users have been simultaneously present

Vocal print emotional characteristics (excited, sad, flat, etc.): what the scene of voiceprint appears, what the voiceprint scene of each person is, what the comprehensive scene is (the scene is analyzed by voiceprint (default definition: excited, warm, happy, hot, etc.))

Face recognition (user attributes, expression attributes), user system association): who and who appeared at the same time, what the expression was, and what the time was.

User home scene analysis (one person, multiple persons, combination of persons, home scene (party, dinner party, leisure, etc. according to the predetermined template analysis by intelligent camera view)

Emotional analysis of the user (by voiceprint + voiceprint emotional characteristics + facial expression + scene)

Scene history analysis (which voiceprint scene combinations have occurred, what processing events have occurred, what time has occurred, what interaction has been performed by the user after the occurrence, prediction of the user's next-step behavior through historical data analysis, and output of some pre-processing)

The voice instruction of the user is further analyzed and processed by constructing an AI family intelligent interaction scene, and the scene construction capability and the emotion interaction capability of AI voice are improved.

And S15, when the intelligent camera detects the voice data of the user and transmits the voice data to the AI home intelligent interactive system, the AI home intelligent interactive system creates an attribute record of the user, takes the ID, the voiceprint attribute and the face attribute of the user as the identification values of the user, and can locate the user through any one of the three attributes.

And S16, when the AI family intelligent interactive system detects an unfamiliar voiceprint or human face, creating an attribute record of the user by default, and increasing the voiceprint attribute of the user corresponding to the voiceprint through subsequent interactive intelligence. And conversely, if the user firstly records the user ID with the voiceprint attribute increased, the face attribute of the user is intelligently increased through subsequent interaction.

And S17, after the successful user is created, automatically creating a big data table based on the user ID, wherein the data table records various behavior records, interaction records and the like of the user (including instruction history sent by the user and record of instruction execution, subsequent interaction performed by the user on the instruction and the like, and the basic data of the user is listed in 6, 7, 8, 9, 10 and 11 but not limited to the listed data records).

And S18, decomposing the voice sent by the user through the AI family intelligent interactive system to obtain the pre-execution operation of the user or recommend the best interactive scene of the user.

Such as: a user A + a user B send an instruction to a camera [ we do today ], an AI family intelligent interaction system analyzes whether before the A/B user has a television at the same time, if the A/B user has the television at the same time, interactive memories of things that they have done before are given, and present opinions and recommendations are given according to present family scenes, the opinions and the recommendations are diverse, application data (such as watching television, playing games and cooking) in the television, shopping (new style recommendations, shopping discounts), travel (travel recommendations) and other operation data can be given, the data are pre-judged according to the behaviors of the users, and the data are continuously learned and corrected according to the interactive behaviors of the users, so that the AI family intelligent interaction system is close to the habits and obtained by the users.

Therefore, the invention provides an intelligent interactive processing method based on AI voice, and provides an intelligent interactive processing method and system based on AI voice, which are convenient for intelligent recognition and interactive recommendation, so that a better intelligent interactive function is added to the intelligent television, and the intelligent television is convenient for users to use.

As shown in fig. 2, based on the above intelligent interactive processing method based on AI voice, the present invention further provides an intelligent interactive processing system based on AI voice, where the intelligent interactive processing system based on AI voice may be an intelligent television, a desktop computer, a notebook computer, a palm computer, and an intelligent device with an intelligent sound. The AI-based intelligent interactive processing system comprises a processor 10, a memory 20 and a display screen 30, wherein the processor 10 is connected with the memory 20 through a communication bus 50, and the display screen 30 is connected with the processor 10 through the communication bus 50. FIG. 2 shows only some of the components of the AI voice-based intelligent interactive processing system, but it is to be understood that not all of the shown components are required and that more or fewer components can alternatively be implemented.

The storage 20 may be an internal storage unit of the AI voice-based intelligent interactive processing system in some embodiments, for example, a memory of the AI voice-based intelligent interactive processing system. The memory 20 may also be an external storage device of the AI voice-based intelligent interactive processing system in other embodiments, such as a plug-in usb disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the AI voice-based intelligent interactive processing system. Further, the memory 20 may also include both an internal storage unit and an external storage device of the AI voice based intelligent interactive processing system. The memory 20 is used for storing application software installed in the AI voice-based intelligent interactive processing system and various types of data, such as program codes for installing the AI voice-based intelligent interactive processing system. The memory 20 may also be used to temporarily store data that has been output or is to be output. In an embodiment, the memory 20 stores an AI voice-based intelligent interactive processing method program 40, and the AI voice-based intelligent interactive processing method program 40 can be executed by the processor 10, so as to implement the AI voice-based intelligent interactive processing method in the present application.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor, a mobile phone baseband processor or other data Processing chip, and is configured to run program codes stored in the memory 20 or process data, such as executing the AI voice-based intelligent interactive Processing method.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display screen 30 is used for displaying information in the AI voice-based intelligent interactive processing system and for displaying a visual user interface.

In one embodiment, when the processor 10 executes the AI voice-based intelligent interactive processing method program 40 in the memory 20, the following steps are implemented:

C. and the smart television pre-judges the behavior habits of the user and performs corresponding interactive response according to the analysis processing result, which is specifically described above.

When the processor executes the AI-voice-based intelligent interactive processing program, the following steps are also realized:

performing semantic recognition and scene construction of voice commands;

after performing AI home intelligent interactive decomposition on the voice image information of the user, obtaining a pre-execution operation of the user, or recommending the best interactive scene of the user and performing corresponding prompting, as described above.

Based on the foregoing embodiments, the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs are executable by one or more processors to implement the steps in the AI voice-based intelligent interactive processing method according to any item above, specifically as described above.

In summary, according to the intelligent interactive processing method, system and storage medium based on AI voice provided by the present invention, the intelligent camera with far-field voice module voiceprint recognition is mounted on the intelligent television, the user interacts with the television through the far-field voice of the intelligent camera, each sentence of voice interaction of the user is analyzed and processed through the AI family intelligent interactive system block, and the analyzed and processed contents include: semantic recognition of voice commands (voice command decomposition, which decomposes commands into a definite command class and a scene construction class (new field classification can be added according to the improvement of an analysis system)), voiceprint attributes of current users (voiceprint recognition (gender, age), voiceprint emotional characteristics (excitement, worry, calm, and the like), face recognition (user attributes, expression attributes), user system association), user family scene analysis (one person, multiple persons, personnel combination, family scenes (party, dinner, leisure, and the like, which are analyzed according to a preset template through intelligent camera framing), emotion analysis of users (which is analyzed through voiceprint + voiceprint emotional characteristics + face expression + scene), scene history analysis (which voiceprint scene combinations have processed events, what happens, what interaction the users have done after the occurrence), through historical data analysis, the next step of behavior of the user is judged in advance, and some preprocessing outputs are carried out)), user system big data (user ID, user attribute, user interaction record and user association (interaction between the user and the user) record) are created intelligently, and the voice instruction of the user is further analyzed and processed by constructing an AI family intelligent interaction scene, so that the scene construction capacity and the emotion interaction capacity of AI voice are improved; all the data mentioned above are stored on the cloud.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program instructing relevant hardware (such as a processor, a controller, etc.), and the program may be stored in a computer readable storage medium, and when executed, the program may include the processes of the above method embodiments. The storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. An intelligent interactive processing method based on AI voice is characterized by comprising the following steps:

the step B comprises the following steps:

the step B of analyzing and processing the voice image information of the user by utilizing an AI family intelligent interactive scene database which is pre-constructed and corresponds to the user behavior data comprises the following steps:

performing semantic recognition and scene construction of voice commands;

the step of performing semantic recognition and scene construction of the voice instruction comprises the following steps:

C. the intelligent television pre-judges the behavior habits of the user and carries out corresponding interactive response according to the analysis and processing result;

the step C comprises the following steps:

2. The AI voice-based intelligent interactive processing method according to claim 1, wherein the step a further comprises: and A1, constructing an AI family intelligent interaction scene database corresponding to the user behavior data in advance.

3. An intelligent interactive processing system based on AI speech, comprising:

a processor, a memory, and a communication bus;

when the processor executes the AI voice-based intelligent interaction processing program, the following steps are also realized:

performing semantic recognition and scene construction of voice commands;

4. A storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps in the AI speech-based intelligent interactive processing method according to any one of claims 1-2.