CN111209376A

CN111209376A - AI digital robot operation method

Info

Publication number: CN111209376A
Application number: CN202010038388.XA
Authority: CN
Inventors: 石子星
Original assignee: Chengdu Maye Technology Co Ltd
Current assignee: Chengdu Maye Technology Co Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-05-29

Abstract

The invention discloses an AI digital robot operation method, which comprises the steps of acquiring portrait/voice information; the AI digital robot converts the acquired sound signals and sends the conversion result to a server; the server extracts effective information from the conversion result and analyzes the effective information, if the analysis result does not contain address related information, dialogue chatting is executed, if the analysis result contains the address related information, subway line guidance is executed, the AI digital robot compares the information stored in the database through a background program of the AI digital robot according to the dialogue chatting or/and the return information of the subway line guidance, calls corresponding voice information and action information, and gives the corresponding voice and action information to the digital character to display the corresponding voice and action; and after the voice and the action are displayed, returning to execute information acquisition and judging the environment.

Description

AI digital robot operation method

Technical Field

The invention relates to the field of AI technology and the like, in particular to an AI digital robot operation method.

Background

Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. Since the birth of artificial intelligence, theories and technologies become mature day by day, and application fields are expanded continuously, so that science and technology products brought by the artificial intelligence in the future can be assumed to be 'containers' of human intelligence. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but can think like a human, and can also exceed human intelligence.

Artificial intelligence is a gate-challenging science that people who work must understand computer knowledge, psychology and philosophy. Artificial intelligence is a science that includes a very broad spectrum of fields, such as machine learning, computer vision, etc., and in general, one of the main goals of artificial intelligence research is to make machines competent for complex tasks that usually require human intelligence to complete. But the understanding of this "complex work" is different for different times and for different people.

Disclosure of Invention

The invention aims to provide an AI digital robot operation method, which replaces the working contents of service personnel such as a foreground, a customer service and a consultation department through the AI digital robot, thereby achieving the purpose of effectively saving labor cost.

The invention is realized by the following technical scheme: an AI digital robot operation method comprises the following specific steps:

1) acquiring portrait information and voice information;

2) the AI digital robot converts the acquired sound signals and sends the conversion result to a server;

3) the server extracts effective information from the conversion result and analyzes the effective information, if the analysis result does not contain address related information, the step 4) is executed, and if the analysis result contains the address related information, the step 5) is executed;

4) dialogue chatting: the server is delivered to a language processing platform for semantic analysis, corresponding interaction is carried out according to a semantic analysis result, and interaction information is returned to the server and the AI digital robot;

5) subway line guides: the server is delivered to a language processing platform for semantic analysis, line query and planning are carried out according to the semantic analysis result, and then the query and planning result is returned to the AI digital robot;

6) the AI digital robot compares the information stored in the database through a background program of the AI digital robot according to the returned information in the step 4) or/and the step 5), calls corresponding voice information and action information, and gives the corresponding voice and action information to the digital figure to display the corresponding voice and action;

7) and after the voice and the action are displayed, returning to the step 1) to judge the environment.

In order to further realize the invention, the following arrangement mode is adopted: the step 1) comprises the following steps:

1.1) portrait information acquisition:

1.1.1) after a camera on the AI digital robot captures a portrait picture, transmitting the captured portrait picture to a background program of the AI digital robot;

1.1.2) a background program of the AI digital robot continuously tracks captured human images through a trained human face key point detector and a human face recognition model;

1.1.3) the background program of the AI digital robot compares the relative position of the virtual image in the AI digital robot according to the position of the captured image in reality, and aims the sight of the virtual image at the captured image to realize the effect that the virtual image watches pedestrians;

1.2) sound information collection:

1.2.1) a microphone on the AI digital robot receives external sound;

1.2.2) capturing voice instruction information of a user from external sound;

1.2.3) then the captured voice command information is handed to the background program of the AI digital robot.

In order to further realize the invention, the following arrangement mode is adopted: the portrait picture adopts 1080P resolution, and the background program of the AI digital robot adopts a background program based on dlib.

In order to further realize the invention, the following arrangement mode is adopted: the method comprises the steps of monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function beyond a certain range, judging that active sound exists when excitation confidence reaches a certain value, and then starting capturing of a voice information instruction.

In order to further realize the invention, the following arrangement mode is adopted: and when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent and keeping an unmanned state.

In order to further realize the invention, the following arrangement mode is adopted: the step 2) is specifically as follows: the AI digital robot calls the message flying API to convert the collected sound signals through a background program of the AI digital robot, receives a conversion result and then sends the conversion result to the server through the background program of the AI digital robot.

In order to further realize the invention, the following arrangement mode is adopted: the server extracts effective information from the conversion result and analyzes the effective information, and specifically comprises the following steps: after the server obtains the effective processing result returned by the message flying API, the character string of the voice information input by the user is extracted from the effective processing result, and the character string is delivered to a Baidu UNIT language processing platform to analyze the user semantics.

In order to further realize the invention, the following arrangement mode is adopted: when the Baidu UNIT language processing platform analyzes user semantics, when the analyzed incoming text message contains entries related to address query, such as 'what to go', and the like, the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, the answer about question-answer chatting is returned.

In order to further realize the invention, the following arrangement mode is adopted: the step 4) is specifically as follows:

4.1) analyzing user semantics by a Baidu UNIT language processing platform and judging the user semantics as dialogue chatting;

4.2) after the step 4.1), cutting off the dialogue content into each phrase according to the semantics;

4.3) the phrases obtained in the step 4.2) are submitted to a semantic analysis module trained by a large amount of data for analysis;

4.4) returning the most suitable answer for the current conversation after the step 4.3) and returning the most suitable answer to the server and the AI digital robot.

In order to further realize the invention, the following arrangement mode is adopted: the step 5) is specifically as follows:

5.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as route guidance;

5.2) carrying out sentence breaking on the conversation content, comparing each phrase, extracting the phrase which most accords with the address information, and returning the phrase to the background program;

5.3) the server starts a route query process;

5.4) the server delivers the destination entry returned by the Baidu UNIT language processing platform to a high-resolution map API;

5.5) calculating by the aid of a high-grade map API to obtain at least 1 optimal route, and returning the optimal route to the server;

5.6) after the step 5.5), the server extracts the related information of the subway line from the returned result, and the information is sorted and packaged as the returned result to be sent to the AI digital robot.

Compared with the prior art, the invention has the following advantages and beneficial effects

(1) The invention uses the character image of the drawing sound and the drawing color to carry out service, so that the invention is more intimate and is easy to be close to people in practical application.

(2) In practical application, the voice interaction can be adopted, so that the communication between people and machines is more convenient, and particularly, people with inconvenient actions and older ages can be more conveniently exchanged.

(3) The AI digital robot can automatically call a client and guide the client.

(4) In many scenes, simple posts with high work repetition and long work time are replaced by machines, but the work of some entity service posts such as a foreground, customer service and consulting department service personnel is still completed manually.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

It is worth noting that: in the present application, when it is necessary to apply the known technology or the conventional technology in the field, the applicant may have the case that the known technology or/and the conventional technology is not specifically described in the text, but the technical means is not specifically disclosed in the text, and the present application is considered to be not in compliance with the twenty-sixth clause of the patent law.

Example 1:

the invention designs an AI digital robot operation method, which particularly adopts the following setting mode: comprises the following steps:

1) acquiring portrait information and voice information;

Example 2:

the present embodiment is further optimized based on the above embodiment, and the same parts as those in the foregoing technical solution will not be described herein again, and further to better implement the present invention, the following setting manner is particularly adopted: the step 1) comprises the following steps:

1.1) portrait information acquisition:

1.2) sound information collection:

1.2.1) a microphone on the AI digital robot receives external sound;

1.2.2) capturing voice instruction information of a user from external sound;

Example 3:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the portrait picture adopts 1080P resolution, and the background program of the AI digital robot adopts a background program based on dlib.

Example 4:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the method comprises the steps of monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function beyond a certain range, judging that active sound exists when excitation confidence reaches a certain value, and then starting capturing of a voice information instruction.

Example 5:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: and when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent and keeping an unmanned state.

Example 6:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the step 2) is specifically as follows: the AI digital robot calls the message flying API to convert the collected sound signals through a background program of the AI digital robot, receives a conversion result and then sends the conversion result to the server through the background program of the AI digital robot.

Example 7:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the server extracts effective information from the conversion result and analyzes the effective information, and specifically comprises the following steps: after the server obtains the effective processing result returned by the message flying API, the character string of the voice information input by the user is extracted from the effective processing result, and the character string is delivered to a Baidu UNIT language processing platform to analyze the user semantics.

Example 8:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: when the Baidu UNIT language processing platform analyzes user semantics, when the analyzed incoming text message contains entries related to address query, such as 'how to go' and 'how to go', the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, the Baidu UNIT language processing platform returns an answer related to question-answer chatting.

Example 9:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the step 4) is specifically as follows:

Example 10:

the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the step 5) is specifically as follows:

5.3) the server starts a route query process;

Example 11:

the embodiment is further optimized on the basis of any one of the embodiments, and provides an AI digital robot operation method. The AI digital robot can continuously send and receive heartbeat packets without data in an unmanned state and a working state so as to ensure normal link between a server and the robot. The operation method comprises the following steps:

1) acquiring portrait information:

1.1) after a camera on the AI digital robot captures a 1080P portrait picture, transmitting the captured portrait picture to a dlib-based background program of the AI digital robot;

1.2) continuously tracking the captured portrait by a backstage program based on dlib of the AI digital robot through a trained face key point detector and a face recognition model;

1.3) the background program of the AI digital robot compares the relative position of the virtual image in the AI digital robot according to the position of the captured image in reality, and aims the sight of the virtual image at the captured image to realize the effect that the virtual image watches pedestrians;

2) sound information acquisition:

2.1) a microphone on the AI digital robot receives external sound;

2.2) monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function when the external sound exceeds a certain range, judging that an active sound exists when the excitation confidence coefficient reaches a certain value, and starting to capture voice instruction information of a user from the external sound after a background program of the AI digital robot judges that the active sound exists; wherein, the excitation function, the algorithm and the required standard value can be modified according to the actual situation, and the subsequent optimization is carried out aiming at the feedback in the actual use process;

2.3) then delivering the captured voice instruction information to a dlib-based background program of the AI digital robot; when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent, and keeping an unmanned state; the step 1) and the step 2) can be carried out simultaneously or not simultaneously.

3) The AI digital robot calls a message flight API (application program interface) to convert the acquired sound signals through a background program of the AI digital robot, receives a conversion result, sends the conversion result to a server through the background program of the AI digital robot, and executes the next processing, if the step 4);

4) after the server obtains an effective processing result returned by the message flying API, extracting a character string of voice information input by a user from the effective processing result, and sending the character string to a Baidu UNIT language processing platform to analyze user semantics; when the parsed incoming text message contains entries related to address query, such as 'what to go', and the like, the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, an answer related to chatting of the query and the answer is returned; according to different returned results after the Baidu UNIT language processing platform is processed, if the returned results do not contain address related information, executing the step 5), and if the returned results contain the address related information, executing the step 6);

5) dialogue chatting:

5.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as dialogue chatting;

5.2) carrying out sentence segmentation on the conversation content after the step 5.1), and breaking the conversation content into each phrase according to semantics;

5.3) the phrases obtained in the step 5.2) are submitted to a semantic analysis module trained by a large amount of data for analysis;

5.4) returning the most suitable answer for the current conversation after the step 5.3) and returning the answer to the server and the AI digital robot, and entering the step 7);

6) subway line guides:

6.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as route guidance;

6.2) carrying out sentence segmentation on the conversation content, comparing each phrase, extracting the phrase which best meets the address information, and returning the phrase to a background program based on dlib;

6.3) the server starts a route query process;

6.4) the server delivers the destination entry returned by the Baidu UNIT language processing platform to a high-resolution map API;

6.5) calculating through a high-grade map API to obtain at least 1 optimal route, and returning to the server;

6.6) after the step 6.5), the server extracts the related information of the subway line from the return result, sorts and packages the information as the return result and sends the return result to the AI digital robot, if the step 7);

7) after the AI digital robot receives the results of the step 5) and/or the step 6), comparing the information stored in the database by a dlib-based background program of the AI digital robot according to different results, calling corresponding voice information and action information, delivering the corresponding voice and action information to a digital character to display the corresponding voice and action, and entering a step 8);

8) and after the voice and action display (playing) is finished, returning to the step 1) and the step 2) again to judge the environment.

Example 12:

the embodiment is further optimized on the basis of any one of the above embodiments, and an AI digital robot operation method, as shown in fig. 1, includes the following steps:

and (3) face recognition, wherein a camera arranged on the AI digital robot can capture a picture at the front part of the AI digital robot in a standby state, an object closest to the face is captured from the picture for recognition, and the object similar to the captured object can be recognized as the face when the object is found to exist in a face recognition model through algorithm comparison.

And (3) detecting the person, wherein a camera arranged on the AI digital robot can capture the picture at the front part of the AI digital robot in a standby state, the object closest to the human being is captured from the picture for recognition, and the captured object and the object with the shape similar to the human being are recognized as the existence of the person when the object is found through algorithm comparison in a face recognition model.

And voice input, wherein a microphone arranged on the AI digital robot records external sounds when the system needs the sounds, and the recorded resources are used for subsequent processing.

Voice awakening, dumping the recorded sound resource into 16-bit pcm format, introducing the dumped resource into the awakening processing process, analyzing whether the sound can trigger awakening and returning the processing result for other processes.

And voice conversation, namely recognizing and converting voice resources input by a user in a voice input process into a resource format which can be used for background program processing, sending the resources to a server for processing to obtain a processing result for analysis, and displaying the processing result through voice and action to carry out conversation.

And in an unmanned state, the AI digital robot maintains a standby state under various conditions that no person appears in the camera capturing area, the person appears but the staying time does not meet the condition, the person appears but the awakening voice is not captured, and the like, and the state is the unmanned state.

And if the AI digital robot is a person, the AI digital robot judges whether a user is about to use the AI digital robot from the perspective of images and sounds through the camera and the microphone.

And entering a dialogue, wherein the AI digital robot judges that the user starts to use the AI digital robot after the user passes through the portrait detection and voice awakening stages, and then starts to enter voice interaction.

Recognizing the conversation, wherein the AI digital robot processes the sound resources in the whole process in the voice awakening and voice input processes, and recognizing the conversation if all the sound resources in the interaction process meet the conditions capable of being converted into processable information.

And the AI digital robot adopts a TCP/IP communication protocol to ensure that the information of each round of conversation is not lost and is correctly transmitted between the AI digital robot and the server.

UNIT semantic recognition, namely performing semantic analysis by adopting a Baidu UNIT language processing platform; with the development of AI technologies and concepts, many products desire to adopt an interactive man-machine interaction mode. However, the development of dialog systems (dialog skills, dialog robots) is a difficult task for most developers, with high technical and data requirements. Therefore, hundreds of natural language understanding and Interaction technologies accumulated for many years are opened, and an intelligent dialog customization and service platform unit (interpretation and Interaction technology) is introduced, so that leading technical capabilities in the industry are output to vast developers, and the research and development threshold of a dialog system is lowered.

Whether the sentence is navigation or not, an intelligent dialogue customization and service platform (Baidu language processing platform) UNIT (interpretation and Interaction technology) can analyze each input sentence, compare the analyzed sentence with a designed intention to identify the purpose of the user, and start a corresponding processing process after the server acquires the corresponding purpose.

And acquiring the answer, analyzing and processing the obtained data by the server, and finishing the answer and sending the finished answer to the AI digital robot.

Extracting a destination entry, decomposing the sentence by the platform in the process of processing the sentence by the Baidu intelligent dialogue customizing and service platform UNIT (interpreting and analyzing technology), and marking each word according to the meaning of the word in the sentence, so that the word containing the destination can be extracted by the mark and returned to the server as a part of the processing result.

And (3) performing Goodpasts API processing, wherein the server sends words extracted by the marks to a corresponding Goodpasts API for processing after judging that the navigation process is performed, and the processing process comprises the steps of converting address information in a character format into longitude and latitude, planning a route through the longitude and latitude and drawing a navigation route map.

Extracting the most reasonable answer, wherein in the process of the unit (explicit Interaction technology) of the intelligent dialog customization and service platform, a plurality of answers are generated according to different preset conditions, most answers in the normal use environment comprise a value for measuring the suitability degree of the answer to the question, and the most reasonable answer can be extracted according to the value and returned to the server.

The expression of the virtual character displayed in the ARkit and AI digital robot needs to be rich enough to meet different requirements of users, the expression of the virtual character can be made manually by an animator, and the made expression is detailed but takes a long time, so that the ARkit is used for capturing the face of the model and generating corresponding expression data, and the animator can modify the captured facial expression data according to the requirements to meet more requirements.

And the AI digital robots carry out different fine adjustment according to different requirements of actual application environments, register and record the data after fine adjustment of each AI digital robot, store the data in the database and directly call the data subsequently.

And calling the expressions/actions, and according to different results returned by the server, adopting different expressions and actions by the AI digital robot to match with voice playing, so that the voice interaction process is more natural.

And (5) speaking the reply, and turning to the unmanned state.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. An AI digital robot operation method is characterized in that: comprises the following steps:

acquiring portrait information and voice information;

2. The AI digital robot operating method according to claim 1, characterized in that: the step 1) comprises the following steps:

1.1) portrait information acquisition:

1.2) sound information collection:

1.2.1) a microphone on the AI digital robot receives external sound;

1.2.2) capturing voice instruction information of a user from external sound;

3. The AI digital robot operating method according to claim 2, characterized in that: the portrait picture adopts 1080P resolution, and the background program of the AI digital robot adopts a background program based on dlib.

4. The AI digital robot operating method according to claim 2 or 3, characterized in that: the method comprises the steps of monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function beyond a certain range, judging that active sound exists when excitation confidence reaches a certain value, and then starting capturing of a voice information instruction.

5. The AI digital robot operating method according to claim 4, wherein: and when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent and keeping an unmanned state.

6. The AI digital robot operating method according to any one of claims 1 to 3 or 5, wherein: the step 2) is specifically as follows: the AI digital robot calls the message flying API to convert the collected sound signals through a background program of the AI digital robot, receives a conversion result and then sends the conversion result to the server through the background program of the AI digital robot.

7. The AI digital robot operating method according to any one of claims 1 to 3 or 5, wherein: the server extracts effective information from the conversion result and analyzes the effective information, and specifically comprises the following steps: after the server obtains the effective processing result returned by the message flying API, the character string of the voice information input by the user is extracted from the effective processing result, and the character string is delivered to a Baidu UNIT language processing platform to analyze the user semantics.

8. The AI digital robot operating method according to claim 7, wherein: when the Baidu UNIT language processing platform analyzes user semantics, when the analyzed incoming text message contains similar terms related to the address query, the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, the Baidu UNIT language processing platform returns an answer related to the question-answer chatting.

9. The AI digital robot operation method according to claims 1-3, 5 and 8, wherein: the step 4) is specifically as follows:

10. The AI digital robot operating method according to any one of claims 1 to 3, 5, and 8, wherein: the step 5) is specifically as follows:

5.3) the server starts a route query process;