CN111209376A - AI digital robot operation method - Google Patents

AI digital robot operation method Download PDF

Info

Publication number
CN111209376A
CN111209376A CN202010038388.XA CN202010038388A CN111209376A CN 111209376 A CN111209376 A CN 111209376A CN 202010038388 A CN202010038388 A CN 202010038388A CN 111209376 A CN111209376 A CN 111209376A
Authority
CN
China
Prior art keywords
information
digital robot
digital
server
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010038388.XA
Other languages
Chinese (zh)
Inventor
石子星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Maye Technology Co Ltd
Original Assignee
Chengdu Maye Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Maye Technology Co Ltd filed Critical Chengdu Maye Technology Co Ltd
Priority to CN202010038388.XA priority Critical patent/CN111209376A/en
Publication of CN111209376A publication Critical patent/CN111209376A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Robotics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Remote Sensing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses an AI digital robot operation method, which comprises the steps of acquiring portrait/voice information; the AI digital robot converts the acquired sound signals and sends the conversion result to a server; the server extracts effective information from the conversion result and analyzes the effective information, if the analysis result does not contain address related information, dialogue chatting is executed, if the analysis result contains the address related information, subway line guidance is executed, the AI digital robot compares the information stored in the database through a background program of the AI digital robot according to the dialogue chatting or/and the return information of the subway line guidance, calls corresponding voice information and action information, and gives the corresponding voice and action information to the digital character to display the corresponding voice and action; and after the voice and the action are displayed, returning to execute information acquisition and judging the environment.

Description

AI digital robot operation method
Technical Field
The invention relates to the field of AI technology and the like, in particular to an AI digital robot operation method.
Background
Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. Since the birth of artificial intelligence, theories and technologies become mature day by day, and application fields are expanded continuously, so that science and technology products brought by the artificial intelligence in the future can be assumed to be 'containers' of human intelligence. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but can think like a human, and can also exceed human intelligence.
Artificial intelligence is a gate-challenging science that people who work must understand computer knowledge, psychology and philosophy. Artificial intelligence is a science that includes a very broad spectrum of fields, such as machine learning, computer vision, etc., and in general, one of the main goals of artificial intelligence research is to make machines competent for complex tasks that usually require human intelligence to complete. But the understanding of this "complex work" is different for different times and for different people.
Disclosure of Invention
The invention aims to provide an AI digital robot operation method, which replaces the working contents of service personnel such as a foreground, a customer service and a consultation department through the AI digital robot, thereby achieving the purpose of effectively saving labor cost.
The invention is realized by the following technical scheme: an AI digital robot operation method comprises the following specific steps:
1) acquiring portrait information and voice information;
2) the AI digital robot converts the acquired sound signals and sends the conversion result to a server;
3) the server extracts effective information from the conversion result and analyzes the effective information, if the analysis result does not contain address related information, the step 4) is executed, and if the analysis result contains the address related information, the step 5) is executed;
4) dialogue chatting: the server is delivered to a language processing platform for semantic analysis, corresponding interaction is carried out according to a semantic analysis result, and interaction information is returned to the server and the AI digital robot;
5) subway line guides: the server is delivered to a language processing platform for semantic analysis, line query and planning are carried out according to the semantic analysis result, and then the query and planning result is returned to the AI digital robot;
6) the AI digital robot compares the information stored in the database through a background program of the AI digital robot according to the returned information in the step 4) or/and the step 5), calls corresponding voice information and action information, and gives the corresponding voice and action information to the digital figure to display the corresponding voice and action;
7) and after the voice and the action are displayed, returning to the step 1) to judge the environment.
In order to further realize the invention, the following arrangement mode is adopted: the step 1) comprises the following steps:
1.1) portrait information acquisition:
1.1.1) after a camera on the AI digital robot captures a portrait picture, transmitting the captured portrait picture to a background program of the AI digital robot;
1.1.2) a background program of the AI digital robot continuously tracks captured human images through a trained human face key point detector and a human face recognition model;
1.1.3) the background program of the AI digital robot compares the relative position of the virtual image in the AI digital robot according to the position of the captured image in reality, and aims the sight of the virtual image at the captured image to realize the effect that the virtual image watches pedestrians;
1.2) sound information collection:
1.2.1) a microphone on the AI digital robot receives external sound;
1.2.2) capturing voice instruction information of a user from external sound;
1.2.3) then the captured voice command information is handed to the background program of the AI digital robot.
In order to further realize the invention, the following arrangement mode is adopted: the portrait picture adopts 1080P resolution, and the background program of the AI digital robot adopts a background program based on dlib.
In order to further realize the invention, the following arrangement mode is adopted: the method comprises the steps of monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function beyond a certain range, judging that active sound exists when excitation confidence reaches a certain value, and then starting capturing of a voice information instruction.
In order to further realize the invention, the following arrangement mode is adopted: and when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent and keeping an unmanned state.
In order to further realize the invention, the following arrangement mode is adopted: the step 2) is specifically as follows: the AI digital robot calls the message flying API to convert the collected sound signals through a background program of the AI digital robot, receives a conversion result and then sends the conversion result to the server through the background program of the AI digital robot.
In order to further realize the invention, the following arrangement mode is adopted: the server extracts effective information from the conversion result and analyzes the effective information, and specifically comprises the following steps: after the server obtains the effective processing result returned by the message flying API, the character string of the voice information input by the user is extracted from the effective processing result, and the character string is delivered to a Baidu UNIT language processing platform to analyze the user semantics.
In order to further realize the invention, the following arrangement mode is adopted: when the Baidu UNIT language processing platform analyzes user semantics, when the analyzed incoming text message contains entries related to address query, such as 'what to go', and the like, the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, the answer about question-answer chatting is returned.
In order to further realize the invention, the following arrangement mode is adopted: the step 4) is specifically as follows:
4.1) analyzing user semantics by a Baidu UNIT language processing platform and judging the user semantics as dialogue chatting;
4.2) after the step 4.1), cutting off the dialogue content into each phrase according to the semantics;
4.3) the phrases obtained in the step 4.2) are submitted to a semantic analysis module trained by a large amount of data for analysis;
4.4) returning the most suitable answer for the current conversation after the step 4.3) and returning the most suitable answer to the server and the AI digital robot.
In order to further realize the invention, the following arrangement mode is adopted: the step 5) is specifically as follows:
5.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as route guidance;
5.2) carrying out sentence breaking on the conversation content, comparing each phrase, extracting the phrase which most accords with the address information, and returning the phrase to the background program;
5.3) the server starts a route query process;
5.4) the server delivers the destination entry returned by the Baidu UNIT language processing platform to a high-resolution map API;
5.5) calculating by the aid of a high-grade map API to obtain at least 1 optimal route, and returning the optimal route to the server;
5.6) after the step 5.5), the server extracts the related information of the subway line from the returned result, and the information is sorted and packaged as the returned result to be sent to the AI digital robot.
Compared with the prior art, the invention has the following advantages and beneficial effects
(1) The invention uses the character image of the drawing sound and the drawing color to carry out service, so that the invention is more intimate and is easy to be close to people in practical application.
(2) In practical application, the voice interaction can be adopted, so that the communication between people and machines is more convenient, and particularly, people with inconvenient actions and older ages can be more conveniently exchanged.
(3) The AI digital robot can automatically call a client and guide the client.
(4) In many scenes, simple posts with high work repetition and long work time are replaced by machines, but the work of some entity service posts such as a foreground, customer service and consulting department service personnel is still completed manually.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It is worth noting that: in the present application, when it is necessary to apply the known technology or the conventional technology in the field, the applicant may have the case that the known technology or/and the conventional technology is not specifically described in the text, but the technical means is not specifically disclosed in the text, and the present application is considered to be not in compliance with the twenty-sixth clause of the patent law.
Example 1:
the invention designs an AI digital robot operation method, which particularly adopts the following setting mode: comprises the following steps:
1) acquiring portrait information and voice information;
2) the AI digital robot converts the acquired sound signals and sends the conversion result to a server;
3) the server extracts effective information from the conversion result and analyzes the effective information, if the analysis result does not contain address related information, the step 4) is executed, and if the analysis result contains the address related information, the step 5) is executed;
4) dialogue chatting: the server is delivered to a language processing platform for semantic analysis, corresponding interaction is carried out according to a semantic analysis result, and interaction information is returned to the server and the AI digital robot;
5) subway line guides: the server is delivered to a language processing platform for semantic analysis, line query and planning are carried out according to the semantic analysis result, and then the query and planning result is returned to the AI digital robot;
6) the AI digital robot compares the information stored in the database through a background program of the AI digital robot according to the returned information in the step 4) or/and the step 5), calls corresponding voice information and action information, and gives the corresponding voice and action information to the digital figure to display the corresponding voice and action;
7) and after the voice and the action are displayed, returning to the step 1) to judge the environment.
Example 2:
the present embodiment is further optimized based on the above embodiment, and the same parts as those in the foregoing technical solution will not be described herein again, and further to better implement the present invention, the following setting manner is particularly adopted: the step 1) comprises the following steps:
1.1) portrait information acquisition:
1.1.1) after a camera on the AI digital robot captures a portrait picture, transmitting the captured portrait picture to a background program of the AI digital robot;
1.1.2) a background program of the AI digital robot continuously tracks captured human images through a trained human face key point detector and a human face recognition model;
1.1.3) the background program of the AI digital robot compares the relative position of the virtual image in the AI digital robot according to the position of the captured image in reality, and aims the sight of the virtual image at the captured image to realize the effect that the virtual image watches pedestrians;
1.2) sound information collection:
1.2.1) a microphone on the AI digital robot receives external sound;
1.2.2) capturing voice instruction information of a user from external sound;
1.2.3) then the captured voice command information is handed to the background program of the AI digital robot.
Example 3:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the portrait picture adopts 1080P resolution, and the background program of the AI digital robot adopts a background program based on dlib.
Example 4:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the method comprises the steps of monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function beyond a certain range, judging that active sound exists when excitation confidence reaches a certain value, and then starting capturing of a voice information instruction.
Example 5:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: and when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent and keeping an unmanned state.
Example 6:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the step 2) is specifically as follows: the AI digital robot calls the message flying API to convert the collected sound signals through a background program of the AI digital robot, receives a conversion result and then sends the conversion result to the server through the background program of the AI digital robot.
Example 7:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the server extracts effective information from the conversion result and analyzes the effective information, and specifically comprises the following steps: after the server obtains the effective processing result returned by the message flying API, the character string of the voice information input by the user is extracted from the effective processing result, and the character string is delivered to a Baidu UNIT language processing platform to analyze the user semantics.
Example 8:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: when the Baidu UNIT language processing platform analyzes user semantics, when the analyzed incoming text message contains entries related to address query, such as 'how to go' and 'how to go', the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, the Baidu UNIT language processing platform returns an answer related to question-answer chatting.
Example 9:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the step 4) is specifically as follows:
4.1) analyzing user semantics by a Baidu UNIT language processing platform and judging the user semantics as dialogue chatting;
4.2) after the step 4.1), cutting off the dialogue content into each phrase according to the semantics;
4.3) the phrases obtained in the step 4.2) are submitted to a semantic analysis module trained by a large amount of data for analysis;
4.4) returning the most suitable answer for the current conversation after the step 4.3) and returning the most suitable answer to the server and the AI digital robot.
Example 10:
the present embodiment is further optimized based on any of the above embodiments, and the same parts as those in the foregoing technical solutions will not be described herein again, and in order to further better implement the present invention, the following setting modes are particularly adopted: the step 5) is specifically as follows:
5.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as route guidance;
5.2) carrying out sentence breaking on the conversation content, comparing each phrase, extracting the phrase which most accords with the address information, and returning the phrase to the background program;
5.3) the server starts a route query process;
5.4) the server delivers the destination entry returned by the Baidu UNIT language processing platform to a high-resolution map API;
5.5) calculating by the aid of a high-grade map API to obtain at least 1 optimal route, and returning the optimal route to the server;
5.6) after the step 5.5), the server extracts the related information of the subway line from the returned result, and the information is sorted and packaged as the returned result to be sent to the AI digital robot.
Example 11:
the embodiment is further optimized on the basis of any one of the embodiments, and provides an AI digital robot operation method. The AI digital robot can continuously send and receive heartbeat packets without data in an unmanned state and a working state so as to ensure normal link between a server and the robot. The operation method comprises the following steps:
1) acquiring portrait information:
1.1) after a camera on the AI digital robot captures a 1080P portrait picture, transmitting the captured portrait picture to a dlib-based background program of the AI digital robot;
1.2) continuously tracking the captured portrait by a backstage program based on dlib of the AI digital robot through a trained face key point detector and a face recognition model;
1.3) the background program of the AI digital robot compares the relative position of the virtual image in the AI digital robot according to the position of the captured image in reality, and aims the sight of the virtual image at the captured image to realize the effect that the virtual image watches pedestrians;
2) sound information acquisition:
2.1) a microphone on the AI digital robot receives external sound;
2.2) monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function when the external sound exceeds a certain range, judging that an active sound exists when the excitation confidence coefficient reaches a certain value, and starting to capture voice instruction information of a user from the external sound after a background program of the AI digital robot judges that the active sound exists; wherein, the excitation function, the algorithm and the required standard value can be modified according to the actual situation, and the subsequent optimization is carried out aiming at the feedback in the actual use process;
2.3) then delivering the captured voice instruction information to a dlib-based background program of the AI digital robot; when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent, and keeping an unmanned state; the step 1) and the step 2) can be carried out simultaneously or not simultaneously.
3) The AI digital robot calls a message flight API (application program interface) to convert the acquired sound signals through a background program of the AI digital robot, receives a conversion result, sends the conversion result to a server through the background program of the AI digital robot, and executes the next processing, if the step 4);
4) after the server obtains an effective processing result returned by the message flying API, extracting a character string of voice information input by a user from the effective processing result, and sending the character string to a Baidu UNIT language processing platform to analyze user semantics; when the parsed incoming text message contains entries related to address query, such as 'what to go', and the like, the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, an answer related to chatting of the query and the answer is returned; according to different returned results after the Baidu UNIT language processing platform is processed, if the returned results do not contain address related information, executing the step 5), and if the returned results contain the address related information, executing the step 6);
5) dialogue chatting:
5.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as dialogue chatting;
5.2) carrying out sentence segmentation on the conversation content after the step 5.1), and breaking the conversation content into each phrase according to semantics;
5.3) the phrases obtained in the step 5.2) are submitted to a semantic analysis module trained by a large amount of data for analysis;
5.4) returning the most suitable answer for the current conversation after the step 5.3) and returning the answer to the server and the AI digital robot, and entering the step 7);
6) subway line guides:
6.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as route guidance;
6.2) carrying out sentence segmentation on the conversation content, comparing each phrase, extracting the phrase which best meets the address information, and returning the phrase to a background program based on dlib;
6.3) the server starts a route query process;
6.4) the server delivers the destination entry returned by the Baidu UNIT language processing platform to a high-resolution map API;
6.5) calculating through a high-grade map API to obtain at least 1 optimal route, and returning to the server;
6.6) after the step 6.5), the server extracts the related information of the subway line from the return result, sorts and packages the information as the return result and sends the return result to the AI digital robot, if the step 7);
7) after the AI digital robot receives the results of the step 5) and/or the step 6), comparing the information stored in the database by a dlib-based background program of the AI digital robot according to different results, calling corresponding voice information and action information, delivering the corresponding voice and action information to a digital character to display the corresponding voice and action, and entering a step 8);
8) and after the voice and action display (playing) is finished, returning to the step 1) and the step 2) again to judge the environment.
Example 12:
the embodiment is further optimized on the basis of any one of the above embodiments, and an AI digital robot operation method, as shown in fig. 1, includes the following steps:
and (3) face recognition, wherein a camera arranged on the AI digital robot can capture a picture at the front part of the AI digital robot in a standby state, an object closest to the face is captured from the picture for recognition, and the object similar to the captured object can be recognized as the face when the object is found to exist in a face recognition model through algorithm comparison.
And (3) detecting the person, wherein a camera arranged on the AI digital robot can capture the picture at the front part of the AI digital robot in a standby state, the object closest to the human being is captured from the picture for recognition, and the captured object and the object with the shape similar to the human being are recognized as the existence of the person when the object is found through algorithm comparison in a face recognition model.
And voice input, wherein a microphone arranged on the AI digital robot records external sounds when the system needs the sounds, and the recorded resources are used for subsequent processing.
Voice awakening, dumping the recorded sound resource into 16-bit pcm format, introducing the dumped resource into the awakening processing process, analyzing whether the sound can trigger awakening and returning the processing result for other processes.
And voice conversation, namely recognizing and converting voice resources input by a user in a voice input process into a resource format which can be used for background program processing, sending the resources to a server for processing to obtain a processing result for analysis, and displaying the processing result through voice and action to carry out conversation.
And in an unmanned state, the AI digital robot maintains a standby state under various conditions that no person appears in the camera capturing area, the person appears but the staying time does not meet the condition, the person appears but the awakening voice is not captured, and the like, and the state is the unmanned state.
And if the AI digital robot is a person, the AI digital robot judges whether a user is about to use the AI digital robot from the perspective of images and sounds through the camera and the microphone.
And entering a dialogue, wherein the AI digital robot judges that the user starts to use the AI digital robot after the user passes through the portrait detection and voice awakening stages, and then starts to enter voice interaction.
Recognizing the conversation, wherein the AI digital robot processes the sound resources in the whole process in the voice awakening and voice input processes, and recognizing the conversation if all the sound resources in the interaction process meet the conditions capable of being converted into processable information.
And the AI digital robot adopts a TCP/IP communication protocol to ensure that the information of each round of conversation is not lost and is correctly transmitted between the AI digital robot and the server.
UNIT semantic recognition, namely performing semantic analysis by adopting a Baidu UNIT language processing platform; with the development of AI technologies and concepts, many products desire to adopt an interactive man-machine interaction mode. However, the development of dialog systems (dialog skills, dialog robots) is a difficult task for most developers, with high technical and data requirements. Therefore, hundreds of natural language understanding and Interaction technologies accumulated for many years are opened, and an intelligent dialog customization and service platform unit (interpretation and Interaction technology) is introduced, so that leading technical capabilities in the industry are output to vast developers, and the research and development threshold of a dialog system is lowered.
Whether the sentence is navigation or not, an intelligent dialogue customization and service platform (Baidu language processing platform) UNIT (interpretation and Interaction technology) can analyze each input sentence, compare the analyzed sentence with a designed intention to identify the purpose of the user, and start a corresponding processing process after the server acquires the corresponding purpose.
And acquiring the answer, analyzing and processing the obtained data by the server, and finishing the answer and sending the finished answer to the AI digital robot.
Extracting a destination entry, decomposing the sentence by the platform in the process of processing the sentence by the Baidu intelligent dialogue customizing and service platform UNIT (interpreting and analyzing technology), and marking each word according to the meaning of the word in the sentence, so that the word containing the destination can be extracted by the mark and returned to the server as a part of the processing result.
And (3) performing Goodpasts API processing, wherein the server sends words extracted by the marks to a corresponding Goodpasts API for processing after judging that the navigation process is performed, and the processing process comprises the steps of converting address information in a character format into longitude and latitude, planning a route through the longitude and latitude and drawing a navigation route map.
Extracting the most reasonable answer, wherein in the process of the unit (explicit Interaction technology) of the intelligent dialog customization and service platform, a plurality of answers are generated according to different preset conditions, most answers in the normal use environment comprise a value for measuring the suitability degree of the answer to the question, and the most reasonable answer can be extracted according to the value and returned to the server.
The expression of the virtual character displayed in the ARkit and AI digital robot needs to be rich enough to meet different requirements of users, the expression of the virtual character can be made manually by an animator, and the made expression is detailed but takes a long time, so that the ARkit is used for capturing the face of the model and generating corresponding expression data, and the animator can modify the captured facial expression data according to the requirements to meet more requirements.
And the AI digital robots carry out different fine adjustment according to different requirements of actual application environments, register and record the data after fine adjustment of each AI digital robot, store the data in the database and directly call the data subsequently.
And calling the expressions/actions, and according to different results returned by the server, adopting different expressions and actions by the AI digital robot to match with voice playing, so that the voice interaction process is more natural.
And (5) speaking the reply, and turning to the unmanned state.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. An AI digital robot operation method is characterized in that: comprises the following steps:
acquiring portrait information and voice information;
2) the AI digital robot converts the acquired sound signals and sends the conversion result to a server;
3) the server extracts effective information from the conversion result and analyzes the effective information, if the analysis result does not contain address related information, the step 4) is executed, and if the analysis result contains the address related information, the step 5) is executed;
4) dialogue chatting: the server is delivered to a language processing platform for semantic analysis, corresponding interaction is carried out according to a semantic analysis result, and interaction information is returned to the server and the AI digital robot;
5) subway line guides: the server is delivered to a language processing platform for semantic analysis, line query and planning are carried out according to the semantic analysis result, and then the query and planning result is returned to the AI digital robot;
6) the AI digital robot compares the information stored in the database through a background program of the AI digital robot according to the returned information in the step 4) or/and the step 5), calls corresponding voice information and action information, and gives the corresponding voice and action information to the digital figure to display the corresponding voice and action;
7) and after the voice and the action are displayed, returning to the step 1) to judge the environment.
2. The AI digital robot operating method according to claim 1, characterized in that: the step 1) comprises the following steps:
1.1) portrait information acquisition:
1.1.1) after a camera on the AI digital robot captures a portrait picture, transmitting the captured portrait picture to a background program of the AI digital robot;
1.1.2) a background program of the AI digital robot continuously tracks captured human images through a trained human face key point detector and a human face recognition model;
1.1.3) the background program of the AI digital robot compares the relative position of the virtual image in the AI digital robot according to the position of the captured image in reality, and aims the sight of the virtual image at the captured image to realize the effect that the virtual image watches pedestrians;
1.2) sound information collection:
1.2.1) a microphone on the AI digital robot receives external sound;
1.2.2) capturing voice instruction information of a user from external sound;
1.2.3) then the captured voice command information is handed to the background program of the AI digital robot.
3. The AI digital robot operating method according to claim 2, characterized in that: the portrait picture adopts 1080P resolution, and the background program of the AI digital robot adopts a background program based on dlib.
4. The AI digital robot operating method according to claim 2 or 3, characterized in that: the method comprises the steps of monitoring external sound through a microphone, comparing the external sound with environmental noise, activating an excitation function beyond a certain range, judging that active sound exists when excitation confidence reaches a certain value, and then starting capturing of a voice information instruction.
5. The AI digital robot operating method according to claim 4, wherein: and when the AI digital robot cannot capture the voice information instruction, judging the AI digital robot to be silent and keeping an unmanned state.
6. The AI digital robot operating method according to any one of claims 1 to 3 or 5, wherein: the step 2) is specifically as follows: the AI digital robot calls the message flying API to convert the collected sound signals through a background program of the AI digital robot, receives a conversion result and then sends the conversion result to the server through the background program of the AI digital robot.
7. The AI digital robot operating method according to any one of claims 1 to 3 or 5, wherein: the server extracts effective information from the conversion result and analyzes the effective information, and specifically comprises the following steps: after the server obtains the effective processing result returned by the message flying API, the character string of the voice information input by the user is extracted from the effective processing result, and the character string is delivered to a Baidu UNIT language processing platform to analyze the user semantics.
8. The AI digital robot operating method according to claim 7, wherein: when the Baidu UNIT language processing platform analyzes user semantics, when the analyzed incoming text message contains similar terms related to the address query, the Baidu UNIT language processing platform returns an answer containing address related information, otherwise, the Baidu UNIT language processing platform returns an answer related to the question-answer chatting.
9. The AI digital robot operation method according to claims 1-3, 5 and 8, wherein: the step 4) is specifically as follows:
4.1) analyzing user semantics by a Baidu UNIT language processing platform and judging the user semantics as dialogue chatting;
4.2) after the step 4.1), cutting off the dialogue content into each phrase according to the semantics;
4.3) the phrases obtained in the step 4.2) are submitted to a semantic analysis module trained by a large amount of data for analysis;
4.4) returning the most suitable answer for the current conversation after the step 4.3) and returning the most suitable answer to the server and the AI digital robot.
10. The AI digital robot operating method according to any one of claims 1 to 3, 5, and 8, wherein: the step 5) is specifically as follows:
5.1) the Baidu UNIT language processing platform analyzes the user semantics and judges the user semantics as route guidance;
5.2) carrying out sentence breaking on the conversation content, comparing each phrase, extracting the phrase which most accords with the address information, and returning the phrase to the background program;
5.3) the server starts a route query process;
5.4) the server delivers the destination entry returned by the Baidu UNIT language processing platform to a high-resolution map API;
5.5) calculating by the aid of a high-grade map API to obtain at least 1 optimal route, and returning the optimal route to the server;
5.6) after the step 5.5), the server extracts the related information of the subway line from the returned result, and the information is sorted and packaged as the returned result to be sent to the AI digital robot.
CN202010038388.XA 2020-01-14 2020-01-14 AI digital robot operation method Pending CN111209376A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010038388.XA CN111209376A (en) 2020-01-14 2020-01-14 AI digital robot operation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010038388.XA CN111209376A (en) 2020-01-14 2020-01-14 AI digital robot operation method

Publications (1)

Publication Number Publication Date
CN111209376A true CN111209376A (en) 2020-05-29

Family

ID=70786661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010038388.XA Pending CN111209376A (en) 2020-01-14 2020-01-14 AI digital robot operation method

Country Status (1)

Country Link
CN (1) CN111209376A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112965593A (en) * 2021-02-25 2021-06-15 浙江百应科技有限公司 AI algorithm-based method and device for realizing multi-mode control digital human interaction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150331711A1 (en) * 2014-05-19 2015-11-19 Qualcomm Incorporated Systems and methods for context-aware application control
CN105376294A (en) * 2014-08-12 2016-03-02 索尼公司 Method and system for providing information via an intelligent user interface
US20190206400A1 (en) * 2017-04-06 2019-07-04 AIBrain Corporation Context aware interactive robot
CN110288985A (en) * 2019-06-28 2019-09-27 北京猎户星空科技有限公司 Voice data processing method, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150331711A1 (en) * 2014-05-19 2015-11-19 Qualcomm Incorporated Systems and methods for context-aware application control
CN105376294A (en) * 2014-08-12 2016-03-02 索尼公司 Method and system for providing information via an intelligent user interface
US20190206400A1 (en) * 2017-04-06 2019-07-04 AIBrain Corporation Context aware interactive robot
CN110288985A (en) * 2019-06-28 2019-09-27 北京猎户星空科技有限公司 Voice data processing method, device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112965593A (en) * 2021-02-25 2021-06-15 浙江百应科技有限公司 AI algorithm-based method and device for realizing multi-mode control digital human interaction

Similar Documents

Publication Publication Date Title
CN108000526B (en) Dialogue interaction method and system for intelligent robot
CN108108340B (en) Dialogue interaction method and system for intelligent robot
EP3696729A1 (en) Method, apparatus, device and readable storage medium for image-based data processing
CN105843381B (en) Data processing method for realizing multi-modal interaction and multi-modal interaction system
CN106406806A (en) A control method and device for intelligent apparatuses
CN109710748B (en) Intelligent robot-oriented picture book reading interaction method and system
CN109429522A (en) Voice interactive method, apparatus and system
CN113835522A (en) Sign language video generation, translation and customer service method, device and readable medium
CN111046148A (en) Intelligent interaction system and intelligent customer service robot
CN112232066A (en) Teaching outline generation method and device, storage medium and electronic equipment
Patil et al. Guidance system for visually impaired people
CN111539408A (en) Intelligent point reading scheme based on photographing and object recognizing
CN111209376A (en) AI digital robot operation method
WO2022062195A1 (en) In-flight information assistance method and apparatus
CN113822187A (en) Sign language translation, customer service, communication method, device and readable medium
Pandey et al. Voice based Sign Language detection for dumb people communication using machine learning
CN113223520B (en) Voice interaction method, system and platform for software operation live-action semantic understanding
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
CN114186041A (en) Answer output method
CN112581631A (en) Service guide platform system
CN112307186A (en) Question-answering service method, system, terminal device and medium based on emotion recognition
CN112784631A (en) Method for recognizing face emotion based on deep neural network
CN111062207A (en) Expression image processing method and device, computer storage medium and electronic equipment
CN117041495B (en) Expert remote auxiliary enabling system based on remote voice and video technology
CN114048319B (en) Humor text classification method, device, equipment and medium based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200529

RJ01 Rejection of invention patent application after publication