CN113742460A - Method and device for generating virtual role - Google Patents

Method and device for generating virtual role Download PDF

Info

Publication number
CN113742460A
CN113742460A CN202010466955.1A CN202010466955A CN113742460A CN 113742460 A CN113742460 A CN 113742460A CN 202010466955 A CN202010466955 A CN 202010466955A CN 113742460 A CN113742460 A CN 113742460A
Authority
CN
China
Prior art keywords
semantic
voice
data
role
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010466955.1A
Other languages
Chinese (zh)
Other versions
CN113742460B (en
Inventor
潘邵武
卢惠莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010466955.1A priority Critical patent/CN113742460B/en
Priority to PCT/CN2021/082911 priority patent/WO2021238371A1/en
Publication of CN113742460A publication Critical patent/CN113742460A/en
Application granted granted Critical
Publication of CN113742460B publication Critical patent/CN113742460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a method and a device for generating virtual roles, and relates to the technical field of AI (artificial intelligence), wherein the method comprises the steps of acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated, generating a second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data to obtain second voice semantic annotation data, and training the second voice semantic annotation data to obtain the first virtual role, wherein the first voice semantic annotation data comprises a first voice instruction and is used for right the second semantic data labeled by the first voice instruction, and the second voice semantic annotation data comprises the second voice instruction and is used for right the first semantic data labeled by the second voice instruction. The technical scheme provided by the application can reduce the period and cost for generating the virtual role, improve the agility and the expansibility for processing the AI service, and facilitate the realization of the personalized processing of the AI service.

Description

Method and device for generating virtual role
Technical Field
The present application relates to the technical field of Artificial Intelligence (AI), and in particular, to a method and an apparatus for generating a virtual character.
Background
With the continuous development of AI technologies, AI services such as voice assistant, subtitle generation, voice input, chat robot, client robot, and spoken language evaluation are also more and more widely applied. The AI services can receive and recognize voice commands sent by users based on a voice semantic recognition algorithm, thereby providing a plurality of services such as interactive dialogue, information inquiry, equipment control and the like for the users.
In the prior art, a large amount of voice data can be collected and labeled according to the function field to which the AI service to be processed belongs to obtain voice semantic labeled data, and then the virtual role for the AI service is obtained through training according to the voice semantic labeled data.
However, for each virtual character, a large amount of voice data needs to be collected and labeled, and then the virtual character is obtained through training, so that the period for generating the virtual character is long, the cost is high, the agility and the expandability of the service are poor, and the personalized processing of the AI service is not facilitated.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for generating a virtual role, so as to reduce the period and cost for generating the virtual role, improve the agility and extensibility for processing an AI service, and facilitate implementation of personalized processing of the AI service.
In order to achieve the above object, in a first aspect, an embodiment of the present application provides a method for generating a virtual role, including:
acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
generating a second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data to obtain second voice semantic annotation data;
training to obtain the first virtual role based on the second voice semantic labeling data;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the second voice semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction; the second phonetic semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
The semantic data may indicate semantics of the voice instruction, including vertical domain information, intention information, and word slot information of the AI service indicated by the voice instruction. The vertical domain information is used for indicating a function domain to which the voice command belongs, the intention information is used for indicating an operation type of the voice command, and the word slot information is used for indicating an operation parameter of the voice command.
The first vertical domain information, the first intention information, and the first word slot information may be different from or partially identical to the second vertical domain information, the second intention information, and the second word slot information, respectively.
It should be further noted that the number of the first voice command, the second voice command, the first semantic data and the second semantic data may be multiple, the number of the second voice command may be greater than the number of the first voice command, and the number of the first semantic data may be greater than the number of the second semantic data.
In the step of acquiring first semantic data and first voice semantic annotation data of a first virtual character to be generated, when end-side deployment is adopted, a terminal can receive first semantic data and first voice semantic annotation data submitted by a user when acquiring the first semantic data and the first voice semantic annotation data of the first virtual character; alternatively, the stored first semantic data and the stored first voice semantic annotation data may be obtained from a storage medium of the terminal. When deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal can acquire the first semantic data and the first voice semantic annotation data from the cloud server or at least one other terminal. Of course, in practical application, the first semantic data and the first voice semantic annotation data of the first virtual character may also be obtained in other manners, and the manner of obtaining the first semantic data and the first voice semantic annotation data of the first virtual character is not specifically limited in the embodiment of the present application.
In the step of generating the second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data to obtain the second voice semantic annotation data, that is, in the step of generating the second voice semantic annotation data based on the first voice semantic annotation data and the first semantic data, when deployment from a cloud side, end-cloud-system collaborative deployment, or terminal distributed deployment is adopted, the terminal may send the first voice semantic annotation data and the first semantic data to the cloud server or another terminal, may also receive the second voice instruction corresponding to the first semantic data sent by the cloud server or another terminal, and may of course also receive the second voice semantic annotation data, thereby generating the second voice semantic annotation data by the cloud server or another terminal.
In the step of training to obtain the first virtual role based on the second voice semantic annotation data, when deployment from a cloud side, end-cloud-system collaborative deployment or terminal distributed deployment is adopted, the terminal may send the second voice semantic annotation data to the cloud server or another terminal, and may also receive the first virtual role sent by the cloud server or another terminal, so that the first virtual role is generated through training of the cloud server or another terminal.
In this embodiment of the application, first semantic data and first voice semantic annotation data of a first virtual character may be obtained, where the first voice semantic annotation data includes a first voice instruction and second semantic data used for annotating the first voice instruction, the first semantic data includes first vertical domain information, first intention information, and first word slot information, the second semantic data includes second vertical domain information, second intention information, and second word slot information, and since the vertical domain information is used for indicating a function field to which the voice instruction belongs, the intention information is used for indicating an operation type of the voice instruction, and the word slot information is used for indicating an operation parameter of the voice instruction. Therefore, a second voice instruction corresponding to the first semantic data can be generated based on the first voice semantic annotation data to obtain second voice semantic annotation data, and the second voice semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction. The first virtual character can be trained based on the second voice semantic annotation data. Because the second voice semantic data can be generated based on the first voice semantic annotation data and the first semantic data, when a new virtual role is generated, only a small amount of first voice semantic annotation data can be collected, and then a large amount of second voice semantic data can be generated based on the first voice semantic annotation data and the first semantic data, so that the number of voice instructions or first voice semantic annotation data which are collected and annotated in advance for generating the virtual role can be greatly reduced, the generation of the new virtual role can be rapidly and efficiently expanded, the period and the cost for generating the virtual role are reduced, a user can conveniently customize the personalized virtual role in time according to requirements, and the agility and the expansibility of the AI business are improved.
Secondly, because a new virtual role can be generated by correspondingly expanding aiming at different AI services more easily, the corresponding virtual role can be generated aiming at the AI services with different functions in different fields, and the virtual role can accurately and reliably process the AI services, thereby relieving the contradiction between the function breadth and the response accuracy of the virtual role.
Optionally, the generating, based on the first voice semantic annotation data, a second voice instruction corresponding to the first semantic data includes:
based on the first semantic data, searching a second virtual role associated with the first virtual role;
and if the second virtual role is not found, generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data.
In the step of searching for the second virtual role associated with the first virtual role based on the first semantic data, when deployment from a cloud side, end-cloud-system collaborative deployment, or terminal distributed deployment is adopted, the terminal may send the first semantic data to the cloud server or another terminal, and receive a determination result from the cloud server or another terminal, so that whether the second virtual role associated with the first virtual role exists is determined based on the first semantic data by the cloud server or another terminal.
Optionally, the generating, based on the first voice semantic annotation data, a second voice instruction corresponding to the first semantic data includes:
performing tuning training on a preset Generated Adaptive Network (GAN) based on the first voice semantic annotation data;
and generating a second voice instruction corresponding to the first semantic data based on the preset GAN after tuning training.
Because the GAN may include a generating network and a determining network, where the generating network may be used to generate "false data" and the determining network may be used to determine whether the input data is "false data" generated by the generating network or natural "true data", and the "false data" generated by the generating network may be as close to the "true data" as possible through the two networks, in the embodiment of the present application, when generating the second voice semantic annotation data, the second voice instruction corresponding to the first semantic data may be generated according to a small amount of real voice semantic annotation data (i.e., the first voice semantic annotation data) through the GAN, so as to obtain a large amount of voice semantic annotation data (i.e., the second voice semantic annotation data), and train and generate the first virtual character, thereby greatly reducing the data amount of the voice semantic annotation data collected in advance for generating the new virtual character, The acquisition cost is reduced.
Optionally, before the tuning training of the preset GAN based on the first speech semantic labeling data, the method further includes:
acquiring third voice semantic annotation data, wherein the third voice semantic annotation data comprises a third voice instruction, third semantic data used for annotating the third voice instruction, fourth semantic data and a fourth voice instruction used for annotating the fourth semantic data;
and training to obtain the preset GAN based on the third voice semantic annotation data.
The third semantic data may include third vertical domain information, third intention information, and third word slot information, and the fourth semantic data may include fourth vertical domain information, fourth intention information, and fourth word slot information.
By training the preset GAN in advance, the GAN can be preset to have strong semantic generalization capability, and a second voice instruction corresponding to the first semantic data can be generated based on a small amount of first voice semantic labeling data through the preset GAN.
Optionally, the training to obtain the first virtual character based on the second speech semantic annotation data includes:
training to obtain a Natural Language Understanding (NLU) model of the first virtual character based on the second speech semantic labeling data.
Optionally, the NLU model includes a long short term memory network (LSTM).
In models of natural language recognition (ASR), NLU, Dialog Management (DM), Natural Language Generation (NLG), Text To Speech (TTS), and the like included in the AI platform, the NLU is used to perform processes such as word segmentation, part-of-speech tagging, keyword extraction, and the like on words processed by the ASR, so as to obtain machine-understandable and structured semantic representation data, that is, the processing process of the NLU is closely related to specific contents indicated by a speech instruction, and directly affects the accuracy of the response of the terminal to the speech instruction, while other algorithm models are not sensitive to the specific contents indicated by the speech instruction, that is, other algorithm models except the NLU can be common to different virtual roles. Therefore, when the first virtual character is generated, the NLU model of the first virtual character can be trained, so that a new virtual character can be obtained quickly.
Optionally, the method further comprises:
when a role awakening instruction is received, role indicating information is obtained, and the role indicating information is used for indicating a third virtual role to be awakened;
determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
loading role resources of the third virtual role;
and processing Artificial Intelligence (AI) business based on the third virtual role.
In a second aspect, an embodiment of the present application provides a method for generating a virtual role, including:
acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
based on the first semantic data, searching a second virtual role associated with the first virtual role;
if the second virtual role is found, performing transfer learning training on the second virtual role based on the first voice semantic annotation data to obtain the first virtual role;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
In the step of performing migration learning training on the second virtual role based on the first voice semantic labeling data to obtain the first virtual role, when deployment from a cloud side, end-cloud-system collaborative deployment or terminal distributed deployment is adopted, the terminal may send the first voice semantic labeling data (and the second virtual role) to the cloud server or another terminal, and may also receive the first virtual role sent by the cloud server or another terminal, so that migration learning is performed on the second virtual role through the cloud server or another terminal.
In this embodiment of the application, first semantic data and first voice semantic annotation data of a first virtual character may be obtained, where the first voice semantic annotation data includes a first voice instruction and second semantic data used for annotating the first voice instruction, the first semantic data includes first vertical domain information, first intention information, and first word slot information, the second semantic data includes second vertical domain information, second intention information, and second word slot information, and since the vertical domain information is used for indicating a function field to which the voice instruction belongs, the intention information is used for indicating an operation type of the voice instruction, and the word slot information is used for indicating an operation parameter of the voice instruction. Therefore, a second virtual role associated with the first virtual role can be searched first, and if the second virtual role is searched, migration learning training is carried out on the second virtual role based on the first voice semantic annotation data to obtain the first virtual role. Therefore, the number of the voice commands or the first voice semantic labeling data which are acquired and labeled in advance for generating the virtual roles can be greatly reduced, the generation of new virtual roles can be rapidly and efficiently expanded, the period and the cost for generating the virtual roles are reduced, a user can conveniently customize personalized virtual roles in time according to requirements, and the agility and the expansibility of the AI service are improved.
Optionally, the NLU model of the first virtual character includes a basic language feature extraction layer and a semantic data extraction layer, and the performing migration learning training on the second virtual character based on the first speech semantic labeling data to obtain the first virtual character includes:
acquiring an NLU model of the second virtual role;
setting the network parameters of the basic language feature extraction layer in the NLU model of the second virtual role as constants;
and training network parameters in the semantic data extraction layer in the NLU model of the second virtual role based on the first voice semantic labeling data to obtain the NLU model of the first virtual role.
The NLU model of the first virtual role comprises a basic language feature extraction layer and a trained semantic data extraction layer.
Because the basic language feature extraction layer of the NLU model can be used for extracting basic features in character information, such as association between each character and a context, the basic language feature extraction layers of the NLU models of different virtual roles can be the same; the semantic data extraction layer can be used for further extracting and obtaining vertical domain information, intention information and word groove information on the basis of the basic characteristics of the extracted character information. Therefore, the semantic data extraction layer in the NLU model can be trained through a small amount of voice semantic labeling data, and therefore the expansion of a new virtual role can be rapidly achieved on the basis that only a small amount of voice semantic labeling data is needed.
Optionally, the method further comprises:
and storing the NLU model of the first virtual role and the first semantic data to a role resource library.
And storing the newly generated virtual role so as to be convenient for being awakened subsequently and processing the corresponding AI service. When deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal can send the first virtual role to the cloud server or another terminal, so that the first virtual role is led into a role resource library located in the cloud server or another terminal.
Optionally, the searching for the second virtual role associated with the first virtual role based on the first semantic data includes:
acquiring fifth semantic data of at least one existing virtual role;
determining role similarity between the at least one virtual role and the first virtual role respectively based on the first semantic data and the fifth semantic data;
and searching a second virtual role associated with the first virtual role according to the role similarity between the at least one virtual role and the first virtual role.
The role similarity between the second virtual role and the first virtual role can be larger than a preset similarity threshold.
It should be noted that the fifth semantic data includes fifth vertical domain information, fifth intention information, and fifth word slot information.
Since when the first virtual character and the second virtual character are similar (adjacent or close), the voice commands sent by the user to the first virtual character and the second virtual character respectively are similar in function and syntax, such as "playing music" and "playing video", "finding encyclopedia" and "finding information", verbs are all playing or finding, and the corresponding vertical domain information is all device control or information query, and the difference is only that the played object or the found object is different. Therefore, the second virtual character associated with the first virtual character is accurately searched through the similarity.
Optionally, the method further comprises:
when a role awakening instruction is received, role indicating information is obtained, and the role indicating information is used for indicating a third virtual role to be awakened;
determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
loading role resources of the third virtual role;
and processing the AI business based on the third virtual role.
Wherein the preset dimension comprises a function field, a occupation, an identity, a title, an age, a content provider, a service platform or a role attribute. Of course, in practical applications, the preset dimension may also include other dimensions more or less, and the preset dimension is not specifically limited in this embodiment of the application.
The terminal can comprise a plurality of virtual roles, and the virtual roles can be divided according to one or more preset dimensions such as function field, occupation, identity, title, age, content provider, service platform or role attribute, so that AI services in multiple aspects can be processed, and the function breadth of the virtual roles is remarkably improved. When a role awakening instruction is received, role indication information can be obtained, so that a matched third virtual role is determined in a plurality of virtual roles currently included according to the role indication information, role resources of the third virtual role are loaded, an AI (artificial intelligence) service is processed based on the third virtual role, the third virtual role is difficult to generate ambiguity on a voice instruction, the AI service can be accurately processed, and the accuracy rate of responding to the voice instruction is remarkably improved.
Optionally, the processing an AI service based on the third virtual role includes:
receiving a fourth voice instruction;
generating response control information corresponding to the fourth voice instruction based on the role resource;
and executing a response task based on the response control information.
In a third aspect, an embodiment of the present application provides a method for generating a virtual role, including:
acquiring first voice semantic annotation data of a first virtual role, wherein the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction;
based on the second semantic data, searching for a second virtual role associated with the first virtual role;
and if the second virtual role is found, performing transfer learning training on the second virtual role based on the first voice semantic annotation data to obtain the first virtual role.
Optionally, the method further comprises:
if the second virtual role is not found, acquiring first semantic data of the first virtual role;
generating a second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data to obtain second voice semantic annotation data, wherein the second voice semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction;
and training to obtain the first virtual role based on the second voice semantic labeling data.
Wherein the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
In the embodiment of the application, first semantic data of a first virtual character may not be acquired, but second semantic data in first voice semantic annotation data may be used to determine whether a second virtual character associated with the first virtual character exists, so that if the second virtual character currently exists, the first virtual character may be generated without acquiring the first semantic data, data required for generating a new virtual character may be further reduced, and the cost for acquiring data may be reduced.
In a fourth aspect, an embodiment of the present application provides a method for processing an AI service, including:
when a role awakening instruction is received, role indicating information is obtained, and the role indicating information is used for indicating a third virtual role to be awakened;
determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
loading role resources of the third virtual role;
and processing the AI business based on the third virtual role.
In the embodiment of the present application, the terminal may include a plurality of virtual roles, and the virtual roles may be divided according to at least one preset dimension, so as to ensure that AI services in multiple aspects can be processed, and significantly improve the functional breadth of the virtual roles. When a role awakening instruction is received, role indication information can be obtained, so that a matched third virtual role is determined in a plurality of virtual roles currently included according to the role indication information, role resources of the third virtual role are loaded, an AI (artificial intelligence) service is processed based on the third virtual role, the third virtual role is difficult to generate ambiguity on a voice instruction, the AI service can be accurately processed, and the accuracy rate of responding to the voice instruction is remarkably improved.
Optionally, the processing an AI service based on the third virtual role includes:
receiving a fourth voice instruction;
generating response control information corresponding to the fourth voice instruction based on the role resource;
and executing a response task based on the response control information.
In a fifth aspect, an embodiment of the present application provides an apparatus for generating a virtual character, including:
the acquisition module is used for acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
the generating module is used for generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data to obtain second voice semantic labeling data;
the training module is used for training to obtain the first virtual role based on the second voice semantic labeling data;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the second voice semantic annotation data comprises a second voice instruction and the first semantic data used for annotating the second voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
Optionally, the generating module is further configured to:
based on the first semantic data, searching a second virtual role associated with the first virtual role;
and if the second virtual role is not found, generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data.
Optionally, the generating module is further configured to perform tuning training on a preset GAN based on the first speech semantic annotation data; and generating a second voice instruction corresponding to the first semantic data based on the preset GAN after tuning training.
Optionally, the obtaining module is further configured to obtain third voice semantic labeling data, where the third voice semantic labeling data includes a third voice instruction, third semantic data used for labeling the third voice instruction, fourth semantic data, and a fourth voice instruction used for labeling the fourth semantic data;
the training module is further configured to train to obtain the preset GAN based on the third speech semantic labeling data.
Optionally, the obtaining module is further configured to obtain role indication information when a role wake-up instruction is received, where the role indication information is used to indicate a third virtual role to be woken up;
further comprising:
the determining module is used for determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
the loading module is used for loading role resources of the third virtual role;
and the processing module is used for processing the AI service based on the third virtual role.
In a sixth aspect, an embodiment of the present application provides an apparatus for generating a virtual character, including:
the acquisition module is used for acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
a searching module, configured to search, based on the first semantic data, a second virtual role associated with the first virtual role;
the training module is used for performing transfer learning training on the second virtual character based on the first voice semantic annotation data to obtain the first virtual character if the second virtual character is found;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
Optionally, the NLU model of the first virtual character includes a basic language feature extraction layer and a semantic data extraction layer, and the training module is further configured to:
acquiring an NLU model of the second virtual role;
setting the network parameters of the basic language feature extraction layer in the NLU model of the second virtual role as constants;
and training network parameters in the semantic data extraction layer in the NLU model of the second virtual role based on the first voice semantic labeling data to obtain the NLU model of the first virtual role.
Optionally, the method further comprises:
and the storage module is used for storing the NLU model of the first virtual role and the first semantic data to a role resource library.
Optionally, the lookup module is further configured to:
acquiring fifth semantic data of at least one existing virtual role;
determining role similarity between the at least one virtual role and the first virtual role respectively based on the first semantic data and the fifth semantic data;
and searching a second virtual role associated with the first virtual role according to the role similarity between the at least one virtual role and the first virtual role.
Optionally, the obtaining module is further configured to obtain role indication information when a role wake-up instruction is received, where the role indication information is used to indicate a third virtual role to be woken up;
further comprising:
the determining module is used for determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
the loading module is used for loading role resources of the third virtual role;
and the processing module is used for processing the AI service based on the third virtual role.
Optionally, the processing module is further configured to:
receiving a fourth voice instruction;
generating response control information corresponding to the fourth voice instruction based on the role resource;
and executing a response task based on the response control information.
In a seventh aspect, an embodiment of the present application provides an apparatus for generating a virtual role, including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring first voice semantic annotation data of a first virtual role to be generated, and the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction;
a searching module, configured to search, based on the second semantic data, a second virtual role associated with the first virtual role;
and the training module is used for performing transfer learning training on the second virtual role based on the first voice semantic annotation data to obtain the first virtual role if the second virtual role is found.
Optionally, the obtaining module is further configured to obtain first semantic data of the first virtual character if the second virtual character is not found;
the training module is further configured to train to obtain the first virtual character based on the second speech semantic labeling data;
further comprising:
and the generating module is used for generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data to obtain second voice semantic labeling data, wherein the second voice semantic labeling data comprise the second voice instruction and the first semantic data used for labeling the second voice instruction.
In an eighth aspect, an embodiment of the present application provides an apparatus for processing an AI service, including:
the device comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring role indication information when receiving a role awakening instruction, and the role indication information is used for indicating a third virtual role to be awakened;
the determining module is used for determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
the loading module is used for loading role resources of the third virtual role;
and the processing module is used for processing the artificial intelligence AI business based on the third virtual role.
Optionally, the processing module is further configured to:
receiving a fourth voice instruction;
generating response control information corresponding to the fourth voice instruction based on the role resource;
and executing a response task based on the response control information.
In a ninth aspect, an embodiment of the present application provides a terminal, including: a memory for storing a computer program and a processor; the processor is configured to perform the method of any of the first to fourth aspects described above when the computer program is invoked.
In a tenth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory to implement the method of any one of the first to fourth aspects.
The chip system can be a single chip or a chip module consisting of a plurality of chips.
In an eleventh aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method of any one of the first to fourth aspects.
In a twelfth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal, causes the terminal to perform the method of any one of the first to fourth aspects.
It is understood that, the beneficial effects of the fifth aspect to the twelfth aspect can be referred to the relevant description of the first aspect to the fourth aspect, and are not described herein again.
Drawings
Fig. 1 is a block diagram of a virtual role system according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a voice assistant interactive session according to an embodiment of the present application;
fig. 3 is a block diagram of another virtual role system provided in the embodiment of the present application;
fig. 4 is a block diagram of another virtual role system provided in the embodiment of the present application;
fig. 5 is a block diagram of another virtual role system provided in the embodiment of the present application;
fig. 6 is a block diagram of another virtual role system provided in the embodiment of the present application;
fig. 7 is a flowchart of a method for generating a virtual character according to an embodiment of the present application;
fig. 8 is a schematic diagram illustrating a role migration learning according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an NLU model according to an embodiment of the present application;
FIG. 10 is a schematic diagram illustrating a principle of generating semantic annotation data of speech according to an embodiment of the present application;
FIG. 11 is a flowchart of another method for generating virtual roles provided by embodiments of the present application;
fig. 12 is a flowchart of a method for processing an AI service according to an embodiment of the present application;
FIG. 13 is a schematic diagram of a UI interface provided by an embodiment of the application;
FIG. 14 is a schematic view of another UI interface provided by embodiments of the application;
FIG. 15 is a schematic view of another UI interface provided by embodiments of the application;
FIG. 16 is a schematic view of another UI interface provided by embodiments of the application;
FIG. 17 is a schematic view of another UI interface provided by embodiments of the application;
FIG. 18 is a schematic view of another UI interface provided by embodiments of the application;
FIG. 19 is a schematic diagram of another UI interface provided by embodiments of the application;
fig. 20 is a schematic structural diagram of an apparatus for generating a virtual character according to an embodiment of the present application;
fig. 21 is a schematic structural diagram of an apparatus for generating a virtual character according to an embodiment of the present application;
fig. 22 is a schematic structural diagram of an apparatus for generating a virtual character according to an embodiment of the present application;
fig. 23 is a schematic structural diagram of an apparatus for processing an AI service according to an embodiment of the present application;
fig. 24 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 25 is a schematic structural diagram of another terminal provided in the embodiment of the present application;
fig. 26 is a block diagram of a software structure of a terminal according to an embodiment of the present application.
Detailed Description
In order to facilitate understanding of the technical solutions in the embodiments of the present application, an application scenario of the embodiments of the present application is first described below.
In order to facilitate understanding of technical solutions in the embodiments of the present application, some terms referred to in the embodiments of the present application are first explained below:
the virtual role can be a collection of programs for handling at least one AI service. In practical application, different virtual roles may be generated by dividing according to at least one preset dimension, for example, the virtual roles may be divided according to at least one preset dimension such as a function field, a job, an identity, a title, an age, a content provider, a service platform, or a role attribute. Of course, in practical applications, the preset dimension may also include other dimensions more or less, for example, a manufacturer of the virtual character may also be included, and the preset dimension is not specifically limited in this embodiment of the application.
It should be noted that the AI service may include a voice assistant, a subtitle generation, a voice input, a chat robot, a client robot, or a spoken language evaluation, and certainly, in practical applications, the AI service may also include other AI services, and the embodiment of the present application does not specifically limit the type of the AI service.
The voice assistant is an application program constructed based on AI, and helps a user to complete operations such as information query, equipment control, text input and the like by performing instant question-answer type voice interaction with the user by means of a voice semantic recognition algorithm.
For example, virtual characters can be classified into medical and health, educational counseling, sports health, news information, travel and smart home according to functional fields; doctors, teachers, coaches, secretaries, stewards and policemen can be classified according to occupation or identity; according to the position, the device can be divided into a kitchen, a bathroom, a living room, a bedroom, a balcony and an entrance guard; the functions of the equipment can be divided into electric cookers, ventilation equipment, televisions, curtains, washing machines and door locks; according to identity/title, can be divided into uncle, aunt, brother, sister, grandfather or milk; and the characters can be divided into literature girls, fashion women, science old people, housewives, science and technology feverish friends and game fellow persons according to the character attributes.
It should be noted that dimensions corresponding to different virtual characters may overlap or repeat, for example, two virtual characters, i.e., a teacher and a primary school teacher, may exist simultaneously.
It should also be noted that, in order to facilitate interaction with the user, the virtual character may also have a character image that can be displayed on a display screen or by projection, and the character image may be the same as the virtual character in reality. For example, a teenager-oriented virtual character installed in the smart watch, which is called a "snail," may be woken up when it is detected that a user utters a voice of the "snail," and may be shown in a display screen of the smart watch as a real snail image when woken up.
In the prior art, when virtual roles are generated, a large number of voice instructions can be collected in advance for each virtual role to be generated, the voice instructions are labeled to obtain voice semantic labeled data, and then an NLU algorithm model of the virtual role is obtained through training through the large number of voice semantic data, so that the virtual role is generated. However, when each virtual character is generated, a large number of voice instructions are collected and marked, so that the period for generating the virtual character is long, the cost is high, the agility and the expansibility for processing the AI service are poor, and the personalized processing of the AI service is not facilitated.
In order to solve the technical problem, the present application provides a method for generating a virtual character, which may obtain first semantic data and first voice semantic annotation data of a first virtual character to be generated, where the first voice semantic annotation data includes a first voice instruction and second semantic data used for annotating the first voice instruction, the first semantic data includes first vertical domain information, first intention information and first word slot information, the second semantic data includes second vertical domain information, second intention information and second word slot information, because the vertical domain information is used for indicating a function domain to which the voice instruction belongs, the intention information is used for indicating an operation type of the voice instruction, and the word slot information is used for indicating an operation parameter of the voice instruction. Therefore, a second voice instruction corresponding to the first semantic data can be generated based on the first voice semantic labeling data to obtain second voice semantic labeling data, and the first virtual role is trained based on the second voice semantic labeling data. Because the second voice semantic data can be generated based on the first voice semantic annotation data and the first semantic data, when a new virtual role is generated, only a small amount of first voice semantic annotation data can be collected, and then a large amount of second voice semantic data can be generated based on the first voice semantic annotation data and the first semantic data, so that the number of voice instructions or first voice semantic annotation data which are collected and annotated in advance for generating the virtual role can be greatly reduced, the generation of the new virtual role can be rapidly and efficiently expanded, the period and the cost for generating the virtual role are reduced, a user can conveniently customize the personalized virtual role in time according to requirements, and the agility and the expansibility of the AI business are improved.
In addition, in the prior art, with the continuous development of AI technology, the user has higher and higher requirements for virtual characters such as voice assistants, and on one hand, it is desirable that the virtual characters can support problems, skills and scenes as wide as possible, that is, "all-impossible", and on the other hand, it is desirable that the virtual characters respond to voice commands as accurately as possible, that is, "must". However, these two requirements may be contradictory, and if the more problems, skills and scenarios are supported by a virtual character, the more complicated the functional field of the AI service to be processed, the more difficult it is for the virtual character to accurately understand the voice command of the user. For example, taking a voice assistant as an example, a user asks "what is a chocolate cyst", and the response result of the voice assistant may be a query result of "chocolate" instead of "chocolate cyst"; for another example, the voice command of the user is "turn on the electric cooker", and the response result of the voice assistant may be to turn on a shopping link of the electric cooker instead of controlling the operation of the kitchen electric cooker through internet of things (IoT); as another example, the user asks "jaguar top speed" and the voice assistant may not recognize whether "jaguar" refers to an animal or a vehicle and therefore responds with an error. Furthermore, some terminals may face users of different ages, academic calendars, languages and content preferences, and these differences may further increase the possibility that the voice assistant may confuse the semantics of the voice instructions.
For the technical problem, on one hand, the method for generating the virtual role provided by the application can correspondingly expand and generate a new virtual role for different AI services more easily, so that corresponding virtual roles can be generated for AI services with different functions in different fields, the AI services can be accurately and reliably processed by the virtual roles, and the contradiction between the function breadth and the response accuracy of the virtual roles can be relieved. On the other hand, the application also provides a method for processing the AI service, the terminal may include a plurality of virtual roles, and the virtual roles may be divided according to at least one dimension (such as at least one of a function field, a job, an identity, a title, an age, a content provider, a service platform, or a role attribute), so as to ensure that the AI service in multiple aspects can be processed, and the function scope of the virtual roles is significantly improved. When a role awakening instruction is received, role indication information can be obtained, so that a matched third virtual role is determined in a plurality of virtual roles currently included according to the role indication information, role resources of the third virtual role are obtained and loaded, an AI service is processed based on the third virtual role, the third virtual role is difficult to generate ambiguity for a voice instruction, the AI service can be accurately processed, and the accuracy rate of responding to the voice instruction is remarkably improved.
The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Referring to fig. 1, a block diagram of a virtual role system 100 according to the present application is shown. The system includes a device input module 110, a base resource library 120, an AI platform 130, a role selection module 140, a role resource library 150, a role construction module 160, a device output module 170, and a task management and business logic module 180.
The device input module 110 may be configured to collect information such as a voice command, an operation command, context interaction information, and context information input by a User, and control input or sensing peripherals of terminal devices such as a microphone, an inertial sensor, a touch display screen, a key, a keyboard, a mouse, and a camera, and terminal device software modules or data information such as a User Interface (UI), a User portrait, a schedule, a communication record, a schedule, short message content, mail content, a contextual model, and device operation history.
The base resource pool 120 may include general resources such as a voice semantic algorithm, User Experience (UX), service access, and the like, which are required for supporting the virtual role system to complete basic service functions such as voice wakeup, system setup, and the like.
The AI platform 130 may integrate algorithms such as voice wakeup, ASR, NLU, DM, NLG, TTS, and the like, to control each virtual role to execute a cascade processing procedure. And simultaneously integrating Content Aware (CA) algorithms for user state and contextual model awareness, and software libraries and AI running frameworks (such as caffe, tensoflow, pytorch and the like) on which the algorithms run.
Voice wakeup, which may mean that the terminal receives and detects a specific user voice command (e.g., a wakeup word) when the screen is locked or the virtual role is in a dormant state, and activates the virtual role to enter a state waiting for voice command input.
It should be noted that, for the received voice information, the voice information may be preprocessed by hardware or software means by using audio signal processing algorithms such as reverberation cancellation, echo cancellation, blind source separation, beam forming, and the like.
ASR can convert speech information into corresponding text information and perform normalization, error correction, and paperwork on the spoken text information.
The NLU can perform word segmentation, part-of-speech tagging, keyword extraction and other processing on the character information processed by the ASR, so that machine-understandable structured semantic representation data can be obtained. The NLU in the voice assistant can be used for recognizing the intention type of the voice instruction and extracting the keyword information of the voice instruction. For example, if the user inputs "order a flight ticket to beijing tomorrow", the intention classification result is "order a flight ticket", and the slot extraction result is "time: tomorrow, destination: beijing ″. Because of the diversity and flexibility of natural language, under the condition of different contexts, completely different semantics can be provided, so that the NLU is an important component in the virtual role, and the accuracy of the NLU in intention classification and word slot extraction is directly related to whether the virtual role can accurately respond to the voice instruction of the user.
The DM may determine what service/platform should be accessed, what feedback operation should be taken, or what response information should be replied according to the dialogue state of the semantic representation data output by the NLU.
The NLG may convert the system response action generated according to the DM determination to generate a natural language text understandable by a human.
The TTS may convert the natural language text generated by the NLG into playable response speech and output the response speech.
It should be noted that, for different AI services, the AI platform 130 of the virtual role may include at least one algorithm of ASR, NLU, DM, NLG, and TTS. For example, when the AI service is a voice assistant, referring to fig. 2, the AI platform 130 corresponding to the virtual role may include an ASR module 220, an NLU module 230, a DM module 240, an NLG module 250, and a TTS module 260, which are sequentially cascaded; when the AI service is subtitle generation or voice input, the AI platform 130 corresponding to the virtual character may include an ASR module 220; when the AI service is a chat robot or a customer service robot, the AI platform 130 corresponding to the virtual role may include an NLU module 230, a DM module 240, and an NLG module 250; when the AI service is a spoken assessment, the AI platform 130 corresponding to the virtual character may include an ASR module 220 and an NLU module 230. Of course, in practical applications, the AI platform 130 corresponding to the virtual role may also include more or less algorithms based on different AI services.
Referring to fig. 2, a schematic diagram of a voice assistant interactive session according to an embodiment of the present application is shown. Taking weather query as an example, the user 210 sends a voice instruction "please tell me the weather of city a tomorrow"; the ASR module 220 converts the voice command into text information; the NLU module 230 identifies the text information, the intention classification result is "query weather", and the slot extraction result is "time: tomorrow, Area: market A "; the DM module 240 determines the accessed service platform as a weather query platform according to the intention type and the keyword information, and queries that the weather of city a is "sunny day, 32 ℃; the NLG module 250 generates the text message of response according to the query result, wherein the text message is that "the weather in tomorrow a city is sunny day, and the average temperature is 32 ℃; TTS module 260 converts the text message "weather in tomorrow A is sunny, average temperature 32 degrees Celsius" into voice message, and then may play the voice message as a response to the voice command by calling a speaker.
The ASR, NLU, DM, NLG, TTS, and other models may be implemented by machine learning models such as a Recurrent Neural Network (RNN), LSTM, or transform.
The role selection module 140 may detect the user instruction collected by the device input module 110, and select the most appropriate virtual role according to the role indication information, and the detection and analysis process may be implemented by processing and recognizing one or more of a wakeup word, a voice instruction, a UI operation, a user state, a contextual model, and the like.
The role indication information may be used to indicate the virtual role to be woken up, and may include at least one of information such as a wake-up word, an identity name, and a user identity.
The role resource library 150 may include resources such as voice semantic algorithm, UX, service access, etc. required for supporting the operation of any virtual role and the execution of AI service of the corresponding vertical domain, and includes role resources of one or more virtual roles.
The character building module 160 may provide a user customized interface for the virtual character, and train and generate an NLU model of the virtual character customized by the user according to the relevant data (such as the speech semantic annotation data) input by the user and required for training the NLU.
The device output module 170 may call terminal peripherals such as a speaker/loudspeaker, a touch display screen, and a vibration motor according to the response control information, and externally perform feedback response operations such as voice broadcasting, text response, information refreshing, device control, and the like.
The task management and service logic module 180 may perform task scheduling and task management according to the response control information output by the virtual role decision.
The avatar system 100 can interact with the user through the at least one function module to implement different functions. For example, a user may be interacted with through the device input module 110 and the character construction module 160 to extend the construction of a new virtual character.
It should be noted that, in practical applications, the virtual role system 100 shown in fig. 1 may include more or fewer functional modules, and the virtual role system 100 may adopt a plurality of deployment manners, such as end-side deployment, cloud-side collaborative deployment, and terminal distributed deployment, which will be described below.
Mode one, end-side deployment
Referring to fig. 3, a block diagram of a virtual role system 100 according to an embodiment of the present disclosure is shown. On the basis of fig. 1, the virtual character system 100 further includes an application management framework 190, a system service/information platform 191, and an intelligent brain 192; the AI platform 130 includes a wake-up module 270, an ASR module 220, an NLU module 230, a DM module 240, an NLG module 250, a TTS module 260, and a CA module 280; the role resource library 150 further comprises a trigger component library 151, a voice semantic algorithm library 152, a service resource library 1532 and an UX resource library 154; the role construction module 160 further includes a word slot information base 161, an intention information base 162, a data generation module 163, and a role migration module 164; base resource library 120 also includes a word slot information library 161 and a base role model 121.
The application management framework 190 may be used to fully or partially call the peripheral systems or devices of the virtual character, including the output module 170, such as a terminal peripheral device for controlling a speaker/sound box, a touch display screen, a vibration motor, and the like.
The system service/information platform 191 may include system services such as simulated click, Uniform Resource Locator (URL) connection access, system Application Programming Interface (API), IoT control, and the like, which are carried by the terminal, and information platforms such as third-party voice service, third-party Content Provider (CP) encyclopedia query or atomization service, IoT control, and the like.
The intelligent brain 192, or resource manager, may be used to select at least one of the corresponding terminal peripherals, services and terminals according to the user's needs or habits.
The trigger spring library 151 may include at least one of a software library, a model, an algorithm, and a policy for virtual character detection recognition and decision selection, such as a wakeup word detection algorithm, a voiceprint recognition algorithm, a character decision algorithm, and the like.
The speech semantic algorithm library 152 may include speech semantic processing resources in each functional field of the virtual role, including end-to-end algorithms or partial processing links such as ASR, NLU, DM, NLG, TTS, and the speech semantic processing resources may be packaged in the form of model files, parameter tables, configuration files, software libraries, service interfaces, and the like.
The service resource library 153 may include service response resources of each function domain of the virtual character, such as at least one of a device function library and an IoT device function library.
The UX resource library 154 may include at least one of UX resources and avatars corresponding to the virtual character.
The word slot information base 161 may include word slot information corresponding to each vertical domain information of the virtual character.
The intention information base 162 may include intention information corresponding to each vertical domain information of the virtual character.
The data generating module 163 and the role migration module 164 may be respectively configured to generate a large amount of voice semantic labeling data according to a small amount of acquired voice semantic labeling data and semantic data, and implement migration learning of virtual roles.
The voice semantic annotation data and the semantic data can correspond to AI services to be processed by the first virtual role; the semantic data may indicate semantics of the voice instruction, including vertical domain information, intention information, and word slot information of the AI service indicated by the voice instruction. The vertical domain information is used for indicating a function domain to which the voice command belongs, the intention information is used for indicating an operation type of the voice command, and the word slot information is used for indicating an operation parameter of the voice command.
For example, if the content of a certain voice instruction is "please play song of zhang san", the corresponding vertical domain information may be device control, the intention information may be playing music, and the word slot information is zhang san.
Second, cloud-side deployment
Referring to fig. 4, a block diagram of another virtual role system 100 according to an embodiment of the present application is shown. Compared with the end-side deployment, the virtual role system 100 comprises an end-side subsystem 300 and a cloud-side subsystem 400, and the end-side subsystem 300 and the cloud-side subsystem 400 are interactively cooperated to complete the same functions as the end-side deployment.
The end-side subsystem 300 includes a device input module 110, an AI platform 130, a role selection module 140, a base resource library 120, a device output module 170, an application management framework 190, and an intelligent brain 192. The AI platform 130 includes a wake-up module 270, a CA module 280, and a cloud access module 310.
The cloud access module 310 may be configured to enable the end-side subsystem 300 to submit various cloud service requests required for processing the AI service to the cloud-side subsystem 400, and read a processing result of the cloud-side subsystem 400 for the cloud service request.
The cloud-side subsystem 400 includes an AI platform 130, a role resource library 150, a role construction module 160, a task management and business logic module 180, an application management framework 190, a terminal access module 410, and a service access selection module 420. The AI platform 130 includes an ASR module 220, an NLU module 230, a DM module 240, an NLG module 250, a TTS module 260, a wake-up module 270, and a CA module 280; the character resource library 150 includes a trigger component library 151, a speech semantic algorithm library 152, a service resource library 153, and an UX resource library 154.
The terminal access module 410 may monitor and analyze various cloud service requests submitted by various terminals.
The access service selection module 420 may call a corresponding service function from the AI platform 130 according to each cloud service request accessed via the terminal access module 410.
Compared with the end-side deployment mode, the cloud-side deployment mode can migrate the main bodies of the virtual role systems such as the AI platform 130, the role resource library 150 and the role building module 160 to the server, and the main bodies are called by the plurality of end-side subsystems 300 in the form of cloud services, so that the method for generating the virtual roles and/or the method for processing the AI services provided by the embodiment of the application are executed, the powerful data processing capacity of the cloud-side server can be fully utilized, the reliability of processing the AI services is improved, and the management, control and maintenance of the virtual role systems are facilitated.
Mode three, end cloud collaborative deployment
Referring to fig. 5, a block diagram of another virtual role system 100 according to an embodiment of the present application is shown. The virtual role system 100 includes an end-side subsystem 300, a cloud-side subsystem 400, and a third-party service system 500.
The end-side subsystem 300 includes a device input module 110, a base resource library 120, an AI platform 130, a role selection module 140, a role resource library 150, a device output module 170, a task management and business logic module 180, an application management framework 190, and an intelligent brain 192.
The AI platform 130 of the end-side subsystem 300 includes an ASR module 220, an NLU module 230, a DM module 240, an NLG module 250, a TTS module 260, a wake-up module 270, and a CA module 280; the character resource library 150 includes a trigger component library 151, a speech semantic algorithm library 152, a service resource library 153, and an UX resource library 154.
The cloud-side subsystem 400 includes an AI platform 130, a role resource library 150, a role construction module 160, a task management and business logic module 180, an application management framework 190, a terminal access module 410, and a service access selection module 420.
The AI platform 130 of the cloud-side subsystem 400 includes an ASR module 220, an NLU module 230, a DM module 240, an NLG module 250, and a TTS module 260; the character resource library 150 includes a trigger component library 151, a speech semantic algorithm library 152, a service resource library 153, and an UX resource library 154.
The third party services system 500 includes a voice semantic service 510, an encyclopedia/search service 520, an atomization service 530, an IOT/vendor platform 540, and a personalized AI service 550. Of course, in practical applications, the third party service system 500 may include more or less services.
The DM module in the cloud-side subsystem 400 may interact with the third-party service system 500 when determining that the third-party service needs to be called based on the voice instruction of the user, so as to obtain a processing result of the third-party service, for example, if the voice instruction of the user is to query weather, the DM module may determine that the weather condition needs to be queried from a weather platform of the third party.
Compared with the end-side deployment in the second mode, in the end-side cloud collaborative deployment, both the end-side subsystem 300 and the cloud-side subsystem 400 include a set of relatively complete components/modules, so that the method for generating a virtual role and/or the method for processing an AI service provided in the embodiment of the present application can be independently completed respectively. However, since the cloud server may include more computation resources and storage resources than the terminal, the AI platform 130 and the role resource library 150 of the cloud-side subsystem 400 may include more data and/or more accurate algorithms than the AI platform 130 and the role resource library 150 of the end-side subsystem 300, so that a wider session scenario and service skills may be supported, and reliability of processing an AI service is higher. In practical applications, when processing each AI service, the end-side subsystem 300 may process at the local end, and if a certain step is difficult to process effectively (for example, the step fails to be executed), the step may be processed by the cloud-side subsystem 400; or, in another possible manner, the end-side subsystem 300 and the cloud-side subsystem 400 may also process the AI service at the same time, compare the processing results of the AI service by the end-side subsystem 300 and the cloud-side subsystem 400, and feed back the processing results that are more reliable and/or faster in return speed to the user. Therefore, the rich information resources of the cloud server and the idle operation resources of the terminal can be simultaneously utilized in the cooperative deployment of the end cloud, so that the flexibility is high, and the reliability of processing the AI business can be ensured.
Mode four, terminal distributed deployment
Referring to fig. 6, a block diagram of another virtual role system 100 according to an embodiment of the present application is shown. The end-side subsystem 300 in each terminal may include a device input module 110, a base resource library 120, an AI platform 130, a role selection module 140, a role resource library 150, a device output module 170, a task management and business logic module 180, an application management framework 190, and an intelligent brain 192. The AI platform 130 includes an ASR module 220, an NLU module 230, a DM module 240, an NLG module 250, a TTS module 260, a wake-up module 270, and a CA module 280; the role resource library 150 further includes a role resource discovery/access module 193 on the basis of the trigger component library 151, the voice semantic algorithm library 152, the service resource library 153 and the UX resource library 154, and the role resource discovery/access module 193 can be used for discovering, invoking and copying role resources in other terminals.
Each terminal may be connected through a wireless or wired network such as Wi-Fi (wireless local area network based on IEEE 802.11 standard), and each terminal may include different virtual roles, and its end-side subsystem 300 may include role resources of different virtual roles, or may include part of role resources of the same virtual role, for example, the end-side subsystem 300 of terminal a may include role resources of "doctor", and the end-side subsystem 300 of terminal B may include role subsystem of "teacher"; alternatively, the end-side subsystem 300 of terminal a comprises the ASR model of "doctor" and the end-side subsystem 300 of terminal B may comprise the NLU model of "doctor". The multiple terminals may cooperate to perform the method for generating a virtual role and/or the method for processing an AI service provided in this embodiment, where the cooperation mode may include performing at least one step respectively, or performing one step cooperatively, and this embodiment of this application does not specifically limit this cooperation mode. It can be seen that, by terminal distributed deployment, resources in multiple terminals can be shared, cross-device resource complementation is achieved, high flexibility is achieved, and reliability of AI service processing can be guaranteed.
Fig. 7 is a flowchart of a method for generating a virtual character according to an embodiment of the present disclosure. It should be noted that the method may be applied to a terminal, an interaction between the terminal and a cloud server, or an interaction between the terminal and the terminal, at least one step in the following method may be performed by the terminal independently, or by the cloud server or another terminal, or by the terminal in cooperation with the cloud server or another terminal, and the method is not limited by fig. 7 and the specific sequence described below, it should be understood that, in other embodiments, the sequence of some steps in the method may be interchanged according to actual needs, or some steps in the method may be omitted or deleted. The method comprises the following steps:
s701, acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated.
To train the generation of a new virtual character (i.e., the first virtual character), first functional data and first voice semantic annotation data of the first virtual character may be obtained.
The first voice semantic annotation data can be obtained by the terminal acquiring a plurality of first voice instructions in advance and receiving the first voice instructions annotated by the user through the second semantic data, and can comprise the first voice instructions and the second semantic data used for annotating the first voice instructions; the second voice semantic annotation data may include a second voice instruction and first semantic data for annotating the second voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
It should be noted that the number of the first voice instruction, the second voice instruction, the first semantic data, and the second semantic data may be multiple, the number of the second voice instruction may be greater than the number of the first voice instruction, and the number of the first semantic data may be greater than the number of the second semantic data.
It should be further noted that the first semantic data may be different from or partially identical to the second semantic data, that is, the first vertical domain information, the first intention information, and the first word slot information may be different from or partially identical to the second vertical domain information, the second intention information, and the second word slot information, respectively.
Optionally, when a role extension instruction of a user is received, a role extension program is started to obtain first semantic data and first voice semantic annotation data of the first virtual role.
The role extension instruction can be used for indicating the extension to generate a new virtual role, and the role extension instruction can be triggered by a user executing a preset operation, wherein the preset operation can include a voice input operation, a text input operation, a key operation or a touch operation. For example, the user may input "enter role extension mode" by voice, and the terminal may determine that the role extension command is triggered when detecting that the content of the user voice output includes "enter role extension mode"; alternatively, the user may touch a character extension button on the touch screen, and the terminal may determine that the character extension instruction is triggered when a touch operation for the character extension button is detected.
When the end-side deployment is adopted, the terminal can receive first semantic data and first voice semantic annotation data submitted by a user when the terminal acquires the first semantic data and the first voice semantic annotation data of the first virtual role; alternatively, the stored first semantic data and the stored first voice semantic annotation data may be obtained from a storage medium of the terminal. When deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal can acquire the first semantic data and the first voice semantic annotation data from the cloud server or at least one other terminal. Of course, in practical application, the first semantic data and the first voice semantic annotation data of the first virtual character may also be obtained in other manners, and the manner of obtaining the first semantic data and the first voice semantic annotation data of the first virtual character is not specifically limited in the embodiment of the present application.
It should be noted that, in order to improve flexibility and reliability of obtaining the first semantic data and the first voice semantic annotation data, the manner of obtaining the first semantic data and the first voice semantic annotation data may be different, and the manner of obtaining the first vertical domain information, the first intention information, and the first word slot information in the first semantic data may also be different.
Take the first semantic data as an example. In a possible mode, when the first vertical domain information is obtained, a plurality of vertical domain information to be selected can be provided for a user, then the vertical domain information selected by the user is determined to be the first vertical domain information, and meanwhile, the first intention information and the first word slot information submitted by the user are received. In another possible manner, the terminal may obtain at least one vertical domain information, at least one intention information, and at least one word slot information, and then obtain one vertical domain information, one intention information, and one word slot information from the at least one vertical domain information, the at least one intention information, and the at least one word slot information, respectively, so as to obtain the first vertical domain information, the first intention information, and the first word slot information. In another alternative, the terminal may obtain at least one intention information and at least one word slot information, obtain one intention information and one word slot information from the at least one intention information and the at least one word slot information, respectively, to obtain a first intention information and a first word slot information, and then determine a first vertical domain information based on the first intention information.
The terminal may obtain at least one piece of intention information from a preset intention information base in a random sampling manner and the like, obtain at least one piece of word slot information from a preset word slot information base, or obtain at least one piece of intention information and at least one piece of word slot information from other pre-constructed databases.
It should be noted that, a preset intention information base and a preset word slot information base may be set in advance for a specific virtual character, where the preset intention information base may include at least one intention information, and the preset word slot information base may include at least one word slot information.
For example, the terminal acquires M pieces of intention information and N pieces of word slot information, where each intention information may be associated with one vertical domain information, and then the terminal may obtain M × N pieces of first semantic data by combining the vertical domain information, the intention information, and the word slot information.
The first vertical domain information associated with the first intention information can be acquired from the association relationship between the preset vertical domain information and the intention information.
It should be noted that, the intention information submitted by the user and the associated vertical domain information may be received in advance, or the vertical domain information associated with the intention information may be determined in a machine learning manner, and then the intention information and the vertical domain information may be stored in the association relationship between the vertical domain information and the intention information.
Of course, in practical application, the vertical domain information associated with the intention information may also be determined in other manners, and the manner of determining the vertical domain information associated with the intention information is not particularly limited in this embodiment of the application.
S702, judging whether a second virtual role related to the first virtual role exists or not based on the first semantic data. If so, S703 is performed, otherwise S704 is performed.
A second virtual role associated with the first virtual role may be searched based on the first semantic data, and if the second virtual role is found, it may be determined that the second virtual role exists, otherwise, it may be determined that the second virtual role does not exist.
Alternatively, since the semantic data of the virtual character may specify semantics indicative of voice instructions that may reflect the character functions (i.e., AI traffic processed) that the user desires the virtual character to implement. When the first virtual character and the second virtual character are similar (adjacent or close), the voice commands sent by the user to the first virtual character and the second virtual character respectively are similar in function and syntax, such as "playing music" and "playing video", "finding encyclopedia" and "finding information", verbs are all playing or finding, and corresponding vertical domain information is all device control or information query, and the difference is only that the played object or the found object is different. Then, according to the semantic data of the first virtual character and each virtual character, whether the first virtual character is associated with the virtual character can be accurately judged. Therefore, fifth semantic data of at least one existing virtual character can be acquired, the character similarity between the at least one virtual character and the first virtual character is determined based on the first semantic data and the fifth semantic data, and a second virtual character associated with the first virtual character is searched according to the character similarity between the at least one virtual character and the first virtual character, wherein the character similarity between the second virtual character and the first virtual character can be greater than a preset similarity threshold.
The fifth semantic data may include fifth vertical domain information, fifth intention information, and fifth word slot information.
The fifth semantic data of any virtual character may be acquired from the character resource library 150, at least one of the first vertical domain information, the first intention information, and the first word slot information is compared with at least one of the fifth vertical domain information, the fifth intention information, and the fifth word slot information to obtain at least one of vertical domain similarity, intention similarity, and word slot similarity, and the character similarity between the first semantic data and the fifth semantic data is determined based on at least one of the vertical domain similarity, the intention similarity, and the word slot similarity. For example, the product of the preset vertical domain weight and the vertical domain similarity, the product of the preset intention weight and the intention similarity, and the product of the preset word slot weight and the word slot similarity are accumulated to obtain the role similarity.
It should be noted that the role similarity, the vertical domain similarity, the intention similarity, and the word slot similarity may be used to respectively describe the similarity between two virtual roles, the similarity between two vertical domain information, the similarity between two intention information, and the similarity between two word slot information;
it should be further noted that the preset similarity threshold, the preset vertical domain weight, the preset intention weight, and the preset word slot weight may be obtained through setting in advance.
Alternatively, the vertical domain similarity, the intention similarity or the word groove similarity may be determined by a table look-up method or a machine learning method. Take vertical domain similarity as an example. If the vertical domain similarity is determined by table lookup, the vertical domain similarity between two pieces of vertical domain information can be queried from a preset similar vertical domain information table, where the preset similar vertical domain information table can be obtained by determining in advance, for example, by collecting a plurality of vertical domain information in advance, determining the similarity between two pieces of vertical domain information, and storing the similarity between any two pieces of vertical domain information in the preset similar vertical domain information table. If the vertical domain similarity is determined in a machine learning manner, two pieces of vertical domain information can be input into a preset similarity discrimination model, the vertical domain similarity between the two pieces of vertical domain information is determined through the preset similarity discrimination model, wherein the preset similarity discrimination model can comprise a machine learning model, a plurality of first training samples can be obtained in advance, each first training sample comprises two pieces of vertical domain information and carries labeled vertical domain similarity, and the preset similarity discrimination model is obtained through training through the plurality of first training samples. Of course, in practical application, the vertical domain similarity, the intention similarity, and the word groove similarity may also be determined in other manners, and the manner of determining the vertical domain similarity, the intention similarity, and the word groove similarity is not specifically limited in this embodiment of the application.
It should be noted that, in the embodiment of the present application, the virtual character similar to the first virtual character is obtained through the character similarity, and is used as the second virtual character associated with the first virtual character, but it is understood that, in practical applications, the second virtual character is not limited to the virtual character similar to the first virtual character.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment, or terminal-distributed deployment is adopted, the terminal may send the first semantic data to the cloud server or another terminal, and receive a determination result from the cloud server or another terminal, so that it is determined, by the cloud server or another terminal, whether a second virtual role associated with the first virtual role exists based on the first semantic data.
S703, the first virtual character is generated by performing Transfer Learning (TL) on the second virtual character.
Transfer learning is one of the research fields of machine learning, and can utilize a solution model of an existing problem to solve other problems associated with the problem, such as using a car classification algorithm, implementing or promoting a truck classification algorithm. Therefore, when a second virtual role associated with the first virtual role exists, the second virtual role can be obtained, and migration learning training is performed on the second virtual role based on the first voice semantic labeling data to obtain the first virtual role, so that the required voice semantic labeling data can be obviously reduced, and the efficiency of generating the first virtual role can be obviously improved.
Optionally, as can be seen from the foregoing, in the algorithm models included in the AI platform, such as ASR, NLU, DM, NLG, and TTS, the NLU is used to perform word segmentation, part-of-speech tagging, keyword extraction, and the like on the characters processed by the ASR, so as to obtain machine-understandable and structured semantic representation data, that is, the processing process of the NLU is closely related to the specific content indicated by the voice instruction, which directly affects the accuracy of the response of the terminal to the voice instruction, and other algorithm models are insensitive to the specific content indicated by the voice instruction, that is, for different virtual roles, other algorithm models except the NLU can be commonly used. Therefore, when the pseudo-character is generated, the NLU model of the first target character can be migrated and learned, so that the NLU model of the first virtual character is obtained.
Optionally, the NLU model of the first virtual character may include a basic language feature extraction layer as a network front stage and a semantic data extraction layer as a network rear stage. The basic language feature extraction layer can be used for extracting basic features in the character information, such as association between each character and a context and the like, and for NLU models of different virtual roles, the basic language feature extraction layers can be the same; the semantic data extraction layer can further extract and obtain vertical domain information, intention information and word slot information on the basis of the basic characteristics of the extracted character information based on the AI service processed by the virtual role.
Fig. 8 is a schematic diagram of a role migration learning principle provided in the embodiment of the present application. When the NLU model of the second virtual character is obtained, migration training may be performed on a semantic data extraction layer (i.e., a network post stage) of the NLU model based on the first speech semantic annotation data, where the NLU model generated by the training is the NLU model of the first virtual character, and the NLU model of the first virtual character may extract corresponding vertical domain information, intention information, and word slot information from the speech instruction belonging to the AI service processed by the first virtual character.
Fig. 9 is a schematic structural diagram of an NLU model according to an embodiment of the present disclosure. As can be seen from fig. 9, the NLU model includes 8 LSTM networks, where the first two LSTM networks are basic language feature extraction layers, the last six LSTM networks are semantic data extraction layers, the third and fourth layers are vertical domain network layers, the fifth and sixth layers are intention network layers, and the seventh and eighth layers are word slot network layers. The vertical domain network layer and the intention network layer may extract vertical domain information and intention information from the input text information based on intention information included in a preset intention information base, where one possible manner may be to search the intention information included in the preset intention information base from the text information, and determine vertical domain information associated with the intention information according to an association relationship between the preset vertical domain information and the intention information. The word slot network layer may extract word slot information from the input text information based on the word slot information included in the preset word slot information base in a manner similar to the extraction of the intention information.
Referring to fig. 9, the text information input into the NLU model is "play song of three, and the first two LSTM layers extract the text information based on the language features. On the basis of the first two layers, the vertical domain network layer extracts vertical domain information as equipment control, the intention network layer extracts intention information as playing music, and the word slot network layer extracts word slot information as Zhang III.
Alternatively, since the basic linguistic feature extraction layers can be the same for NLU models of different avatars, the data can be labeled with a small amount of speech semantics, carrying out supervised training on a semantic data extraction layer in the NLU model, including acquiring an NLU model of a second virtual role, setting network parameters of a basic language feature extraction layer in the NLU model of the second virtual role as constants, so as to freeze the network parameters of the basic language feature extraction layer, and then based on the first voice semantic annotation data, training the network parameters in the semantic data extraction layer in the NLU model of the second virtual role to obtain the NLU model of the first virtual role, the NLU model of the first virtual role comprises a basic language feature extraction layer and a trained semantic data extraction layer, and the basic language feature extraction layer is the same as the basic language feature extraction layer in the NLU model of the second virtual role. That is, the expansion of the new virtual role can be rapidly realized on the basis of only a small amount of voice semantic annotation data.
For example, the AI service processed by the second virtual character is video playing, the corresponding vertical domain information includes device control, the intention information includes commonly used semantic sentence patterns and keywords such as playing and pausing, and if the AI service processed by the first virtual character is video playing, the vertical domain information and the intention information may be the same, and only the keywords (such as movie name and director name) for video playing in the word slot information need to be replaced by the keywords (such as song name and singer name) for audio playing.
In addition, when deployment from a cloud side, end-cloud cooperative deployment or terminal distributed deployment is adopted, the terminal can send the first voice semantic annotation data (and the second virtual role) to the cloud server or another terminal, and can also receive the first virtual role sent by the cloud server or another terminal, so that the second virtual role is migrated and learned through the cloud server or another terminal.
S704, generating second voice semantic annotation data based on the first voice semantic annotation data and the first semantic data.
If there is no second virtual role associated with the first virtual role, the first virtual role is not easy to generate in a transfer learning manner, but because the first voice semantic annotation data and the first semantic data of the first virtual role are obtained, and the first voice semantic annotation data comprises a first voice instruction and second voice semantic annotation data for annotating the first voice instruction, a large amount of second voice semantic annotation data can be quickly generated according to the relationship between the second voice semantic annotation data and the first voice instruction, so that sufficient voice semantic annotation data for training to generate the first virtual role is obtained.
Optionally, since the GAN may include a generating network and a determining network, where the generating network may be used to generate "false data", and the determining network may be used to determine whether the input data is "false data" generated by the generating network or natural "true data", and the "false data" generated by the generating network may be as close to the "true data" as possible through the two networks, in the embodiment of the present application, when generating the second voice semantic annotation data, the voice data corresponding to the first semantic data may be generated according to a small amount of real voice semantic annotation data (i.e., the first voice semantic annotation data) through the GAN, so as to obtain a large amount of second voice semantic annotation data, and train and generate the first virtual character, thereby greatly reducing the data amount of the voice semantic annotation data collected in advance for generating a new virtual character, The acquisition cost is reduced.
The preset GAN can be tuned and trained based on the first voice semantic labeling data, so that the preset GAN learns to obtain a relation between a voice instruction and labeled semantic data, then second voice instructions corresponding to the first semantic data are generated based on the tuned and trained preset GAN, so that second voice semantic labeling data are obtained, and the generated second voice semantic labeling data can comprise the second voice instructions and the first semantic data for labeling the second voice instructions.
Fig. 10 is a schematic diagram of a principle of generating speech semantic annotation data according to an embodiment of the present application. The generating network 1010 generates a corresponding voice command according to the input word slot information and intention information (wherein, the associated vertical domain information can be determined according to the intention information), the judging network 1020 analyzes the generated voice command to obtain the word slot information, the intention information and the vertical domain information, and the word slot information, the intention information and the vertical domain information analyzed and output by the judging network 1020 are compared with the word slot information, the intention information and the vertical domain information input into the generating network 1010, so that the supervision training of the generating network 1010 and the judging network 1020 is realized, and the generated voice command is as close to the voice command input by the user in the real scene as possible.
It should be noted that the preset GAN may be constructed by using a pre-training model based on a Bidirectional Encoder (BERT), a generative pre-training (GPT), a GPT-2, and the like.
Optionally, third speech semantic annotation data may be obtained in advance, and a preset GAN is obtained based on training of the third speech semantic annotation data, so that the preset GAN has a strong semantic generalization capability, and it is also ensured that a second speech instruction corresponding to the first semantic data can be generated based on a small amount of first speech semantic annotation data through the preset GAN.
The third voice semantic annotation data comprises a third voice instruction, third semantic data used for annotating the third voice instruction, fourth semantic data and a fourth voice instruction used for annotating the fourth semantic data; the third semantic data may include third vertical domain information, third intention information, and third word slot information, and the fourth semantic data includes fourth vertical domain information, fourth intention information, and fourth word slot information. Therefore, the generating network in the GAN can be trained through the fourth semantic data and the fourth voice instruction for labeling the fourth semantic data, so that the generating network can generate a corresponding voice instruction according to the input semantic data, and the discriminating network of the GAN can be trained through the third voice instruction and the third semantic data for labeling the third voice instruction, so that the discriminating network can extract the corresponding semantic data from the input voice instruction.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal may send the first voice semantic annotation data and the first semantic data to the cloud server or another terminal, and may also receive a second voice instruction or second voice semantic annotation data sent by the cloud server or another terminal, so that the second voice semantic annotation data is generated by the cloud server or another terminal.
S705, training to obtain a first virtual role based on the second voice semantic annotation data.
When a large amount of second voice semantic labeling data is generated and obtained, the first virtual character can be obtained based on the second voice semantic labeling data training.
And training to obtain the NLU model of the first virtual role based on the second voice semantic labeling data.
In addition, in another optional embodiment of the present application, the first virtual character may also be obtained by training based on the first voice semantic labeling data and the second voice semantic labeling data, that is, the obtained first semantic data and the first voice semantic labeling data are fully utilized, so that the first virtual character is obtained by training more voice semantic labeling data, and the accuracy of generating the first virtual character is further improved.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal can send the second voice semantic annotation data to the cloud server or another terminal, and can also receive the first virtual role sent by the cloud server or another terminal, so that the first virtual role is generated through training of the cloud server or another terminal.
S706, the first virtual role is guided into the role resource library.
When a new virtual role is generated, the virtual role can be stored, so that the virtual role can be conveniently awakened and corresponding AI services can be processed subsequently.
The NLU model and the first semantic data of the first virtual character may be stored in a character resource library.
When the first semantic data is stored in the role resource library, the word slot information may be stored in the word slot information library, and the intention information may be stored in the intention information library. Of course, the vertical domain information can also be stored in the vertical domain information base.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal can send the first virtual role to the cloud server or another terminal, so that the first virtual role is led into a role resource library located in the cloud server or another terminal.
And S707, judging whether the role expansion is finished, if so, finishing, and if not, returning to the S701.
The role extension can be determined to end when the first virtual role is imported into the role resource repository. Of course, in practical applications, it may also be determined whether the current role extension is ended in other manners, for example, if no user operation is received within a first preset time period after the first virtual role is introduced into the role resource library, it may be determined that the role extension is ended. The method for determining whether the role extension is ended is not particularly limited in the embodiment of the present application.
If the current role expansion is not finished, other role expansion data and voice semantic annotation data submitted by the user can be continuously received, so that more first virtual roles are continuously generated.
S707 may be omitted, that is, when the first virtual character is imported into the character repository, whether the character extension is finished is not determined.
In this embodiment of the application, first semantic data and first voice semantic annotation data of a first virtual character may be obtained, where the first voice semantic annotation data includes a first voice instruction and second semantic data used for annotating the first voice instruction, the first semantic data includes first vertical domain information, first intention information, and first word slot information, the second semantic data includes second vertical domain information, second intention information, and second word slot information, and since the vertical domain information is used for indicating a function field to which the voice instruction belongs, the intention information is used for indicating an operation type of the voice instruction, and the word slot information is used for indicating an operation parameter of the voice instruction. Therefore, a second voice instruction corresponding to the first semantic data can be generated based on the first voice semantic annotation data to obtain second voice semantic annotation data, and the second voice semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction. The first virtual character can be trained based on the second voice semantic annotation data. Because the second voice semantic data can be generated based on the first voice semantic labeling data and the first semantic data, when a new virtual role is generated, only a small amount of first voice semantic labeling data can be collected, and then a large amount of second voice semantic data can be generated based on the first voice semantic labeling data and the first semantic data, so that the data volume of voice instructions or first voice semantic labeling data collected in advance for generating the virtual role can be greatly reduced, the generation of the new virtual role can be rapidly and efficiently expanded, the period and the cost for generating the virtual role are reduced, a user can conveniently customize the personalized virtual role in time according to requirements, and the agility and the expansibility of the AI business are improved.
In addition, because a new virtual role can be generated by correspondingly expanding aiming at different AI services more easily, corresponding virtual roles can be generated aiming at the AI services with different functions in different fields, and the virtual roles can accurately and reliably process the AI services, thereby relieving the contradiction between the function breadth and the response accuracy of the virtual roles.
In addition, it can be understood that, for any character to be generated, the character to be generated may be generated through S704-S705, and for a first virtual character for which a second virtual character can be found, the first virtual character may be generated through S703 to further reduce the period of generating the virtual character and improve the efficiency of generating the virtual character, so in an actual application, when the character to be generated is generated, S702 may not be executed, that is, it is not determined whether a second virtual character associated with the character to be generated currently exists, but S701 and S704-707 are directly executed to generate the first virtual character.
Fig. 11 is a flowchart of a method for generating a virtual character according to an embodiment of the present disclosure. It should be noted that the method may be applied to a terminal, an interaction between the terminal and a cloud server, or an interaction between the terminal and the terminal, at least one step in the following method may be performed by the terminal independently, or by the cloud server or another terminal, or by the terminal in cooperation with the cloud server or another terminal, and the method is not limited by fig. 11 and the specific sequence described below, it should be understood that, in other embodiments, the sequence of some steps in the method may be interchanged according to actual needs, or some steps in the method may be omitted or deleted. The method comprises the following steps:
s1101, obtaining first voice semantic annotation data of a first virtual role to be generated.
The first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction.
It should be noted that the manner of acquiring the first speech semantic annotation data in S1101 may be the same as the manner of acquiring the first speech semantic annotation data in S301, and is not described herein again.
S1102, judging whether a second virtual role related to the first virtual role exists or not based on the second semantic data. If so, S1103 is executed, otherwise, S1104 is executed.
Since the semantic data of the first virtual character is also included in the first voice semantic annotation data, whether a second virtual character associated with the first virtual character exists can be searched for through the second semantic data included in the first voice semantic annotation data.
It should be noted that, the manner of determining whether the second virtual character associated with the first virtual character exists based on the second semantic data may be the same as the manner of determining whether the second virtual character associated with the first virtual character exists based on the first semantic data, and details are not repeated here.
S1103, the first virtual character is generated by performing migration learning on the second virtual character.
It should be noted that, the manner of generating the first virtual character by performing migration learning on the second virtual character in S1103 may be the same as the manner of generating the first virtual character by performing migration learning on the second virtual character in S703, and details are not repeated here.
S1104, first semantic data of the first virtual role is obtained.
It should be noted that the manner of obtaining the first semantic data of the first virtual character in S1104 may be the same as the manner of obtaining the first semantic data of the first virtual character in S701, and details are not repeated here.
S1105, generating second voice semantic annotation data based on the first voice semantic annotation data and the first semantic data.
S1106, training to obtain the first virtual role based on the second voice semantic annotation data.
S1107, import the first virtual role into the role resource library.
S1108, judging whether the role expansion is finished, if so, finishing, otherwise, returning to S1101.
It should be noted that the execution manners of S1105 to S1108 may be the same as the execution manners of S704 to S707, and are not described again here.
In the embodiment of the present application, first semantic data of a first virtual character may not be obtained, but second semantic data in first voice semantic annotation data may be used to determine whether a second virtual character associated with the first virtual character exists, so that if the second virtual character currently exists, the first virtual character may also be generated without obtaining the first semantic data. Therefore, on the basis of the beneficial effects of the method for generating the virtual character provided by fig. 7, the data required for generating a new virtual character can be further reduced, and the cost for acquiring the data is reduced.
Secondly, because a new virtual role can be generated by correspondingly expanding aiming at different AI services more easily, the corresponding virtual role can be generated aiming at the AI services with different functions in different fields, and the virtual role can accurately and reliably process the AI services, thereby relieving the contradiction between the function breadth and the response accuracy of the virtual role.
In the above, it has been explained how to generate virtual characters, and then, it will be explained how to use existing virtual characters.
Fig. 12 is a flowchart of a method for processing an AI service according to an embodiment of the present disclosure. It should be noted that the method may be applied to a terminal, an interaction between the terminal and a cloud server, or an interaction between the terminal and the terminal, at least one step in the following method may be performed by the terminal independently, or by the cloud server or another terminal, or by the terminal in cooperation with the cloud server or another terminal, and the method is not limited to fig. 12 and the specific sequence described below, it should be understood that, in other embodiments, the sequence of some steps in the method may be interchanged according to actual needs, or some steps in the method may be omitted or deleted. The method comprises the following steps:
s1201, when receiving a role awakening instruction, obtaining role indication information, wherein the role indication information is used for indicating a third virtual role to be awakened.
In order to process the corresponding AI service through the role, the terminal can receive the role awakening instruction, and in order to process different AI services through the virtual roles respectively, the virtual roles can only understand the voice instruction of the user in the scene corresponding to the AI service, so that semantic confusion is reduced, and the accuracy of responding to the voice instruction is improved.
The third virtual role can be a virtual role that the user wishes to wake up.
And the role awakening instruction can be used for awakening the virtual role by the user. The character wake-up instruction may include a fifth voice instruction or a UI control operation instruction.
The role wake-up command may be received through the device input module 110 in the virtual role system 100, and the role indication information may be obtained through the AI platform 130.
And S1202, determining a third virtual role matched with the role indication information.
Wherein the third virtual character may be determined among at least one existing virtual character in at least one manner, and if the third virtual character is determined in more than two manners, the third virtual character may be determined when the determination results (or more than half of the determination results) in various manners are the same virtual character.
In one mode, the character indication information may include at least one of a wakeup word and a name, when the character wakeup command includes a fifth voice command, text information corresponding to the fifth voice command may be extracted, whether at least one of a wakeup word and a name corresponding to any virtual character is included in the text information is detected, and if yes, the virtual character may be determined as a third virtual character. In another mode, the role indication information may include a user voiceprint, and when the role wake-up instruction includes a fifth voice instruction, the user voiceprint corresponding to the fifth voice instruction may be extracted, and then it is detected whether the user voiceprint is the same as a user voiceprint associated with any virtual role, and if so, the virtual role may be determined as a third virtual role. In another mode, the role indication information may include AI service information, and when the role wake-up instruction includes a fifth voice instruction, text information corresponding to the fifth voice instruction may be extracted, AI service information (information such as weather query, song playing, and the like) may be detected from the text information, and then a virtual role corresponding to the AI service information may be acquired as a third virtual role. In another mode, the role indication information may include an operation parameter (e.g., a click position) of the UI operation instruction, and when the role wake-up instruction includes the UI operation instruction, the virtual role corresponding to the UI operation instruction may be determined as the third virtual role based on the operation parameter of the UI operation instruction. In another manner, the character indication information may include context mode information (e.g., at least one of location information, weather information, temperature information, and time information), and the current context mode information may be acquired and then the avatar corresponding to the context mode information may be acquired as the third avatar.
Fig. 13 is a schematic view of a UI interface provided in an embodiment of the present application. In the interface, the user inputs a fifth voice instruction, the terminal acquires that the text information in the fifth voice instruction is ' little art doctor ', and detects that the ' little art doctor ' is a wake-up word of the virtual character ' little art doctor ', so that the little art doctor is determined to be a third virtual character, and the responded text information ' little art doctor ' is generated and on the way ', and the responded text information is subjected to voice broadcast.
Please refer to fig. 14 and 15, which are schematic diagrams of another UI interface provided in the embodiments of the present application. In the interface of fig. 14, the user is prompted by text and voice that "the icon can be clicked, the small art professional character is selected", and three character buttons are displayed below the interface, each character button can correspond to one virtual character, and a voice button is also displayed, so that the user can conveniently select a mode of a character wake-up instruction. When a click operation of the user is received based on a character button corresponding to the "little art doctor", it is determined that the "little art doctor" is the third virtual character, and thus the text information "little art doctor provides a professional health guide for you" is generated in response, as shown in fig. 15.
Please refer to fig. 16 and 17, which are schematic diagrams of another UI interface provided in the embodiments of the present application. In the interface shown in fig. 16, the terminal detects that the user searches for "art" in the search box, and searches for and displays the character icons corresponding to the four characters, such as "XX art", "art chef", "art doctor", and "art teacher". When a click operation of the user is received based on the character target corresponding to the "schoolmate", the "schoolmate" may be determined as the third virtual character, and the text information "schoolmate, help you grow up, and load the latest learning resource" is generated, as shown in fig. 17.
The third virtual character matching the character indication information can be determined by the character selection module 140 in the virtual character system 100.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment, or terminal distributed deployment is adopted, the terminal may send the acquired role indication information to the cloud server or another terminal device, and may also acquire the determined third virtual role from the cloud server or another terminal device.
S1203, loading the role resource of the third virtual role.
When the third virtual role awakened by the user is determined, the role resource of the third virtual role can be obtained and loaded, so that corresponding service can be provided for the user through the third virtual role in the following process.
It should be noted that, if a role resource of another virtual role is currently being loaded, that is, an AI service is being processed through the virtual role, the loaded role resource of the virtual role may be replaced with a role resource of a third virtual role, so as to switch the virtual role currently processing the AI service.
The role resources of the third virtual role can be obtained and loaded from the role resource library 150 through the AI platform 130 and the application management framework 190 in the virtual role system 100.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment or terminal distributed deployment is adopted, the terminal can acquire and load role resources of the third virtual role from the cloud server or another terminal.
Through the above S1201-S1203, the user has already woken up the third virtual role, and in the next step, the corresponding AI service may be processed based on the third virtual role.
S1204, receiving a fourth voice command.
The fourth voice instruction may be a voice instruction issued by the user for a service that needs to be acquired. For example, the fourth voice command may be "play song of zhang san", "inquire tomorrow weather", and "turn on rice cooker".
It should be noted that the fifth voice command and the fourth voice command may be the same voice command, or may be obtained in S1201 at the same time. For example, "a small art chef tells me the recipe of the eggplant braised in soy", wherein the "small art chef" can be used as a wakeup word indicating the virtual role of "the small art chef", and the "the recipe telling me the eggplant braised in soy" can be used as a service required to be acquired from the "small art chef".
Wherein, the fourth voice command can be received through the device input module 110 in the aforementioned avatar system 100.
And S1205, generating response control information corresponding to the fourth voice command based on the role resource.
The response control information may be used to indicate at least one task generated for the fourth voice instruction, such as generating text/voice information as a response, controlling the specified device, invoking a third-party service for information query, and the like.
The response control information corresponding to the fourth voice command may be generated by the AI platform 130 in the virtual role system 100, and the response control information may be obtained by performing a cascade process on the fourth voice command through the ASR module 220, the NLU module 230, and the DM module 240.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment, or terminal distributed deployment is adopted, the terminal may send the fourth voice instruction to the cloud server or another terminal, and may also receive response control information corresponding to the fourth voice instruction sent by another terminal.
Optionally, since the terminal may generate the response control information corresponding to the fourth voice instruction through the home terminal, the cloud server, or another terminal, the terminal may obtain a plurality of response control information corresponding to the fourth voice instruction, and in this case, the terminal may select one of the plurality of response control information according to the preset selection policy and perform the subsequent step.
The preset selection policy may be obtained by setting in advance, for example, the response control information obtained first after the fourth voice instruction is received may be selected; alternatively, the most appropriate one of the plurality of response control information may be selected by machine mining or the like.
S1206, based on the response control information, executes the response task.
By performing at least one of the response tasks, the service indicated to the user by the fourth voice instruction can be completed.
Referring to fig. 18 and 19, schematic diagrams of another UI interface provided in the embodiments of the present application are shown. In fig. 18, the user wakes up the virtual character "Xiao Yi doctor" and inputs a academic problem "viral influenza" by voice, the terminal generates relevant suggestions for the academic problem by searching and generates text/voice information "viral influenza, advices are used by taking ammonium chloride and ambroxol medicines according to medical advices, and sufficient sleep and light diet" are guaranteed to be fed back to the user. In fig. 19, the user wakes up the virtual character "schoolteacher" and inputs an academic question "viral cold" by voice, the terminal generates relevant knowledge for the academic question by searching, generates text/voice information "upper respiratory tract infection disease with virus as a pathogenic source, and the basic knowledge point: viruses can be classified into DNA viruses and RNA viruses, and are a non-cellular form "consisting of a nucleic acid molecule and a protein that is fed back to the user. As can be seen from comparing fig. 18 with fig. 19, different virtual roles belong to different fields, and the semantics of the same voice command can be obtained from different professional perspectives, so that each voice command can be accurately responded, and the accuracy of the AI service to be processed is improved. And the more virtual roles the terminal has, the more technical fields can be covered, the more detailed function fields each virtual role belongs to can be, the more the scope of AI business (namely the scope of virtual role function) can be better improved, and the accuracy of AI business processing can also be improved.
The task management and service logic module 180 in the virtual character system 100 can organize and manage tasks according to the response control information, and invoke the peripheral systems or devices such as the device output module 170, the smart brain 192, the system service/information platform 191, and the like through the application management framework 190 to execute each task.
In addition, when deployment from a cloud side, end-cloud-system cooperative deployment, or terminal-distributed deployment is adopted, the terminal may send the response control information to the cloud server or another terminal, so that the cloud server or another terminal executes a corresponding response task based on the response notification information.
S1207, judging whether the user interaction is finished, if so, finishing, and if not, returning to the S1201.
If no other voice instruction or operation of the user is received within a second preset time after the response task is executed, it may be determined that the interaction with the user is ended.
It should be noted that, it may be determined, through the device input module 110 in the virtual character system 100, whether another voice instruction or operation of the user is received within a second preset time period after the response task is executed.
It should be noted that the second preset time period may be determined by setting in advance.
In the embodiment of the present application, the terminal may include a plurality of virtual roles, and the plurality of virtual roles may be divided by at least one preset dimension, so as to ensure that AI services in multiple aspects can be processed, and significantly improve the functional breadth of the virtual roles. When a role awakening instruction is received, role indication information can be obtained, so that a matched third virtual role is determined in a plurality of virtual roles currently included according to the role indication information, role resources of the third virtual role are loaded, an AI (artificial intelligence) service is processed based on the third virtual role, the third virtual role is difficult to generate ambiguity on a voice instruction, the AI service can be accurately processed, and the accuracy rate of responding to the voice instruction is remarkably improved.
It should be noted that there is no time-sequential limitation between the expansion generation of the new virtual role and the use of the existing virtual role, for example, a user may start the expansion creation of the new virtual role in the process of using the existing virtual role; alternatively, various virtual roles may be generated first, and then the corresponding AI services may be processed using the virtual roles.
Based on the same inventive concept, as an implementation of the foregoing method, an embodiment of the present application provides an apparatus for generating a virtual role and an apparatus for processing an AI service, where the apparatus embodiment corresponds to the foregoing method embodiment, and for convenience of reading, details in the foregoing method embodiment are not repeated one by one in the apparatus embodiment, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the foregoing method embodiment.
Referring to fig. 20, a schematic structural diagram of an apparatus 2000 for generating a virtual character according to an embodiment of the present application is shown in fig. 20, where the apparatus according to the embodiment includes:
an obtaining module 2010, configured to obtain first semantic data and first voice semantic labeling data of a first virtual role to be generated;
the generating module 2020 is configured to generate, based on the first voice semantic annotation data, a second voice instruction corresponding to the first semantic data to obtain second voice semantic annotation data;
a training module 2030, configured to train to obtain the first virtual character based on the second speech semantic labeling data;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the second voice semantic annotation data comprises a second voice instruction and first semantic data used for annotating the second voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
Optionally, the generating module is further configured to:
based on the first semantic data, searching a second virtual role associated with the first virtual role;
and if the second virtual role is not found, generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data.
Optionally, the generating module is further configured to perform tuning training on a preset GAN based on the first speech semantic annotation data; and generating a second voice instruction corresponding to the first semantic data based on the preset GAN after tuning training.
Optionally, the obtaining module is further configured to obtain third voice semantic annotation data, where the third voice semantic annotation data includes a third voice instruction, third semantic data used for annotating the third voice instruction, fourth semantic data, and a fourth voice instruction used for annotating the fourth semantic data;
the training module is further configured to train to obtain the preset GAN based on the third speech semantic labeling data.
Optionally, the obtaining module is further configured to obtain role indication information when a role wake-up instruction is received, where the role indication information is used to indicate a third virtual role to be woken up;
further comprising:
the determining module is used for determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
the loading module is used for loading the role resources of the third virtual role;
and the processing module is used for processing the AI service based on the third virtual role.
The apparatus 2100 for generating a virtual role provided in this embodiment may perform the method embodiment shown in fig. 7, and its implementation principle and technical effect are similar, which are not described herein again.
Referring to fig. 21, a schematic structural diagram of an apparatus 2100 for generating a virtual character according to an embodiment of the present disclosure is shown in fig. 21, where the apparatus according to the embodiment includes:
the obtaining module 2110 is used for obtaining first semantic data and first voice semantic annotation data of a first virtual role to be generated;
a searching module 2120, configured to search, based on the first semantic data, a second virtual role associated with the first virtual role;
a training module 2130, configured to perform migration learning training on the second virtual character based on the first voice semantic labeling data if the second virtual character is found, to obtain the first virtual character;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
Optionally, the NLU model of the first virtual character includes a basic language feature extraction layer and a semantic data extraction layer, and the training module is further configured to:
acquiring an NLU model of the second virtual role;
setting the network parameters of the basic language feature extraction layer in the NLU model of the second virtual role as constants;
and training network parameters in the semantic data extraction layer in the NLU model of the second virtual role based on the first voice semantic labeling data to obtain the NLU model of the first virtual role.
Optionally, the method further comprises:
and the storage module is used for storing the NLU model of the first virtual role and the first semantic data to a role resource library.
Optionally, the lookup module is further configured to:
acquiring fifth semantic data of at least one existing virtual role;
determining role similarity between the at least one virtual role and the first virtual role respectively based on the first semantic data and the fifth semantic data;
and searching a second virtual role associated with the first virtual role according to the role similarity between the at least one virtual role and the first virtual role.
Optionally, the obtaining module is further configured to obtain role indication information when a role wake-up instruction is received, where the role indication information is used to indicate a third virtual role to be woken up;
further comprising:
the determining module is used for determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing at least one preset dimension;
the loading module is used for loading the role resources of the third virtual role;
and the processing module is used for processing the AI service based on the third virtual role.
Optionally, the processing module is further configured to:
receiving a fourth voice instruction;
generating response control information corresponding to the fourth voice instruction based on the role resource;
based on the response control information, a response task is executed.
The apparatus 2100 for generating a virtual role provided in this embodiment may perform the method embodiment shown in fig. 7, and its implementation principle and technical effect are similar, which are not described herein again.
Referring to fig. 22, a schematic structural diagram of an apparatus 2200 for generating a virtual character according to an embodiment of the present application is shown, as shown in fig. 22, the apparatus according to the embodiment includes:
an obtaining module 2210, configured to obtain first voice semantic labeling data of a first virtual character to be generated, where the first voice semantic labeling data includes a first voice instruction and second semantic data for labeling the first voice instruction;
a searching module 2220, configured to search, based on the second semantic data, for a second virtual role associated with the first virtual role;
a training module 2230, configured to perform migration learning training on the second virtual character based on the first voice semantic labeling data if the second virtual character is found, so as to obtain the first virtual character.
Optionally, the obtaining module is further configured to obtain first semantic data of the first virtual character if the second virtual character is not found;
the training module is further used for training to obtain the first virtual role based on the second voice semantic labeling data;
further comprising:
and the generating module is used for generating a second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data to obtain second voice semantic annotation data, wherein the second voice semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction.
The apparatus 2200 for generating a virtual role provided in this embodiment may execute the method embodiment shown in fig. 11, and its implementation principle and technical effect are similar, which are not described herein again.
Referring to fig. 23, a schematic structural diagram of a device 2300 for processing an AI service according to an embodiment of the present application is shown, as shown in fig. 23, the device according to the embodiment includes:
an obtaining module 2310, configured to obtain role indication information when a role wake-up instruction is received, where the role indication information is used to indicate a third virtual role;
a determining module 2320, configured to determine, in at least one existing virtual role, the third virtual role matched with the role indication information, where the at least one virtual role is obtained by dividing according to at least one preset dimension;
a loading module 2330 for loading role resources of the third virtual role;
a processing module 2340, configured to process the AI service based on the third virtual role.
Optionally, the processing module is further configured to:
receiving a fourth voice instruction;
generating response control information corresponding to the fourth voice instruction based on the role resource;
based on the response control information, a response task is executed.
The apparatus 2300 for generating a virtual character provided in this embodiment may perform the method embodiment shown in fig. 12, and the implementation principle and the technical effect are similar, which are not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Based on the same inventive concept, the embodiment of the application also provides a terminal. Fig. 24 is a schematic structural diagram of a terminal according to an embodiment of the present application, and as shown in fig. 24, the terminal according to the embodiment includes: a memory 2410 and a processor 2420, the memory 2410 for storing computer programs; the processor 2420 is adapted to perform the methods of the above-described method embodiments when the computer program is called.
The terminal provided in this embodiment may execute the above method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
Based on the same inventive concept, the embodiment of the application also provides a chip system. The chip system comprises a processor coupled to a memory, the processor executing a computer program stored in the memory to implement the method of the first aspect or any of the embodiments of the first aspect.
The chip system can be a single chip or a chip module consisting of a plurality of chips.
Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method described in the above method embodiments.
The embodiment of the present application further provides a computer program product, when the computer program product runs on a middle terminal, the terminal implements the method described in the above method embodiment when executed.
Fig. 25 is a schematic structural diagram of a terminal 2500 provided in the present application. The terminal 2500 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the terminal 2500. In other embodiments of the present application, terminal 2500 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
The controller may be, among other things, the neural center and command center of the terminal 2500. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. The repeated accesses are reduced, reducing the latency of the processor 110, and thus increasing the efficiency of the system.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.
The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K via an I2C interface, such that the processor 110 and the touch sensor 180K communicate via an I2C bus interface to implement the touch functionality of the terminal 2500.
The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface, enabling answering of calls via a bluetooth headset.
The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.
MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate over a CSI interface to implement the capture functionality of terminal 2500. Processor 110 and display screen 194 communicate via a DSI interface to implement the display function of terminal 2500.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal 2500, and may also be used to transmit data between the terminal 2500 and peripheral devices. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.
It should be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is only an exemplary illustration, and does not limit the structure of the terminal 2500. In other embodiments of the present application, the terminal 2500 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the terminal 2500. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.
The wireless communication function of the terminal 2500 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 2500 can be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the terminal 2500. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.
The wireless communication module 160 may provide a solution for wireless communication applied to the terminal 2500, including Wireless Local Area Networks (WLANs) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
In some embodiments, antenna 1 of terminal 2500 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 such that terminal 2500 can communicate with networks and other devices via wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
Terminal 2500 implements the display function via the GPU, display screen 194, and application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, terminal 2500 may include 1 or N display screens 194, with N being a positive integer greater than 1.
The terminal 2500 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, terminal 2500 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the terminal 2500 selects in a frequency bin, the digital signal processor is used for performing fourier transform or the like on the frequency bin energy.
Video codecs are used to compress or decompress digital video. Terminal 2500 may support one or more video codecs. In this way, terminal 2500 can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent recognition of the terminal 2500, for example: image recognition, face recognition, speech recognition, text understanding, and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to implement the storage capability of the expansion terminal 2500. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the terminal 2500 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 2500, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.
The terminal 2500 can implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The terminal 2500 can listen to music through the speaker 170A or listen to a hands-free call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the terminal 2500 answers a call or voice information, it can answer a voice by placing the receiver 170B close to the human ear.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The terminal 2500 may be provided with at least one microphone 170C. In other embodiments, the terminal 2500 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, three, four or more microphones 170C may be further disposed on the terminal 2500 to achieve sound signal collection, noise reduction, sound source identification, directional recording function, and the like.
The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The terminal 2500 determines the intensity of the pressure according to the change in capacitance. When a touch operation is applied to the display screen 194, the terminal 2500 detects the intensity of the touch operation based on the pressure sensor 180A. The terminal 2500 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.
The gyro sensor 180B may be used to determine the motion attitude of the terminal 2500. In some embodiments, the angular velocity of terminal 2500 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the terminal 2500, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal 2500 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.
The air pressure sensor 180C is used to measure air pressure. In some embodiments, terminal 2500 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.
The magnetic sensor 180D includes a hall sensor. The terminal 2500 can detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the terminal 2500 is a flip phone, the terminal 2500 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.
The acceleration sensor 180E may detect the magnitude of acceleration of the terminal 2500 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the terminal 2500 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The terminal 2500 may measure a distance by infrared or laser. In some embodiments, taking a scene, terminal 2500 may utilize range sensor 180F to range for fast focus.
The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 2500 emits infrared light outward through a light emitting diode. Terminal 2500 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near terminal 2500. When insufficient reflected light is detected, terminal 2500 can determine that there are no objects near terminal 2500. The terminal 2500 can detect that the user holds the terminal 2500 close to the ear for talking by using the proximity light sensor 180G, so as to automatically extinguish the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.
The ambient light sensor 180L is used to sense the ambient light level. The terminal 2500 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal 2500 is in a pocket to prevent accidental touches.
The fingerprint sensor 180H is used to collect a fingerprint. The terminal 2500 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.
The temperature sensor 180J is used to detect temperature. In some embodiments, terminal 2500 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the terminal 2500 performs a reduction in performance of a processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, terminal 2500 heats battery 142 when the temperature is below another threshold to reduce abnormal shutdown of terminal 2500 caused by low temperatures. In other embodiments, terminal 2500 performs a boost on the output voltage of battery 142 when the temperature is below a further threshold to reduce abnormal shutdown due to low temperature.
The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the terminal 2500 at a different position than the display screen 194.
The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal 2500 may receive a key input, and generate a key signal input related to user setting and function control of the terminal 2500.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card can be attached to and detached from the terminal 2500 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The terminal 2500 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. Terminal 2500 interacts with the network through the SIM card to implement functions such as telephony and data communications. In some embodiments, terminal 2500 employs eSIM, namely: an embedded SIM card. The eSIM card can be embedded in the terminal 2500 and cannot be separated from the terminal 2500.
The software system of the terminal 2500 may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. In the embodiment of the present application, a software structure of the terminal 2500 is exemplarily described by taking an Android system with a layered architecture as an example.
Fig. 26 is a block diagram of a software configuration of the terminal 2500 according to the embodiment of the present application.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the android system is divided into four layers, an application layer, an application framework layer, an android runtime (android runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages.
As shown in fig. 26, the application packages may include camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 26, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.
The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The phone manager is used to provide communication functions of the terminal 2500. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The android runtime comprises a core library and a virtual machine. The android runtime is responsible for scheduling and management of the android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), media libraries (media libraries), three-dimensional graphics processing libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The following describes an exemplary workflow of software and hardware of the terminal 2500 in connection with capturing a photo scene.
When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and taking a control corresponding to the click operation as a control of a camera application icon as an example, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera drive by calling a kernel layer, and captures a still image or a video through the camera 193.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer memory, read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunication signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (14)

1. A method for generating a virtual character, comprising:
acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
generating a second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data to obtain second voice semantic annotation data;
training to obtain the first virtual role based on the second voice semantic labeling data;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the second voice semantic annotation data comprises the second voice instruction and the first semantic data used for annotating the second voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
2. The method of claim 1, wherein generating a second voice instruction corresponding to the first semantic data based on the first voice semantic annotation data comprises:
based on the first semantic data, searching a second virtual role associated with the first virtual role;
and if the second virtual role is not found, generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data.
3. The method according to claim 1 or 2, wherein the generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data comprises:
performing tuning training on a preset generation type countermeasure network GAN based on the first voice semantic annotation data;
and generating a second voice instruction corresponding to the first semantic data based on the preset GAN after tuning training.
4. The method of claim 3, wherein before the tuning training of the preset GAN based on the first speech semantic labeling data, further comprising:
acquiring third voice semantic annotation data, wherein the third voice semantic annotation data comprises a third voice instruction, third semantic data used for annotating the third voice instruction, fourth semantic data and a fourth voice instruction used for annotating the fourth semantic data;
and training to obtain the preset GAN based on the third voice semantic annotation data.
5. The method of any of claims 1-4, further comprising:
when a role awakening instruction is received, role indicating information is obtained, and the role indicating information is used for indicating a third virtual role to be awakened;
determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
loading role resources of the third virtual role;
and processing Artificial Intelligence (AI) business based on the third virtual role.
6. A method for generating a virtual character, the method comprising:
acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
based on the first semantic data, searching a second virtual role associated with the first virtual role;
if the second virtual role is found, performing transfer learning training on the second virtual role based on the first voice semantic annotation data to obtain the first virtual role;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
7. The method of claim 6, wherein the Natural Language Understanding (NLU) model of the first avatar comprises a basic language feature extraction layer and a semantic data extraction layer, and wherein the performing the transfer learning training on the second avatar based on the first speech semantic labeling data to obtain the first avatar comprises:
acquiring an NLU model of the second virtual role;
setting the network parameters of the basic language feature extraction layer in the NLU model of the second virtual role as constants;
and training network parameters in the semantic data extraction layer in the NLU model of the second virtual role based on the first voice semantic labeling data to obtain the NLU model of the first virtual role.
8. The method of claim 6 or 7, further comprising:
and storing the NLU model of the first virtual role and the first semantic data to a role resource library.
9. The method of any of claims 6-8, wherein the locating a second virtual role associated with the first virtual role based on the first semantic data comprises:
acquiring fifth semantic data of at least one existing virtual role;
determining role similarity between the at least one virtual role and the first virtual role respectively based on the first semantic data and the fifth semantic data;
and searching a second virtual role associated with the first virtual role according to the role similarity between the at least one virtual role and the first virtual role.
10. The method according to any one of claims 6-9, further comprising:
when a role awakening instruction is received, role indicating information is obtained, and the role indicating information is used for indicating a third virtual role to be awakened;
determining the third virtual role matched with the role indication information in at least one existing virtual role, wherein the at least one virtual role is obtained by dividing according to at least one preset dimension;
loading role resources of the third virtual role;
and processing the AI business based on the third virtual role.
11. An apparatus for generating a virtual character, comprising:
the acquisition module is used for acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
the generating module is used for generating a second voice instruction corresponding to the first semantic data based on the first voice semantic labeling data to obtain second voice semantic labeling data;
the training module is used for training to obtain the first virtual role based on the second voice semantic labeling data;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the second voice semantic annotation data comprises a second voice instruction and the first semantic data used for annotating the second voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
12. An apparatus for generating a virtual character, comprising:
the acquisition module is used for acquiring first semantic data and first voice semantic annotation data of a first virtual role to be generated;
a searching module, configured to search, based on the first semantic data, a second virtual role associated with the first virtual role;
the training module is used for performing transfer learning training on the second virtual character based on the first voice semantic annotation data to obtain the first virtual character if the second virtual character is found;
the first voice semantic annotation data comprises a first voice instruction and second semantic data used for annotating the first voice instruction; the first semantic data comprises first vertical domain information, first intention information and first word slot information; the second semantic data includes second vertical domain information, second intention information, and second word slot information.
13. A terminal, comprising: a memory for storing a computer program and a processor; the processor is adapted to perform the method of any of claims 1-5 or the method of any of claims 6-10 when the computer program is invoked.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5 or the method according to any one of claims 6-10.
CN202010466955.1A 2020-05-28 2020-05-28 Method and device for generating virtual roles Active CN113742460B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010466955.1A CN113742460B (en) 2020-05-28 2020-05-28 Method and device for generating virtual roles
PCT/CN2021/082911 WO2021238371A1 (en) 2020-05-28 2021-03-25 Method and apparatus for generating virtual character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010466955.1A CN113742460B (en) 2020-05-28 2020-05-28 Method and device for generating virtual roles

Publications (2)

Publication Number Publication Date
CN113742460A true CN113742460A (en) 2021-12-03
CN113742460B CN113742460B (en) 2024-03-29

Family

ID=78724145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010466955.1A Active CN113742460B (en) 2020-05-28 2020-05-28 Method and device for generating virtual roles

Country Status (2)

Country Link
CN (1) CN113742460B (en)
WO (1) WO2021238371A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925181A (en) * 2022-04-28 2022-08-19 支付宝(杭州)信息技术有限公司 Data processing method and device, computer storage medium and terminal
CN117708347A (en) * 2023-12-14 2024-03-15 北京英视睿达科技股份有限公司 Method and system for outputting multi-mode result by large model based on API (application program interface) endpoint

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108519816A (en) * 2018-03-26 2018-09-11 广东欧珀移动通信有限公司 Information processing method, device, storage medium and electronic equipment
CN110310636A (en) * 2019-06-24 2019-10-08 歌尔股份有限公司 Interaction control method, device, equipment and audio frequency apparatus
CN110688008A (en) * 2019-09-27 2020-01-14 贵州小爱机器人科技有限公司 Virtual image interaction method and device
CN110992947A (en) * 2019-11-12 2020-04-10 北京字节跳动网络技术有限公司 Voice-based interaction method, device, medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10810371B2 (en) * 2017-04-06 2020-10-20 AIBrain Corporation Adaptive, interactive, and cognitive reasoner of an autonomous robotic system
CN109559748B (en) * 2018-12-21 2019-09-24 出门问问信息科技有限公司 A kind of method for recognizing semantics, device, smart machine and storage medium
CN109753565A (en) * 2018-12-27 2019-05-14 厦门智融合科技有限公司 Intellectual Property intelligent service method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108519816A (en) * 2018-03-26 2018-09-11 广东欧珀移动通信有限公司 Information processing method, device, storage medium and electronic equipment
CN110310636A (en) * 2019-06-24 2019-10-08 歌尔股份有限公司 Interaction control method, device, equipment and audio frequency apparatus
CN110688008A (en) * 2019-09-27 2020-01-14 贵州小爱机器人科技有限公司 Virtual image interaction method and device
CN110992947A (en) * 2019-11-12 2020-04-10 北京字节跳动网络技术有限公司 Voice-based interaction method, device, medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925181A (en) * 2022-04-28 2022-08-19 支付宝(杭州)信息技术有限公司 Data processing method and device, computer storage medium and terminal
CN117708347A (en) * 2023-12-14 2024-03-15 北京英视睿达科技股份有限公司 Method and system for outputting multi-mode result by large model based on API (application program interface) endpoint

Also Published As

Publication number Publication date
CN113742460B (en) 2024-03-29
WO2021238371A1 (en) 2021-12-02

Similar Documents

Publication Publication Date Title
RU2766255C1 (en) Voice control method and electronic device
CN110111787B (en) Semantic parsing method and server
CN110134316B (en) Model training method, emotion recognition method, and related device and equipment
CN110910872B (en) Voice interaction method and device
CN112567457B (en) Voice detection method, prediction model training method, device, equipment and medium
CN110138959B (en) Method for displaying prompt of human-computer interaction instruction and electronic equipment
CN110825469A (en) Voice assistant display method and device
CN111061912A (en) Method for processing video file and electronic equipment
US20220214894A1 (en) Command execution method, apparatus, and device
WO2021244457A1 (en) Video generation method and related apparatus
WO2022052776A1 (en) Human-computer interaction method, and electronic device and system
CN112214636A (en) Audio file recommendation method and device, electronic equipment and readable storage medium
CN111970401B (en) Call content processing method, electronic equipment and storage medium
CN111881315A (en) Image information input method, electronic device, and computer-readable storage medium
WO2022100221A1 (en) Retrieval processing method and apparatus, and storage medium
WO2020239001A1 (en) Humming recognition method and related device
CN113806473A (en) Intention recognition method and electronic equipment
CN111835904A (en) Method for starting application based on context awareness and user portrait and electronic equipment
WO2021238371A1 (en) Method and apparatus for generating virtual character
CN112740148A (en) Method for inputting information into input box and electronic equipment
CN114691839A (en) Intention slot position identification method
CN113380240B (en) Voice interaction method and electronic equipment
WO2022033432A1 (en) Content recommendation method, electronic device and server
CN115437601A (en) Image sorting method, electronic device, program product, and medium
CN114822543A (en) Lip language identification method, sample labeling method, model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant