CN116501285B

CN116501285B - AI dialogue processing method based on virtual digital image interaction and digitizing system

Info

Publication number: CN116501285B
Application number: CN202310500502.XA
Authority: CN
Inventors: 李春智; 袁杰
Original assignee: Zhuyu Future Technology Beijing Co ltd
Current assignee: Juyin Digital Media Beijing Co ltd
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2024-01-05
Anticipated expiration: 2043-05-06
Also published as: CN116501285A

Abstract

The invention provides an AI dialogue processing method and a digitizing system based on virtual digital image interaction, and relates to the technical field of artificial intelligence. In the invention, the data analysis operation is carried out on the graph members of the data distribution knowledge graph, and the generalized data description vector of each piece of audio generalized data is determined; performing comparison analysis operation on the generalized data description vector to analyze data association relation information; determining dialogue audio data to be processed, and carrying out association expansion operation on the dialogue audio data to be processed based on data association relation information between audio generalized data corresponding to the dialogue audio data to be processed and a plurality of audio generalized data so as to obtain expanded dialogue audio data; and performing audio session management operation of the virtual digital person based on the dialog audio data to be processed and the extended dialog audio data. Based on the above, the reliability of the dialogue process can be improved to some extent.

Description

AI dialogue processing method based on virtual digital image interaction and digitizing system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an AI dialogue processing method and a digitizing system based on virtual digital image interaction.

Background

In the background of continuous development and maturity of internet technology, the application scenes of session interaction based on virtual digital images (i.e. virtual digital people) are more. In some application scenarios, analysis of dialogue data is required to perform dialogue management operations and the like based on analysis results. However, the conventional technology has a problem that the reliability of the session process (session control) is not high.

Disclosure of Invention

In view of the above, the present invention aims to provide an AI dialogue processing method and a digitizing system based on virtual digital image interaction, so as to improve the reliability of dialogue processing to a certain extent.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical scheme:

an AI dialogue processing method based on virtual digital image interaction, the AI dialogue processing method comprising:

extracting a plurality of historical dialogue audio data, a plurality of audio summary data and a plurality of virtual digital person description data, wherein the historical dialogue audio data is formed by performing audio session operation on the basis of corresponding virtual digital persons in history, the audio summary data is used for carrying out summary description on the corresponding historical dialogue audio data, and the virtual digital person description data is used for carrying out attribute description on the corresponding virtual digital persons;

Determining a corresponding data distribution knowledge graph according to the historical dialogue audio data, the audio summary data and the virtual digital person description data, wherein the data distribution knowledge graph is used for reflecting the correlation among the historical dialogue audio data, the audio summary data and the virtual digital person description data, the correlation among the historical dialogue audio data and the audio summary data;

performing data analysis operation on the map members of the data distribution knowledge map to determine a generalized data description vector of each piece of audio generalized data in the plurality of pieces of audio generalized data;

performing contrast analysis operation on the generalized data description vector to analyze data association relation information among the plurality of audio generalized data;

determining one history dialogue audio data from the plurality of history dialogue audio data, marking the history dialogue audio data as to-be-processed dialogue audio data, and carrying out association expansion operation on the to-be-processed dialogue audio data in the plurality of history dialogue audio data based on data association relation information between audio generalized data corresponding to the to-be-processed dialogue audio data and the plurality of audio generalized data so as to obtain each expansion dialogue audio data corresponding to the to-be-processed dialogue audio data;

And performing audio session management and control operation of the virtual digital person based on the dialogue audio data to be processed and the extension dialogue audio data.

In some preferred embodiments, in the above AI dialogue processing method based on the interaction of the avatar, the step of determining the corresponding data distribution knowledge graph according to the plurality of historical dialogue audio data, the plurality of audio summary data and the plurality of avatar description data includes:

performing mapping operation based on the correlation relationship between each of the plurality of audio summary data, the plurality of historical dialogue audio data and the plurality of virtual digital person description data so as to form a corresponding first local knowledge graph;

performing mapping operation according to the plurality of virtual digital person description data and the plurality of audio generalized data to form a second local knowledge graph for reflecting the correlation between the plurality of historical dialog audio data;

performing mapping operation according to the plurality of historical dialogue audio data to form a third local knowledge graph for reflecting the correlation between the plurality of audio generalized data;

and merging the first local knowledge graph, the second local knowledge graph and the third local knowledge graph to form a corresponding data distribution knowledge graph.

In some preferred embodiments, in the above AI-dialogue processing method based on avatar interaction, the step of performing a mapping operation based on correlation between each of the plurality of audio summary data and the plurality of historical dialogue audio data and the plurality of virtual digital person description data to form a corresponding first local knowledge graph includes:

determining audio summary data with a correlation with each virtual digital person description data in the plurality of audio summary data, and determining audio summary data with a correlation with each historical dialogue audio data in the plurality of audio summary data;

based on a first map line, connecting each piece of virtual digital person description data with corresponding audio summary data with a correlation, and configuring importance parameters for the first map line based on the data repetition number of the corresponding audio summary data with the correlation so as to form a corresponding first importance carrying map line;

based on a second map line, connecting each historical dialogue audio data with corresponding audio summary data with a correlation, and marking a first characterization parameter with a predetermined reference importance so as to be an importance parameter of the second map line, thereby forming a second importance-carrying map line;

And combining the first importance carrying map line and the second importance carrying map line to form a corresponding first local knowledge map.

In some preferred embodiments, in the above AI dialog processing method based on avatar interaction, the step of performing a mapping operation according to the plurality of virtual digital person description data and the plurality of audio summary data to form a second local knowledge-graph for reflecting a correlation between the plurality of historical dialog audio data includes:

determining at least two first historical dialogue audio data with a correlation relationship in the plurality of historical dialogue audio data based on each virtual digital person description data, and determining at least two second historical dialogue audio data with a correlation relationship in the plurality of historical dialogue audio data based on each audio summary data;

performing a connecting operation on the at least two first historical dialog audio data based on a third map line, and configuring an importance parameter for the third map line based on a predetermined reference importance second characterization parameter and audio summary data respectively corresponding to the at least two first historical dialog audio data to form a third importance carrying map line;

Connecting the at least two second historical dialog audio data based on a fourth map line, and configuring importance parameters for the fourth map line based on virtual digital person description data respectively corresponding to the at least two second historical dialog audio data and audio summary data respectively corresponding to the at least two second historical dialog audio data to form a fourth importance-carrying map line;

and determining a corresponding second local knowledge graph based on the third importance carrying graph line and the fourth importance carrying graph line.

In some preferred embodiments, in the above AI dialog processing method based on avatar interaction, the step of performing a mapping operation according to the plurality of historical dialog audio data to form a third local knowledge graph for reflecting a correlation between the plurality of audio summary data includes:

determining at least two third audio summary data having a correlation among the plurality of audio summary data based on each of the historical dialog audio data;

connecting the at least two third audio summary data based on a fourth atlas line, and configuring importance parameters for the fourth atlas line based on collinear parameters of the at least two third audio summary data to form a fifth carry importance atlas line;

And determining a corresponding third local knowledge graph based on the fifth importance map carrying line.

In some preferred embodiments, in the above AI dialogue processing method based on the avatar interaction, the step of performing a data analysis operation on the graph members of the data distribution knowledge graph to determine a generalized data description vector of each of the plurality of audio generalized data includes:

sequentially marking a plurality of spectrum members included in the data distribution knowledge spectrum to be marked as initial spectrum members, so as to perform a drawing operation of the spectrum members, thereby forming a spectrum member drawing link corresponding to each of the plurality of spectrum members, wherein the spectrum members belong to any one of the plurality of historical dialogue audio data, the plurality of audio generalized data and the plurality of virtual digital person description data;

performing feature mining operation on the spectrum member decimation link to output a spectrum member description vector corresponding to each of the plurality of spectrum members;

and determining the generalized data description vector corresponding to the plurality of audio generalized data from the atlas member description vectors corresponding to the plurality of atlas members respectively.

In some preferred embodiments, in the above AI dialog processing method based on avatar interaction, the step of extracting a plurality of historical dialog audio data, a plurality of audio summary data, and a plurality of avatar description data includes:

extracting historical interaction session data, and determining a first historical dialogue audio data set, a first audio summary data set and a first virtual digital person description data set based on the historical interaction session data;

screening the first audio generalized data in the first audio generalized data set, and marking a plurality of first audio generalized data except for the first audio generalized data with the quantity of the corresponding virtual digital person description data smaller than the preset first reference quantity in the first audio generalized data set after the screening operation so as to obtain a plurality of audio generalized data;

sequentially marking first virtual digital person description data, second virtual digital person description data and third virtual digital person description data in the first virtual digital person description data set, wherein the first virtual digital person description data belongs to virtual digital person description data with the number of corresponding historical conversation audio data smaller than a second preset reference number, the second virtual digital person description data belongs to virtual digital person description data with the number of data types corresponding to the historical conversation audio data exceeding a third preset reference number, and the third virtual digital person description data belongs to virtual digital person description data with the number of corresponding audio summary data exceeding a fourth preset reference number;

Obtaining a plurality of virtual digital person description data based on the plurality of virtual digital person description data except the first virtual digital person description data, the second virtual digital person description data and the third virtual digital person description data in the first virtual digital person description data set;

and obtaining a plurality of historical dialogue audio data based on the historical dialogue audio data corresponding to the first virtual digital person description data, the historical dialogue audio data corresponding to the second virtual digital person description data and the historical dialogue audio data except the historical dialogue audio data corresponding to the third virtual digital person description data in the first historical dialogue audio data set.

In some preferred embodiments, in the above AI-dialogue processing method based on avatar interaction, the step of determining one history dialogue audio data from the plurality of history dialogue audio data to be marked as a to-be-processed dialogue audio data, and performing an associated expansion operation on the to-be-processed dialogue audio data from the plurality of history dialogue audio data based on data association relationship information between audio summary data corresponding to the to-be-processed dialogue audio data and the plurality of audio summary data, to obtain each expanded dialogue audio data corresponding to the to-be-processed dialogue audio data includes:

Based on the received dialogue abnormal request information, determining a corresponding historical dialogue audio data in the plurality of historical dialogue audio data, and marking the historical dialogue audio data to be marked as dialogue audio data to be processed;

determining each audio summary data associated with the audio summary data corresponding to the dialogue audio data to be processed in the plurality of audio summary data based on the data association relation information among the plurality of audio summary data, wherein the data association relation information among the associated audio summary data meets the preset association relation condition;

among the plurality of historical dialog audio data, historical dialog audio data corresponding to each piece of audio summary data associated with the audio summary data corresponding to the dialog audio data to be processed is marked to be the expanded dialog audio data corresponding to the dialog audio data to be processed.

In some preferred embodiments, in the above AI dialogue processing method based on the avatar interaction, the step of performing an audio session management operation of the avatar based on the to-be-processed dialogue audio data and the extended dialogue audio data includes:

Performing feature mining operation on the dialogue audio data to be processed to form a first audio feature description vector corresponding to the dialogue audio data to be processed, and performing feature mining operation on the extended dialogue audio data to form a second audio feature description vector corresponding to the extended dialogue audio data;

performing a focusing feature analysis operation on the first audio feature description vector based on the second audio feature description vector to form a corresponding focusing audio feature description vector, wherein the number of the focusing audio feature description vectors is equal to the number of the second audio feature description vectors;

performing aggregation operation on the first audio feature description vector and each focusing audio feature description vector to form a corresponding aggregated audio feature description vector;

and based on the aggregated audio feature description vector, evaluating audio conversation abnormality information corresponding to the conversation audio data to be processed, and based on the audio conversation abnormality information, performing audio conversation monitoring operation on the virtual digital person corresponding to the conversation audio data to be processed, wherein the audio conversation monitoring operation at least comprises the steps of increasing the abnormality monitoring frequency of the audio conversation operation of the virtual digital person corresponding to the conversation audio data to be processed or reducing the operation frequency of the audio conversation operation of the virtual digital person corresponding to the conversation audio data to be processed.

The embodiment of the invention also provides a digitizing system, which comprises a processor and a memory, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program to realize the AI dialogue processing method.

The AI dialogue processing method and the digitizing system based on the virtual digital image interaction provided by the embodiment of the invention can firstly extract a plurality of historical dialogue audio data, a plurality of audio summary data and a plurality of virtual digital person description data; determining a corresponding data distribution knowledge graph according to a plurality of historical dialogue audio data, a plurality of audio summary data and a plurality of virtual digital person description data; carrying out data analysis operation on the graph members of the data distribution knowledge graph to determine the generalized data description vector of each piece of audio generalized data in the plurality of pieces of audio generalized data; performing comparison analysis operation on the generalized data description vector to analyze data association relation information among a plurality of audio generalized data; determining a historical dialogue audio data from a plurality of historical dialogue audio data, marking the historical dialogue audio data as dialogue audio data to be processed, and carrying out association expansion operation on the dialogue audio data to be processed in the plurality of historical dialogue audio data based on data association relation information between audio generalized data corresponding to the dialogue audio data to be processed and a plurality of audio generalized data so as to obtain each expansion dialogue audio data corresponding to the dialogue audio data to be processed; and performing audio session management operation of the virtual digital person based on the dialog audio data to be processed and the extended dialog audio data. Based on the above, before the audio session management operation of the virtual digital person, the related expansion operation is performed on the audio session data to be processed, so as to obtain each piece of expanded audio session data corresponding to the audio session data to be processed, so that in the process of performing the audio session management operation of the virtual digital person, the basis of the audio session management operation is more sufficient according to not only the audio session data to be processed but also the expanded audio session data corresponding to the audio session data to be processed, and therefore, the reliability of the session processing can be improved to a certain extent, thereby improving the problem of low reliability in the prior art.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a block diagram of a digitizing system according to an embodiment of the invention.

Fig. 2 is a flowchart illustrating steps involved in an AI dialogue processing method based on an avatar interaction according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of each module included in the AI dialogue processing device based on the interaction of the virtual digital images according to the embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, an embodiment of the present invention provides a digitizing system. Wherein the digitizing system may include a memory and a processor.

In detail, the memory and the processor are electrically connected directly or indirectly to realize transmission or interaction of data. For example, electrical connection may be made to each other via one or more communication buses or signal lines. The memory may store at least one software functional module (computer program) that may exist in the form of software or firmware. The processor may be configured to execute an executable computer program stored in the memory, so as to implement the AI dialogue processing method based on the virtual digital image interaction provided by the embodiment of the present invention.

It is to be appreciated that in some embodiments, the Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like.

It will be appreciated that in some embodiments, the processor may be a general purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a System on Chip (SoC), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

It will be appreciated that in some embodiments, the digitizing system may be a server with data processing capabilities.

Referring to fig. 2, the embodiment of the invention further provides an AI dialogue processing method based on virtual digital image interaction, which can be applied to the digitizing system. The method steps defined by the flow related to the AI dialogue processing method based on the virtual digital image interaction can be realized by the digitizing system.

The specific flow shown in fig. 2 will be described in detail.

Step S110 extracts a plurality of historical dialog audio data, a plurality of audio summary data, and a plurality of virtual digital person description data.

In an embodiment of the invention, the digitizing system may extract a plurality of historical dialog audio data, a plurality of audio summary data, and a plurality of virtual digital person description data. The historical dialog audio data is formed by performing audio session operation on the basis of histories of corresponding virtual digital persons, one historical dialog audio data can comprise at least one audio frame, the audio summary data is used for carrying out summary description on the corresponding historical dialog audio data, the audio summary data can be summary audio or summary text, the virtual digital person description data is used for carrying out attribute description on the corresponding virtual digital person, that is, the virtual digital person description data belongs to attribute data such as identity and the like of the corresponding virtual digital person, and the virtual digital person description data can be text data. For example, the historical dialog audio data may be converted into corresponding text data prior to processing.

Step S120, determining a corresponding data distribution knowledge graph according to the plurality of historical dialogue audio data, the plurality of audio summary data and the plurality of virtual digital person description data.

In the embodiment of the invention, the digitizing system can determine the corresponding data distribution knowledge graph according to the plurality of historical dialogue audio data, the plurality of audio summary data and the plurality of virtual digital person description data. The data distribution knowledge graph is used for reflecting the correlation among the historical dialogue audio data, the audio summary data and the virtual digital person description data, the correlation among the historical dialogue audio data and the audio summary data. That is, the plurality of historical dialog audio data, the plurality of audio summary data, and the plurality of virtual digital person description data are subjected to a mapping process to form a corresponding data distribution knowledge graph.

And step S130, carrying out data analysis operation on the map members of the data distribution knowledge map, and determining the generalized data description vector of each piece of audio generalized data in the plurality of pieces of audio generalized data.

In the embodiment of the invention, the digitizing system can perform data analysis operation on the map members of the data distribution knowledge map to determine the generalized data description vector of each of the plurality of audio generalized data.

And step S140, comparing and analyzing the generalized data description vector to analyze data association relation information among the plurality of audio generalized data.

In the embodiment of the invention, the digitizing system can perform a comparison analysis operation on the generalized data description vector to analyze data association relationship information among the plurality of audio generalized data. Illustratively, the cosine similarity between the generalized data description vectors of the audio generalized data may be determined, and the data association relationship information between the audio generalized data may be determined, for example, the data association relationship information between the audio generalized data may have a positive correlation correspondence relationship with the cosine similarity.

Step S150, determining a historical dialogue audio data from the plurality of historical dialogue audio data, marking the historical dialogue audio data as dialogue audio data to be processed, and performing association expansion operation on the dialogue audio data to be processed in the plurality of historical dialogue audio data based on data association relation information between audio summary data corresponding to the dialogue audio data to be processed and the plurality of audio summary data, so as to obtain each expansion dialogue audio data corresponding to the dialogue audio data to be processed.

In the embodiment of the present invention, the digitizing system may determine one history dialogue audio data from the plurality of history dialogue audio data, so as to mark the history dialogue audio data as to-be-processed dialogue audio data, and perform an association expansion operation on the to-be-processed dialogue audio data from the plurality of history dialogue audio data based on data association relationship information between audio summary data corresponding to the to-be-processed dialogue audio data and the plurality of audio summary data, so as to obtain each expansion dialogue audio data corresponding to the to-be-processed dialogue audio data. In this way, association supplementation or association reinforcement of the dialog audio data to be processed can be achieved.

Step S160, performing audio session management operation of the virtual digital person based on the to-be-processed session audio data and the extended session audio data.

In the embodiment of the invention, the digitizing system can perform audio session control operation of the virtual digital person based on the to-be-processed dialogue audio data and the extended dialogue audio data, namely, perform audio session control operation of the virtual digital image.

Based on the above, before the audio session management operation of the virtual digital person, the related expansion operation is performed on the audio session data to be processed, so as to obtain each piece of expanded audio session data corresponding to the audio session data to be processed, so that in the process of performing the audio session management operation of the virtual digital person, the basis of the audio session management operation is more sufficient according to not only the audio session data to be processed but also the expanded audio session data corresponding to the audio session data to be processed, and therefore, the reliability of the session processing can be improved to a certain extent, thereby improving the problem of low reliability in the prior art.

It will be appreciated that in some embodiments, the step S110 described above may further include the sub-steps described below:

extracting historical interaction session data, and determining a first historical dialogue audio data set, a first audio summary data set and a first virtual digital person description data set based on the historical interaction session data, wherein the historical interaction session data can be extracted from a database of a corresponding interaction session platform or the historical interaction session data in the last time period;

screening the first audio summary data in the first audio summary data set, for example, repeated audio summary data can be screened out, and a plurality of first audio summary data, except for the first audio summary data, of which the number of corresponding virtual digital person description data in the first audio summary data set after the screening operation is smaller than a preset first reference number, are marked to obtain a plurality of audio summary data, wherein the first reference number can be configured according to actual requirements and is not particularly limited;

in the first virtual digital person description data set, first virtual digital person description data, second virtual digital person description data and third virtual digital person description data are marked in sequence, wherein the first virtual digital person description data belongs to virtual digital person description data with the number smaller than a second preset reference number, the second virtual digital person description data belongs to virtual digital person description data with the number of data types corresponding to the historical dialogue audio data exceeding a third preset reference number, the third virtual digital person description data belongs to virtual digital person description data with the number of corresponding audio summary data exceeding a fourth preset reference number, the second reference number, the third reference number and the fourth reference number can be configured according to actual requirements, specific limitation is not made herein, in addition, the data types can refer to the field related to the audio content, namely the field is regarded as the corresponding data types;

It will be appreciated that in some embodiments, the step S120 described above may further include the sub-steps described below:

and combining the first local knowledge graph, the second local knowledge graph and the third local knowledge graph to form a corresponding data distribution knowledge graph, that is, the data distribution knowledge graph can comprise all contents of the first local knowledge graph, the second local knowledge graph and the third local knowledge graph.

It will be appreciated that in some embodiments, the step of mapping the plurality of audio summary data based on the correlation between the plurality of historical dialog audio data and the plurality of virtual digital person description data to form a corresponding first local knowledge-graph may further include the sub-steps described below:

determining audio summary data having a correlation with each of the virtual digital person description data from the plurality of audio summary data, and determining audio summary data having a correlation with each of the historical dialog audio data from the plurality of audio summary data, e.g., having a correlation between the audio summary data and the historical dialog audio data of the audio summary data profile, and also having a correlation between the virtual digital person description data of the virtual digital person corresponding to the historical dialog audio data;

Performing a connection operation on each piece of virtual digital person description data and corresponding audio summary data with a correlation relationship based on a first graph line, that is, configuring the first graph line between a graph member corresponding to the virtual digital person description data and a graph member corresponding to the corresponding audio summary data with a correlation relationship to realize the connection operation, and configuring importance parameters for the first graph line based on the data repetition number of the corresponding audio summary data with a correlation relationship to form a corresponding first importance-carrying graph line, for example, the data repetition number may have a positive correlation correspondence relationship with the corresponding importance parameters;

based on a second spectrum line, each historical dialog audio data and corresponding audio summary data with a correlation relationship are subjected to a connection operation, that is, the second spectrum line can be configured between a spectrum member corresponding to the historical dialog audio data and a spectrum member corresponding to the corresponding audio summary data with a correlation relationship so as to realize the connection operation, and a predetermined reference importance first characterization parameter is marked so as to be marked as an importance parameter of the second spectrum line, so that a second importance carrying spectrum line is formed, wherein the reference importance first characterization parameter can be configured according to actual requirements, such as values of 0.9, 0.85, 0.75 and the like;

It will be appreciated that in some embodiments, the step of performing a mapping operation according to the plurality of virtual digital person description data and the plurality of audio summary data to form a second local knowledge-graph reflecting a correlation between the plurality of historical dialog audio data may further include the sub-steps described as follows:

determining at least two first historical dialog audio data with a correlation relationship among the plurality of historical dialog audio data based on each of the virtual digital person description data, for example, determining two historical dialog audio data corresponding to the same virtual digital person description data as at least two first historical dialog audio data with a correlation relationship, and determining at least two second historical dialog audio data with a correlation relationship among the plurality of historical dialog audio data based on each of the audio summary data, for example, determining two historical dialog audio data corresponding to the same audio summary data as at least two second historical dialog audio data with a correlation relationship;

Based on a third pattern line, performing a connection operation on the at least two first historical dialog audio data, that is, the third pattern line may be configured between pattern members corresponding to the two first historical dialog audio data to implement the connection operation of the pattern members, and based on a predetermined reference importance second characterization parameter and audio summary data corresponding to the at least two first historical dialog audio data, an importance parameter may be configured on the third pattern line to form a third importance-carrying pattern line, for example, a data similarity between the audio summary data corresponding to the two first historical dialog audio data may be calculated, and then the data similarity and the reference importance second characterization parameter may be weighted and summed to obtain an importance parameter of the corresponding third pattern line, where the reference importance second characterization parameter may be configured according to actual requirements;

based on a fourth spectrum line, the at least two second historical dialog audio data are subjected to a connecting operation, that is, the fourth spectrum line can be configured between spectrum members corresponding to the two second historical dialog audio data so as to realize the connecting operation of the spectrum members, and based on virtual digital person description data corresponding to the at least two second historical dialog audio data and audio summary data corresponding to the at least two second historical dialog audio data respectively, importance parameters are configured for the fourth spectrum line so as to form a fourth importance carrying spectrum line, for example, on one hand, the data similarity between the virtual digital person description data corresponding to the two second historical dialog audio data respectively can be calculated, on the other hand, the data similarity between the audio summary data corresponding to the two second historical dialog audio data respectively can be calculated, and then, weighted summation operation can be performed on the two data similarities so as to form the importance parameters of the fourth spectrum line between the corresponding two spectrum members;

It will be appreciated that in some embodiments, the step of performing a mapping operation according to the plurality of historical dialog audio data to form a third local knowledge-graph for reflecting the correlation between the plurality of audio summary data may further include the sub-steps described as follows:

based on each of the historical dialog audio data, at least two third audio summary data having a correlation are determined among the plurality of audio summary data, and illustratively, in the process of determining the audio summary data, for one historical dialog audio data, different plurality of audio summary data can be configured for the historical dialog audio data, such as different audio summary data can be formed by performing configuration operations based on different management users, or the device can perform summary data extraction on the historical dialog audio data based on different strategies to form different audio summary data, so that two third audio summary data corresponding to the same historical dialog audio data can be determined as at least two third audio summary data having a correlation; alternatively, in other embodiments, audio summary data corresponding to two historical dialog audio data corresponding to the same virtual digital person may be determined as at least two third audio summary data having a correlation;

The at least two third audio summary data are connected based on a fourth map line, that is, the fourth map line can be configured between map members corresponding to the two third audio summary data, and an importance parameter can be configured on the fourth map line based on a collinear parameter of the at least two third audio summary data to form a fifth importance map line, for example, the number occupation ratio of the audio summary data of the two third audio summary data serving as one historical dialog audio data can be firstly determined, then, a corresponding importance parameter can be determined based on the number occupation ratio, the importance parameter can have a positive correlation corresponding relation with the number occupation ratio, the number occupation ratio can be equal to a ratio between a first numerical value and a second numerical value, the first numerical value can refer to the number of the historical dialog audio data simultaneously having the two third audio summary data, the second numerical value can refer to the number of the historical dialog audio data having one third audio summary data and the audio summary data having the other audio summary data, or the corresponding importance parameter can be determined based on an average value;

It will be appreciated that in some embodiments, the step S130 described above may further include the sub-steps described below:

the method comprises the steps that a plurality of spectrum members included in a data distribution knowledge spectrum are marked in sequence, the data distribution knowledge spectrum is marked as initial spectrum members, so that a spectrum member lottery link corresponding to each of the plurality of spectrum members is formed, the spectrum members belong to any one of the plurality of historical dialogue audio data, the plurality of audio summary data and the plurality of virtual digital person description data, the spectrum member lottery link can be used for reflecting other spectrum members with correlation relations with the spectrum members, in addition, in the process of conducting the spectrum member lottery operation, a corresponding spectrum line is required to be used as a traversal path, and the probability of traversing to the next spectrum member can have positive correlation relations with importance parameters linearly corresponding to the spectrum;

performing feature mining operation on the graph member decimation link to output a graph member description vector corresponding to each graph member in the plurality of graph members, for example, for each graph member, performing feature mining operation on the graph member decimation link corresponding to the graph member to form a graph member description vector corresponding to the graph member;

And determining the generalized data description vectors corresponding to the audio generalized data respectively from the map member description vectors corresponding to the plurality of map members respectively, namely determining the generalized data description vectors corresponding to the map members corresponding to the audio generalized data respectively.

Wherein, it can be understood that, in some embodiments, the step of performing feature mining operation on the graph member decimation link to output a graph member description vector corresponding to each of the plurality of graph members may further include the following sub-steps:

performing feature mining operation on the spectrum member decimation links to form mining feature description vectors corresponding to the plurality of spectrum members, wherein the feature mining operation can be realized through a corresponding feature mining neural network, and the feature mining neural network can be a convolutional neural network so as to realize the feature mining operation through performing convolution operation, so that the spectrum member decimation links can be represented in a vector form;

performing feature mining operation on data screened from audio summary data in the plurality of spectrum members to form corresponding summary feature description vectors, and performing feature mining operation on data screened from key audio frames of historical dialogue audio data in the plurality of spectrum members to form corresponding key feature description vectors, wherein the key audio frames can be obtained based on key frame identification operation or can be obtained by taking a first frame audio frame as a key audio frame, and the key frame identification operation can be realized based on a corresponding neural network formed based on training; in addition, in the process of forming the generalized feature description vector, key data may be extracted from audio generalized data, for example, the audio generalized data belongs to text data, key words may be extracted from the audio generalized data, and then feature mining operation may be performed on the extracted key words to form a corresponding generalized feature description vector;

Performing a superposition operation, such as weighted superposition, on the mined feature description vector of the audio summary data in the plurality of spectrum members and the summary feature description vector to form a spectrum member description vector of the audio summary data in the plurality of spectrum members;

performing superposition operations, such as weighted superposition, on the mined feature description vectors and the key feature description vectors of the historical dialog audio data in the plurality of spectrum members to form spectrum member description vectors of the historical dialog audio data in the plurality of spectrum members;

and marking mining feature description vectors corresponding to the audio generalized data and the virtual digital person description data except the historical dialogue audio data in the plurality of spectrum members so as to be marked as spectrum member description vectors of the virtual digital person description data in the plurality of spectrum members.

It will be appreciated that in some embodiments, the step S150 described above may further include the sub-steps described below:

based on the received dialogue abnormal request information, determining a corresponding historical dialogue audio data in the plurality of historical dialogue audio data, and marking the historical dialogue audio data to be marked as dialogue audio data to be processed, for example, the dialogue abnormal request information can be analyzed to determine the dialogue audio data to be processed according to the analysis result, for example, the dialogue abnormal request information can be identification data of the historical dialogue audio data marked as suspected abnormal in manual auditing, and the dialogue abnormal request information can also be data for carrying out abnormal report on the historical dialogue audio data by other users;

Based on the data association relationship information among the plurality of audio summary data, determining each audio summary data associated with the audio summary data corresponding to the dialogue audio data to be processed in the plurality of audio summary data, wherein the data association relationship information among the associated audio summary data meets the pre-configured association relationship condition, for example, each audio summary data with the association degree represented by the data association relationship information being greater than the preset association degree can be screened out;

It will be appreciated that in some embodiments, the step S160 described above may further include the sub-steps described below:

performing feature mining operation on the dialogue audio data to be processed to form a first audio feature description vector corresponding to the dialogue audio data to be processed, and performing feature mining operation on the extended dialogue audio data to form a second audio feature description vector corresponding to the extended dialogue audio data, namely, representing features or key information in the dialogue audio data to be processed and the extended dialogue audio data in a vector form;

Performing a focusing feature analysis operation on the first audio feature description vector based on the second audio feature description vector to form a corresponding focusing audio feature description vector, wherein the number of the focusing audio feature description vectors is equal to the number of the second audio feature description vectors, that is, for each of the second audio feature description vectors, the first audio feature description vector may be subjected to the focusing feature analysis operation based on the second audio feature description vector to form a focusing audio feature description vector corresponding to the second audio feature description vector, the focusing feature analysis operation may refer to a mapping operation on the second audio feature description vector based on a first mapping matrix and a second mapping matrix formed by performing a training operation of a neural network in advance, and a mapping operation on the first audio feature description vector based on a third mapping matrix formed by performing a training operation of the neural network in advance to form a third mapping vector, then a product between the third mapping vector and a transpose vector of the first mapping vector may be calculated, and then a similarity coefficient is determined based on the correlation to the corresponding audio feature description vector;

Aggregating the first audio feature description vector and each of the focused audio feature description vectors to form a corresponding aggregate audio feature description vector, e.g., stacking the first audio feature description vector and each of the focused audio feature description vectors;

based on the aggregate audio feature description vector, the audio conversation anomaly information corresponding to the to-be-processed conversation audio data is evaluated, for example, the aggregate audio feature description vector may be evaluated for anomalies based on a neural network trained by a network, so that corresponding audio conversation anomaly information may be obtained, for example, if there is an anomaly or a degree value of the anomaly or a type of the anomaly, etc., and based on the audio conversation anomaly information, an audio conversation monitoring operation is performed on a virtual digital person corresponding to the to-be-processed conversation audio data, where the audio conversation monitoring operation includes at least increasing an anomaly monitoring frequency of an audio conversation operation on the virtual digital person corresponding to the to-be-processed conversation audio data or decreasing an operation frequency of an audio conversation operation on the virtual digital person corresponding to the to-be-processed conversation audio data (i.e., decreasing a conversation frequency of the virtual digital person), for example, the higher the degree of the anomaly represented by the audio conversation anomaly information may be, the higher the anomaly monitoring frequency may be, the lower the operation frequency may be.

Referring to fig. 3, the embodiment of the invention further provides an AI dialogue processing device based on the interaction of the virtual digital images, which can be applied to the digitizing system. Wherein, the AI dialogue processing device based on the virtual digital image interaction can comprise:

a data extraction module for extracting a plurality of historical dialog audio data, a plurality of audio summary data and a plurality of virtual digital person description data, wherein the historical dialog audio data is formed by performing audio session operation on the basis of corresponding virtual digital persons in history, the audio summary data is used for carrying out summary description on the corresponding historical dialog audio data, and the virtual digital person description data is used for carrying out attribute description on the corresponding virtual digital persons;

the knowledge graph determining module is used for determining a corresponding data distribution knowledge graph according to the historical dialogue audio data, the audio summary data and the virtual digital person description data, wherein the data distribution knowledge graph is used for reflecting the correlation among the historical dialogue audio data, the audio summary data and the virtual digital person description data, the correlation among the historical dialogue audio data and the audio summary data;

The data analysis module is used for carrying out data analysis operation on the graph members of the data distribution knowledge graph and determining a generalized data description vector of each piece of audio generalized data in the plurality of pieces of audio generalized data;

the comparison analysis module is used for carrying out comparison analysis operation on the generalized data description vector so as to analyze data association relation information among the plurality of audio generalized data;

the association expansion module is used for determining one history dialogue audio data from the plurality of history dialogue audio data, marking the history dialogue audio data as dialogue audio data to be processed, and carrying out association expansion operation on the dialogue audio data to be processed in the plurality of history dialogue audio data based on data association relation information between audio generalized data corresponding to the dialogue audio data to be processed and the plurality of audio generalized data so as to obtain each expansion dialogue audio data corresponding to the dialogue audio data to be processed;

and the audio session management and control module is used for carrying out audio session management and control operation of the virtual digital person based on the dialogue audio data to be processed and the extension dialogue audio data.

In summary, the AI dialogue processing method and the digitizing system based on the virtual digital image interaction provided by the invention can firstly extract a plurality of historical dialogue audio data, a plurality of audio summary data and a plurality of virtual digital person description data; determining a corresponding data distribution knowledge graph according to a plurality of historical dialogue audio data, a plurality of audio summary data and a plurality of virtual digital person description data; carrying out data analysis operation on the graph members of the data distribution knowledge graph to determine the generalized data description vector of each piece of audio generalized data in the plurality of pieces of audio generalized data; performing comparison analysis operation on the generalized data description vector to analyze data association relation information among a plurality of audio generalized data; determining a historical dialogue audio data from a plurality of historical dialogue audio data, marking the historical dialogue audio data as dialogue audio data to be processed, and carrying out association expansion operation on the dialogue audio data to be processed in the plurality of historical dialogue audio data based on data association relation information between audio generalized data corresponding to the dialogue audio data to be processed and a plurality of audio generalized data so as to obtain each expansion dialogue audio data corresponding to the dialogue audio data to be processed; and performing audio session management operation of the virtual digital person based on the dialog audio data to be processed and the extended dialog audio data. Based on the above, before the audio session management operation of the virtual digital person, the related expansion operation is performed on the audio session data to be processed, so as to obtain each piece of expanded audio session data corresponding to the audio session data to be processed, so that in the process of performing the audio session management operation of the virtual digital person, the basis of the audio session management operation is more sufficient according to not only the audio session data to be processed but also the expanded audio session data corresponding to the audio session data to be processed, and therefore, the reliability of the session processing can be improved to a certain extent, thereby improving the problem of low reliability in the prior art.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An AI dialogue processing method based on virtual digital image interaction, which is characterized by comprising the following steps:

2. The AI dialog processing method of claim 1, wherein determining the corresponding data distribution knowledge-graph based on the plurality of historical dialog audio data, the plurality of audio summary data, and the plurality of virtual digital person description data comprises:

3. The AI dialog processing method of claim 2, wherein the step of mapping the plurality of audio summary data based on the correlation between each of the plurality of historical dialog audio data and the plurality of virtual digital person description data to form a corresponding first local knowledge-graph includes:

4. The AI dialog processing method of claim 2, wherein the step of performing a mapping operation based on the plurality of virtual digital person description data and the plurality of audio summary data to form a second local knowledge-graph reflecting a correlation between the plurality of historical dialog audio data, comprises:

5. The AI dialog processing method based on avatar interactions of claim 2, wherein the step of performing a mapping operation in accordance with the plurality of historical dialog audio data to form a third local knowledge-graph reflecting a correlation between the plurality of audio summary data comprises:

6. The AI dialog processing method based on avatar interactions of claim 1, wherein the step of performing a data analysis operation on the graph members of the data distribution knowledge graph to determine a generalized data description vector for each of the plurality of audio generalized data comprises:

7. The AI dialog processing method based on avatar interactions of claim 1, wherein the extracting of the plurality of historical dialog audio data, the plurality of audio summary data, and the plurality of avatar description data includes:

8. The AI-dialogue processing method as claimed in any one of claims 1-7, wherein the step of determining one history dialogue audio data among the plurality of history dialogue audio data to be marked as a to-be-processed dialogue audio data and performing an associated expansion operation on the to-be-processed dialogue audio data among the plurality of history dialogue audio data based on data association relationship information between audio summary data corresponding to the to-be-processed dialogue audio data and the plurality of audio summary data to obtain each of the expanded dialogue audio data corresponding to the to-be-processed dialogue audio data includes:

9. The AI dialog processing method based on avatar interactions of any of claims 1-7, wherein the step of performing audio session management operations of a virtual digital person based on the dialog audio data to be processed and the extended dialog audio data includes:

10. A digitizing system comprising a processor and a memory for storing a computer program, the processor being adapted to execute the computer program to implement the method of any of claims 1-9.