CN114974253A

CN114974253A - Natural language interpretation method and device based on character image and storage medium

Info

Publication number: CN114974253A
Application number: CN202210553460.1A
Authority: CN
Inventors: 林皓; 高曦; 杨华
Original assignee: Beijing VRV Software Corp Ltd
Current assignee: Beijing VRV Software Corp Ltd
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-08-30

Abstract

The invention provides a natural language interpretation method, a device and a storage medium based on a character portrait, wherein the natural language interpretation method based on the character portrait comprises the following steps: receiving first voice data; determining a first interaction scene to which the first voice data belongs; selecting a first interaction behavior interpretation model corresponding to the first interaction scenario; performing voice recognition on the first voice data to obtain a first recognition result; and correcting the first key words meeting preset conditions in the first recognition result by using the first interactive behavior interpretation model. By determining the first interactive behavior interpretation model corresponding to the first interactive scene and correcting the first keyword in the first recognition result of the first voice data by using the first interactive behavior interpretation model, the efficiency of voice recognition/language interpretation is improved, and meanwhile, the accuracy of voice recognition/language interpretation can also be improved.

Description

A natural language interpretation method, device and storage medium based on portraits

技术领域technical field

本发明涉及语音识别技术领域，具体涉及一种基于人物画像的自然语言解释方法、装置及存储介质。The present invention relates to the technical field of speech recognition, in particular to a natural language interpretation method, device and storage medium based on person portraits.

背景技术Background technique

随着计算机技术的不断发展，人机交互的方式也越来越来多样化与智能化，目前，越来越多的交互平台都采用了语音交互的方式，语音交互能提高用户的互动效率和提升趣味性，已成为一种重要的人机交互方式。比如自助语音客服系统，先通过语音的方式向用户抛出问题，然后用户语音回答。再比如一些导航系统、购物系统等，需要用户发出语音指令来控制其展示的内容。在这些场景下都需要对用户的语音进行准确识别，才能给出正确的反馈。With the continuous development of computer technology, the methods of human-computer interaction are becoming more and more diversified and intelligent. At present, more and more interactive platforms adopt the method of voice interaction. Voice interaction can improve user interaction efficiency and Enhancing the fun has become an important way of human-computer interaction. For example, the self-service voice customer service system first throws questions to the user through voice, and then the user answers by voice. Another example is some navigation systems, shopping systems, etc., which require users to issue voice commands to control the displayed content. In these scenarios, it is necessary to accurately recognize the user's voice in order to give correct feedback.

但是现有的语音识别方法或语言解释方法的准确率都不高，不能基于人物画像进行针对性的解释与识别。However, the accuracy of the existing speech recognition methods or language interpretation methods is not high, and they cannot be interpreted and recognized based on person portraits.

发明内容SUMMARY OF THE INVENTION

本发明正是基于上述问题，提出了一种基于人物画像的自然语言解释方法、装置及存储介质，通过确定与第一交互场景对应的第一交互行为解释模型，并利用第一交互行为解释模型对所述第一语音数据的第一识别结果中的第一关键词进行修正，在提高语音识别/语言解释的效率的同时，也能提高语音识别/语言解释的准确率。Based on the above problems, the present invention proposes a natural language interpretation method, device and storage medium based on person portraits. By determining the first interaction behavior interpretation model corresponding to the first interaction scene, and using the first interaction behavior interpretation model Modifying the first keyword in the first recognition result of the first speech data can improve the efficiency of speech recognition/language interpretation and at the same time improve the accuracy of speech recognition/language interpretation.

有鉴于此，本发明的一方面提出了一种基于人物画像的自然语言解释方法，包括：In view of this, one aspect of the present invention proposes a natural language interpretation method based on portraits, including:

接收第一语音数据；receiving first voice data;

确定所述第一语音数据所属的第一交互场景；determining the first interaction scene to which the first voice data belongs;

选择与所述第一交互场景对应的第一交互行为解释模型；selecting a first interaction behavior interpretation model corresponding to the first interaction scene;

对所述第一语音数据进行语音识别，得到第一识别结果；performing speech recognition on the first speech data to obtain a first recognition result;

利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正；Using the first interaction behavior interpretation model to correct the first keywords that meet the preset conditions in the first recognition result;

其中，所述第一交互行为解释模型包含交互场景信息与人物角色画像信息之间的关联关系。Wherein, the first interaction behavior interpretation model includes an association relationship between interaction scene information and character portrait information.

可选地，在确定所述第一语音数据所属的第一交互场景的步骤之后，还包括：Optionally, after the step of determining the first interaction scene to which the first voice data belongs, the method further includes:

根据所述第一交互场景，提示用户作出第一动作和/或提示用户说出第一文字数据和/或提示用户输入第一选择数据；According to the first interaction scenario, prompting the user to make a first action and/or prompting the user to speak the first text data and/or prompting the user to input the first selection data;

采集所述第一动作数据和/或所述第一文字数据和/或所述第一选择数据；collecting the first motion data and/or the first text data and/or the first selection data;

从所述第一动作数据和/或所述第一文字数据和/或所述第一选择数据中提取第一关键信息。The first key information is extracted from the first action data and/or the first text data and/or the first selection data.

可选地，所述选择与所述第一交互场景对应的第一交互行为解释模型的步骤包括：Optionally, the step of selecting a first interaction behavior interpretation model corresponding to the first interaction scene includes:

基于预先建立的关键信息与交互行为解释模型之间的对应关系，选择与所述第一关键信息匹配的交互行为解释模型；Based on the correspondence between the pre-established key information and the interaction behavior interpretation model, selecting an interaction behavior interpretation model that matches the first key information;

确定与所述第一关键信息匹配的交互行为解释模型为与所述第一交互场景对应的第一交互行为解释模型。It is determined that the interaction behavior interpretation model matching the first key information is the first interaction behavior interpretation model corresponding to the first interaction scene.

可选地，所述利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正的步骤包括：Optionally, the step of using the first interaction behavior interpretation model to revise the first keywords that meet the preset conditions in the first recognition result includes:

从所述第一识别结果中提取满足预设条件的第一关键词；extracting a first keyword that satisfies a preset condition from the first identification result;

所述第一交互行为解释模型中提取出第一人物角色画像；extracting a first character portrait from the first interactive behavior interpretation model;

根据所述第一人物角色画像对所述第一关键词进行修正。The first keyword is modified according to the first character portrait.

可选地，所述确定所述第一语音数据所属的第一交互场景的步骤，包括：Optionally, the step of determining the first interaction scene to which the first voice data belongs includes:

从所述第一语音数据中提取第一属性信息；extracting first attribute information from the first voice data;

根据所述第一属性信息确定所述第一交互场景。The first interaction scene is determined according to the first attribute information.

可选地，所述第一属性信息包括：所述第一语音数据的采集工具、采集方式、采集时间、采集地点、人物数量和语义环境。Optionally, the first attribute information includes: a collection tool, collection method, collection time, collection location, number of people, and semantic environment of the first voice data.

可选地，在所述利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正的步骤之后，还包括：Optionally, after the step of using the first interaction behavior interpretation model to revise the first keyword that meets the preset condition in the first recognition result, the method further includes:

以语音或文字的形式输出对所述第一关键词的第一修正结果；outputting the first correction result for the first keyword in the form of voice or text;

接收用户对所述修正结果的评价反馈；receiving the user's evaluation feedback on the correction result;

当所述评价反馈为正向值时，调高所述第一交互行为解释模型对应所述第一交互场景时的优先级；When the evaluation feedback is a positive value, increase the priority of the first interaction behavior interpretation model corresponding to the first interaction scene;

当所述评价反馈为负向值时，降低所述第一交互行为解释模型对应所述第一交互场景时的优先级。When the evaluation feedback is a negative value, the priority of the first interaction behavior interpretation model corresponding to the first interaction scene is lowered.

可选地，在所述接收第一语音数据的步骤之前，还包括：Optionally, before the step of receiving the first voice data, the method further includes:

从用户群中确定第一用户与第二用户之间的关系，并利用所述第一用户和所述第二用户各自的唯一身份标签生成第一关系标签；Determine the relationship between the first user and the second user from the user group, and generate a first relationship label by using the respective unique identity labels of the first user and the second user;

获取所述第一用户和所述第二用户之间的第一交互行为数据；acquiring first interaction behavior data between the first user and the second user;

根据所述第一交互行为数据和所述第一关系标签，构建所述第一用户的第一人物角色画像、所述第二用户的第二人物角色画像以及所述第一用户和所述第二用户之间的交互行为数据库；According to the first interaction behavior data and the first relationship label, construct a first persona portrait of the first user, a second persona portrait of the second user, and the first user and the first persona Interactive behavior database between two users;

重复前述操作，直至所有用户均按不同角色建立了人物角色画像、不同人物角色画像之间都建立了交互行为数据库，并从所述交互行为数据库中提取出关键信息；Repeat the foregoing operations until all users have established character portraits according to different roles, and an interactive behavior database has been established between different character portraits, and extracted key information from the interactive behavior database;

将所述交互行为数据库中的数据输入训练好的神经网络得到多个交互行为解释模型；Inputting the data in the interactive behavior database into a trained neural network to obtain multiple interactive behavior interpretation models;

建立所述关键信息与所述多个交互行为解释模型之间的对应关系。A corresponding relationship between the key information and the plurality of interaction behavior explanation models is established.

本发明的另一方面提供一种基于人物画像的自然语言解释装置，包括：语音接收模块、处理模块、语音识别模块和结果修正模块；Another aspect of the present invention provides a natural language interpretation device based on person portrait, comprising: a speech receiving module, a processing module, a speech recognition module and a result correction module;

所述语音接收模块，用于接收第一语音数据；the voice receiving module for receiving the first voice data;

所述处理模块，用于确定所述第一语音数据所属的第一交互场景，以及选择与所述第一交互场景对应的第一交互行为解释模型；The processing module is configured to determine a first interaction scene to which the first voice data belongs, and select a first interaction behavior interpretation model corresponding to the first interaction scene;

所述语音识别模块，用于对所述第一语音数据进行语音识别，得到第一识别结果；the voice recognition module, configured to perform voice recognition on the first voice data to obtain a first recognition result;

所述结果修正模块，用于利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正；the result correction module, configured to use the first interaction behavior interpretation model to correct the first keywords that meet the preset conditions in the first recognition result;

本发明的第三方面提供一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如前述任一所述基于人物画像的自然语言解释方法。A third aspect of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program , the code set or the instruction set is loaded and executed by the processor to implement the natural language interpretation method based on the person portrait as described in any one of the foregoing.

采用本发明的技术方案，基于人物画像的自然语言解释方法包括：接收第一语音数据；确定所述第一语音数据所属的第一交互场景；选择与所述第一交互场景对应的第一交互行为解释模型；对所述第一语音数据进行语音识别，得到第一识别结果；利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正。通过确定与第一交互场景对应的第一交互行为解释模型，并利用第一交互行为解释模型对所述第一语音数据的第一识别结果中的第一关键词进行正，在提高语音识别/语言解释的效率的同时，也能提高语音识别/语言解释的准确率。By adopting the technical solution of the present invention, the natural language interpretation method based on person portrait includes: receiving first voice data; determining a first interaction scene to which the first voice data belongs; selecting a first interaction scene corresponding to the first interaction scene A behavior interpretation model; performing speech recognition on the first speech data to obtain a first recognition result; and using the first interactive behavior interpretation model to revise the first keywords that meet preset conditions in the first recognition result. By determining the first interaction behavior interpretation model corresponding to the first interaction scene, and using the first interaction behavior interpretation model to correct the first keyword in the first recognition result of the first speech data, it is possible to improve speech recognition/ While improving the efficiency of language interpretation, it can also improve the accuracy of speech recognition/language interpretation.

附图说明Description of drawings

图1是本发明一个实施例提供的基于人物画像的自然语言解释方法流程图；1 is a flowchart of a natural language interpretation method based on a portrait provided by an embodiment of the present invention;

图2是本发明另一个实施例中在确定所述第一语音数据所属的第一交互场景的步骤之后的流程图；FIG. 2 is a flowchart after the step of determining the first interaction scene to which the first voice data belongs in another embodiment of the present invention;

图3是本发明另一个实施例中选择与所述第一交互场景对应的第一交互行为解释模型的步骤的具体执行流程图；3 is a specific execution flow chart of a step of selecting a first interaction behavior interpretation model corresponding to the first interaction scene in another embodiment of the present invention;

图4是另一个实施例中利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正的步骤的具体执行流程图；4 is a specific execution flow chart of a step of using the first interaction behavior interpretation model to correct the first keyword in the first recognition result that satisfies a preset condition in another embodiment;

图5是另一个实施例中利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正的步骤之后的流程图；5 is a flow chart after the step of correcting the first keyword that satisfies a preset condition in the first recognition result by using the first interaction behavior interpretation model in another embodiment;

图6是另一个实施例中构建交互行为解释模型的方法流程图；6 is a flowchart of a method for constructing an interactive behavior interpretation model in another embodiment;

图7是本发明一个实施例提供的基于人物画像的自然语言解释装置的示意框图。FIG. 7 is a schematic block diagram of a natural language interpretation apparatus based on a person portrait provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了能够更清楚地理解本发明的上述目的、特征和优点，下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是，在不冲突的情况下，本申请的实施例及实施例中的特征可以相互组合。In order to understand the above objects, features and advantages of the present invention more clearly, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是，本发明还可以采用其他不同于在此描述的方式来实施，因此，本发明的保护范围并不受下面公开的具体实施例的限制。Many specific details are set forth in the following description to facilitate a full understanding of the present invention. However, the present invention can also be implemented in other ways different from those described herein. Therefore, the protection scope of the present invention is not limited by the specific implementation disclosed below. example limitations.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

下面参照图1至图7来描述根据本发明一些实施方式提供的一种基于人物画像的自然语言解释方法、装置及存储介质。The following describes a method, device, and storage medium for natural language interpretation based on portraits according to some embodiments of the present invention with reference to FIGS. 1 to 7 .

如图1所示，本发明一个实施例提供一种基于人物画像的自然语言解释方法，包括：As shown in FIG. 1, an embodiment of the present invention provides a natural language interpretation method based on portraits, including:

接收第一语音数据；receiving first voice data;

可以理解的是，本发明实施例提供的一种基于人物画像的自然语言解释方法可以应用于智能终端，如智能手机、电脑、智能电视等，还可以用于对讲设备、机器人、门禁系统等。It can be understood that a natural language interpretation method based on person portraits provided by the embodiments of the present invention can be applied to smart terminals, such as smart phones, computers, smart TVs, etc., and can also be used for intercom equipment, robots, access control systems, etc. .

在本发明的实施例中，可以通过语音采集单元(如麦克风)获取第一语音数据，也可以通过通信网络从服务器或智能终端获取第一语音数据。在采集所述第一语音数据的过程中，同时将语音发生场景的相关信息作为所述第一语音数据的第一属性信息进行保存。In the embodiment of the present invention, the first voice data may be acquired through a voice acquisition unit (eg, a microphone), or the first voice data may be acquired from a server or an intelligent terminal through a communication network. During the process of collecting the first voice data, the relevant information of the voice occurrence scene is stored as the first attribute information of the first voice data at the same time.

应当说明的是，在接收到所述第一语音数据后，根据所述第一语音数据所携带的第一属性信息，可以确定所述第一语音数据所属的第一交互场景，例如，第一语音数据的第一属性信息为采集地点，通过采集地点坐标可以确定坐标对应的建筑(如家、公司、商场等)；假定采集地点为公司，再结合其他第一属性信息，如采集时间(如星期一上午10点)、人物数量(如有5人；可以通过声纹特征确定)等，可以第一语音数据所属的交互场景为“公司会议”。根据实际应用场景，交互场景可以包括但不限于：家人闲谈、工作讨论、购物、朋友聚会等。It should be noted that, after the first voice data is received, the first interaction scene to which the first voice data belongs can be determined according to the first attribute information carried by the first voice data. The first attribute information of the voice data is the collection location, and the building (such as home, company, shopping mall, etc.) corresponding to the coordinates can be determined by the coordinates of the collection location; assuming that the collection location is a company, combined with other first attribute information, such as the collection time (such as the week of the week) 10:00 a.m.), the number of people (if there are 5 people; it can be determined by the voiceprint feature), etc., the interaction scene to which the first voice data belongs may be a "company meeting". According to actual application scenarios, the interaction scenarios may include but are not limited to: family chat, work discussion, shopping, friends gathering, etc.

进一步地，选择与所述第一交互场景对应的第一交互行为解释模型。其中，所述第一交互行为解释模型包含交互场景信息与人物角色画像信息之间的关联关系，如此，则可以通过确定对应的人物画像信息对第一识别结果进行针对性的解释或修正。Further, a first interaction behavior interpretation model corresponding to the first interaction scene is selected. Wherein, the first interaction behavior interpretation model includes the relationship between the interaction scene information and the character portrait information. In this way, the first recognition result can be interpreted or corrected in a targeted manner by determining the corresponding character portrait information.

在本发明的实施例中，对所述第一语音数据进行语音识别，可以通过语音识别模块按照声纹不同将所述第一语音数据进行分段，也可以按预设的时间长度将所述第一语音数据进行分段，还可以按预设的文件大小将所述第一语音数据进行分段，分段后的每一个语音片段按照语音发生的时间先后顺序进行排队，并利用语音识别算法按照排队序列将每个语音片段转换为对应的文本信息；将所述文本信息按时间先后顺序融合，并根据上下文进行调整，得到第一识别结果。In the embodiment of the present invention, to perform speech recognition on the first voice data, the first voice data may be segmented according to different voiceprints by a voice recognition module, or the first voice data may be divided according to a preset time length. The first voice data is segmented, and the first voice data can also be segmented according to a preset file size, and each segment of the segmented voice is queued according to the chronological sequence of speech occurrences, and a speech recognition algorithm is used. Each speech segment is converted into corresponding text information according to the queuing sequence; the text information is fused in chronological order, and adjusted according to the context to obtain a first recognition result.

对所述第一识别结果中满足预设条件(如出现频率和/或易错频率处于预设范围)的第一关键词(如具有地方特色的用语、行业通用语、专业术语等)，利用所述第一交互行为解释模型对第一关键词进行解释/修正，得到第一解释/修正结果，比如利用其包含的人物角色画像信息，结合人物的特征对第一关键词进行解释/修正。For the first keywords (such as terms with local characteristics, industry common terms, professional terms, etc.) that meet preset conditions (such as occurrence frequency and/or error-prone frequency in the preset range) in the first recognition result, use The first interactive behavior interpretation model interprets/amends the first keyword to obtain a first interpretation/correction result, for example, using the character portrait information contained in the first keyword to interpret/amend the first keyword in combination with the character's characteristics.

采用该实施例的技术方案，基于人物画像的自然语言解释方法包括：接收第一语音数据；确定所述第一语音数据所属的第一交互场景；选择与所述第一交互场景对应的第一交互行为解释模型；对所述第一语音数据进行语音识别，得到第一识别结果；利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正。通过确定与第一交互场景对应的第一交互行为解释模型，并利用第一交互行为解释模型对所述第一语音数据的第一识别结果中的第一关键词进行解释/修正，在提高语音识别/语言解释的效率的同时，也能提高语音识别/语言解释的准确率。Using the technical solution of this embodiment, the natural language interpretation method based on the person portrait includes: receiving first voice data; determining a first interaction scene to which the first voice data belongs; selecting a first interaction scene corresponding to the first interaction scene Interaction behavior interpretation model; perform speech recognition on the first voice data to obtain a first recognition result; use the first interaction behavior interpretation model to revise the first keywords that meet preset conditions in the first recognition result . By determining the first interaction behavior interpretation model corresponding to the first interaction scene, and using the first interaction behavior interpretation model to interpret/correct the first keyword in the first recognition result of the first voice data, improve the voice While the efficiency of recognition/language interpretation can be improved, the accuracy of speech recognition/language interpretation can also be improved.

如图2所示，在本发明一些可能的实施方式中，在确定所述第一语音数据所属的第一交互场景的步骤之后，还包括：As shown in FIG. 2, in some possible embodiments of the present invention, after the step of determining the first interaction scene to which the first voice data belongs, the method further includes:

应当说明的是，为了进一步地明确用户在第一交互场景下的交互行为特点，以选择最优的交互行为解释模型，在本发明的实施例中，根据所述第一交互场景，构建出一些交互事件，以获得用户在所述第一交互场景下的交互行为，例如发出指令提示用户作出第一动作，和/或提供一段第一文字数据并发出指令提示用户说出第一文字数据，和/或提供一些选项并发出指令提示用户输入第一选择数据等。通过采集所述第一动作数据和/或所述第一文字数据和/或所述第一选择数据，并从中提取第一关键信息。所述第一关键信息可以是特定的动作、特殊的语气、特别的发音或者特别的喜好等，本发明的实施方式对此不作限制。It should be noted that, in order to further clarify the characteristics of the user's interaction behavior in the first interaction scenario, so as to select the optimal interaction behavior interpretation model, in the embodiment of the present invention, according to the first interaction scenario, some An interaction event to obtain the interaction behavior of the user in the first interaction scenario, for example, issuing an instruction to prompt the user to make a first action, and/or providing a piece of first text data and issuing an instruction to prompt the user to speak the first text data, and/or Provide some options and issue instructions prompting the user to enter first selection data, etc. By collecting the first action data and/or the first text data and/or the first selection data, and extracting the first key information therefrom. The first key information may be a specific action, a special tone, a special pronunciation, or a special preference, etc., which is not limited by the embodiments of the present invention.

可以理解的是，基于所述第一交互场景，构建的交互事件越多、覆盖的事件类型越全面，得到的交互行为数据越多，后面选择交互行为解释模型就越准确。It can be understood that, based on the first interaction scenario, the more interaction events are constructed and the more comprehensive the types of events covered, the more interaction behavior data is obtained, and the more accurate the interaction behavior interpretation model is selected later.

如图3所示，在本发明一些可能的实施方式中，所述选择与所述第一交互场景对应的第一交互行为解释模型的步骤包括：As shown in FIG. 3, in some possible embodiments of the present invention, the step of selecting a first interaction behavior interpretation model corresponding to the first interaction scene includes:

可以理解的是，在本发明的实施例中，通过对历史发生的不同交互场景下的交互行为数据进行分析，并通过神经网络的处理后，建立了关键信息与交互行为解释模型之间的对应关系。如上文在解释第一关键信息所述，关键信息可以是特定的动作、特殊的语气、特别的发音或者特别的喜好等。基于关键信息与交互行为解释模型之间的对应关系，选择出与所述第一关键信息匹配的交互行为解释模型，将其作为与所述第一交互场景对应的第一交互行为解释模型。通过此方案，能快速而且准确选择出第一交互行为解释模型，提高了执行效率，提升了用户体验。It can be understood that, in the embodiment of the present invention, by analyzing the interactive behavior data in different historical interaction scenarios, and after processing by the neural network, the correspondence between the key information and the interactive behavior interpretation model is established. relation. As mentioned above in the explanation of the first key information, the key information may be a specific action, a special tone, a special pronunciation, or a special preference, and the like. Based on the correspondence between the key information and the interaction behavior interpretation model, the interaction behavior interpretation model matching the first key information is selected as the first interaction behavior interpretation model corresponding to the first interaction scene. Through this solution, the first interaction behavior interpretation model can be quickly and accurately selected, the execution efficiency is improved, and the user experience is improved.

如图4所示，在本发明一些可能的实施方式中，所述利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正的步骤包括：As shown in FIG. 4 , in some possible implementations of the present invention, the step of using the first interaction behavior interpretation model to correct the first keyword that meets the preset condition in the first recognition result includes:

可以理解的是，在本发明实施例中，所述第一识别结果可以为文本信息，从所述第一识别结果中提取满足预设条件(如出现频率和/或易错频率处于预设范围)的第一关键词(如具有地方特色的用语、行业通用语、专业术语等)；再从所述第一交互行为解释模型提取出第一人物角色画像，利用第一人物角色画像所含有的人物标签(如常居地、所在行业、口音特点、性别、人物关系等)，综合分析所述第一识别结果，并在存在误差时，对所述第一关键词进行解释/修正，得到第一解释/修正结果。本实施例中，利用人物角色画像，针对性的对所述第一识别结果进行分析，并对第一关键词进行修正，极大地提高了识别准备率。It can be understood that, in this embodiment of the present invention, the first recognition result may be text information, and the first recognition result is extracted from the first recognition result that satisfies a preset condition (for example, the frequency of occurrence and/or the error-prone frequency is within a preset range. ) of the first keyword (such as terms with local characteristics, industry common terms, professional terms, etc.); then extract the first character portrait from the first interactive behavior interpretation model, and use the first character portrait contained in the Character tags (such as habitual residence, industry, accent characteristics, gender, character relationship, etc.), comprehensively analyze the first recognition result, and when there is an error, interpret/correct the first keyword to obtain the first Interpret/correct results. In this embodiment, the first recognition result is analyzed in a targeted manner by using the persona portraits, and the first keyword is corrected, which greatly improves the recognition readiness rate.

在本发明一些可能的实施方式中，所述确定所述第一语音数据所属的第一交互场景的步骤，包括：In some possible implementations of the present invention, the step of determining the first interaction scene to which the first voice data belongs includes:

可以理解的是，如前所述，在采集所述第一语音数据的过程中，同时将语音发生场景的相关信息作为所述第一语音数据的第一属性信息进行保存，具体是将声音数据与所述第一属性信息进行打包形成所述第一语音数据，或者将声音数据的数据格式进行修改，增加一部分以记录所述第一属性信息形成所述第一语音数据。It can be understood that, as mentioned above, in the process of collecting the first voice data, the relevant information of the voice occurrence scene is stored as the first attribute information of the first voice data, specifically, the voice data is stored. Packing with the first attribute information to form the first voice data, or modifying the data format of the voice data, adding a part to record the first attribute information to form the first voice data.

其中，所述第一属性信息包括：所述第一语音数据的采集工具(如手机、无人机、办公机器人、智能摄像头等)、采集方式(如通过设备直接采集、通过网络连接其他设备采集等)、采集时间(如早晨6点、上午9点、下午3点、晚间8点等)、采集地点(如公司、家、商场、医院、学校等)、人物数量和语义环境(主要包括表达、领会的前言后语和上下文)。Wherein, the first attribute information includes: a collection tool of the first voice data (such as a mobile phone, a drone, an office robot, a smart camera, etc.), a collection method (such as direct collection through a device, collection through a network connection to other devices, etc.) etc.), collection time (such as 6 am, 9 am, 3 pm, 8 pm, etc.), collection location (such as company, home, shopping mall, hospital, school, etc.), number of characters and semantic environment (mainly including expression , comprehension of foreword and context).

在本发明的实施例中，通过记录语音发生场景的相关信息，为后续语音识别多提供了一个参考维度，提高语音识别效率和准度。In the embodiment of the present invention, by recording the relevant information of the speech occurrence scene, an additional reference dimension is provided for the subsequent speech recognition, and the speech recognition efficiency and accuracy are improved.

如图5所示，在本发明一些可能的实施方式中，在所述利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正的步骤之后，还包括：As shown in FIG. 5 , in some possible implementations of the present invention, after the step of using the first interaction behavior interpretation model to correct the first keywords that meet the preset conditions in the first recognition result ,Also includes:

可以理解的是，为了提高识别效率，本发明的实施例设置了反馈机制，通过以语音或文字的形式输出对所述第一关键词的第一解释/修正结果，并提供用户对识别结果进行评价反馈的交互接口；接收用户对所述修正结果的评价反馈，并根据所述评价反馈所表征的正负向值，相应调整所述第一交互行为解释模型对应所述第一交互场景时的优先级。It can be understood that, in order to improve the recognition efficiency, the embodiment of the present invention sets up a feedback mechanism, by outputting the first interpretation/correction result of the first keyword in the form of voice or text, and providing the user with the recognition result. An interactive interface for evaluation feedback; receiving the user's evaluation feedback on the correction result, and correspondingly adjusting the first interaction behavior interpretation model corresponding to the first interaction scene according to the positive and negative values represented by the evaluation feedback. priority.

如图6所示，在本发明一些可能的实施方式中，在所述接收第一语音数据的步骤之前，还包括：As shown in FIG. 6, in some possible embodiments of the present invention, before the step of receiving the first voice data, the method further includes:

S1、从用户群中确定第一用户与第二用户之间的关系，并利用所述第一用户和所述第二用户各自的唯一身份标签生成第一关系标签；S1, determine the relationship between the first user and the second user from the user group, and generate a first relationship label by using the respective unique identity labels of the first user and the second user;

可以理解的是，每个用户都具有唯一身份标签，用户之间的关系可以是角色关系如父母、子女、夫妻、朋友、同事或其他基本关系等，而用户的角色标签可以基于其唯一身份标签通过预设的规则进行构建，如在唯一身份标签后加上角色字段等，而第一关系标签构建可以是由所述第一用户和所述第二用户的角色标签的融合。在本步骤中，随机选取任意两个不同的用户作为第一用户与第二用户。It can be understood that each user has a unique identity tag, the relationship between users can be role relationship such as parent, child, husband and wife, friend, colleague or other basic relationship, and the user's role tag can be based on its unique identity tag. The construction is performed through preset rules, such as adding a role field after the unique identity tag, and the construction of the first relationship tag may be a fusion of the role tags of the first user and the second user. In this step, any two different users are randomly selected as the first user and the second user.

S2、获取所述第一用户和所述第二用户之间的第一交互行为数据；S2, acquiring first interaction behavior data between the first user and the second user;

在本步骤中，交互行为包括闲谈、讨论、教导、命令等，从多个用户/角色之间的交互行为数据中(如语音、动作、文字、地理位置、角色距离、同时参与人数、背景噪音等)，提取所述第一用户和所述第二用户之间的第一交互行为数据。In this step, the interactive behavior includes chatting, discussion, teaching, commands, etc., from the interactive behavior data between multiple users/characters (such as voice, action, text, geographic location, character distance, number of simultaneous participants, background noise) etc.), extract the first interaction behavior data between the first user and the second user.

S3、根据所述第一交互行为数据和所述第一关系标签，构建所述第一用户的第一人物角色画像、所述第二用户的第二人物角色画像以及所述第一用户和所述第二用户之间的交互行为数据库；S3. According to the first interaction behavior data and the first relationship label, construct a first persona portrait of the first user, a second persona portrait of the second user, and the first user and all Describe the interactive behavior database between the second users;

在本步骤中，通过用词、情感、年龄、性别、教育阶段、口音、爱好等方面建立人物角色画像，基于人物角色画像及必要的技术手段(关键词识别、情绪识别和态度分析等)，建立人物角色两两之间的交互行为数据库，交互行为数据库中包含每段交互行为的关系标签(如前述第一关系标签构建而得)；In this step, character portraits are established by using words, emotions, age, gender, education stage, accent, hobbies, etc., based on character portraits and necessary technical means (keyword identification, emotion identification and attitude analysis, etc.) Establish an interactive behavior database between the characters, and the interactive behavior database contains the relationship label of each interactive behavior (as constructed from the aforementioned first relationship label);

S4、重复前述操作，直至所有用户均按不同角色建立了人物角色画像、不同人物角色画像之间都建立了交互行为数据库，并从所述交互行为数据库中提取出关键信息；S4, repeating the foregoing operations until all users have established character portraits according to different roles, and an interactive behavior database has been established between different character portraits, and extracted key information from the interactive behavior database;

在本步骤中，所述关键信息，如前文所述的第一关键信息，可以是特定的动作、特殊的语气、特别的发音或者特别的喜好等。In this step, the key information, such as the aforementioned first key information, may be a specific action, a special tone, a special pronunciation, or a special preference.

S5、将所述交互行为数据库中的数据输入训练好的神经网络得到多个交互行为解释模型；S5, inputting the data in the interactive behavior database into the trained neural network to obtain multiple interactive behavior interpretation models;

S6、建立所述关键信息与所述多个交互行为解释模型之间的对应关系。S6. Establish a corresponding relationship between the key information and the multiple interaction behavior interpretation models.

本实施例以用户间的角色关系为基础、以交互行为数据为分析依据，利用预先训练好的神经网络生成构建的交互行为解释模型，执行方法简单，生成的交互行为解释模型也能提供精准的解释结果。This embodiment is based on the role relationship between users and the interactive behavior data as the analysis basis, and uses the pre-trained neural network to generate and construct the interactive behavior interpretation model. The execution method is simple, and the generated interactive behavior interpretation model can also provide accurate information. Interpret the results.

在本发明一些可能的实施方式中，设立验证机制，通过为同一交互场景设置对应的多个交互行为解释模型，将所述第一语音数据的第一识别结果中的第一关键词通过多个交互行为解释模型进行解释/修正后，由用户选择出最正确的解释/修正结果。具体的是，在利用所述第一互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行解释/修正，得到第一解释/修正结果之后，还包括：In some possible implementations of the present invention, a verification mechanism is established, and by setting a plurality of corresponding interaction behavior interpretation models for the same interaction scene, the first keyword in the first recognition result of the first speech data is passed through multiple After the interactive behavior explanation model is explained/corrected, the user selects the most correct interpretation/correction result. Specifically, after using the first mutual behavior interpretation model to interpret/correct the first keywords that meet the preset conditions in the first recognition result, and obtain the first interpretation/correction result, the method further includes:

选择与所述第一交互场景对应的第二交互行为解释模型；selecting a second interaction behavior interpretation model corresponding to the first interaction scene;

利用所述第二交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行解释/修正，得到第二解释/修正结果；Using the second interactive behavior interpretation model to interpret/correct the first keywords in the first recognition result that meet the preset conditions, to obtain a second interpretation/correction result;

将所述第一解释/修正结果、所述第二解释/修正结果展示给用户，由用户选择最优者，并根据用户的选择，调整所述第一交互行为解释模型与所述第二交互行为解释模型的优选级。Display the first interpretation/correction result and the second interpretation/correction result to the user, and the user selects the best one, and adjusts the first interaction behavior interpretation model and the second interaction according to the user's selection The preference for behavioral explanation models.

如图7所示，本发明的另一实施例提供一种基于人物画像的自然语言解释装置700，包括：语音接收模块701、处理模块702、语音识别模块703和结果修正模块704；As shown in FIG. 7 , another embodiment of the present invention provides a natural language interpretation device 700 based on person portraits, including: a speech receiving module 701, a processing module 702, a speech recognition module 703, and a result correction module 704;

所述语音接收模块701，用于接收第一语音数据；The voice receiving module 701 is configured to receive first voice data;

所述处理模块702，用于确定所述第一语音数据所属的第一交互场景，以及选择与所述第一交互场景对应的第一交互行为解释模型；The processing module 702 is configured to determine a first interaction scene to which the first voice data belongs, and select a first interaction behavior interpretation model corresponding to the first interaction scene;

所述语音识别模块703，用于对所述第一语音数据进行语音识别，得到第一识别结果；The speech recognition module 703 is configured to perform speech recognition on the first speech data to obtain a first recognition result;

所述结果修正模块704，用于利用所述第一交互行为解释模型对所述第一识别结果中满足预设条件的第一关键词进行修正；The result correction module 704 is configured to use the first interaction behavior interpretation model to correct the first keywords that meet the preset conditions in the first recognition result;

本实施例提供的装置的运行方法请参见前述各方法实施例，在此不再赘述。For the operation method of the apparatus provided in this embodiment, please refer to the foregoing method embodiments, and details are not described herein again.

图7为本实施例中装置的硬件组成示意图。可以理解的是，图7仅仅示出了装置的简化设计。在实际应用中，装置还可以分别包含必要的其他元件，包含但不限于任意数量的输入/输出系统、处理器、控制器、存储器等，而所有可以实现本申请实施例的自然语言解释方法的装置都在本申请的保护范围之内。FIG. 7 is a schematic diagram of the hardware composition of the apparatus in this embodiment. It will be appreciated that Figure 7 only shows a simplified design of the device. In practical applications, the device may also include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all the elements that can implement the natural language interpretation method of the embodiments of the present application The devices are all within the scope of protection of the present application.

本发明的另一实施例提供一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如前述任一所述方法。Another embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, at least one segment of a program, a code set or an instruction set, the at least one instruction, the at least one segment of A program, the set of codes or the set of instructions is loaded and executed by a processor to implement any of the methods described above.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本申请所必须的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Because in accordance with the present application, certain steps may be performed in other orders or concurrently. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如上述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the above-mentioned units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例上述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art, or all or part of the technical solution, and the computer software product is stored in a memory, Several instructions are included to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the above-mentioned methods in the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储器中，存储器可以包括：闪存盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取器(英文：Random Access Memory，简称：RAM)、磁盘或光盘等。Those skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), magnetic disk or optical disk, etc.

以上对本申请实施例进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above, and the principles and implementations of the present application are described in this paper by using specific examples. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; at the same time, for Persons of ordinary skill in the art, based on the idea of the present application, will have changes in the specific implementation manner and application scope. In summary, the contents of this specification should not be construed as limitations on the present application.

虽然本发明披露如上，但本发明并非限定于此。任何本领域技术人员，在不脱离本发明的精神和范围内，可轻易想到变化或替换，均可作各种更动与修改，包含上述不同功能、实施步骤的组合，包含软件和硬件的实施方式，均在本发明的保护范围。Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art, without departing from the spirit and scope of the present invention, can easily think of changes or substitutions, and can make various changes and modifications, including the combination of the above-mentioned different functions and implementation steps, including the implementation of software and hardware. The methods are all within the protection scope of the present invention.

Claims

1. a natural language interpretation method based on portrait, is characterized in that, comprises:

receiving first voice data;

determining the first interaction scene to which the first voice data belongs;

selecting a first interaction behavior interpretation model corresponding to the first interaction scene;

performing speech recognition on the first speech data to obtain a first recognition result;

Using the first interaction behavior interpretation model to correct the first keywords that meet the preset conditions in the first recognition result;

Wherein, the first interaction behavior interpretation model includes an association relationship between interaction scene information and character portrait information.

2. The natural language interpretation method based on person portrait according to claim 1, characterized in that, after the step of determining the first interaction scene to which the first voice data belongs, further comprising:

According to the first interaction scenario, prompting the user to make a first action and/or prompting the user to speak the first text data and/or prompting the user to input the first selection data;

collecting the first motion data and/or the first text data and/or the first selection data;

The first key information is extracted from the first action data and/or the first text data and/or the first selection data.

3. The natural language interpretation method based on person portrait according to claim 2, wherein the step of selecting the first interaction behavior interpretation model corresponding to the first interaction scene comprises:

Based on the correspondence between the pre-established key information and the interaction behavior interpretation model, selecting an interaction behavior interpretation model that matches the first key information;

It is determined that the interaction behavior interpretation model matching the first key information is the first interaction behavior interpretation model corresponding to the first interaction scene.

4. The natural language interpretation method based on person portrait according to claim 3, wherein the first keyword that satisfies a preset condition in the first recognition result is analyzed by using the first interactive behavior interpretation model. The steps to make corrections include:

extracting a first keyword that satisfies a preset condition from the first identification result;

extracting a first character portrait from the first interactive behavior interpretation model;

The first keyword is modified according to the first character portrait.

5. The natural language interpretation method based on person portrait according to claim 4, wherein the step of determining the first interaction scene to which the first voice data belongs comprises:

extracting first attribute information from the first voice data;

The first interaction scene is determined according to the first attribute information.

6. The natural language interpretation method based on person portrait according to claim 5, wherein the first attribute information comprises: a collection tool, collection method, collection time, collection location, character of the first voice data Quantitative and semantic context.

7 . The natural language interpretation method based on person portrait according to claim 6 , wherein, in the first key that satisfies a preset condition in the first recognition result using the first interactive behavior interpretation model, After the steps of word correction, it also includes:

outputting the first correction result for the first keyword in the form of voice or text;

receiving the user's evaluation feedback on the correction result;

When the evaluation feedback is a positive value, increase the priority of the first interaction behavior interpretation model corresponding to the first interaction scene;

When the evaluation feedback is a negative value, the priority of the first interaction behavior interpretation model corresponding to the first interaction scene is lowered.

8. The natural language interpretation method based on person portrait according to claim 7, is characterized in that, before the described step of receiving the first speech data, also comprises:

Determine the relationship between the first user and the second user from the user group, and generate a first relationship label by using the respective unique identity labels of the first user and the second user;

acquiring first interaction behavior data between the first user and the second user;

According to the first interaction behavior data and the first relationship label, construct a first persona portrait of the first user, a second persona portrait of the second user, and the first user and the first persona Interactive behavior database between two users;

Repeat the foregoing operations until all users have established character portraits according to different roles, and an interactive behavior database has been established between different character portraits, and extracted key information from the interactive behavior database;

Inputting the data in the interactive behavior database into a trained neural network to obtain multiple interactive behavior interpretation models;

A corresponding relationship between the key information and the plurality of interaction behavior explanation models is established.

9. A natural language interpretation device based on portraits, comprising: a voice receiving module, a processing module, a voice recognition module and a result correction module;

the voice receiving module for receiving the first voice data;

The processing module is configured to determine a first interaction scene to which the first voice data belongs, and select a first interaction behavior interpretation model corresponding to the first interaction scene;

the voice recognition module, configured to perform voice recognition on the first voice data to obtain a first recognition result;

the result correction module, configured to use the first interaction behavior interpretation model to correct the first keywords that meet the preset conditions in the first recognition result;

10. A computer-readable storage medium, characterized in that:

The computer-readable storage medium stores at least one instruction, at least one piece of program, code set or instruction set, and the at least one instruction, the at least one piece of program, the code set or the instruction set is loaded and executed by the processor to Realize the natural language interpretation method based on person portrait as described in any one of claims 1 to 8.