CN116795957A - Dialogue information acquisition method and device, storage medium and electronic device - Google Patents

Dialogue information acquisition method and device, storage medium and electronic device Download PDF

Info

Publication number
CN116795957A
CN116795957A CN202211676142.0A CN202211676142A CN116795957A CN 116795957 A CN116795957 A CN 116795957A CN 202211676142 A CN202211676142 A CN 202211676142A CN 116795957 A CN116795957 A CN 116795957A
Authority
CN
China
Prior art keywords
information
dialogue
multimedia information
multimedia
dialogue information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211676142.0A
Other languages
Chinese (zh)
Inventor
陈秀龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd, Haier Uplus Intelligent Technology Beijing Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202211676142.0A priority Critical patent/CN116795957A/en
Publication of CN116795957A publication Critical patent/CN116795957A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a method and a device for acquiring dialogue information, a storage medium and an electronic device, and relates to the technical field of smart families, wherein the method for acquiring dialogue information comprises the following steps: carrying out semantic analysis on the received first dialogue information, and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information; determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result; under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result; and determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.

Description

Dialogue information acquisition method and device, storage medium and electronic device
Technical Field
The application relates to the technical field of smart families, in particular to a method and a device for acquiring dialogue information, a storage medium and an electronic device.
Background
The current intelligent home dialogue system is mainly constructed by single-mode texts, and cannot answer cross-mode questions such as user video and image questions. For example, the user sends a photograph of the model 328 of the air conditioner and asks the system to say "what is the model of the air conditioner? By the way, when such multi-modal questions are presented, the current system is unable to answer such questions because it is unable to combine different modal information at the same time.
In the related art, mainly text-based reference digestion methods are used for constructing a series of digestion reference digestion deep learning models, such as translation models, according to information of a first keyword in a user problem and the like and in combination with text contents of a conversation context, directly translating the first keyword into the content corresponding to the text in prediction, or syntactically analyzing the content, and then constructing rules to replace the first keyword of the current user problem with nouns or subjects corresponding to the text, so as to solve the problems. However, whether the time depth model method is implemented or the syntax analysis and other rule methods are implemented, the system is based on a text single mode, and cannot analyze the picture content information appearing above, and the reference resolution cannot be implemented.
Aiming at the problems that in the related art, dialogue information of a user cannot be determined by combining different modal information, and the like, no effective solution has been proposed yet.
Disclosure of Invention
The embodiment of the application provides a method and a device for acquiring dialogue information, a storage medium and an electronic device, which are used for at least solving the problems that the dialogue information of a user cannot be determined by combining different modal information in the related technology.
According to an embodiment of the present application, there is provided a method for acquiring dialogue information, including: carrying out semantic analysis on the received first dialogue information, and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information; determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result; under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result; and determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.
In an exemplary embodiment, determining historical dialog information within a target range corresponding to the first dialog information according to the semantic parsing result includes: determining historical dialogue information in a first range corresponding to the first dialogue information when the first keyword and/or the second keyword exist in the first dialogue information, wherein the target range comprises: the first range; determining historical conversation information in a second range corresponding to the first conversation information when the first keyword is present in the first conversation information and the second keyword is not present, wherein the target range comprises: the second range, the first range being greater than the second range.
In an exemplary embodiment, the multimedia information parsing of the first multimedia information is performed to obtain a multimedia information parsing result, including: determining whether a keyword for indicating a first object exists in the first dialogue information, wherein the first object at least comprises one of the following: text, objects, users; and under the condition that the first dialogue information contains the keyword for indicating the first object, carrying out multimedia information analysis on the first multimedia information according to the analysis mode corresponding to the first object to obtain a multimedia information analysis result.
In an exemplary embodiment, the multimedia information analysis is performed on the first multimedia information according to the analysis mode corresponding to the first object, so as to obtain a multimedia information analysis result, where the multimedia information analysis result at least includes one of the following: and under the condition that the first object is a word, analyzing the first multimedia information in a word recognition mode, and determining the word information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the text information; and under the condition that the first object is an object, analyzing the first multimedia information in an object identification mode, and determining object information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the object information; and under the condition that the first object is a user, analyzing the first multimedia information in a human body identification mode, and determining the user information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the user information.
In an exemplary embodiment, the multimedia information parsing of the first multimedia information is performed to obtain a multimedia information parsing result, including: performing multimedia information analysis on the first multimedia information by an analysis mode to obtain a second object in the first multimedia information and object information of the second object, wherein the analysis mode at least comprises one of the following steps: the second object at least comprises one of the following components: text, objects and users, wherein the multimedia information analysis result comprises: the object information.
In an exemplary embodiment, determining second session information corresponding to the first session information according to the multimedia information parsing result and the first session information includes: determining nouns corresponding to the first keywords and/or the second keywords in the first dialogue information according to the multimedia information analysis result; and replacing the first keyword and/or the second keyword in the first dialogue information with the noun, and acquiring the replaced first dialogue information as the second dialogue information.
In an exemplary embodiment, before performing multimedia information parsing on the first multimedia information to obtain a multimedia information parsing result, the method further includes: determining whether the first multimedia information exists in the historical dialogue information; acquiring third dialogue information input by a second object under the condition that the first multimedia information does not exist in the historical dialogue information, wherein the input time of the third dialogue information is later than that of the first dialogue information; and sending prompt information for indicating that the first dialogue information is incomplete to the second object under the condition that the first multimedia information does not exist in the third dialogue information.
According to another embodiment of the present application, there is also provided a device for acquiring dialogue information, including: the first analysis module is used for carrying out semantic analysis on the received first dialogue information and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information; the first determining module is used for determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result; the second analysis module is used for carrying out multimedia information analysis on the first multimedia information under the condition that the first multimedia information exists in the historical dialogue information to obtain a multimedia information analysis result; and the second determining module is used for determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.
According to still another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-described method for acquiring dialogue information when running.
According to still another aspect of the embodiments of the present application, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the method for acquiring dialogue information through the computer program.
In the embodiment of the application, semantic analysis is carried out on received first dialogue information, and a semantic analysis result corresponding to the first dialogue information is determined, wherein the semantic analysis result is used for indicating whether a first keyword with target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information; determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result; under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result; determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information; by adopting the technical scheme, the problems that dialogue information of a user cannot be determined by combining different modal information are solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of a hardware environment of a method for acquiring dialogue information according to an embodiment of the present application;
fig. 2 is a flowchart of a method of acquiring dialogue information according to an embodiment of the present application;
fig. 3 is a schematic diagram of a method for acquiring dialogue information according to an embodiment of the present application;
fig. 4 is a block diagram (a) of a structure of a dialogue information acquisition device according to an embodiment of the present application;
fig. 5 is a block diagram (two) of a dialogue information acquisition device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiment of the application, a method for acquiring dialogue information is provided. The method for acquiring the dialogue information is widely applied to full-house intelligent digital control application scenes such as Smart Home (Smart Home), intelligent Home equipment ecology, intelligent Home (intelligent house) ecology and the like. Alternatively, in the present embodiment, the above-described method of acquiring dialogue information may be applied to a hardware environment constituted by the terminal device 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.
The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.
In this embodiment, a method for obtaining dialogue information is provided and applied to a computer terminal, and fig. 2 is a flowchart of a method for obtaining dialogue information according to an embodiment of the present application, where the flowchart includes the following steps:
Step S202, carrying out semantic analysis on received first dialogue information, and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information;
for example, the first dialogue information may be "who the child is on this picture", "what the function of that air conditioner is," or the like.
It should be noted that, in the case that the first dialogue information is "who the child on the picture is", the first keyword exists in the first dialogue information, and the second keyword exists; in the case where the first dialogue information is "who this girl is", the first keyword is present in the first dialogue information, but the second keyword is not present.
It should be noted that, the first keyword may be understood as an indication pronoun, where the indication pronoun includes, but is not limited to: this, that, this, that; the first keyword may be understood as a human pronoun including, but not limited to: you, me, he, it; the second keyword may be understood as a multimodal information keyword including, but not limited to: pictures, photos, videos, voices.
Step S204, according to the semantic analysis result, determining historical dialogue information in a target range corresponding to the first dialogue information;
it should be noted that, in the embodiment of the present invention, the history dialogue record may be the history dialogue record of the last round with the first object, the history dialogue record of the last five rounds with the first object, or all the history dialogue records with the first object, which is not limited in the embodiment of the present invention.
Step S206, under the condition that the first multimedia information exists in the history dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result;
step S208, determining second session information corresponding to the first session information according to the multimedia information analysis result and the first session information.
Through the steps, semantic analysis is carried out on the received first dialogue information, and a semantic analysis result corresponding to the first dialogue information is determined, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information; determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result; under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result; according to the multimedia information analysis result and the first dialogue information, the second dialogue information corresponding to the first dialogue information is determined, the problems that dialogue information of a user cannot be determined by combining different modal information in the related technology are solved, and the embodiment of the invention combines context information and multi-modal multi-round states to help the user to perform cross-modal reference resolution, so that multi-modal multi-round dialogue is realized.
In an exemplary embodiment, determining historical dialog information within a target range corresponding to the first dialog information according to the semantic parsing result includes: determining historical dialogue information in a first range corresponding to the first dialogue information when the first keyword and/or the second keyword exist in the first dialogue information, wherein the target range comprises: the first range; determining historical conversation information in a second range corresponding to the first conversation information when the first keyword is present in the first conversation information and the second keyword is not present, wherein the target range comprises: the second range, the first range being greater than the second range.
It should be noted that, when the first keyword exists in the first dialogue information and the second keyword exists, or when the first keyword does not exist in the first dialogue information and the second keyword exists, the first multimedia information exists in the history dialogue record of the first object and the large probability is described, so that a large range of history dialogue records are obtained; in the case that a first keyword exists in the first dialogue information and a second keyword does not exist, the first multimedia information exists in the history dialogue record of the first object and the small probability is described, and therefore, a history dialogue record in a small range is acquired.
It should be noted that, in the second embodiment of the present invention, the history dialogue information in the second range may be the history dialogue record of the last round with the first object, and the history dialogue information in the first range may be the history dialogue record of the last five rounds with the first object, or may be all the history dialogue records with the first object.
In an exemplary embodiment, the multimedia information parsing of the first multimedia information is performed to obtain a multimedia information parsing result, including: determining whether a keyword for indicating a first object exists in the first dialogue information, wherein the first object at least comprises one of the following: text, objects, users; and under the condition that the first dialogue information contains the keyword for indicating the first object, carrying out multimedia information analysis on the first multimedia information according to the analysis mode corresponding to the first object to obtain a multimedia information analysis result.
Specifically, the multimedia information analysis is performed on the first multimedia information according to the analysis mode corresponding to the first object, so as to obtain a multimedia information analysis result, wherein the multimedia information analysis result at least comprises one of the following steps: and under the condition that the first object is a word, analyzing the first multimedia information in a word recognition mode, and determining the word information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the text information; and under the condition that the first object is an object, analyzing the first multimedia information in an object identification mode, and determining object information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the object information; and under the condition that the first object is a user, analyzing the first multimedia information in a human body identification mode, and determining the user information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the user information.
It should be noted that, because the image understanding range is numerous, including object detection, chinese character recognition (i.e. OCR), face recognition, human body detection, etc., it is necessary to combine the part of speech of the first keyword that needs to be resolved in the first dialogue information to help reduce the image understanding range. For example, the user sends a photograph of the model 328 of the air conditioner and asks the system to say "what is the model of the air conditioner? "this" is used as a sign-modifying air conditioner, which is a home appliance, so that the image should be subject to object recognition, which is a home appliance air conditioner, and the model is recognized as 328. Thereby achieving the technical effect of improving the image recognition efficiency.
For example, when the first dialogue information is "what the text in the picture says," it is explained that the text exists in the first multimedia information, and therefore, the picture is parsed by the text recognition method to obtain the text in the picture.
In an exemplary embodiment, the multimedia information parsing of the first multimedia information is performed to obtain a multimedia information parsing result, including: performing multimedia information analysis on the first multimedia information by an analysis mode to obtain a second object in the first multimedia information and object information of the second object, wherein the analysis mode at least comprises one of the following steps: the second object at least comprises one of the following components: text, objects and users, wherein the multimedia information analysis result comprises: the object information.
In the embodiment of the invention, under the condition that a user sends multimedia information such as a picture, a video and the like, the received multimedia information is analyzed through detection modes such as object detection, chinese character recognition (namely OCR), face recognition, human body detection and the like, and then the analysis result is stored in a database, and then the detection result is directly obtained in the database if required.
In an exemplary embodiment, determining second session information corresponding to the first session information according to the multimedia information parsing result and the first session information includes: determining nouns corresponding to the first keywords and/or the second keywords in the first dialogue information according to the multimedia information analysis result; and replacing the first keyword and/or the second keyword in the first dialogue information with the noun, and acquiring the replaced first dialogue information as the second dialogue information.
For example, the user sends a photograph of the model 328 of the air conditioner and asks the system to say "what is the model of the air conditioner? "this" is used as a sign-modifying air conditioner, which is a home appliance, so that the image should be subject to object recognition, it is recognized that this is a home appliance air conditioner by object detection, and the model is recognized as 328, and "what mode is this air conditioner? What are all modes of air conditioner "replace" model 328? "
In an exemplary embodiment, before performing multimedia information parsing on the first multimedia information to obtain a multimedia information parsing result, the method further includes: determining whether the first multimedia information exists in the historical dialogue information; acquiring third dialogue information input by a second object under the condition that the first multimedia information does not exist in the historical dialogue information, wherein the input time of the third dialogue information is later than that of the first dialogue information; and sending prompt information for indicating that the first dialogue information is incomplete to the second object under the condition that the first multimedia information does not exist in the third dialogue information.
That is, the user may send the dialogue information first and then send the multimedia information, so that in the case that the first multimedia information does not exist in the history dialogue record, the user may further input third dialogue information with a time later than the input time of the first dialogue information, and further determine the noun corresponding to the first keyword according to the second multimedia information in the third dialogue information; and under the condition that the multimedia information does not exist in the third dialogue information and the historical dialogue information, sending prompt information for indicating that the first dialogue information is incomplete to the second object, and further enabling the second object to supplement the first dialogue information.
In order to better understand the process of the method for acquiring the session information, the following description is given with reference to the implementation method flow of the session information in the alternative embodiment, but the implementation method flow is not limited to the technical solution of the embodiment of the present application.
In this embodiment, a method for obtaining dialogue information is provided, and fig. 3 is a schematic diagram of a method for obtaining dialogue information according to an embodiment of the present application, as shown in fig. 3, specifically including the following steps: step S301: acquiring a text dialogue input by a user;
step S302: part of speech and syntax parsing of a text dialog;
step S303: triggering a multi-modal reference resolution function in the case of a first keyword in the text conversation;
the first keyword includes: indicating the first keyword and personally calling the first keyword, the indicating the first keyword includes, but is not limited to: this, that, this, that; the first keyword is called by people to include, but is not limited to: you, me, he, it; the second keywords include, but are not limited to: pictures, photos, etc., the user may not speak the keywords of the pictures, the photos, etc., but instead, the user may directly send a photo and then initiate a conversation, which triggers the multi-modal reference resolution function.
Step S304: under the condition of triggering a multi-mode reference digestion function, acquiring a history dialogue record;
it should be noted that, the last dialog may be a picture, or several rounds may be separated. And prompting the user that the dialogue is incomplete under the condition that the multi-mode touch word is not available and the last round is not a picture. Acquiring 5 rounds of dialogue records under the condition that multi-mode touch words exist; and under the condition that the current round of the user does not have the multi-modal trigger words, acquiring the dialogue record of the last round.
Step S305: under the condition that the image exists in the history dialogue record, carrying out image content understanding;
because of the wide range of image understanding, including object detection, character recognition (i.e., OCR) in images, face recognition, human detection, etc., it is desirable to incorporate the first keyword part of speech that needs to be resolved in current text dialogues to help narrow the image understanding range. For example, the user sends a photograph of the model 328 of the air conditioner and asks the system to say "what is the model of the air conditioner? "this" is used as a sign-modifying air conditioner, which is a home appliance, so that the image should be subject to object recognition, it is recognized that this is a home appliance air conditioner by image understanding, and the model is recognized as 328. Or when the user sends a picture, the picture is analyzed by object detection, chinese character recognition (namely OCR) in the image, face recognition and human body detection, and then the picture is stored in the dialogue record.
Step S306: and replacing nouns corresponding to the pictures with first keywords at corresponding positions in the text dialogue according to the analysis result.
It should be noted that, considering the syntax structure and sentence smoothness, it is also necessary to de-duplicate the words in the sentence. For example, the center of the "this" decoration is de-duplicated with the "air conditioner" and the image identified air conditioner 328, and post-processing is performed appropriately. Finally, the multi-mode reference digestion is completed.
According to the embodiment of the invention, part of speech and syntax analysis are combined, and multi-mode reference digestion is triggered according to the second keywords; the recognition range of the image content is reduced according to part of speech and syntactic analysis; and finally, carrying out post-processing on the sentence after digestion by utilizing syntax analysis. According to the embodiment of the invention, the user is assisted to perform cross-mode reference resolution by combining the context information and the multi-mode multi-round state, so that multi-mode multi-round dialogue is realized. Meanwhile, grammar knowledge in the sight line reduces the scope of multi-mode such as image understanding tasks, improves the image recognition efficiency, and can greatly improve the efficiency by combining grammar information because the image recognition speed is lower and a large amount of resources are required to be occupied for carrying out. According to the multi-modal trigger words or the mode that the latest round is defined as a picture, meaningless multi-modal reference resolution trigger is reduced, and the system efficiency is improved. Finally, combining grammar rules to carry out post-treatment after the reference resolution, so that the reference resolution result is more smooth. Compared with a pure deep learning method, the method has low cost and is easier to realize.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present application.
Fig. 4 is a block diagram (a) of a structure of a dialogue information acquisition device according to an embodiment of the present application; as shown in fig. 4, includes:
the first parsing module 42 is configured to perform semantic parsing on the received first dialogue information, and determine a semantic parsing result corresponding to the first dialogue information, where the semantic parsing result is used to indicate whether a first keyword with a target part of speech and/or a second keyword that characterizes first multimedia information exist in the first dialogue information;
A first determining module 44, configured to determine, according to the semantic analysis result, historical dialogue information in a target range corresponding to the first dialogue information;
the second parsing module 46 is configured to parse the first multimedia information to obtain a multimedia information parsing result when the first multimedia information exists in the historical dialogue information;
and a second determining module 48, configured to determine second session information corresponding to the first session information according to the multimedia information analysis result and the first session information.
Through the device, semantic analysis is carried out on the received first dialogue information, and a semantic analysis result corresponding to the first dialogue information is determined, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information; determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result; under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result; according to the multimedia information analysis result and the first dialogue information, the second dialogue information corresponding to the first dialogue information is determined, the problems that dialogue information of a user cannot be determined by combining different modal information in the related technology are solved, and the embodiment of the invention combines context information and multi-modal multi-round states to help the user to perform cross-modal reference resolution, so that multi-modal multi-round dialogue is realized.
In an exemplary embodiment, a first determining module is configured to determine, in a case where the first keyword and/or the second keyword exist in the first session information, historical session information in a first range corresponding to the first session information, where the target range includes: the first range; determining historical conversation information in a second range corresponding to the first conversation information when the first keyword is present in the first conversation information and the second keyword is not present, wherein the target range comprises: the second range, the first range being greater than the second range.
In an exemplary embodiment, the second parsing module is configured to determine whether a keyword for indicating a first object exists in the first dialog information, where the first object includes at least one of the following: text, objects, users; and under the condition that the first dialogue information contains the keyword for indicating the first object, carrying out multimedia information analysis on the first multimedia information according to the analysis mode corresponding to the first object to obtain a multimedia information analysis result.
In an exemplary embodiment, the second parsing module is further configured to perform at least one of: and under the condition that the first object is a word, analyzing the first multimedia information in a word recognition mode, and determining the word information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the text information; and under the condition that the first object is an object, analyzing the first multimedia information in an object identification mode, and determining object information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the object information; and under the condition that the first object is a user, analyzing the first multimedia information in a human body identification mode, and determining the user information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the user information.
In an exemplary embodiment, a second parsing module is configured to parse the first multimedia information by using a parsing manner to obtain a second object in the first multimedia information and object information of the second object, where the parsing manner at least includes one of the following: the second object at least comprises one of the following components: text, objects and users, wherein the multimedia information analysis result comprises: the object information.
In an exemplary embodiment, the second determining module is configured to determine, according to the multimedia information parsing result, a noun corresponding to the first keyword and/or the second keyword in the first dialogue information; and replacing the first keyword and/or the second keyword in the first dialogue information with the noun, and acquiring the replaced first dialogue information as the second dialogue information.
In an exemplary embodiment, fig. 5 is a block diagram (ii) of a structure of a dialogue information acquisition device according to an embodiment of the present application; as shown in fig. 5, includes: the sending module 52, where before performing multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result, the method further includes: a first determining module, configured to determine whether the first multimedia information exists in the historical dialogue information; the first analysis module is used for acquiring third dialogue information input by a second object under the condition that the first multimedia information does not exist in the historical dialogue information, wherein the input time of the third dialogue information is later than that of the first dialogue information; and a sending module 52, configured to send, to the second object, prompt information for indicating that the first session information is incomplete, if the first multimedia information does not exist in the third session information.
An embodiment of the present application also provides a storage medium including a stored program, wherein the program executes the method of any one of the above.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:
s1, carrying out semantic analysis on received first dialogue information, and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information;
s2, determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result;
s3, under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result;
s4, determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, carrying out semantic analysis on received first dialogue information, and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information;
s2, determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result;
s3, under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result;
s4, determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method for acquiring dialogue information, comprising:
carrying out semantic analysis on the received first dialogue information, and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information;
determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result;
under the condition that the first multimedia information exists in the historical dialogue information, carrying out multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result;
and determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.
2. The method according to claim 1, wherein determining historical dialogue information within a target range corresponding to the first dialogue information based on the semantic analysis result, comprises:
determining historical dialogue information in a first range corresponding to the first dialogue information when the first keyword and/or the second keyword exist in the first dialogue information, wherein the target range comprises: the first range;
Determining historical conversation information in a second range corresponding to the first conversation information when the first keyword is present in the first conversation information and the second keyword is not present, wherein the target range comprises: the second range, the first range being greater than the second range.
3. The method for obtaining dialogue information according to claim 1, wherein the step of performing multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result includes:
determining whether a keyword for indicating a first object exists in the first dialogue information, wherein the first object at least comprises one of the following: text, objects, users;
and under the condition that the first dialogue information contains the keyword for indicating the first object, carrying out multimedia information analysis on the first multimedia information according to the analysis mode corresponding to the first object to obtain the multimedia information analysis result.
4. The method for obtaining dialogue information according to claim 3, wherein the multimedia information analysis is performed on the first multimedia information according to the analysis mode corresponding to the first object, so as to obtain the multimedia information analysis result, and the method at least comprises one of the following steps:
And under the condition that the first object is a word, analyzing the first multimedia information in a word recognition mode, and determining the word information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the text information;
and under the condition that the first object is an object, analyzing the first multimedia information in an object identification mode, and determining object information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the object information;
and under the condition that the first object is a user, analyzing the first multimedia information in a human body identification mode, and determining the user information in the first multimedia information, wherein the multimedia information analysis result comprises the following steps: the user information.
5. The method for obtaining dialogue information according to any one of claims 1 to 4, wherein the step of performing multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result includes:
performing multimedia information analysis on the first multimedia information by an analysis mode, and determining a second object in the first multimedia information and object information of the second object, wherein the analysis mode at least comprises one of the following steps: the second object at least comprises one of the following components: text, objects and users, wherein the multimedia information analysis result comprises: the object information.
6. The method for obtaining dialogue information according to any one of claims 1 to 4, wherein determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information includes:
determining nouns corresponding to the first keywords and/or the second keywords in the first dialogue information according to the multimedia information analysis result;
and replacing the first keyword and/or the second keyword in the first dialogue information with the noun, and acquiring the replaced first dialogue information as the second dialogue information.
7. The method for obtaining dialogue information according to any one of claims 1 to 4, wherein before performing multimedia information analysis on the first multimedia information to obtain a multimedia information analysis result, the method further comprises:
determining whether the first multimedia information exists in the historical dialogue information;
acquiring third dialogue information input by a second object under the condition that the first multimedia information does not exist in the historical dialogue information, wherein the input time of the third dialogue information is later than that of the first dialogue information;
And sending prompt information for indicating that the first dialogue information is incomplete to the second object under the condition that the first multimedia information does not exist in the third dialogue information.
8. A dialogue information acquisition apparatus, comprising:
the first analysis module is used for carrying out semantic analysis on the received first dialogue information and determining a semantic analysis result corresponding to the first dialogue information, wherein the semantic analysis result is used for indicating whether a first keyword with a target part of speech and/or a second keyword representing first multimedia information exist in the first dialogue information;
the first determining module is used for determining historical dialogue information in a target range corresponding to the first dialogue information according to the semantic analysis result;
the second analysis module is used for carrying out multimedia information analysis on the first multimedia information under the condition that the first multimedia information exists in the historical dialogue information to obtain a multimedia information analysis result;
and the second determining module is used for determining second dialogue information corresponding to the first dialogue information according to the multimedia information analysis result and the first dialogue information.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 7 by means of the computer program.
CN202211676142.0A 2022-12-26 2022-12-26 Dialogue information acquisition method and device, storage medium and electronic device Pending CN116795957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211676142.0A CN116795957A (en) 2022-12-26 2022-12-26 Dialogue information acquisition method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211676142.0A CN116795957A (en) 2022-12-26 2022-12-26 Dialogue information acquisition method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN116795957A true CN116795957A (en) 2023-09-22

Family

ID=88048675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211676142.0A Pending CN116795957A (en) 2022-12-26 2022-12-26 Dialogue information acquisition method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN116795957A (en)

Similar Documents

Publication Publication Date Title
US11948556B2 (en) Detection and/or enrollment of hot commands to trigger responsive action by automated assistant
KR102437944B1 (en) Voice wake-up method and device
CN104142964B (en) The method and device of information matches
EP3631793B1 (en) Dynamic and/or context-specific hot words to invoke automated assistant
CN107909998B (en) Voice instruction processing method and device, computer equipment and storage medium
CN109429522A (en) Voice interactive method, apparatus and system
CN105551488A (en) Voice control method and system
EP3633947A1 (en) Electronic device and control method therefor
CN109377995B (en) Method and device for controlling equipment
CN108766431B (en) Automatic awakening method based on voice recognition and electronic equipment
CN110717337A (en) Information processing method, device, computing equipment and storage medium
CN108595406B (en) User state reminding method and device, electronic equipment and storage medium
CN111832308A (en) Method and device for processing consistency of voice recognition text
CN106294321B (en) A kind of the dialogue method for digging and device of specific area
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
CN116994565A (en) Intelligent voice assistant and voice control method thereof
CN117253478A (en) Voice interaction method and related device
CN116795957A (en) Dialogue information acquisition method and device, storage medium and electronic device
US20210166685A1 (en) Speech processing apparatus and speech processing method
CN112331203A (en) Intelligent household equipment control method and device, electronic equipment and storage medium
CN114363664A (en) Method and device for generating video collection title
CN110535749A (en) Talk with method for pushing, device, electronic equipment and storage medium
CN109524002A (en) Intelligent voice recognition method and device
CN108710707A (en) Show method, apparatus, terminal device and the system of disadvantaged group's user information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination