CN109582763B - Answering system and method in moving picture expert group media Internet of things environment - Google Patents

Answering system and method in moving picture expert group media Internet of things environment Download PDF

Info

Publication number
CN109582763B
CN109582763B CN201811129983.3A CN201811129983A CN109582763B CN 109582763 B CN109582763 B CN 109582763B CN 201811129983 A CN201811129983 A CN 201811129983A CN 109582763 B CN109582763 B CN 109582763B
Authority
CN
China
Prior art keywords
information
speech
question
iomt
mpeg
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811129983.3A
Other languages
Chinese (zh)
Other versions
CN109582763A (en
Inventor
崔美兰
金珉湖
金铉基
柳志熙
裵倞万
裵容秦
李炯直
林秀钟
林俊浩
蒋明吉
许桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electronics and Telecommunications Research Institute ETRI
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020180097020A external-priority patent/KR102479026B1/en
Application filed by Electronics and Telecommunications Research Institute ETRI filed Critical Electronics and Telecommunications Research Institute ETRI
Publication of CN109582763A publication Critical patent/CN109582763A/en
Application granted granted Critical
Publication of CN109582763B publication Critical patent/CN109582763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a answering system and a method in a moving picture expert group media Internet of things environment. The answering system in the moving picture expert group MPEG media internet of things (IoMT) environment comprises: the internet of things (IoT) terminal is used for inputting and transmitting speaking information, receiving and providing answering result information; and an utterance analysis server for performing utterance analysis on the utterance information provided from the IoT terminal according to the MPEG IoMT data format, and providing answer result information to the IoT terminal after performing an answer using the utterance-analyzed information and the answer server.

Description

Answering system and method in moving picture expert group media Internet of things environment
Technical Field
The invention provides a system and a method for realizing answering in an MPEG (moving picture experts group) IoMT (object oriented field) environment, relating to equipment operation and information transmission meeting various requirements of users and a device and a method for accurately detecting expected answers of questioners about answering.
Background
The conventional answering technology only depends on question articles directly input by a questioner to find answers, so that the requirements of various users are difficult to solve.
Recently, as IoT (Internet of Things ) devices including wearable apparatuses are marketed in large numbers, only answering systems that address simple questions are limited.
To solve such inconvenience, it is necessary to analyze the words of the questioner in advance in the device to grasp the intention of the questioner.
In this regard, in MPEG, to implement multimedia technology in IoT environments, standards are made in the MPEG (Moving Picture Experts Group ) IoMT (Internet of Media Things, media internet of things) group and attempt to include an answer user interface therein.
To this end, techniques are being studied that analyze the utterance content of a user, and are capable of performing processing based on the utterance content in an appropriate IoT device.
Disclosure of Invention
The invention provides an answer system and method in an MPEG IoMT environment, aiming at questions and instruction related words of various forms input by various devices in an IoT environment, and capable of realizing query processing.
The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be understood by those skilled in the art from the following description.
An answering system in an MPEG IoMT environment according to an embodiment of the present invention for achieving the above object includes: an IoT terminal that is input and transmits utterance information, receives and provides answer result information; and an utterance analysis server for performing utterance analysis on the utterance information transmitted from the IoT terminal according to the MPEG IoMT data format, and providing answer result information to the IoT terminal after performing an answer using the utterance-analyzed information and the answer server.
Preferably, the data format of the MPEG IoMT contains information about the type of the user question and information about in which language the user's question is presented.
Further, the information related to the type of the question of the user preferably includes information indicating a subject of the question, information indicating a focus of the question, and information indicating a meaning or purpose of the question.
The focus information of the question is classified into a classification system of "when, where, what, why, and how", and the information of the meaning or purpose of the question is classified into a classification system of instruction request, vocabulary request, meaning request, information request, and method request.
On the other hand, the data format of the MPEG IoMT includes question field information expressed in a string (string).
Moreover, an IoT terminal in accordance with an embodiment of the present invention comprises: an input unit for inputting speech information provided by a user; a communication unit configured to transmit the input speech information to the speech analysis server and receive answer result information from the speech analysis server; and an output unit configured to output the answer result information received from the speech analysis server.
The input unit includes a microphone to which speech information of the user is input.
The input unit includes a Query window (Query Interface) providing unit that outputs a user Interface for inputting text-form speech information on a screen.
The input unit includes a camera for acquiring speech information in an image form.
On the other hand, the output unit further includes a screen output unit for outputting answer result information to a screen.
The output unit further includes a voice output unit for outputting the answer result information by voice.
In another aspect, the speech analysis server includes: a communication unit configured to perform data communication with the IoT terminal and the question server; a voice recognition unit configured to recognize a voice of the speech information provided from the IoT terminal; an utterance analysis unit that performs utterance analysis on the utterance information subjected to speech recognition according to an MPEG IoMT data format; and an answering calling part for sending an inquiry to the answering server by using the information after the speech analysis by the data format of the MPEG IoMT.
The speech analysis server may further include a speech synthesis unit for converting the answer result information in the form of text into speech.
Also, an embodiment of the present invention further includes an utterance information judging section that judges whether the analyzed utterance information is information for an inquiry request or information for a device control instruction, and if the analyzed utterance information is a device control instruction, transmits the utterance information to the IoT terminal that transmitted the utterance information so as to execute the corresponding device control instruction.
On the other hand, the speech recognition unit performs a language processing process of morphological analysis, object name analysis, and syntactic analysis on the speech information.
The question answering server performs query analysis using the data format of the MPEG IoMT of the information received from the speech analysis server, and provides question answering result information, which is a result of the query analysis, to the speech analysis server.
In the case where there are a plurality of answer results, such an answer server transmits catalog information set according to answer possibility information for the answer results to the utterance analysis server.
The answering method in the MPEG IoMT environment of the embodiment of the invention comprises the following steps: a step in which the utterance analysis server performs utterance analysis on utterance information transmitted from the IoT terminal according to an MPEG IoMT data format; the step of executing answering between the information after the speech analysis and the answering server by the speech analysis server; and the step of providing answer result information to the IoT terminal by the speech analysis server.
Wherein, the data format of the MPEG IoMT includes: information about the type of user question; and information about in which language the user's question is presented.
The information about the type of the question by the user includes information indicating the subject of the question, information indicating the focus of the question, and information indicating the meaning or purpose of the question.
Thus, according to an embodiment of the present invention, user utterances provided from IoT terminals are analyzed, and an utterance analysis is performed thereon according to the MPEG IoMT data format to provide questions, thereby having the effect that a question answering service using the user's utterances can be provided in the MPEG IoMT environment as well.
Drawings
Fig. 1 is a block diagram illustrating structural blocks of an answering system in an MPEG IoMT environment according to an embodiment of the present invention.
Fig. 2 is a block diagram illustrating structural blocks of the IoT terminal shown in fig. 1 for illustrating the present invention.
Fig. 3 is a block diagram illustrating a block structure of the utterance analysis server 200 shown in fig. 1.
Fig. 4 is a reference diagram for explaining a voice recognition data format related to voice recognition applied to the voice processing section shown in fig. 1.
Fig. 5 is a reference diagram for explaining a speech recognition data format utilized in the speech analysis server shown in fig. 1.
Fig. 6 is a reference diagram for explaining an IoMT query analysis packet format utilized in the utterance analysis server shown in fig. 1.
Fig. 7 is a reference diagram for explaining a first example of speech analysis in the speech analysis server shown in fig. 1.
Fig. 8 is a reference diagram for explaining a second example of the utterance analysis in the utterance analysis server shown in fig. 1.
Fig. 9 is a reference diagram for explaining a "Qfocus classification hierarchy" when performing utterance analysis in the utterance analysis server shown in fig. 1.
Fig. 10 is a reference diagram for explaining a "qcemanntics cs classification hierarchy" when performing utterance analysis in the utterance analysis server shown in fig. 1.
Fig. 11 is a reference diagram for explaining a speech synthesis data format utilized in the speech analysis server shown in fig. 1.
Fig. 12 is a block diagram showing a structural block for using a token in the utterance analysis server shown in fig. 1.
Fig. 13 is a flowchart for explaining an answering method in the MPEG IoMT environment according to an embodiment of the present invention.
Description of the reference numerals
100: ioT terminal 200: speech analysis server
210: the first communication unit 220: speech processing unit
230: the speech analysis unit 240: speech information judging unit
250: answering calling unit 260: second communication part
300: answering server
Detailed Description
The advantages and features of the present invention, and the manner in which the same are accomplished, will become more apparent with reference to the embodiments that will be described in detail below with reference to the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be embodied in various different forms, which complete the disclosure of the present invention and are provided for fully appreciating the scope of the present invention to those skilled in the art, and the present invention is defined by the claims below. On the other hand, the terminology used in the present specification is for the purpose of describing embodiments only and is not intended to be limiting of the invention. In this specification, the singular forms also include the plural forms unless specifically mentioned in the description. As used in this specification, "comprises" and/or "comprising" means that a reference to an element, step, action, and/or element does not exclude the presence or addition of more than one other element, step, action, and/or element.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Fig. 1 is a block diagram illustrating the structural blocks of an answering system in an MPEG (Moving Picture Experts Group, hereinafter referred to as "MPEG") IoMT (Internet of Media Things, hereinafter referred to as "IoMT") environment according to one embodiment of the present invention.
As shown in fig. 1, the answering system in the MPEG IoMT environment according to an embodiment of the present invention includes an IoT (Internet of Things (internet of things), hereinafter referred to as "IoT") terminal 100, an utterance analysis server 200, and an answering server 300.
The IoT terminal 100 is input with utterance information provided by a user and delivered to an utterance analysis server, and provides answer result information received from the utterance analysis server to the user.
The IoT terminal 100 is all devices used in the IoT environment, including wearable devices, and may include various sensors, controls.
On the other hand, when providing utterance information including inquiry information of a user to the utterance analysis server 200, the IoT terminal 100 can provide information of a device together with Sensing (Sensing) information.
Fig. 2 is a block diagram illustrating structural blocks of the IoT terminal shown in fig. 1 for illustrating the present invention.
As shown in fig. 2, ioT terminal 100 includes an input 110, a communication 120, and an output 130.
The input unit 110 inputs speech information of a user. In the present embodiment, the input section 110 is preferably a microphone that receives input of speech information. However, the input 110 may also include one or more of the following: an inquiry window for outputting a user interface for receiving input of speech information in a text form on a screen; and the camera is used for acquiring the speaking information of the image form.
Further, the communication section 120 transmits the input utterance information to the utterance analysis server 200, and receives answer result information from the utterance analysis server 200. The information exchanged by the communication unit 120 may include data such as voice, text, and image, a device control instruction in the result of speech analysis, a question article of a user question, and an answer candidate list as answer result information.
The output unit 130 outputs the answer result information supplied from the speech analysis server 200. In the present embodiment, the output section 130 may include at least one or more of the following: a screen output unit 130 for outputting answer result information on a screen through a user interface; and a voice output unit 130 for outputting the answer result information in a voice manner.
The speech analysis server 200 performs speech analysis on the speech information provided from the IoT terminal 100 according to the MPEG IoMT data format, and provides the question answering result information to the IoT terminal 100 after answering the question using the speech analyzed information and the question answering server 300.
Fig. 3 is a block diagram illustrating a block structure of the utterance analysis server 200 shown in fig. 1. As shown in fig. 3, the speech analysis server 200 includes a first communication unit 210, a speech processing unit 220, a speech analysis unit 230, a speech information determination unit 240, an answer call unit 250, and a second communication unit 270.
The first communication unit 210 communicates with the IoT terminal 100.
Also, the voice processing part 220 is for recognizing a voice of the utterance information provided from the IoT terminal 100.
Fig. 4 is a reference diagram for explaining a voice recognition data format related to voice recognition applied to the voice processing section shown in fig. 1.
For this, as shown in fig. 4 and table 1 below, the voice processing section 220 uses a voice recognition data format formed of a "specrecgnationtype" field providing a description digest related to voice recognition and a "specchtext" field describing a result text of the voice recognition. At this time, the utterance information may undergo general language processing such as morphological analysis, object name analysis, syntactic analysis, and the like.
TABLE 1
Fig. 5 is a reference diagram for explaining the result of the utterance analysis shown in fig. 1.
For example, in the case where the analyzed data is a voice recognition result and is "Please turn to Channel" of text output according to the voice of the user (please switch to channel 7), as shown in fig. 5, the voice processing section 220 may know that "xai:type" is included in the "specrecognationtype" field and "Please turn to Channel" is included in the "spechtext" field.
As shown in fig. 6 and table 2, the speech analysis unit 230 performs speech analysis on speech information recognized by speech, based on the MPEG IoMT data format.
The speech analysis and the query analysis are "query analysis type", which is a form of a basic type of data analysis used in MPEG IoMT as an extension, and is composed of 2 elements (elements).
One as an analyzed "anlyzedQuest" becomes "userQuest type", and the other as a linguistic element analyzes what "language" the user's question is presented and informed. That is, the two elements represent information related to the question being analyzed.
"userQuest type" (user question type) consists of 3 elements and 1 property (attribute).
The first element represents "qtypic" (question topic) and is presented in the form of a string, and the second element represents "Qfocus" (focus of question).
As shown in fig. 9 and table 3, the "Qfocus" (focus of question) is classified and presented in advance by the CQfocus classification system.
The third element is the meaning or purpose of the question, and at the end of the table, as a feature of the question, there is "qdomain" (question field) which can present the field of the question in a column (string). That is, when a question of a user is analyzed, the analysis result is presented in a form of being divided into a subject, a focus, a meaning, and a field of the question, and such a presentation format is transferred to an appropriate module of a server or a terminal so as to perform a desired action.
TABLE 2
In the CQfocus classification system, as shown in table 3 below, it appears that the question of the user corresponds to one of the questions in 5W 1H. Moreover, questions, i.e. "when, where, what, who, why, how" may be presented in binary.
TABLE 3
Binary representation Terminology ID of Qfocus CS
0000 What_question
0001 Where_question
0010 When_question
0011 Who_question
0100 Why_question
0101 How_question
0110~1111 Reservation of
Further, as shown in table 4 below, in the "qcsermatics cs" classification system, questions such as "instruction request, vocabulary request, meaning request, information request, method request" can be presented by binary.
TABLE 4
For example, as shown in fig. 7, if and "Who is the author of King Lear? "the query analysis result related to such a user query is" analyzedquery "and the language" en-us ", it is known from the query analysis result that the domain of the query is" Liternature ", the query is" King Lear ", the query focus is" white ", and the purpose of the query is" request_for_query ".
That is, "Who is the author of King Lear" as the first question, first, the language was analyzed to be english, the subject of the question was analyzed to be li-wang (King Lear), the focus was "who" and the meaning and purpose of the question were "information request", and it was seen that the analysis result was properly included in the format.
Looking at the second example, as shown in fig. 8, if and "How do you make Kimchi? "the query analysis result related to such a user query is" analyzedquery "and the language" en-us ", it is known from the query analysis result that the domain of the query is" cookie ", the subject of the query is" Kimchi ", the query focus is" How ", and the purpose of the query is" request_for_method ".
That is, the second question example is "How do you make Kimchi? The question is also analyzed as English, the field of the question is "cooking", the subject of the question is "pickle", the focus of the question is "how", and the purpose of the question is "information request" and is contained in a format and shared among modules.
Further, the utterance information determining section 240 determines whether the utterance information being analyzed is information for an inquiry request or information for a device control instruction. If the analyzed utterance information is a device control instruction, the utterance information determining part 240 delivers the utterance information to the corresponding IoT terminal 100 so as to execute the corresponding device control instruction.
When the analyzed speech information is query information, the answering calling unit 250 transmits the information subjected to speech analysis in the data format of the MPEG IoMT to the answering server 300 by using the second communication unit 260, and makes a query. The second communication unit 260 communicates with the answering server 300.
On the other hand, the speech synthesis unit 270 transmits the answer result information to the IoT terminal 100 via the answer server 300. At this time, when the answer result information received by the answer server 300 is text, as shown in fig. 11, the answer execution result in the text form may be converted into voice by using a voice synthesis data format, and transmitted to the IoT terminal 100 through the first communication unit 210.
As shown in table 5, the speech synthesis data format is formed by the following fields: a spechsynthesis type field providing an abstract description about speech synthesis that can be performed in the speech synthesis section, a TextInput field describing text input that needs to be synthesized in the speech synthesis process, an outputspecfeature field representing speech output characteristics such as gender, tone, speech speed, etc., reflected in the speech output when outputting speech, and a Language field representing the Language of the input speech.
TABLE 5
When the speech analysis server 200 requests a query using the data of the MPEG IoMT, the question answering server 300 analyzes the query using the query analysis information included in the data of the MPEG IoMT, and transmits the question answering result information of the analysis result to the speech analysis server 200.
Thus, according to an embodiment of the present invention, user utterances provided from IoT terminals are analyzed, and utterance analysis is performed according to the MPEG IoMT data format to provide answers, thereby having an answer service that can also provide in the MPEG IoMT environment using the user's utterances.
The utterance analysis server 200 of an embodiment of the present invention may further include a location information search part (not shown) that compares location information of the terminal transmitted from the IoT terminal 100 with location information (Point of Interest, hereinafter, referred to as "POI") of the terminal stored in the database to identify a location of the terminal user.
Fig. 12 is a block diagram showing a structural block for using a token in the utterance analysis server shown in fig. 1.
As shown in fig. 12, the speech analysis server 200 according to an embodiment of the present invention may further include a speech recognition API (Application Programming Interface ) processing unit 281, a speech synthesis API processing unit 282, and an inquiry analysis API processing unit 283, which are used in the MPEG IoMT.
As shown in table 6, the speech recognition API processing unit 281 uses an API packet format using an IoMT speech recognizer class (Calss) extended with a mamalyzer class.
TABLE 6
As shown in table 7, the speech synthesis API processing unit 282 uses an API packet format of a class using an IoMT speech synthesizer in which the mamalyzer class is extended.
TABLE 7
/>
As shown in table 8, the query analysis API processing unit 283 uses an API packet format of a class using an IoMT query analyzer in which the mamalyzer class is extended.
TABLE 8
/>
Thus, the utterance analysis server may provide a transaction service whenever a service of query analysis, voice recognition, voice synthesis, etc. is provided in the MPEG IoMT environment.
Next, referring to fig. 12, an answering method in the MPEG IoMT environment according to an embodiment of the present invention is described.
Fig. 13 is a method of answer processing in an MPEG IoMT environment, preferably performed by an utterance analysis server.
First, the utterance analysis server 200 receives input of utterance information transmitted from the IoT terminal 100 (S100).
Then, the utterance analysis server 200 performs utterance analysis on the input utterance information according to the MPEG IoMT data format (S200). The data format of the MPEG IoMT includes information about the question type of the user and information about the language in which the question of the user is expressed.
The information about the type of the user question includes information indicating the subject of the question, information indicating the focus of the question, and information indicating the meaning or purpose of the question.
The focus information of the question is classified into a classification system such as "when, where, what, why, and how", and the meaning and destination information of the question is classified into a classification system such as an instruction request, a vocabulary request, a meaning request, an information request, and a method request.
On the other hand, the data format of the MPEG IoMT may include question domain information expressed in a string.
At this time, the utterance analysis server 200 determines whether the analyzed utterance analysis result is an utterance analysis result regarding a query (S300).
In the above-mentioned judging step S300, if the utterance is a query (yes), the above-mentioned utterance analysis server 200 performs an answer with an answer server by using the information on which the utterance analysis was performed (S400).
After that, the utterance analysis server 200 provides the answer result information to the IoT terminal (S500).
On the other hand, in the above-described determination step S300, if the utterance is the utterance regarding the device control (no), the utterance analysis content is provided to the IoT terminal 100 (S600).
Thus, according to an embodiment of the present invention, a user utterance provided from an IoT terminal is analyzed, and utterance analysis is performed and an answer is provided thereto according to an MPEG IoMT data format, thereby having an effect that an answer service using the user's utterance can be provided also in an MPEG IoMT environment.
While the structure of the present invention has been described in detail with reference to the drawings, this is merely an example, and it is needless to say that various modifications and alterations can be made by those skilled in the art without departing from the technical spirit of the present invention. Accordingly, the scope of the present invention should not be limited to the above-described embodiments, but should be defined by the description of the claims.

Claims (18)

1. A system for answering questions in an moving picture experts group MPEG media internet of things, ioMT, environment, comprising:
the internet of things (IoT) terminal is used for inputting and transmitting speaking information, receiving and providing answering result information;
the speech analysis server is used for executing speech analysis on the speech information transmitted from the IoT terminal according to the MPEG IoMT data format, and providing answering result information to the IoT terminal after answering is executed by using the information after the speech analysis and the answering server; and
an answer server for performing an inquiry analysis using the data format of the MPEG IoMT of the information received from the speech analysis server and providing the speech analysis server with answer result information as a result of the inquiry analysis, and transmitting directory information set according to answer probability information for the answer result to the speech analysis server when a plurality of answer results exist,
wherein, the above-mentioned speech analysis server includes:
a communication unit configured to perform data communication with the IoT terminal and the question server;
a voice recognition unit configured to recognize a voice of the speech information provided from the IoT terminal;
an utterance analysis unit that performs utterance analysis on the utterance information subjected to speech recognition according to an MPEG IoMT data format;
an answering calling part which sends an inquiry to the answering server by using the information after the speech analysis through the data format of the MPEG IoMT; and
and a speech information judgment unit configured to judge whether the analyzed speech information is information for a query request or information for a device control instruction, and if the analyzed speech information is the device control instruction, to transmit the speech information to the IoT terminal that transmitted the speech information, so as to execute the corresponding device control instruction.
2. The system according to claim 1, wherein the data format of the MPEG IoMT includes information about a type of a question of a user and information about a language in which the question of the user is expressed.
3. The answering system in an MPEG IoMT environment according to claim 2, wherein the information about the type of the question of the user includes information indicating a subject of the question, information indicating a focus of the question, and information indicating a meaning or purpose of the question.
4. The answering system in an MPEG IoMT environment according to claim 2, wherein the focus information of the questions is classified into a classification hierarchy of "when, where, what, who, why, how.
5. The answering system in an MPEG IoMT environment according to claim 2, wherein the information of meaning or purpose of the question is classified into a classification system of instruction request, vocabulary request, meaning request, information request and method request.
6. The system according to claim 2, wherein the data format of the MPEG IoMT includes question field information expressed in a string.
7. The answering system in an MPEG IoMT environment according to claim 1, wherein the IoT terminal comprises:
an input unit for inputting speech information provided by a user;
a communication unit configured to transmit the input speech information to the speech analysis server and receive answer result information from the speech analysis server; and
and an output unit configured to output the answering result information received from the speech analysis server.
8. The answering system in an MPEG IoMT environment according to claim 7, wherein the input section includes an inquiry window providing section that outputs a user interface for inputting speech information in a text form on a screen.
9. The system according to claim 7, wherein the input unit includes a camera for acquiring speech information of an image form.
10. The system according to claim 7, wherein the input unit includes a microphone to which speech information of a user is input.
11. The system according to claim 7, wherein the output unit further comprises a screen output unit for outputting the answer result information to a screen.
12. The system according to claim 7, wherein the output unit further comprises a voice output unit for outputting the answer result information by voice.
13. The answering system in an MPEG IoMT environment according to claim 1, wherein the speech analysis server further comprises a speech synthesis section for converting answer result information in a text form into speech.
14. The answering system in an MPEG IoMT environment according to claim 1, wherein the speech recognition unit performs a language processing process of morphological analysis, object name analysis, and syntactic analysis on the speech information.
15. A method for answering questions in an moving picture experts group MPEG media internet of things, ioMT, environment, comprising:
the utterance analysis server performs utterance analysis on utterance information transmitted from the internet of things IoT terminal according to the MPEG IoMT data format;
the speech analysis server judging whether the analyzed speech information is information for a query request or information for a device control instruction;
if the utterance information is information for a query request, the utterance analysis server performs an answer between the utterance analyzed information and an answer server and the utterance analysis server provides answer result information to an IoT terminal; and
if the utterance information is information for a device control instruction, the utterance analysis server delivers the utterance information to the IoT terminal that transmitted the utterance information, so as to execute the corresponding device control instruction,
wherein the speech analysis server performing speech analysis on speech information transmitted from the internet of things IoT terminal according to the MPEG IoMT data format comprises:
recognizing a voice of utterance information provided from the IoT terminal; and
according to the MPEG IoMT data format, the above-mentioned utterance information subjected to speech recognition is subjected to utterance analysis,
and wherein the performing an answer between the speech analysis server and the answering server using the information after the speech analysis includes:
the speech analysis server sends a query to the answering server by using the information after speech analysis in the data format of the MPEG IoMT;
the question answering server performs query analysis using the data format of the MPEG IoMT of the information received from the speech analysis server, and provides question answering result information, which is a result of the query analysis, to the speech analysis server, and when there are a plurality of question answering results, transmits directory information set according to answer possibility information for the question answering results to the speech analysis server.
16. The method for answering questions in an MPEG IoMT environment of claim 15, wherein the data format of the MPEG IoMT comprises:
information about the type of user question; and
information about in which language the user's question is presented.
17. The method according to claim 16, wherein the information about the type of the question of the user includes information indicating a subject of the question, information indicating a focus of the question, and information indicating a meaning or purpose of the question.
18. The method for answering a question in an MPEG IoMT environment of claim 17, wherein,
the focus information of the question is classified into a classification hierarchy of "when, where, what, who, why, how",
the meaning or destination information of the question is classified into a classification system of instruction request, vocabulary request, meaning request, information request and method request.
CN201811129983.3A 2017-09-27 2018-09-27 Answering system and method in moving picture expert group media Internet of things environment Active CN109582763B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2017-0125232 2017-09-27
KR20170125232 2017-09-27
KR1020180097020A KR102479026B1 (en) 2017-09-27 2018-08-20 QUERY AND RESPONSE SYSTEM AND METHOD IN MPEG IoMT ENVIRONMENT
KR10-2018-0097020 2018-08-20

Publications (2)

Publication Number Publication Date
CN109582763A CN109582763A (en) 2019-04-05
CN109582763B true CN109582763B (en) 2023-08-22

Family

ID=65919920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811129983.3A Active CN109582763B (en) 2017-09-27 2018-09-27 Answering system and method in moving picture expert group media Internet of things environment

Country Status (1)

Country Link
CN (1) CN109582763B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187990A (en) * 2007-12-14 2008-05-28 华南理工大学 A session robotic system
KR20080095180A (en) * 2007-04-23 2008-10-28 한국전자통신연구원 Method and apparatus for retrieving multimedia contents
KR20130108173A (en) * 2012-03-22 2013-10-02 진삼순 Question answering system using speech recognition by radio wire communication and its application method thereof
CN104821109A (en) * 2015-05-26 2015-08-05 北京云江科技有限公司 Online question answering system based on image information and voice information
WO2016175354A1 (en) * 2015-04-29 2016-11-03 주식회사 아카인텔리전스 Artificial intelligence conversation device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080095180A (en) * 2007-04-23 2008-10-28 한국전자통신연구원 Method and apparatus for retrieving multimedia contents
CN101187990A (en) * 2007-12-14 2008-05-28 华南理工大学 A session robotic system
KR20130108173A (en) * 2012-03-22 2013-10-02 진삼순 Question answering system using speech recognition by radio wire communication and its application method thereof
WO2016175354A1 (en) * 2015-04-29 2016-11-03 주식회사 아카인텔리전스 Artificial intelligence conversation device and method
CN104821109A (en) * 2015-05-26 2015-08-05 北京云江科技有限公司 Online question answering system based on image information and voice information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数字校园混合模式下答疑系统的设计与实现;康金辉;《武汉理工大学学报(信息与管理工程版)》;20091215(第06期);全文 *

Also Published As

Publication number Publication date
CN109582763A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
KR20180025121A (en) Method and apparatus for inputting information
US20190221208A1 (en) Method, user interface, and device for audio-based emoji input
CN111651497B (en) User tag mining method and device, storage medium and electronic equipment
CN110288995B (en) Interaction method and device based on voice recognition, storage medium and electronic equipment
KR20010034113A (en) Intelligent human/computer interface system
CN110827803A (en) Method, device and equipment for constructing dialect pronunciation dictionary and readable storage medium
CN116595148B (en) Method and system for realizing dialogue flow by using large language model
US20140358543A1 (en) Linked-work assistance apparatus, method and program
KR20200104544A (en) Method of real time intent recognition
CN108710653B (en) On-demand method, device and system for reading book
US20190341059A1 (en) Automatically identifying speakers in real-time through media processing with dialog understanding supported by ai techniques
CN116150339A (en) Dialogue method, dialogue device, dialogue equipment and dialogue storage medium
WO2021066399A1 (en) Realistic artificial intelligence-based voice assistant system using relationship setting
CN109582763B (en) Answering system and method in moving picture expert group media Internet of things environment
JPH11203295A (en) Information providing device and its method
WO2020199590A1 (en) Mood detection analysis method and related device
CN116629236A (en) Backlog extraction method, device, equipment and storage medium
KR102479026B1 (en) QUERY AND RESPONSE SYSTEM AND METHOD IN MPEG IoMT ENVIRONMENT
US20050288933A1 (en) Information input method and apparatus
US20060015340A1 (en) Operating system and method
CN110717020B (en) Voice question-answering method, device, computer equipment and storage medium
CN114860910A (en) Intelligent dialogue method and system
CN114528851A (en) Reply statement determination method and device, electronic equipment and storage medium
JP4808763B2 (en) Audio information collecting apparatus, method and program thereof
CN114297380A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant