CN112541070A - Method and device for excavating slot position updating corpus, electronic equipment and storage medium - Google Patents

Method and device for excavating slot position updating corpus, electronic equipment and storage medium Download PDF

Info

Publication number
CN112541070A
CN112541070A CN202011559712.9A CN202011559712A CN112541070A CN 112541070 A CN112541070 A CN 112541070A CN 202011559712 A CN202011559712 A CN 202011559712A CN 112541070 A CN112541070 A CN 112541070A
Authority
CN
China
Prior art keywords
slot
corpus
template
updating
update
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011559712.9A
Other languages
Chinese (zh)
Other versions
CN112541070B (en
Inventor
于振龙
李和瀚
孙辉丰
孙叔琦
常月
李婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011559712.9A priority Critical patent/CN112541070B/en
Publication of CN112541070A publication Critical patent/CN112541070A/en
Application granted granted Critical
Publication of CN112541070B publication Critical patent/CN112541070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a method and a device for excavating a slot position updating corpus, electronic equipment and a storage medium, and relates to the field of computers, in particular to the technical field of artificial intelligence such as natural language processing and deep learning. The specific implementation scheme is as follows: acquiring all query sentences and natural language identification results corresponding to all the query sentences from the historical interactive logs, wherein the natural language identification results comprise word slot names to which word slot values in the query sentences belong; replacing the word slot value in each query sentence with the word slot name to which the query sentence belongs to generate a reference template set; splitting each reference template in the reference template set to obtain a slot updating template set; and traversing the preset corpus based on the slot updating template set so as to obtain the slot updating corpus matched with the slot updating template in the slot updating template set from the corpus. In the method, the whole excavation process of the slot position updating corpus is automatic, manual participation is not needed, and the labor cost is greatly reduced.

Description

Method and device for excavating slot position updating corpus, electronic equipment and storage medium
Technical Field
The application relates to the field of computers, in particular to the technical field of artificial intelligence such as natural language processing and deep learning, and particularly relates to a method and a device for mining a slot updating corpus, electronic equipment and a storage medium.
Background
In the man-machine conversation system, the quality and the quantity of the man-machine conversation training corpora determine the effect of man-machine interaction understanding to a great extent.
How to obtain the corpus from the interactive log with lower cost is an urgent problem to be solved.
Disclosure of Invention
The application provides a method and a device for mining a slot updating corpus, electronic equipment and a storage medium.
According to an aspect of the present application, there is provided a method for mining a bin update corpus, including:
acquiring all query sentences and natural language identification results corresponding to all the query sentences from a historical interactive log, wherein the natural language identification results comprise word slot names to which word slot values in the query sentences belong;
replacing the word slot value in each query statement with the word slot name to which the word slot value belongs to generate a reference template set;
splitting each reference template in the reference template set to obtain a slot position updating template set;
traversing a preset corpus based on the slot updating template set so as to obtain slot updating corpus matched with the slot updating template in the slot updating template set from the corpus.
According to another aspect of the present application, there is provided a mining apparatus for bin update corpuses, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring all query sentences and natural language identification results corresponding to all the query sentences from a historical interaction log, and the natural language identification results comprise word slot names to which word slot values in the query sentences belong;
the first generation module is used for replacing the word slot value in each query statement with the word slot name to which the query statement belongs so as to generate a reference template set;
the second obtaining module is used for splitting each reference template in the reference template set to obtain a slot position updating template set;
and the third acquisition module is used for traversing a preset corpus based on the slot updating template set so as to acquire slot updating corpuses matched with the slot updating templates in the slot updating template set from the corpus.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method for mining the slot update corpus according to the embodiment of the above aspect.
According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing thereon a computer program for causing a computer to execute the mining method of slot update corpus according to the embodiment of the above aspect.
According to another aspect of the present application, there is provided a computer program product, including a computer program, where the computer program is executed by a processor to implement the method for mining a slot update corpus according to an embodiment of the foregoing aspect.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic flowchart of a mining method for slot update corpus according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a reference template generation provided by an embodiment of the present application;
fig. 3 is a schematic diagram of an obtaining slot update template according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating another mining method for slot update corpuses according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another slot update corpus mining method according to an embodiment of the present application;
fig. 6 is a schematic diagram of obtaining slot update training data according to an embodiment of the present application;
fig. 7 is a schematic flowchart of another slot update corpus mining method according to an embodiment of the present application;
fig. 8 is a schematic flowchart of another slot update corpus mining method according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an excavating device for bin update corpuses according to an embodiment of the present application;
fig. 10 is a block diagram of an electronic device for implementing the mining method of the slot update corpus according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes a method, an apparatus, an electronic device, and a storage medium for mining a slot update corpus according to an embodiment of the present application with reference to the drawings.
Artificial intelligence is the subject of studying some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) of a human being simulated by using a computer class, and the technology at the hardware level and the software level exist. The artificial intelligence hardware technology generally comprises a sensor, a special artificial intelligence chip, cloud computing, distributed storage, deep learning, a big data processing technology, a knowledge graph technology and the like.
Natural language processing is an important direction in the fields of computer science and artificial intelligence, and the contents of natural language processing research include, but are not limited to, the following branch fields: text classification, information extraction, automatic summarization, intelligent question answering, topic recommendation, machine translation, subject word recognition, knowledge base construction, deep text representation, named entity recognition, text generation, text analysis (lexical, syntactic, grammatical, etc.), speech recognition and synthesis, and the like.
Deep learning is a new research direction in the field of machine learning. Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds.
Fig. 1 is a schematic flowchart of a mining method for slot update corpuses according to an embodiment of the present application.
The mining method for the slot update corpus in the embodiment of the present application can be executed by the apparatus provided in the embodiment of the present application, and the apparatus can be configured in an electronic device.
As shown in fig. 1, the mining method of the slot update corpus includes:
step 101, obtaining each query statement and a natural language identification result corresponding to each query statement from a history interactive log, wherein the natural language identification result comprises a word slot name to which a word slot value in the query statement belongs.
In the history log of human-computer interaction, each log records query sentences of users, entity recognition results of the query sentences by the dialogue system, recognition results of the query sentences by the dialogue system through the natural language understanding module, and the like, and the recognition results include word slot names, word slot values, and the like. Therefore, the query sentences and the natural language identification result corresponding to each query sentence can be obtained from the historical interaction log, wherein the natural language identification result can comprise the word slot names to which the word slot values in the query sentences belong.
Where a word slot is typically an intended query condition, such as time and place in a weather intention, date and end in a booking intention, etc., the word slot may be used as a condition to manage dialog logic.
For example, the query statement "how today's weather is a place", there are two word slots, the word slot names of which are place and time, respectively, and the word slot value of the word slot name [ place ] is a place and the word slot value of the word slot name [ time ] is today.
And 102, replacing the word slot value in each query statement with the word slot name to which the word slot value belongs to generate a reference template set.
In this embodiment, for each query statement, each word slot value in the query statement is replaced with a word slot name to which each word slot value belongs, so as to obtain a reference template corresponding to the query statement. Then, a set of reference templates is generated from the plurality of reference templates. Wherein the reference template includes an intent of the query statement.
Fig. 2 is a schematic diagram of generating a reference template according to an embodiment of the present application. In fig. 2, the query sentence is "how hard seats are from beijing to shanghai", the slot value of the slot name [ origin ] in the query sentence is beijing, the slot value of the slot name [ destination ] is shanghai, and the slot value of the slot name [ type ] is hard seats. Each word slot value in the query sentence "how hard seats from beijing to shanghai" is replaced with the word slot name to which it belongs, generating a reference template "how [ type ] from [ origin ] to [ destination ]: and checking a train ticket'. And the 'checking the train ticket' is an intention corresponding to the query statement corresponding to the reference template.
And 103, splitting each reference template in the reference template set to obtain the slot updating template set.
In practical application, the query sentence input by the user may only express the word slot information, and the dialog state only updates the word slot because the user intention is not explicitly expressed. In this embodiment, each reference template in the reference template set may be split, and the slot updating templates obtained by splitting all the reference templates form the slot updating template set.
During splitting, word segmentation processing can be performed on the part, corresponding to the query sentence, in the reference template to obtain a plurality of words, and a plurality of slot updating templates are obtained based on the words. The obtained slot updating template may include one or more word slot names.
It should be noted that, in order to obtain the slot update corpus, the slot update template does not form a complete sentence.
Fig. 3 is a schematic diagram of an acquire slot update template according to an embodiment of the present application. In fig. 3 there is how to reference the template "type from [ origin ] to [ destination ]: checking the train ticket' for splitting to obtain a plurality of slot position updating templates: [ origin ], [ origin ] to [ origin ], to [ destination ], to [ type ], [ type ] of [ destination ], and [ type ] of [ origin ].
And 104, traversing the preset corpus based on the slot updating template set to obtain the slot updating corpus matched with the slot updating template in the slot updating template set from the corpus.
In this embodiment, after the slot update template set is obtained, the slot update template in the slot update template set may be used to obtain the slot update corpus from the preset corpus set.
Specifically, the corpuses in the corpus set may be respectively matched with the slot updating templates in the slot updating template set, and if the matching degree between a certain corpus and any slot updating template is greater than a preset threshold, the corpus is the slot updating corpus.
In the embodiment of the application, the query sentences and the natural language identification result corresponding to each query sentence are obtained from the historical interactive log, wherein the natural language identification result comprises the word-slot names to which the word-slot values in the query sentences belong, the word-slot values in each query sentence are replaced by the word-slot names to which the word-slot values belong, so as to generate the reference template set, each reference template in the reference template set is split so as to obtain the slot updating template set, the preset corpus is traversed based on the slot updating template set, and the slot updating corpus matched with the slot updating templates in the slot updating template set is obtained from the corpus set. Therefore, the word slot value in the query sentence is replaced by the word slot name to which the word slot value belongs to obtain the reference template, the reference template is split to obtain the slot updating template set, the slot updating template set is utilized, and the slot updating corpus is obtained from the corpus set, so that the automation of the whole excavation process of the slot updating corpus is realized, manual participation is not needed, and the labor cost is greatly reduced.
The whole excavation process of the slot position updating corpus is free from manual participation, so that the method can be applied to large-scale real conversation interaction logs. Meanwhile, with the continuous increase of the scale of the real log, the slot updating templates accumulated by the method are more diversified, the capability of coping with the long tail data is stronger, and the effect is better.
In order to improve the accuracy of mining the slot updating corpus, in an embodiment of the application, the natural language recognition result may further include an intention corresponding to each query statement, when the slot updating template set is obtained, the reference template is split to obtain that the slot updating template only includes one word slot name, and the slot updating template is filtered based on the intention. Fig. 4 is a schematic flow chart of another mining method for slot update corpuses according to an embodiment of the present application, which is described below with reference to fig. 4.
As shown in fig. 4, the splitting each reference template in the reference template set to obtain the slot update template set includes:
step 401, splitting each reference template in the reference template set to obtain a plurality of slot update templates including only one word slot name.
In this embodiment, when each reference template is split, the part of the reference template corresponding to the query statement may be subjected to word segmentation processing to obtain a plurality of words. When a plurality of slot position updating templates are obtained based on word segmentation, each slot position updating template only contains one word slot name. For example, the slot update templates in fig. 3 all contain a word slot name.
And step 402, removing the slot position updating template only corresponding to the user intention to obtain a slot position updating template set.
To obtain the slot update corpus, then the slot update template that may be used to obtain the slot update corpus does not correspond only to user intent. Based on the method, the intention of each slot position updating template can be judged, the slot position updating templates only corresponding to the user intention are removed, and the residual slot position updating templates do not correspond to the user intention. That is to say, the slot updating template obtained by splitting is filtered, so that the slot updating template set does not contain the slot updating template only corresponding to the user intention.
Taking fig. 3 as an example, in fig. 3, the slot update template is filtered as to how "type" corresponds to user intent only.
In this embodiment, the slot update template corresponding to only the user intention is removed, so that the slot update template in the slot update template set does not include the slot update template corresponding to only the user intention, and the accuracy of the slot update corpus obtained from the corpus set based on the slot update template set is higher.
In the embodiment of the application, the natural language recognition result further includes an intention corresponding to each query statement, and when each reference template in the reference template set is split to obtain the slot updating template set, each reference template in the reference template set can be split to obtain a plurality of slot updating templates only including a word slot name, and the slot updating templates only corresponding to the user intention are removed to obtain the slot updating template set. Therefore, the plurality of slot updating templates only containing one word slot name are obtained, and the slot updating templates only corresponding to the intentions of the user are filtered, so that the accuracy of the slot updating templates is ensured, and the accuracy of the slot updating corpus obtained based on the slot updating templates is improved.
In order to further improve the accuracy of mining the slot update corpus, in an embodiment of the present application, when the slot update template is removed and the slot update template set is obtained, the slot update template may be used as a reference template in the reference template set as a removal condition.
Specifically, the intent corresponding to the query statement may include at least one of a user intent and a slot update intent. For example, the intention corresponding to the query statement shown in fig. 2 includes a user intention "search for train tickets" and also includes a slot update intention. As another example, the query statement "Shanghai Bar" corresponds to an intent that is only a slot update intent.
The slot position updating intention means that the query statement does not express an explicit user intention and only updates the word slot. That is, the slot update intention refers to an intention to update the word slot.
When the slot updating template only corresponding to the user intention is removed, each slot updating template can be matched with the reference template in the reference template set, if any slot updating template is one reference template in the reference template set and the intention corresponding to the reference template is only the user intention, the slot updating template is removed, and the slot updating template in the obtained slot updating template set corresponds to the slot updating intention.
For example, in fig. 3, how "the slot update template" [ type ] exists in the reference template set and the query finds that the corresponding intent is a seat query, then how "the slot update template" [ type ] is removed.
In the embodiment of the application, the intention corresponding to each query statement includes at least one of a user intention and a slot updating intention, and when a slot updating template corresponding to only the user intention is removed, any slot updating template can be removed under the condition that any slot updating template is one reference template in a reference template set and the intention corresponding to the reference template is only the user intention. Therefore, the trench updating template is further taken as one reference template in the reference template set as a removing condition, and when the trench updating template is taken as the reference template in the reference template set and the intention corresponding to the reference template is only the intention of the user, the trench updating template is removed, so that the trench updating template in the obtained trench updating template set corresponds to the trench updating intention, the accuracy of the trench updating template is further improved, and the accuracy of excavating the trench updating corpus is further improved.
In an embodiment of the present application, when the slot update corpus is obtained from the corpus set by using the slot update template set, a candidate template may be generated according to each corpus, and the slot update corpus may be obtained by using the candidate template and the slot update template set. Fig. 5 is a schematic flow chart of another slot update corpus mining method according to an embodiment of the present application.
As shown in fig. 5, the mining method of the slot update corpus includes:
step 501, obtaining all query sentences and natural language identification results corresponding to all the query sentences from the historical interaction log, wherein the natural language identification results comprise the word slot names to which the word slot values in the query sentences belong.
Step 502, the word slot value in each query statement is replaced by the word slot name to which it belongs to generate a reference template set.
Step 503, splitting each reference template in the reference template set to obtain the slot updating template set.
In this embodiment, steps 501 to 503 are similar to steps 101 to 103, and therefore are not described herein again.
Step 504, each corpus in the corpus is analyzed to determine entities contained in each corpus and candidate word slot names corresponding to the entities.
In this embodiment, a natural corpus understanding may be performed on each corpus in the corpus to determine an entity included in each corpus and a candidate word slot name corresponding to the entity. If the corpus in the corpus set is an interactive log, because the interactive log records the recognition result of the corpus, the entity contained in the corpus can be obtained from the interactive log, and the candidate word slot name corresponding to the entity is determined according to the contained entity.
Fig. 6 is a schematic diagram of obtaining slot update training data according to an embodiment of the present application. In fig. 6, the corpus is "to beijing", and the entity identification result is: the entity is "Beijing", and the candidate word slot names corresponding to the entity "Beijing" are "departure place" and "destination".
And 505, respectively replacing the entities in each corpus with corresponding candidate word slot names to generate candidate templates.
In this embodiment, for each corpus, each candidate word slot name may be used to replace an entity in the corpus, so as to generate a candidate template. If the entity in the corpus corresponds to multiple candidate word slot names, multiple candidate templates corresponding to the corpus can be generated.
For example, in fig. 6, the corpus is "to beijing", the candidate word slot names corresponding to the entity "beijing" in the corpus are "departure place" and "destination", the entity "beijing" in the corpus is replaced by the "departure place" and the "destination", respectively, and two candidate templates "to [ departure place ]" and "to [ destination ]".
Step 506, under the condition that the slot position updating template set comprises the slot position updating template matched with any candidate template, determining the corpus corresponding to any candidate template as the slot position updating corpus.
For each corpus, in the candidate templates corresponding to the corpus, matching each candidate template with a slot updating template in a slot updating template set, if the slot updating template set contains a slot updating template matched with any candidate template, indicating that the slot updating template contains the candidate template, and determining the corpus corresponding to any candidate template as the slot updating corpus.
For example, in fig. 6, two candidate templates "to [ origin ]" and "to [ destination ]" are matched with the slot update template in the slot update template set, and if it is determined that there is a candidate template "to [ destination ]" in the slot update template set, the "to [ destination ]" corresponding corpus "to beijing" is used as the slot update corpus.
In this embodiment, the slot update corpus can be obtained from the corpus set by using the candidate templates of the corpus and the slot update template set.
In the embodiment of the application, when traversing a preset corpus based on a slot updating template set to obtain slot updating corpora matched with the slot updating template in the slot updating template set from the corpus, each corpus in the corpus can be analyzed to determine entities contained in each corpus and candidate word slot names corresponding to the entities, the entities in each corpus are respectively replaced by the corresponding candidate word slot names to generate candidate templates, and under the condition that the slot updating template set contains a slot updating template matched with any candidate template, the corpus corresponding to any candidate template is determined to be the slot updating corpus. Therefore, the entities in the corpus are replaced by the candidate word slot names to generate the candidate templates, and if any candidate template corresponding to the corpus is contained in the slot updating template set, the corpus is used as the slot updating corpus, so that the slot updating corpus is obtained from the corpus set by using the slot updating template set.
In an embodiment of the present application, after the slot update corpus is obtained, the slot update corpus may be labeled to obtain slot update training data, and based on the obtained slot update training data, the slot update intention recognition model may be obtained through trainable training. Fig. 7 is a schematic flow chart of another slot update corpus mining method according to an embodiment of the present application.
As shown in fig. 7, after determining that the corpus corresponding to any candidate template is a slot update corpus, the method may further include:
step 701, according to a slot updating template matched with any candidate template, marking the slot updating corpus to obtain slot updating training data.
In this embodiment, after determining that the corpus corresponding to any candidate template is the slot update corpus, the slot update corpus may be labeled according to an intention corresponding to the slot update template matched with any candidate template, a word slot name included in the slot update template, and the like, so as to obtain slot update training data. Specifically, the intention corresponding to the slot updating corpus may be labeled as a slot updating intention, and the word slot value in the slot updating corpus and the word slot name to which the word slot value belongs are labeled.
For example, in fig. 6, after determining that the candidate template "to [ destination ]" matches the slot update template "to [ destination ]", the corpus "to beijing" may be labeled according to the slot update template "to [ destination ]", so as to obtain slot update training data "to beijing (intention: slot update; word slot: destination ═ beijing)".
Step 702, the slot updating training data is utilized to train the initial intention recognition model so as to generate a slot updating intention recognition model.
In this embodiment, the corpus in the slot update training data may be input into the initial intention recognition model, parameters of the initial intention model are adjusted according to differences between the predicted intention, the predicted word slot name, the predicted word slot value, and the like output by the initial intention recognition model and the labeled intention, the word slot name, and the word slot value, and the model after parameter adjustment continues training until a slot update intention recognition model meeting the requirements is generated.
In order to improve the model effect, when the initial intention recognition model is trained, the model can be trained in a deep learning mode.
In the embodiment of the application, after determining that the corpus corresponding to any candidate template is the slot position update corpus, the slot position update template can be further marked according to the slot position update template matched with any candidate template to obtain slot position update training data, the slot position update training data is utilized, the initial intention recognition model is trained, and the slot position update intention recognition model is generated. Therefore, the trench updating corpus can be marked by using the trench updating template matched with the candidate template, so that the trench updating training data mining is realized, the whole mining process is automatic, manual participation is not needed, the labor cost is greatly reduced, model training can be carried out by using the marked trench updating corpus, and a trench updating intention recognition model is obtained.
In order to improve the efficiency of obtaining the slot update corpus, in an embodiment of the present application, when the slot update corpus matched with the slot update template in the slot update template set is obtained from the corpus based on the slot update template, the slot update corpus may be obtained from the corpus by constructing a template tree corresponding to the slot update template set and using the template tree. Fig. 8 is a schematic flow chart of another slot update corpus mining method according to an embodiment of the present application.
As shown in fig. 8, the traversing the preset corpus based on the slot update template set to obtain the slot update corpus matched with the slot update template in the slot update template set from the corpus includes:
step 801, determining associated non-word slot names among the slot position updating templates in the slot position updating template set.
In this embodiment, each slot update template may be matched to determine the associated non-word slot name between each slot update template.
For example, in fig. 3, the non-word slot name associated between the slot update template "[ origin ]" and "from [ origin ]" is "from"; the non-word slot name associated between "[ origin ]" and "[ origin ] to" is "to".
And 802, constructing a template tree corresponding to the slot updating template set according to the associated non-word slot names among the slot updating templates.
In this embodiment, a template tree corresponding to the slot update template set may be constructed according to the associated non-word slot names between the slot update templates. And the non-word slot names and the word slot names are nodes in the template tree.
And 803, traversing the preset corpus based on the template tree, and determining the slot position update corpus included in the corpus according to the matching relation between each corpus in the corpus and the nodes in the template tree.
In this embodiment, each corpus may be analyzed, an entity of each corpus and a candidate word slot name corresponding to the entity are determined, the entity in each corpus is replaced with the corresponding candidate word slot name, so as to generate a candidate template, and the obtained candidate template is represented by a regular expression. After the regular expression of the candidate template corresponding to the corpus is obtained, the regular expression can be fuzzy matched in the template tree, if the regular expression is matched, the information of the matched leaf node is taken out, a candidate template is obtained, and the corpus corresponding to the candidate template is the slot position updating corpus.
For example, in fig. 6, two candidates "to [ origin ]" and "to [ destination ]" generated from the corpus "to beijing" are collectively represented as [ to (origin) | (destination) ]. The regular expression is fuzzy matched in the template tree, the regular expression is matched with a certain leaf node, the information of the matched leaf node is taken out, and the template destination is obtained. Then the corpus "to Beijing" is the slot update corpus.
In the embodiment of the application, when traversing the preset corpus is performed based on the slot updating template set to obtain the slot updating corpus matched with the slot updating template in the slot updating template set from the corpus, the associated non-word slot name between each slot updating template in the slot updating template set can be determined, the template tree corresponding to the slot updating template set is constructed according to the associated non-word slot name between each slot updating template, and the preset corpus is traversed based on the template tree to determine the slot updating corpus included in the corpus according to the matching relationship between each corpus in the corpus and the node in the template tree. Therefore, the template tree corresponding to the slot updating template set is constructed, the corpus set is traversed by the template tree, the slot updating corpus is obtained from the corpus set, and the efficiency of obtaining the slot updating corpus is improved.
In an embodiment of the present application, when determining a slot update corpus included in a corpus according to a matching relationship between each corpus in the corpus set and a node in a template tree, the following method may also be implemented.
As a possible implementation manner, each corpus is matched with a node in the template tree, and if a non-word slot name in any corpus is matched with a first node in the template tree and a candidate word slot name corresponding to an entity behind the non-word slot name contains a word slot name sub-node of the first node, any corpus is determined as a slot position updating corpus.
For example, the non-word slot name "to" in the corpus "to beijing" and the candidate word slot name "origin" and the word slot name sub-node "destination" of the "destination" corresponding to the entity "beijing" after the non-word slot name "from" matched with the first node "from" in the template tree, and the "destination" contain the first node "to", determine that the corpus is the slot position updating corpus.
Therefore, whether the linguistic data of the non-word slot words before and after the entities is the slot position updating linguistic data or not can be determined by the mode.
Or, as another possible implementation manner, each corpus is matched with a node in the template tree, and if a candidate word slot name corresponding to an entity in any corpus is matched with a second node in the template tree and a non-word slot name located behind the entity is matched with a child node of the second node, it is determined that any corpus is a slot position update corpus.
For example, a candidate word slot name "departure place" and a candidate word slot name "destination" corresponding to the corpus "shanghai to" middle entity "shanghai" are matched with a second node "departure place" in the template tree, and a non-word slot name "to" located behind the entity "shanghai" is matched with a child node "to" of the second node "departure place", and the corpus is determined to be the slot position update corpus.
Therefore, whether the linguistic data before the entity but not after the word slot word is the slot position updating linguistic data or not can be determined by the mode.
In the embodiment of the application, based on the template tree, the preset corpus is traversed, so that when the trench updating corpus included in the corpus is determined according to the matching relation between each corpus and the node in the template tree in the corpus, the trench updating corpus included in the corpus can be acquired in the two utilization modes, the trench updating corpus included in the corpus is acquired based on the template tree, the trench updating corpus in the corpus is acquired based on the template tree, and the efficiency of acquiring the trench updating corpus is improved.
In order to implement the foregoing embodiment, an embodiment of the present application further provides a mining device for bin update corpuses. Fig. 9 is a schematic structural diagram of an excavating device for bin update corpuses according to an embodiment of the present application.
As shown in fig. 9, the excavation device 900 for the slot update corpus includes: a first obtaining module 910, a first generating module 920, a second obtaining module 930, and a third obtaining module 940.
A first obtaining module 910, configured to obtain, from a history interaction log, each query statement and a natural language identification result corresponding to each query statement, where the natural language identification result includes a word slot name to which a word slot value in the query statement belongs;
a first generating module 920, configured to replace the slot value in each query statement with the slot name to which the query statement belongs, so as to generate a reference template set;
a second obtaining module 930, configured to split each reference template in the reference template set to obtain a slot update template set;
a third obtaining module 940, configured to traverse the preset corpus based on the slot update template set, so as to obtain a slot update corpus matched with the slot update template in the slot update template set from the corpus.
In a possible implementation manner of this embodiment of the application, the natural language recognition result further includes an intention corresponding to each query statement, and the second obtaining module 930 includes:
the first acquisition unit is used for splitting each reference template in the reference template set to acquire a plurality of slot position updating templates only containing one word slot name;
and the second acquisition unit is used for removing the slot position updating template only corresponding to the user intention so as to acquire the slot position updating template set.
In a possible implementation manner of the embodiment of the present application, the intention corresponding to each query statement includes at least one of a user intention and a slot update intention, and the second obtaining unit is configured to:
and under the condition that any slot position updating template is one reference template in the reference template set and the corresponding intention of the reference template is only the intention of the user, removing any slot position updating template.
In a possible implementation manner of this embodiment of the application, the third obtaining module 940 includes:
the first determining unit is used for analyzing each corpus in the corpus set to determine entities contained in each corpus and candidate word slot names corresponding to the entities;
the generating unit is used for replacing the entity in each corpus with the corresponding candidate word slot name respectively to generate a candidate template;
and the second determining unit is used for determining the corpus corresponding to any candidate template as the slot position updating corpus under the condition that the slot position updating template set contains the slot position updating template matched with any candidate template.
In a possible implementation manner of the embodiment of the present application, the apparatus further includes:
the fourth acquisition module is used for marking the slot updating corpus according to the slot updating template matched with any candidate template so as to acquire slot updating training data;
and the second generation module is used for utilizing the slot position updating training data to train the initial intention recognition model so as to generate a slot position updating intention recognition model.
In a possible implementation manner of this embodiment of the application, the third obtaining module 940 includes:
a third determining unit, configured to determine a non-word slot name associated between each slot update template in the slot update template set;
the construction unit is used for constructing a template tree corresponding to the slot position updating template set according to the non-word slot names associated among the slot position updating templates;
and the fourth determining unit is used for traversing the preset corpus set based on the template tree so as to determine the slot position updating corpus included in the corpus set according to the matching relation between each corpus in the corpus set and the node in the template tree.
In a possible implementation manner of the embodiment of the present application, the fourth determining unit is configured to:
under the condition that a non-word slot name in any corpus is matched with a first node in a template tree and a candidate word slot name corresponding to an entity behind the non-word slot name contains a word slot name sub-node of the first node, determining any corpus as a slot position updating corpus; or,
and under the condition that a candidate word slot name corresponding to an entity in any corpus is matched with a second node in the template tree and a non-word slot name behind the entity is matched with a child node of the second node, determining any corpus as a slot position updating corpus.
It should be noted that the explanation of the embodiment of the method for mining the slot update corpus is also applicable to the mining apparatus of the slot update corpus of this embodiment, and therefore, the details are not repeated herein.
The trench update corpus mining device according to the embodiment of the application obtains each query sentence and a natural language identification result corresponding to each query sentence from a history interactive log, wherein the natural language identification result includes a word-slot name to which a word-slot value in the query sentence belongs, the word-slot value in each query sentence is replaced with the word-slot name to which the word-slot value belongs, a reference template set is generated, each reference template in the reference template set is split to obtain a trench update template set, a preset corpus is traversed based on the trench update template set, and a trench update corpus matched with the trench update template in the trench update template set is obtained from the corpus. Therefore, the word slot value in the query sentence is replaced by the word slot name to which the word slot value belongs to obtain the reference template, the reference template is split to obtain the slot updating template set, the slot updating template set is utilized, and the slot updating corpus is obtained from the corpus set, so that the automation of the whole excavation process of the slot updating corpus is realized, manual participation is not needed, and the labor cost is greatly reduced.
According to embodiments of the present application, there is also provided an electronic device, a readable storage medium, and a computer program product.
FIG. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 10, the device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 1002 or a computer program loaded from a storage unit 1008 into a RAM (Random Access Memory) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An I/O (Input/Output) interface 1005 is also connected to the bus 1004.
A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 1001 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 1001 executes the respective methods and processes described above, such as the mining method of the slot update corpus. For example, in some embodiments, the method of mining the slot update corpus may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the above-described excavation method of the slot update corpus may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the mining method of the slot update corpus in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, Integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, System On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (Electrically Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in a conventional physical host and a VPS (Virtual Private Server). The server may also be a server of a distributed system, or a server incorporating a blockchain.
According to an embodiment of the present application, there is also provided a computer program product, which when executed by an instruction processor in the computer program product, executes the mining method for the slot update corpus provided in the foregoing embodiment of the present application.
According to the technical scheme of the embodiment of the application, the artificial intelligence technical fields such as natural language processing and deep learning are related, a reference template is obtained by replacing a word-slot value in an inquiry sentence with a word-slot name to which the word-slot value belongs, the reference template is split to obtain a slot updating template set, slot updating corpora are obtained from the corpus set by using the slot updating template set, and therefore the automation of the whole excavation process of the slot updating corpora is achieved, manual participation is not needed, and labor cost is greatly reduced.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (17)

1. A method for mining a slot updating corpus comprises the following steps:
acquiring all query sentences and natural language identification results corresponding to all the query sentences from a historical interactive log, wherein the natural language identification results comprise word slot names to which word slot values in the query sentences belong;
replacing the word slot value in each query statement with the word slot name to which the word slot value belongs to generate a reference template set;
splitting each reference template in the reference template set to obtain a slot position updating template set;
traversing a preset corpus based on the slot updating template set so as to obtain slot updating corpus matched with the slot updating template in the slot updating template set from the corpus.
2. The method of claim 1, wherein the natural language recognition result further includes an intent corresponding to each query statement, and splitting each reference template in the set of reference templates to obtain a set of slot update templates includes:
splitting each reference template in the reference template set to obtain a plurality of slot position updating templates only containing one word slot name;
and removing the slot updating template only corresponding to the user intention so as to obtain the slot updating template set.
3. The method of claim 2, wherein the intent corresponding to each of the query statements includes at least one of a user intent and a slot update intent, the removing slot update templates corresponding only to the user intent comprising:
and under the condition that any slot position updating template is one reference template in the reference template set and the corresponding intention of the reference template is only the intention of a user, removing any slot position updating template.
4. The method of claim 1, wherein traversing a preset corpus based on the slot update template set to obtain a slot update corpus from the corpus that matches a slot update template in the slot update template set comprises:
analyzing each corpus in the corpus set to determine an entity contained in each corpus and a candidate word slot name corresponding to the entity;
respectively replacing the entities in each corpus with the corresponding candidate word slot names to generate candidate templates;
and under the condition that the slot updating template set comprises a slot updating template matched with any candidate template, determining the corpus corresponding to any candidate template as a slot updating corpus.
5. The method according to claim 4, wherein after determining that the corpus corresponding to any one of the candidate templates is a slot update corpus, the method further comprises:
marking the slot updating corpus according to the slot updating template matched with any candidate template to acquire slot updating training data;
and training the initial intention recognition model by utilizing the slot position updating training data to generate a slot position updating intention recognition model.
6. The method of any of claims 1-5, wherein traversing a preset corpus based on the slot update template set to obtain a slot update corpus from the corpus that matches a slot update template in the slot update template set comprises:
determining related non-word slot names among the slot position updating templates in the slot position updating template set;
constructing a template tree corresponding to the slot position updating template set according to the associated non-word slot names among the slot position updating templates;
traversing the preset corpus based on the template tree to determine slot position update corpuses included in the corpus according to the matching relation between each corpus in the corpus and nodes in the template tree.
7. The method of claim 6, wherein the determining the slot update corpus included in the corpus according to the matching relationship between each corpus in the corpus and the node in the template tree comprises:
under the condition that a non-word slot name in any corpus is matched with a first node in the template tree and a candidate word slot name corresponding to an entity behind the non-word slot name comprises a word slot name sub-node of the first node, determining the any corpus as a slot position updating corpus; or,
and under the condition that a candidate word slot name corresponding to an entity in any corpus is matched with a second node in the template tree and a non-word slot name behind the entity is matched with a child node of the second node, determining the corpus as a slot position updating corpus.
8. A mining device for a slot updating corpus comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring all query sentences and natural language identification results corresponding to all the query sentences from a historical interaction log, and the natural language identification results comprise word slot names to which word slot values in the query sentences belong;
the first generation module is used for replacing the word slot value in each query statement with the word slot name to which the query statement belongs so as to generate a reference template set;
the second obtaining module is used for splitting each reference template in the reference template set to obtain a slot position updating template set;
and the third acquisition module is used for traversing a preset corpus based on the slot updating template set so as to acquire slot updating corpuses matched with the slot updating templates in the slot updating template set from the corpus.
9. The apparatus of claim 8, wherein the natural language identification result further includes an intention corresponding to each query statement, and the second obtaining module includes:
a first obtaining unit, configured to split each reference template in the reference template set to obtain multiple slot update templates that only include a word slot name;
and the second acquisition unit is used for removing the slot updating template only corresponding to the user intention so as to acquire the slot updating template set.
10. The apparatus of claim 9, wherein the corresponding intention of each query statement includes at least one of a user intention and a slot update intention, and the second obtaining unit is configured to:
and under the condition that any slot position updating template is one reference template in the reference template set and the corresponding intention of the reference template is only the intention of a user, removing any slot position updating template.
11. The apparatus of claim 8, wherein the third obtaining means comprises:
a first determining unit, configured to analyze each corpus in the corpus set to determine an entity included in each corpus and a candidate word slot name corresponding to the entity;
the generating unit is used for replacing the entity in each corpus with the corresponding candidate word slot name respectively to generate a candidate template;
and a second determining unit, configured to determine that the corpus corresponding to any candidate template is a slot update corpus when the slot update template set includes a slot update template matching any candidate template.
12. The apparatus of claim 11, wherein the apparatus further comprises:
a fourth obtaining module, configured to label the slot update corpus according to the slot update template matched with any one of the candidate templates, so as to obtain slot update training data;
and the second generation module is used for utilizing the slot position updating training data to train the initial intention recognition model so as to generate a slot position updating intention recognition model.
13. The apparatus of any of claims 8-12, wherein the third obtaining means comprises:
a third determining unit, configured to determine a non-word slot name associated between each slot update template in the slot update template set;
the construction unit is used for constructing a template tree corresponding to the slot position updating template set according to the associated non-word slot names among the slot position updating templates;
and a fourth determining unit, configured to traverse the preset corpus based on the template tree, so as to determine, according to a matching relationship between each corpus in the corpus and a node in the template tree, a slot update corpus included in the corpus.
14. The apparatus of claim 13, wherein the fourth determining unit is to:
under the condition that a non-word slot name in any corpus is matched with a first node in the template tree and a candidate word slot name corresponding to an entity behind the non-word slot name comprises a word slot name sub-node of the first node, determining the any corpus as a slot position updating corpus; or,
and under the condition that a candidate word slot name corresponding to an entity in any corpus is matched with a second node in the template tree and a non-word slot name behind the entity is matched with a child node of the second node, determining the corpus as a slot position updating corpus.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of mining slot update corpus of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of mining a slot update corpus of any one of claims 1-7.
17. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the method for mining a slot update corpus of any one of claims 1 to 7.
CN202011559712.9A 2020-12-25 2020-12-25 Mining method and device for slot updating corpus, electronic equipment and storage medium Active CN112541070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011559712.9A CN112541070B (en) 2020-12-25 2020-12-25 Mining method and device for slot updating corpus, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011559712.9A CN112541070B (en) 2020-12-25 2020-12-25 Mining method and device for slot updating corpus, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112541070A true CN112541070A (en) 2021-03-23
CN112541070B CN112541070B (en) 2024-03-22

Family

ID=75018111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011559712.9A Active CN112541070B (en) 2020-12-25 2020-12-25 Mining method and device for slot updating corpus, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112541070B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111160A (en) * 2021-04-22 2021-07-13 中国平安人寿保险股份有限公司 Synonym matching method, device, equipment and storage medium
CN113553843A (en) * 2021-06-24 2021-10-26 青岛海尔科技有限公司 Skill creation method and device
CN113626468A (en) * 2021-08-12 2021-11-09 平安科技(深圳)有限公司 SQL statement generation method, device, equipment and storage medium based on artificial intelligence
CN114298001A (en) * 2021-11-29 2022-04-08 腾讯科技(深圳)有限公司 Corpus template generation method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016197227A (en) * 2015-04-02 2016-11-24 パナソニックIpマネジメント株式会社 Interaction method, interaction program, and interaction system
KR20160141682A (en) * 2016-08-03 2016-12-09 라인 가부시키가이샤 Apparatus for providing service based messenger and method using the same
US20190065507A1 (en) * 2017-08-22 2019-02-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for information processing
CN109446307A (en) * 2018-10-16 2019-03-08 浪潮软件股份有限公司 A kind of method for realizing dialogue management in Intelligent dialogue
WO2020052405A1 (en) * 2018-09-10 2020-03-19 腾讯科技(深圳)有限公司 Corpus annotation set generation method and apparatus, electronic device, and storage medium
CN111966781A (en) * 2020-06-28 2020-11-20 北京百度网讯科技有限公司 Data query interaction method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016197227A (en) * 2015-04-02 2016-11-24 パナソニックIpマネジメント株式会社 Interaction method, interaction program, and interaction system
KR20160141682A (en) * 2016-08-03 2016-12-09 라인 가부시키가이샤 Apparatus for providing service based messenger and method using the same
US20190065507A1 (en) * 2017-08-22 2019-02-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for information processing
WO2020052405A1 (en) * 2018-09-10 2020-03-19 腾讯科技(深圳)有限公司 Corpus annotation set generation method and apparatus, electronic device, and storage medium
CN109446307A (en) * 2018-10-16 2019-03-08 浪潮软件股份有限公司 A kind of method for realizing dialogue management in Intelligent dialogue
CN111966781A (en) * 2020-06-28 2020-11-20 北京百度网讯科技有限公司 Data query interaction method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张俊驰;胡婕;刘梦赤;: "基于复述的中文自然语言接口", 计算机应用, no. 05, 10 May 2016 (2016-05-10) *
盖森;刘建忠;熊伟;张心悦;李江鹏;: "自然语言空间查询中应用编辑距离的规则匹配模型", 测绘科学技术学报, no. 04, 15 August 2015 (2015-08-15) *
陈睿: "基于图结构的语音智能交互关键技术研究", 中国优秀硕士学位论文全文数据库, no. 7, 15 July 2020 (2020-07-15) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111160A (en) * 2021-04-22 2021-07-13 中国平安人寿保险股份有限公司 Synonym matching method, device, equipment and storage medium
CN113553843A (en) * 2021-06-24 2021-10-26 青岛海尔科技有限公司 Skill creation method and device
CN113553843B (en) * 2021-06-24 2023-12-19 青岛海尔科技有限公司 Skill creation method and device
CN113626468A (en) * 2021-08-12 2021-11-09 平安科技(深圳)有限公司 SQL statement generation method, device, equipment and storage medium based on artificial intelligence
WO2023015841A1 (en) * 2021-08-12 2023-02-16 平安科技(深圳)有限公司 Sql statement generation method, apparatus, and device based on artificial intelligence, and storage medium
CN113626468B (en) * 2021-08-12 2024-03-01 平安科技(深圳)有限公司 SQL sentence generation method, device and equipment based on artificial intelligence and storage medium
CN114298001A (en) * 2021-11-29 2022-04-08 腾讯科技(深圳)有限公司 Corpus template generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112541070B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN110765759B (en) Intention recognition method and device
CN112541070B (en) Mining method and device for slot updating corpus, electronic equipment and storage medium
CN113220836A (en) Training method and device of sequence labeling model, electronic equipment and storage medium
US20220358292A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN110555205A (en) negative semantic recognition method and device, electronic equipment and storage medium
CN114548110A (en) Semantic understanding method and device, electronic equipment and storage medium
CN114281968B (en) Model training and corpus generation method, device, equipment and storage medium
CN113553412A (en) Question and answer processing method and device, electronic equipment and storage medium
CN113722493A (en) Data processing method, device, storage medium and program product for text classification
CN113836925A (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN112466289A (en) Voice instruction recognition method and device, voice equipment and storage medium
CN115688920A (en) Knowledge extraction method, model training method, device, equipment and medium
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN112784591A (en) Data processing method and device, electronic equipment and storage medium
CN111861596A (en) Text classification method and device
CN112466277A (en) Rhythm model training method and device, electronic equipment and storage medium
CN112560425A (en) Template generation method and device, electronic equipment and storage medium
CN114722159B (en) Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
CN114141236B (en) Language model updating method and device, electronic equipment and storage medium
CN115600592A (en) Method, device, equipment and medium for extracting key information of text content
CN115292506A (en) Knowledge graph ontology construction method and device applied to office field
CN114781386A (en) Method and device for acquiring text error correction training corpus and electronic equipment
CN114416941A (en) Generation method and device of dialogue knowledge point determination model fusing knowledge graph
CN114416990A (en) Object relationship network construction method and device and electronic equipment
CN113641724A (en) Knowledge tag mining method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant