CN111552821A - Legal intention searching method, legal intention searching device and electronic equipment - Google Patents

Legal intention searching method, legal intention searching device and electronic equipment Download PDF

Info

Publication number
CN111552821A
CN111552821A CN202010407792.XA CN202010407792A CN111552821A CN 111552821 A CN111552821 A CN 111552821A CN 202010407792 A CN202010407792 A CN 202010407792A CN 111552821 A CN111552821 A CN 111552821A
Authority
CN
China
Prior art keywords
legal
intention
intent
training
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010407792.XA
Other languages
Chinese (zh)
Other versions
CN111552821B (en
Inventor
李东海
黄晓宏
张斌琦
李喻瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayu Yuandian Information Services Co ltd
Original Assignee
Beijing Huayu Yuandian Information Services Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayu Yuandian Information Services Co ltd filed Critical Beijing Huayu Yuandian Information Services Co ltd
Priority to CN202010407792.XA priority Critical patent/CN111552821B/en
Publication of CN111552821A publication Critical patent/CN111552821A/en
Application granted granted Critical
Publication of CN111552821B publication Critical patent/CN111552821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Biology (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

A legal intention search method, a legal intention search apparatus, and an electronic device are disclosed. The legal intention searching method comprises the following steps: performing knowledge injection and sentence tree conversion on the query request based on a legal knowledge graph through a knowledge layer of a legal intention classifier; transforming the sentence tree into an embedded representation through an embedding layer of the legal intent classifier; the visualization degree of the words in the query request and the words from the legal knowledge graph is controlled through a visual layer of the legal intention classifier, and the self-attention area in a conversion model is controlled through a mask converter layer of the legal intention classifier according to the visualization degree information to obtain a legal intention search result. Therefore, the legal intention classifier with a specific framework is constructed by fusing the legal knowledge map and the pre-training language model, so that the recognition accuracy of the legal intention is improved to optimize the search result.

Description

Legal intention searching method, legal intention searching device and electronic equipment
Technical Field
The present application relates to the field of search technologies, and in particular, to a legal intention search method, a legal intention search device, and an electronic device.
Background
With the development of internet technology, people can retrieve information required by themselves by using a search engine through a network. Various search engines for vertically subdivided domains are also emerging like a spring shoot in the rain, for example, legal, financial, patent search engines, etc. The vertical search engine is professional search which enables information positioning to be more accurate by performing professional and deep analysis mining and filtering screening on the content of professional specific fields or industries. In fact, the vertical search engine is a subdivision and extension of the search engine, and is a special information retrieval service which is provided for a specific field, a specific group or a specific requirement in a targeted manner so as to meet the personalized information search requirement of the user, for example, "endorsement", "book certificate", "third person" and the like have semantics which are obviously different from those of a general scene under the legal scene.
For example, in a legal search engine, when a user enters "apply a proposal" in an input box, the user's intent may be a factor of "self" query, rather than a simple "query. Meanwhile, the legal domain has a large amount of professional data knowledge content (e.g., cases, laws, journal documents, etc.), business domain (e.g., criminals or civil or business), professional identity (e.g., infringement relation), and the user may have different specific directions when initiating a search, may focus on only a specific category, and may focus on a plurality of different categories.
Therefore, the search engine needs to understand the conditions of the user search and accurately determine the intention category and the intention content of the user search, so that the returned search result can be closer to the content required by the user. Here, the search intention recognition means that for any given query string, it is judged that the query string belongs to some kind of intention information.
Some existing technical solutions for intent search have some defects, and the accuracy of intent recognition is not high, so a more optimized intent search engine for a specific field is needed.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides a legal intention searching method, a searching device and electronic equipment, which are used for constructing a legal intention classifier with a specific architecture by fusing a legal knowledge graph and a pre-training language model so as to improve the recognition accuracy of legal intention and optimize a searching result.
According to an aspect of the present application, there is provided a legal intention search method including:
performing knowledge injection and sentence tree conversion on the query request based on a legal knowledge graph through a knowledge layer of a legal intention classifier;
transforming the sentence tree into an embedded representation through an embedding layer of the legal intent classifier;
controlling a degree of visualization of the words in the query request and the words from the legal knowledge graph by a visual layer of the legal intent classifier, and
and controlling a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier to obtain a legal intention search result.
In the legal intention search method, the legal intention classifier is obtained by training, and the training process includes: performing entity identification and relationship extraction on the obtained linguistic data related to the legal field to generate a legal knowledge map; pre-training a pre-training language model of the legal intention classifier based on a legal knowledge graph and pre-training corpora; and carrying out classification training on the pre-trained language model based on the training corpus to generate the legal intention classifier.
In the legal intention search method, pre-training the pre-training language model based on a legal knowledge base and pre-training corpora includes: performing knowledge injection and sentence tree conversion on sentences in the pre-training corpus on the basis of a legal knowledge graph through a knowledge layer of the pre-training language model; converting the sentence tree into an embedded representation through an embedding layer of the pre-training language model; setting visualization degrees of words in a given sentence and newly added words including words from a legal knowledge graph through a visualization layer of the pre-trained language model, and setting a self-attention area in a conversion model according to visualization degree information through a mask converter layer of the pre-trained language model to obtain a pre-trained legal intention search result.
In the legal intent search method, converting the sentence tree into an embedded representation through an embedding layer of the pre-trained language model includes: performing word embedding processing on the sentence tree to generate word embedding representation; performing a location embedding process on the word-embedded representation to generate a location-embedded representation; and performing word segmentation embedding processing on the position embedding representation to generate the embedding representation.
In the above legal intention search method, setting a degree of visualization of words in a given sentence and words from a legal knowledge base through a visualization layer of the pre-trained language model includes: in response to a newly added word being a predicate, setting the predicate to be invisible; and setting the newly added word to be visible in response to the newly added word being a word from the legal knowledge base that represents the entity.
In the legal intent search method, before performing knowledge injection and sentence tree transformation on the query request based on the legal knowledge base, the method further includes: and performing dictionary matching and/or rule matching on the content in the query request.
In the above legal intent search method, performing dictionary matching and/or rule matching on the content in the query request includes: performing dictionary generation by using the training corpus to obtain a legal dictionary; and performing dictionary matching on the content in the query request by using the legal dictionary.
In the above legal intent search method, performing dictionary matching and/or rule matching on the content in the query request includes: using the training corpus to extract rules to obtain legal rules; and performing rule matching on the content in the query request by using the legal rule.
In the above legal intention search method, controlling, by a mask converter layer of the legal intention classifier, a self-attentive region in a conversion model according to the visualization degree information to obtain a legal intention search result, comprising: controlling, by a mask translator layer of the legal intention classifier, a self-attentive region in a translation model according to visualization degree information to obtain an initial legal intention result; and intent classifying the initial legal intent result by Softmax to obtain the legal intent search result.
In the legal intent search method described above, the intent classification includes a plurality of layers of classification results, one or more of which are selected or specified by a user.
According to another aspect of the present application, there is provided a legal intention search apparatus including:
the sentence tree conversion unit is used for performing knowledge injection and sentence tree conversion on the query request based on the legal knowledge map through the knowledge layer of the legal intention classifier;
an embedding processing unit that converts the sentence tree into an embedded representation through an embedding layer of the legal intention classifier;
a visual control unit for controlling the degree of visual layer of the words in the query request and the words from the legal knowledge base by the visual layer of the legal intent classifier, and
and the intention classification unit is used for controlling a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier so as to obtain a legal intention search result.
In the above legal intention search device, further comprising a training unit for: performing entity identification and relationship extraction on the obtained linguistic data related to the legal field to generate a legal knowledge map; pre-training a pre-training language model of the legal intention classifier based on a legal knowledge graph and pre-training corpora; and carrying out classification training on the pre-trained language model based on the training corpus to generate the legal intention classifier.
In the legal intention searching apparatus, the training unit is further configured to perform knowledge injection and sentence tree transformation on the sentences in the pre-training corpus based on a legal knowledge graph through a knowledge layer of the pre-training language model; converting the sentence tree into an embedded representation through an embedding layer of the pre-training language model; setting visualization degrees of words in a given sentence and words from a legal knowledge graph through a visualization layer of the pre-trained language model, and setting a self-attention area in a conversion model according to visualization degree information through a mask converter of the pre-trained language model to obtain a pre-trained legal intention search result.
In the above legal intention search device, the embedded processing unit is further configured to: performing word embedding processing on the sentence tree to generate word embedding representation; performing a location embedding process on the word-embedded representation to generate a location-embedded representation; and performing word segmentation embedding processing on the position embedding representation to generate the embedding representation.
In the above legal intention search device, the visual control unit is further configured to: in response to a newly added word being a predicate, setting the predicate to be invisible; and setting the newly added word to be visible in response to the newly added word being a word from the legal knowledge base that represents the entity.
In the above legal intent search apparatus, the apparatus further comprises a matching unit, configured to perform dictionary matching and/or rule matching on the content in the query request.
In the above legal intention search apparatus, the matching unit is further configured to perform dictionary generation using the corpus to obtain a legal dictionary; and performing dictionary matching on the content in the query request by using the legal dictionary.
In the legal intention searching apparatus, the matching unit is further configured to perform rule extraction using the corpus to obtain legal rules; and performing rule matching on the content in the query request by using the legal rule.
In the above legal intention search apparatus, the intention classification unit is further configured to control, by a mask converter layer of the legal intention classifier, a region of self-attention in a conversion model according to the visualization degree information to obtain an initial legal intention result; and intent classifying the initial legal intent result by Softmax to obtain the legal intent search result.
In the above-described legal intention search apparatus, the intention classification includes a plurality of layers of classification results, one or more layers of the plurality of layers of classification results being selected or designated by a user.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the legal intent search method as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the legal intent search method as described above.
According to the legal intention searching method, the searching device and the electronic equipment, the legal intention classifier with a specific framework is constructed by fusing the legal knowledge map and the pre-training language model, so that the recognition accuracy of the legal intention is improved to optimize a searching result.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 illustrates a flow chart of a legal intent search method according to an embodiment of the application.
FIG. 2 illustrates a specific example of the processing of a query request by the legal intent classifier according to an embodiment of the present application.
FIG. 3 illustrates a specific example of legal intent recognition by the legal intent classifier according to an embodiment of the application.
FIG. 4 illustrates a general framework diagram of the legal intent classifier in combination with other legal intent recognition approaches in accordance with embodiments of the application.
FIG. 5 illustrates a block diagram of a legal intent search apparatus in accordance with an embodiment of the present application.
FIG. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Summary of the application
As described above, some solutions for intent search exist, but these solutions have some defects, and the accuracy of intent recognition is not high, so a more optimized intent search engine for a specific field is needed.
Specifically, in the conventional search intention recognition method, a web page is often subjected to category labeling by using a manual labeling method, and in the intention recognition, it is necessary to recognize the web page category by using the manual labeling. However, the cost is too high for manually labeling the combination of the webpages of each category, the number of manually labeled results is often limited, and the case that the category is unknown is likely to occur for the webpages with a small click rate, so that the accuracy of intention identification is not high.
Meanwhile, some patents disclose patents related to search intention identification, and chinese patent CN106951422A discloses a method, an apparatus, a computer device and a storage medium for searching unstructured documents, which aims to solve the problem that in the prior art, when legal document retrieval is performed, unstructured documents matched with search condition texts cannot be accurately retrieved. The specific solution method comprises the steps of firstly analyzing a search condition text acquired from a client, and acquiring at least one first entity text included in the search condition text and a first entity type corresponding to each first entity text; then, performing intention recognition on the search condition text according to a preset intention recognition method, and acquiring search intents corresponding to the search condition text, wherein the search intents comprise search intents respectively corresponding to the first entity texts; then, according to the search intention, constructing all the first entity texts and first entity types corresponding to the first entity texts as search expressions; then, matching the search expression in a document database to generate a search result, wherein at least one unstructured document is stored in the document database, and the search result comprises each unstructured document matched with the search expression; and finally, presenting the search result to the client.
In addition, chinese patent CN110674259 discloses an intention understanding method and apparatus, wherein the method comprises: firstly, identifying target word slot labels corresponding to the constituent words in a target text, and generating a target generalized text of the target text according to the target word slot labels; then, matching the target generalization text with a plurality of preset generalization templates, and determining candidate generalization templates according to the matching degree; then, calculating the semantic similarity between the target generalization text and the candidate generalization template, and determining the target generalization template according to the semantic similarity; and then, acquiring a template intention of the target generalization template, acquiring the template intention of the target generalization template, and generating an intention understanding result of the target text according to the template intention, the target word slot label and the constituent words corresponding to the target word slot label.
In recent years, two technical fields of 'pre-training language model' and 'knowledge graph' are developed vigorously, and new technical implementation possibility is provided for search intention recognition in a specific field.
In the last 2 years, especially 2019, the pre-trained language model has developed a new NLP (natural language Processing) paradigm: and a large-scale text corpus is used for pre-training, and the small data set of a specific task is finely adjusted, so that the difficulty of a single NLP task is reduced. The essence of the pre-training concept is that the model parameters are no longer initialized randomly, but are pre-trained through some task (e.g., language model). The pre-training belongs to the category of transfer learning, the pre-training language model mainly refers to an unsupervised training task (sometimes also called self-learning or self-supervision), and the transfer paradigm mainly comprises feature integration and model fine tuning.
With the pre-training language models such as ELMO/GPT/BERT, etc. obtaining SOTA results in NLP task, a series of new methods are developed, such as MASS, UNILM, ERNIE1.0, ERNIE (THU), MTDNN, ERNIE2.0, SpanBERT, RoBERTA, XLNET, XLM, etc.
The pre-training language model promotes the progress of NLP technology, which is mainly based on BERT model, and therefore a series of BERT models are born. For example, the joint labs of hagongdeifei released the chinese pre-training language model BERT-wwm based on the full-word mask in 2019, month 6 and month 20, which introduced the full-word mask (whole wordmask, wwm) on the basis of the original BERT-base, which was actually masked after word segmentation.
Fusing human knowledge is one of the research directions for artificial intelligence. Knowledge representation and reasoning is inspired by human problem solving methods and represents knowledge for an intelligent system to obtain the capability of solving complex tasks. In recent years, knowledge-graphs have received widespread attention in academia and industry as a form of structured human knowledge. A knowledge graph is a structured representation of facts, consisting of entities, relationships, and semantic descriptions. Entities may be real-world objects and abstractions, relationships represent associations between entities, semantic descriptions of entities and their relationships include good types and properties, property or property graphs are widely used, where nodes and relationships have properties or properties.
Combining the excellent pre-training language model with the knowledge graph is a concept for solving many natural language processing problems, such as the ERNIE model proposed by Baidu corporation and the ERNIE model proposed by Qinghua university, where the two models have consistent names but differ qualitatively.
The ERNIE model proposed by Qinghua university is a combined teaching of Qinghua university and Hua Nuo ark laboratory for introducing a knowledge graph to enhance semantic expression capability, and the technical core of the ERNIE model is that an entity alignment task is added on the basis of the original BERT model. The ERNIE model proposed by the university of qinghua comprises two encoders: t-encoders (T-encoders) and K-encoders (K-encoders), wherein the K-encoder only works during pre-training, and only the T-encoder is used in the fine-tuning (fine-tuning) phase thereafter, and therefore the ERNIE model is focused on introducing the entity alignment task.
Specifically, a sequence is given by w1,w2,…,wnComposition, entity e corresponding to this sequence1,e2,…,emThese entities are from a knowledge graph. Since an entity may involve more than one word, e.g. e1Bob Dylan, and the corresponding entity in the sequence consists of two words (token), i.e. w1=Bob,w2Dylan. Thus, when aligned, the first of the entities in the knowledge-graph and the entities in the sequence are mapped, i.e., e is positionally mapped1Corresponds to w1
The function of the T-encoder is to encode the sequence, the structure is similar to the BERT-base model, but the number of layers is 6, the K-encoder is to aggregate the entity and the sequence in the knowledge graph, and the entity in the knowledge graph is embedded by a specific embedding means (such as TransE).
In the ERNIE model proposed by the university of qinghua, the entity alignment task is constructed by: firstly, randomly masking an entity in a word-entity, and then predicting an entity corresponding to the position, wherein the entity is essentially consistent with an MLM (Mask language model) task and belongs to denoising self-coding.
Besides the newly added entity alignment task, a new pre-training mode constructed on the basis of the two tasks of entity type and relationship extraction is also provided in the model, and the pre-training process introduces some special words to indicate the identities of other special words.
The ERNIE model proposed by Baidu corporation, although its name is consistent with that proposed by Qinghua university, does not perform exactly the same, and mainly aims at some improvement of MLM task in BERT model. Knowledge masks a single word in a BERT model, but in a statement, many words exist in the form of phrases or entities, and if correlation among words in the phrases or the entities is not considered, and all the words are independent, information such as syntax, semantics and the like cannot be well expressed, so that three different masking modes are introduced in the Baidu proposed ERNIE model to mask the words, the entities and the phrases respectively. In addition, the Baidu ERNIE model also enters the dialog corpus, enriches the corpus types, and gives a similar NSP (Next Session Prediction) task for the dialog corpus. Specifically, the Baidu ERNIE model constructs a dialogue language model task, which is similar to the NSP task, randomly generates some false multi-turn QR pairs (Question and answer), and then lets the model predict whether the current multi-turn dialogue is real or false.
The Baidu ERNIE model is improved by 1-2% in comparison with the BERT model on a plurality of tasks, and experiments show that the Baidu ERNIE model is improved in NLI tasks on DLM tasks.
Therefore, it can be seen that in the process of combining the pre-trained language model and the knowledge graph, the pre-trained language model needs to be subjected to architecture matching based on the domain characteristics of the knowledge graph, so as to obtain a model capable of achieving a predetermined effect.
Based on the technical development and evolution thought, the basic idea of the application is to fuse the legal knowledge graph and the pre-training language model to construct the legal intention classifier with a specific architecture, so as to obtain an optimized legal intention search result through the legal intention classifier with the specific architecture based on the legal knowledge graph.
Based on the method, the legal intention searching device and the electronic equipment firstly perform knowledge injection and sentence tree conversion on a query request through a knowledge layer of a legal intention classifier based on a legal knowledge graph; then, converting the sentence tree into an embedded representation through an embedded layer of the legal intention classifier; then, controlling the visualization degree of the words in the query request and the words from the legal knowledge base by the visual layer of the legal intention classifier; then, controlling a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier to obtain a legal intention search result.
Therefore, the legal intention searching method, the legal intention searching device and the electronic equipment provided by the application can combine the legal knowledge map and the pre-trained language model through the legal intention classifier with a specific architecture, so that the recognition accuracy of the legal intention is improved to optimize the searching result.
It should be noted that the above basic concept of the present application can also be equivalently applied to the construction of search intention classifier models in other specific vertical domains, such as the financial domain, the intellectual property domain, etc., based on the intellectual maps in other domains.
Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.
Exemplary method
FIG. 1 illustrates a flow chart of a legal intent search method according to an embodiment of the application.
As shown in fig. 1, a legal intention search method according to an embodiment of the present application includes: s110, performing knowledge injection and sentence tree conversion on the query request based on a legal knowledge graph through a knowledge layer of a legal intention classifier S120, and converting the sentence tree into an embedded representation through an embedded layer of the legal intention classifier; s130, controlling the visualization degree of the words in the query request and the words from the legal knowledge base by the visual layer of the legal intention classifier; and S140, controlling a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier to obtain a legal intention search result.
That is, the legal intention classifier according to the embodiment of the present application includes four layers, namely, a knowledge layer, an embedded layer, a visual layer, and a mask transformer layer (mask transformer), which fuses a knowledge graph and a pre-trained language model to construct a specific legal intention classifier, so as to improve recognition accuracy of a legal intention to optimize a search result.
In step S110, the query request is subjected to knowledge injection and sentence tree transformation based on the legal knowledge graph through the knowledge layer of the legal intention classifier. The query request comprises content input by a user, and the knowledge layer of the legal intention classifier performs knowledge injection on the content in the query request and converts the content into a sentence tree (a data structure). Here, knowledge injection means embedding words in the legal knowledge graph into the content input by the user, i.e., adding new words to the content input by the user, for example, giving one sentence S ═ { w ═0,w1,w2,w3,.....wnAnd a sentence tree output by the knowledge layer is st ═ w { (w)0,w1,…,wi{(ri0,wi0),…,(rik,wik)},…,wnWherein E { (w)i,ri0,wi0),…,(Wi,rik,wik) Is the word WiThe corresponding entity relationship pairs are from a knowledge graph (referred to herein as a legal knowledge graph).
In step S120, the sentence tree is converted into an embedded representation by the embedding layer of the legal intent classifier, i.e., the embedding layer of the legal intent classifier converts the sentence tree into an embedded representation. In the embodiment of the present application, the embedding layer includes three parts, a word embedding part, a position embedding part, and a word segmentation embedding part. The word embedding part is used for carrying out word embedding processing on the sentence tree to generate word embedding representation; the position embedding part is used for carrying out position embedding processing on the word embedding representation so as to generate a position embedding representation; the word segmentation embedding part is used for carrying out word segmentation embedding processing on the position embedding representation so as to generate the embedding representation.
Specifically, in the embodiment of the present application, the word embedding part may use a word embedding layer in the BERT-wwm model and use [ CLS ] as a classification mark, in which a word introduced from the legal knowledge graph is inserted after a word of the original content.
Fig. 2 illustrates a specific example of the processing of a query request by the legal intention classifier according to the embodiment of the present application, and as shown in fig. 2, the original content input by the user is "dad send me to send out a proposed case", and after the processing of the word embedding part, the original content becomes [ CLS ] dad is a family of today dad is a send me to send out the national institute is a place of the is a project isa self-beginning ". Obviously, this may cause the sentence to deviate from the original meaning, and in the embodiment of the present application, this problem is solved by the position embedding part. Specifically, in the embodiment of the present application, the position embedding part (for example, using a soft-position embedding layer) can keep the position information of the word of the original content unchanged, and the newly added position from the legal knowledge base is introduced as shown in fig. 2. In one example, the word segmentation embedding part may adopt a word segmentation embedding manner of a standard BERT model, that is, treat all words as the same segment.
It should be noted that, in the embodiment of the present application, the entity relationship processing includes other relationships besides is a in the above example, for example, Part of, position, location, and the like, which is not limited by the present application.
In step S130, the degree of visualization of the words in the query request and the words from the legal knowledge base is controlled by the visual layer of the legal intent classifier. In the embodiment of the application, in order to avoid the original sentence structure from being damaged, the visual layer is arranged and used for controlling the visualization degree of the words in the query request and the words from the legal knowledge base. In one possible implementation manner of the present application, the visualization degree rule is as follows: when a newly added word is a predicate (namely, a verb), only two entities of the newly added word can be seen, if the newly added word is a second entity of the is a relationship, the visual range is the visual range of a first word (used for representing the first entity) of the newly added word, the visual range of the original word is added with words from a legal knowledge graph, namely, the predicate is set to be invisible in response to the newly added word being the predicate; and setting the newly added word to be visible in response to the newly added word being a word from the legal knowledge base that represents the entity.
As shown in fig. 2, through the visual layer, a visual matrix for reflecting the degree of visualization of the words in the query request and the words from the legal knowledge graph may be obtained, wherein each position in the matrix corresponds to the words in the corresponding query request and the words from the legal knowledge graph, and each position may reflect invisible or visible as white circles and black circles.
In step S140, the self-attention area in the conversion model is controlled by the mask converter layer of the legal intention classifier according to the visualization degree information to obtain a legal intention search result. Here, the mask converter layer controls a region of self-attention in the conversion model according to information of the visible layer.
In particular, the mask translator layer of the legal intent classifier according to embodiments of the present application is similar to translator models used in the field of natural language processing to replace traditional RNNs and CNNs to enable machine translation.
In the field of natural language processing, CNN convolution is not well suited for serialized text, whereas RNN is not parallelized and is easily out of memory limits (e.g., 50 word-length sentences occupy a large memory).
The converter model usually includes an encoder and a decoder, and has self-attention layers inside the encoder and the decoder, each having a multi-head (multi-head) feature, and finally incorporates the position information that has not been considered by position coding.
Here, the self-attention layer having the multi-head feature is used to divide a vector of one word into h dimensions, and each h dimension is calculated when the attention similarity is found. Because the words are mapped in the high-dimensional space as a vector form, each dimensional space can learn different characteristics, the learned results of adjacent spaces are more similar, and the words are more reasonable to put together and correspond compared with the whole spaces. For example, for a word vector with a vector size of 512, h is 8, and attention calculation is performed every 64 spaces, so that the learned result is more detailed.
The self-attention mechanism enables the words of each lexeme to be coded directly with each word in the sentence regardless of direction and distance. For example, through the self-attention mechanism, each word in a sentence can be associated with an edge between other words in the sentence, the deeper the edge is, the stronger the association is, and the deeper the edges are associated with the words with fuzzy meaning, such as law, application, miss, opinion, etc.
The position coding is to introduce the sequence order information of the sentences, such as meaning of "you are missing 100 million tomorrow to be still" and "I are missing 100 million tomorrow to be still" is quite different, so the sequence order information needs to be considered through the position coding.
Specifically, the mask converter layer may train a position embedding vector directly to retain position information, randomly initialize a vector for each position, add model training, and finally obtain a position embedding vector containing position information, and finally the position embedding vector may be combined with the word embedding vector, for example, may be directly spliced.
For the visual matrix as shown in fig. 2, the mutually visible points may be taken to be 0, the mutually invisible points to be minus infinity, and then the matrix M is introduced into the softmax function that calculates the self-attention, as follows:
Qi+1,Ki+1,Vi+1=hiWq,hiWk,hiWv
Figure BDA0002492044350000121
hi+1=Si+1Vi+1
wherein the content of the first and second substances,
Figure BDA0002492044350000122
with respect to the self-attention layer of the BERT standard model, in the embodiment of the present application, the final embedded vector is obtained from the visual matrix M. That is, if two words are not visible, the influence coefficient S [ i, j ] between them will be 0, so that there is no influence between the hidden states h of the two words. In this way, the structure information in the sentence sub-tree is input to the mask converter layer, thereby realizing the visibility control between words and finally influencing the generated embedded vector.
In the embodiment of the application, controlling, by a mask converter layer of the legal intention classifier, a self-attention area in a conversion model according to the visualization degree information to obtain a legal intention search result includes: controlling, by a mask translator layer of the legal intent classifier, a self-attentive region in a translation model according to visualization degree information to obtain an initial legal intent result, such as an embedded vector as described above; and intent classifying the initial legal intent result by Softmax to obtain the legal intent search result. Here, the intention classification may include a plurality of layers of classification results, one or more layers of which are selected or designated by a user. The reason for this is that, in an actual legal search scenario, the search intention of the user may not be directional and may not be clear uniquely, and thus, multi-layer classification may be necessary. For example, entering "Zhang III", the best intent classification should match "person name", but at the same time, may involve multiple subjects, such as judges, attorneys, students, further selections by the user or further classification based on specific needs. As another example, if "burglary to home" is input to initiate a search, the user may be interested in cases related to minor matters, related laws and regulations, articles, and the like. Based on the consideration of these application scenarios, it is necessary to continuously optimize the model for specific business scenarios and requirements in combination with the current input of the user and the actual intention situation of the past user behavior reaction, so as to ensure that a plurality of possible intention categories can be output.
FIG. 3 illustrates a specific example of legal intent recognition by the legal intent classifier according to an embodiment of the application. As shown in fig. 3, the content included in the query request of the user is "today's dad sends me to send out a proposed plan", the query request is converted into a sentence tree after being processed by the knowledge layer, the sentence tree is converted into an embedded representation by the embedding layer, the visualization layer controls the words in the query request and the degree of the visualization layer from the legal knowledge graph, and then the mask converter layer controls the self-attention area in the conversion model according to the visualization degree information to obtain the legal intention classification result.
It will be appreciated by those of ordinary skill in the art that the legal intent classifier is trained prior to intent recognition by the legal intent classifier. In an embodiment of the present application, the training process includes: firstly, performing entity identification and relation extraction on the obtained corpus related to the legal field to generate a legal knowledge map; then, pre-training a pre-training language model of the legal intention classifier based on a legal knowledge graph and pre-training corpora; and then, carrying out classification training on the pre-trained language model based on the training corpus to generate the legal intention classifier. In a specific example of the present application, entity identification and relationship extraction are performed on the obtained corpus related to the legal field to generate a legal knowledge graph; pre-training a pre-training language model of the legal intention classifier based on a legal knowledge graph and pre-training corpora; and carrying out classification training on the pre-trained language model based on the training corpus to generate the legal intention classifier.
In a specific example of the present application, the pre-trained language model may be a model optimized based on the BERT-wwm model, or any other pre-trained language model that satisfies the requirements, and is not limited in this respect.
That is, in the embodiment of the present application, the training process of the legal intention classifier includes: firstly, obtaining linguistic data in the legal field, such as referee documents and the like, and performing entity recognition and relationship extraction on the linguistic data to obtain a knowledge graph in the legal field; then, the fused legal knowledge graph and the pre-training language model are trained to obtain the legal intention classifier.
It is worth mentioning that in the embodiments of the present application, the legal search intent recognition may also be combined with other recognition means, such as conventional dictionary matching and rule matching. For example, in one possible implementation, before performing knowledge injection and sentence tree transformation on the query request based on the legal knowledge graph, the method further includes: and performing dictionary matching and/or rule matching on the content in the query request. Specifically, dictionary matching and/or rule matching the content in the query request includes: performing dictionary generation by using the training corpus to obtain a legal dictionary; and performing dictionary matching on the content in the query request by using the legal dictionary. Performing dictionary matching and/or rule matching on the content in the query request comprises: using the training corpus to extract rules to obtain legal rules; and performing rule matching on the content in the query request by using the legal rule. It should be noted that, in the embodiment of the present application, the legal intention recognition may be performed on the query request according to the sequence of the dictionary matching, the rule matching and the legal intention classifier, and once the previous step is completed, the subsequent recognition is not performed, and of course, three types of legal intention search results may also be output simultaneously, so as to provide more choices for the user, which is not limited by the present application.
FIG. 4 illustrates a general framework diagram of the legal intent classifier in combination with other legal intent recognition approaches in accordance with embodiments of the application. As shown in fig. 4, the overall framework performs search intention recognition in the legal field, and mainly includes the following processes: firstly, identifying a training corpus according to a search intention in the legal field of the mobile phone; then, based on the legal field search intention training corpus, a legal field dictionary is semi-automatically generated, and an intention matching rule is searched; then, training the legal intention classifier based on the legal knowledge graph, the pre-training language model and the training corpus; then, on the application system, the query request of the user is received, dictionary matching, rule matching and legal intention classifier are respectively carried out for carrying out search intention recognition in sequence, and once the previous step is completed, the subsequent recognition is not carried out.
In the above, although the legal intention search method is taken as an example, it should be understood by those skilled in the art that the above basic concept of the present application can also be applied to the construction of search intention classifier models in other specific vertical fields, such as the financial field, the intellectual property field, etc.
Exemplary devices
FIG. 5 illustrates a block diagram of a legal intent search apparatus in accordance with an embodiment of the present application.
As shown in fig. 5, the legal intention search device 200 according to the embodiment of the present application includes: a sentence tree transformation unit 210, configured to perform knowledge injection and sentence tree transformation on the query request based on the legal knowledge map through the knowledge layer of the legal intention classifier; an embedding processing unit 220 that converts the sentence tree into an embedded representation through an embedding layer of the legal intent classifier; a visual control unit 230 for controlling the degree of the visual layer of the words in the query request and the words from the legal knowledge base through the visual layer of the legal intention classifier, and an intention classification unit 240 for controlling the self-attention area in the conversion model according to the visualization degree information through the mask converter layer of the legal intention classifier to obtain the legal intention search result.
In one example, in the above legal intent search apparatus 200, further comprising a training unit 250 for: performing entity identification and relationship extraction on the obtained linguistic data related to the legal field to generate a legal knowledge map; pre-training a pre-training language model of the legal intention classifier based on a legal knowledge graph and pre-training corpora; and carrying out classification training on the pre-trained language model based on the training corpus to generate the legal intention classifier.
In one example, in the above legal intention searching apparatus 200, the training unit 250 is further configured to perform knowledge injection and sentence tree transformation on the sentences in the pre-training corpus based on a legal knowledge graph through a knowledge layer of the pre-training language model; converting the sentence tree into an embedded representation through an embedding layer of the pre-training language model; setting visualization degrees of words in a given sentence and words from a legal knowledge graph through a visualization layer of the pre-trained language model, and setting a self-attention area in a conversion model according to visualization degree information through a mask converter of the pre-trained language model to obtain a pre-trained legal intention search result.
In one example, in the above legal intention searching apparatus 200, the embedded processing unit 220 is further configured to: performing word embedding processing on the sentence tree to generate word embedding representation; performing a location embedding process on the word-embedded representation to generate a location-embedded representation; and performing word segmentation embedding processing on the position embedding representation to generate the embedding representation.
In one example, in the above legal intention searching apparatus 200, the visual control unit 230 is further configured to: in response to a newly added word being a predicate, setting the predicate to be invisible; and setting the newly added word to be visible in response to the newly added word being a word from the legal knowledge base that represents the entity.
In one example, in the above-mentioned legal intention searching apparatus 200, further comprising a matching unit 260 for performing dictionary matching and/or rule matching on the content in the query request.
In one example, in the above legal intention searching apparatus 200, the matching unit 260 is further configured to perform dictionary generation using the corpus to obtain a legal dictionary; and performing dictionary matching on the content in the query request by using the legal dictionary.
In one example, in the above legal intention searching apparatus 200, the matching unit 260 is further configured to perform rule extraction using a corpus to obtain legal rules; and performing rule matching on the content in the query request by using the legal rule.
In one example, in the above legal intention searching apparatus 200, the intention classifying unit 240 is further configured to control a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier to obtain an initial legal intention result; and intent classifying the initial legal intent result by Softmax to obtain the legal intent search result.
In one example, in the above-described legal intent search apparatus 200, the intent classification includes a plurality of layers of classification results, one or more of which are selected or designated by a user.
Here, it can be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described legal intention search apparatus 200 have been described in detail in the above description of the legal intention search method with reference to fig. 1 to 4, and thus, a repetitive description thereof will be omitted.
As described above, the legal intention search apparatus 200 according to the embodiment of the present application can be implemented in various terminal devices, such as a large-screen smart device, or a computer independent of a large-screen smart device. In one example, the legal intention search apparatus 200 according to an embodiment of the present application may be integrated into a terminal device as one software module and/or hardware module. For example, the legal intention search means 200 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the legal intent search apparatus 200 can also be one of many hardware modules of the terminal device.
Alternatively, in another example, the legal intention search apparatus 200 and the terminal device may be separate devices, and the legal intention search apparatus 200 may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information in an agreed data format.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 6.
FIG. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
As shown in fig. 6, the electronic device 10 includes one or more processors 11 and memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 11 to implement the legal intent search method and/or other desired functionality of the various embodiments of the present application described above. Various contents such as intention classification results, legal knowledge maps, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 13 may include, for example, a keyboard, a mouse, and the like.
The output device 14 can output various information including a legal intention search result and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 6, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform steps in a legal intent search method according to various embodiments of the present application as described in the "exemplary methods" section of this specification, supra.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a legal intent search method according to various embodiments of the present application, as described in the "exemplary methods" section above of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (12)

1. A legal intent search method, comprising:
performing knowledge injection and sentence tree conversion on the query request based on a legal knowledge graph through a knowledge layer of a legal intention classifier;
transforming the sentence tree into an embedded representation through an embedding layer of the legal intent classifier;
controlling a degree of visualization of the words in the query request and the words from the legal knowledge graph by a visual layer of the legal intent classifier, and
and controlling a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier to obtain a legal intention search result.
2. The legal intent search method of claim 1, wherein the legal intent classifier is obtained by training, the training process comprising:
performing entity identification and relationship extraction on the obtained linguistic data related to the legal field to generate a legal knowledge map;
pre-training a pre-training language model of the legal intention classifier based on a legal knowledge graph and pre-training corpora; and
and carrying out classification training on the pre-trained language model after pre-training based on training corpora to generate the legal intention classifier.
3. The legal intent search method of claim 2, wherein pre-training the pre-trained language model based on a legal knowledge graph and pre-trained corpora comprises:
performing knowledge injection and sentence tree conversion on sentences in the pre-training corpus on the basis of a legal knowledge graph through a knowledge layer of the pre-training language model;
converting the sentence tree into an embedded representation through an embedding layer of the pre-training language model;
setting a degree of visualization of words in a given sentence and newly added words, including words from a legal knowledge graph, by a visualization layer of the pre-trained language model, an
Setting a self-attention area in a conversion model according to the visualization degree information through a mask converter of the pre-training language model so as to obtain a pre-training legal intention search result.
4. The legal intent search method of claim 3, wherein converting the sentence tree into an embedded representation by an embedding layer of the pre-trained language model comprises:
performing word embedding processing on the sentence tree to generate word embedding representation;
performing a location embedding process on the word-embedded representation to generate a location-embedded representation; and
performing a word segmentation embedding process on the position embedded representation to generate the embedded representation.
5. The legal intent search method of claim 3, wherein controlling the degree of visualization of the words in the query request and the words from the legal knowledge graph by the visual layer of the legal intent classifier comprises:
in response to a newly added word being a predicate, setting the predicate to be invisible; and
the newly added word is set to be visible in response to the newly added word being a word from the legal knowledge graph that represents the entity.
6. The legal intent search method of claim 1, wherein, prior to performing knowledge injection and sentence tree transformation on the query request based on the legal knowledge graph, further comprising:
and performing dictionary matching and/or rule matching on the content in the query request.
7. The legal intent search method of claim 6, wherein dictionary-matching and/or rule-matching the content in the query request comprises:
performing dictionary generation by using the training corpus to obtain a legal dictionary; and
dictionary matching is performed on the content in the query request using the legal dictionary.
8. The legal intent search method of claim 6, wherein dictionary-matching and/or rule-matching the content in the query request comprises:
using the training corpus to extract rules to obtain legal rules; and
and carrying out rule matching on the content in the query request by using the legal rule.
9. The legal intention search method of claim 1, wherein controlling, by a mask converter layer of the legal intention classifier, a self-attentive region in a conversion model according to the visualization degree information to obtain a legal intention search result comprises:
controlling, by a mask translator layer of the legal intention classifier, a self-attentive region in a translation model according to visualization degree information to obtain an initial legal intention result; and
intent classification is performed on the initial legal intent result by Softmax to obtain the legal intent search result.
10. The legal intent search method of claim 9, wherein the intent classification includes multiple layers of classification results, one or more of which are selected or specified by a user.
11. A legal intent search apparatus, comprising:
the sentence tree conversion unit is used for performing knowledge injection and sentence tree conversion on the query request based on the legal knowledge map through the knowledge layer of the legal intention classifier;
an embedding processing unit that converts the sentence tree into an embedded representation through an embedding layer of the legal intention classifier;
a visual control unit for controlling the degree of visual layer of the words in the query request and the words from the legal knowledge base by the visual layer of the legal intent classifier, and
and the intention classification unit is used for controlling a self-attention area in a conversion model according to the visualization degree information through a mask converter layer of the legal intention classifier so as to obtain a legal intention search result.
12. An electronic device, comprising:
a processor; and
a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the legal intent search method of any one of claims 1-10.
CN202010407792.XA 2020-05-14 2020-05-14 Legal intention searching method, legal intention searching device and electronic equipment Active CN111552821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010407792.XA CN111552821B (en) 2020-05-14 2020-05-14 Legal intention searching method, legal intention searching device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010407792.XA CN111552821B (en) 2020-05-14 2020-05-14 Legal intention searching method, legal intention searching device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111552821A true CN111552821A (en) 2020-08-18
CN111552821B CN111552821B (en) 2022-03-01

Family

ID=72002755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010407792.XA Active CN111552821B (en) 2020-05-14 2020-05-14 Legal intention searching method, legal intention searching device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111552821B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015900A (en) * 2020-09-07 2020-12-01 平安科技(深圳)有限公司 Medical attribute knowledge graph construction method, device, equipment and medium
CN112035645A (en) * 2020-09-01 2020-12-04 平安科技(深圳)有限公司 Data query method and system
CN112487202A (en) * 2020-11-27 2021-03-12 厦门理工学院 Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN112487154A (en) * 2020-12-24 2021-03-12 武汉烽火众智数字技术有限责任公司 Intelligent search method based on natural language
CN113190667A (en) * 2021-05-12 2021-07-30 北京律联东方文化传播有限公司 Legal data query method, device, equipment and storage medium
CN113254671A (en) * 2021-06-22 2021-08-13 平安科技(深圳)有限公司 Atlas optimization method, device, equipment and medium based on query analysis
CN113377969A (en) * 2021-08-16 2021-09-10 中航信移动科技有限公司 Intention recognition data processing system
CN113627161A (en) * 2021-08-09 2021-11-09 杭州网易云音乐科技有限公司 Data processing method and device, storage medium and electronic equipment
CN113792540A (en) * 2021-09-18 2021-12-14 平安科技(深圳)有限公司 Intention recognition model updating method and related equipment
EP3964978A1 (en) * 2020-09-02 2022-03-09 Tata Consultancy Services Limited Method and system for retrieval of prior court cases using witness testimonies
CN114330312A (en) * 2021-11-03 2022-04-12 腾讯科技(深圳)有限公司 Title text processing method, apparatus, storage medium, and program

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
CN108073569A (en) * 2017-06-21 2018-05-25 北京华宇元典信息服务有限公司 A kind of law cognitive approach, device and medium based on multi-layer various dimensions semantic understanding
CN109145153A (en) * 2018-07-02 2019-01-04 北京奇艺世纪科技有限公司 It is intended to recognition methods and the device of classification
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN109992671A (en) * 2019-04-10 2019-07-09 出门问问信息科技有限公司 Intension recognizing method, device, equipment and storage medium
CN110059193A (en) * 2019-06-21 2019-07-26 南京擎盾信息科技有限公司 Legal advice system based on law semanteme part and document big data statistical analysis
CN110348024A (en) * 2019-07-23 2019-10-18 天津汇智星源信息技术有限公司 Intelligent identifying system based on legal knowledge map
CN110377715A (en) * 2019-07-23 2019-10-25 天津汇智星源信息技术有限公司 Reasoning type accurate intelligent answering method based on legal knowledge map
US20190341058A1 (en) * 2018-05-06 2019-11-07 Microsoft Technology Licensing, Llc Joint neural network for speaker recognition

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
CN108073569A (en) * 2017-06-21 2018-05-25 北京华宇元典信息服务有限公司 A kind of law cognitive approach, device and medium based on multi-layer various dimensions semantic understanding
US20190341058A1 (en) * 2018-05-06 2019-11-07 Microsoft Technology Licensing, Llc Joint neural network for speaker recognition
CN109145153A (en) * 2018-07-02 2019-01-04 北京奇艺世纪科技有限公司 It is intended to recognition methods and the device of classification
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN109992671A (en) * 2019-04-10 2019-07-09 出门问问信息科技有限公司 Intension recognizing method, device, equipment and storage medium
CN110059193A (en) * 2019-06-21 2019-07-26 南京擎盾信息科技有限公司 Legal advice system based on law semanteme part and document big data statistical analysis
CN110348024A (en) * 2019-07-23 2019-10-18 天津汇智星源信息技术有限公司 Intelligent identifying system based on legal knowledge map
CN110377715A (en) * 2019-07-23 2019-10-25 天津汇智星源信息技术有限公司 Reasoning type accurate intelligent answering method based on legal knowledge map

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIJIE LIU等: "K-BERT: Enabling Language Representation with Knowledge Graph", 《PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035645A (en) * 2020-09-01 2020-12-04 平安科技(深圳)有限公司 Data query method and system
EP3964978A1 (en) * 2020-09-02 2022-03-09 Tata Consultancy Services Limited Method and system for retrieval of prior court cases using witness testimonies
CN112015900B (en) * 2020-09-07 2024-05-03 平安科技(深圳)有限公司 Medical attribute knowledge graph construction method, device, equipment and medium
CN112015900A (en) * 2020-09-07 2020-12-01 平安科技(深圳)有限公司 Medical attribute knowledge graph construction method, device, equipment and medium
CN112487202A (en) * 2020-11-27 2021-03-12 厦门理工学院 Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN112487202B (en) * 2020-11-27 2022-05-06 厦门理工学院 Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN112487154A (en) * 2020-12-24 2021-03-12 武汉烽火众智数字技术有限责任公司 Intelligent search method based on natural language
CN113190667A (en) * 2021-05-12 2021-07-30 北京律联东方文化传播有限公司 Legal data query method, device, equipment and storage medium
CN113254671A (en) * 2021-06-22 2021-08-13 平安科技(深圳)有限公司 Atlas optimization method, device, equipment and medium based on query analysis
CN113254671B (en) * 2021-06-22 2021-09-28 平安科技(深圳)有限公司 Atlas optimization method, device, equipment and medium based on query analysis
CN113627161A (en) * 2021-08-09 2021-11-09 杭州网易云音乐科技有限公司 Data processing method and device, storage medium and electronic equipment
CN113377969B (en) * 2021-08-16 2021-11-09 中航信移动科技有限公司 Intention recognition data processing system
CN113377969A (en) * 2021-08-16 2021-09-10 中航信移动科技有限公司 Intention recognition data processing system
CN113792540A (en) * 2021-09-18 2021-12-14 平安科技(深圳)有限公司 Intention recognition model updating method and related equipment
CN113792540B (en) * 2021-09-18 2024-03-22 平安科技(深圳)有限公司 Method for updating intention recognition model and related equipment
CN114330312A (en) * 2021-11-03 2022-04-12 腾讯科技(深圳)有限公司 Title text processing method, apparatus, storage medium, and program

Also Published As

Publication number Publication date
CN111552821B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN111552821B (en) Legal intention searching method, legal intention searching device and electronic equipment
WO2021082953A1 (en) Machine reading understanding method and apparatus, storage medium, and device
CN108959482B (en) Single-round dialogue data classification method and device based on deep learning and electronic equipment
Zhu et al. Knowledge-based question answering by tree-to-sequence learning
CN111401077B (en) Language model processing method and device and computer equipment
US8452772B1 (en) Methods, systems, and articles of manufacture for addressing popular topics in a socials sphere
Kumar et al. Automating reading comprehension by generating question and answer pairs
CN110390049B (en) Automatic answer generation method for software development questions
CN113032568A (en) Query intention identification method based on bert + bilstm + crf and combined sentence pattern analysis
CN112926345A (en) Multi-feature fusion neural machine translation error detection method based on data enhancement training
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
Jian et al. Lstm-based attentional embedding for English machine translation
CN115630145A (en) Multi-granularity emotion-based conversation recommendation method and system
Sharma et al. Improving visual question answering by combining scene-text information
Wu et al. Connective Prediction for Implicit Discourse Relation Recognition via Knowledge Distillation
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
CN113377844A (en) Dialogue type data fuzzy retrieval method and device facing large relational database
Jia et al. Semantic association enhancement transformer with relative position for image captioning
Yang et al. Research on AI-assisted grading of math questions based on deep learning
Zhou et al. A novel MRC framework for evidence extracts in judgment documents
CN115712713A (en) Text matching method, device and system and storage medium
CN115759254A (en) Question-answering method, system and medium based on knowledge-enhanced generative language model
CN115129859A (en) Intention recognition method, intention recognition device, electronic device and storage medium
CN113569124A (en) Medical title matching method, device, equipment and storage medium
Sha et al. A Prompt-Based Representation Individual Enhancement Method for Chinese Idiom Reading Comprehension

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant