CN116303917A - Data retrieval method, device and equipment - Google Patents

Data retrieval method, device and equipment Download PDF

Info

Publication number
CN116303917A
CN116303917A CN202211095738.1A CN202211095738A CN116303917A CN 116303917 A CN116303917 A CN 116303917A CN 202211095738 A CN202211095738 A CN 202211095738A CN 116303917 A CN116303917 A CN 116303917A
Authority
CN
China
Prior art keywords
fact
event
behavior data
data
document information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211095738.1A
Other languages
Chinese (zh)
Inventor
魏扬威
都金涛
祝慧佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202211095738.1A priority Critical patent/CN116303917A/en
Publication of CN116303917A publication Critical patent/CN116303917A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Technology Law (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification discloses a data retrieval method, a device and equipment, wherein the method comprises the following steps: acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event; acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event; acquiring document information of historical events corresponding to the first index information, determining matching features of the document information of each historical event and fact behavior data of a target event, and carrying out pooling processing and/or convolution processing on the matching features to obtain processed data; and carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.

Description

Data retrieval method, device and equipment
Technical Field
The present document relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for retrieving data.
Background
The semantic matching of text information is an application with important practical significance, especially the semantic matching of judicial cases, which can happen every day, massive historical judicial cases are accumulated in a judicial case library, and legal departments often need to acquire historical cases similar to current events in the historical cases to assist the current event trial. On one hand, a set of convenient and effective case-like retrieval system can remarkably improve the case handling efficiency of legal departments and reduce the occurrence of related judicial accidents; on the other hand, if the historical cases cannot be effectively searched, the legal department can only judge by means of the existing experience, so that the event processing difficulty is improved, time and labor are wasted, and the probability of occurrence of judicial accidents is increased. Therefore, it is necessary to provide a technical solution that is more effective in searching historical cases and can make the law department more efficient in case handling.
Disclosure of Invention
The embodiment of the specification aims to provide a technical scheme which is more effective in searching historical cases and can enable the law department to handle cases more efficiently.
In order to achieve the above technical solution, the embodiments of the present specification are implemented as follows:
The embodiment of the specification provides a data retrieval method, which comprises the following steps: and obtaining a query request of the target event, wherein the query request comprises the fact behavior data of the target event. And acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event. And acquiring the document information of the historical event corresponding to the first index information, determining the matching characteristics of the document information of each historical event and the fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing. And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
The embodiment of the specification provides a data retrieval device, which comprises: the query request module acquires a query request of a target event, wherein the query request comprises fact behavior data of the target event. And the index acquisition module is used for acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and the evidence information in a pre-established retrieval database based on the fact behavior data of the target event. The information processing module is used for acquiring the document information of the historical event corresponding to the first index information, determining the matching characteristics of the document information of each historical event and the fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing. And the retrieval output module is used for carrying out fusion processing on the processed data to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event.
The embodiment of the specification provides a retrieval device of data, the retrieval device of data includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: and obtaining a query request of the target event, wherein the query request comprises the fact behavior data of the target event. And acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event. And acquiring the document information of the historical event corresponding to the first index information, determining the matching characteristics of the document information of each historical event and the fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing. And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
The present description also provides a storage medium for storing computer-executable instructions that when executed by a processor implement the following: and obtaining a query request of the target event, wherein the query request comprises the fact behavior data of the target event. And acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event. And acquiring the document information of the historical event corresponding to the first index information, determining the matching characteristics of the document information of each historical event and the fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing. And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram illustrating an embodiment of a method for retrieving data according to the present disclosure;
FIG. 2 is a schematic diagram of a data retrieval process according to the present disclosure;
FIG. 3 is a diagram illustrating another embodiment of a method for retrieving data according to the present disclosure;
FIG. 4 is a schematic diagram of another data retrieval process according to the present disclosure;
FIG. 5 is a diagram of an embodiment of a data retrieval device according to the present disclosure;
fig. 6 is an embodiment of a data retrieval device according to the present description.
Detailed Description
The embodiment of the specification provides a data retrieval method, device and equipment.
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
Example 1
As shown in fig. 1, the embodiment of the present disclosure provides a data retrieval method, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like). The server may be a single server, a server cluster formed by a plurality of servers, or the like, and the server may be a background server such as legal business or the like. In this embodiment, a server is taken as an example for detailed description, and the following related contents may be referred to for the execution process of the terminal device, which is not described herein. The method specifically comprises the following steps:
in step S102, a query request of the target event is acquired, where the query request includes fact behavior data of the target event.
The target event may be any event, for example, an event related to intellectual property and competition disputes, or an event related to contract disputes, etc., and specifically may be set according to actual situations, which is not limited in the embodiment of the present disclosure. The fact action data may be data in which the agent does not have an intention to set up, change or eliminate a legal relationship of civil affairs, but the fact action data is not represented by meaning as an essential element thereof at all, the fact action data directly produces legal effects according to legal regulations, the effect of legal regulations occurs only when the objective action of the agent meets legal constitution requirements, and constitution of the fact action data does not require the agent to have corresponding civil action capabilities, such as: the preemption, processing, dimensionless management, collection of lost objects, finding of buried objects, paying of objects of the right object and the like belong to fact behaviors, and corresponding data are fact behavior data.
In implementation, semantic matching of text information is an application with important practical significance, especially semantic matching of judicial cases, which may happen every day, massive historical judicial cases are accumulated in a judicial case library, and legal departments often need to acquire historical cases similar to current events in the historical cases to assist current event reviews. On one hand, a set of convenient and effective case-like retrieval system can remarkably improve the case handling efficiency of legal departments and reduce the occurrence of related judicial accidents; on the other hand, if the historical cases cannot be effectively searched, the legal department can only judge by means of the existing experience, so that the event processing difficulty is improved, time and labor are wasted, and the probability of occurrence of judicial accidents is increased. Therefore, it is necessary to provide a technical solution that is more effective in searching historical cases and can make the law department more efficient in case handling.
In practical application, the text matching algorithm may include a plurality of text matching modes, wherein the BERT-PLI algorithm based on BERT is one of the main text matching modes, the model structure is as shown in fig. 2, in the first part of processing, a query request of a target event is acquired, the query request includes fact behavior data of the target event, history document data matched with the fact behavior data of the target event is acquired from a pre-established search database through a corresponding algorithm based on the fact behavior data of the target event, and data features corresponding to the acquired history document data can be determined. In the second part of the processing, the model constructed based on the BERT-PLI algorithm can be fine-tuned by using the manual annotation data, but the manual annotation data is very limited, so that the model cannot learn judicial domain knowledge well. In the third part of processing, the model performs a lateral maximum pooling MaxPooling process when processing the interaction map internmap, and then captures the front-to-back logic relationship of the fact behavior data of the target event through the RNN, however, the logic relationship of the fact behavior data of the target event itself is lost due to the pooling process.
The embodiment of the present specification provides a technical solution that can be implemented, and specifically can be seen in the following:
when the legal action department receives a legal action request submitted by a certain requesting party, staff of the legal action department can analyze the legal action request, determine a target event corresponding to the legal action request, and obtain fact behavior data of the target event. The method department can be provided with a database of related documents of legal litigation, a worker can search the database for data of documents similar to a target event, based on the database, the worker can request the database for inquiring related data through a terminal device, specifically, the worker can acquire an inquiry page through the terminal device, the inquiry page can comprise an inquiry result output box of an inquiry information input box, a determination key and the like, the worker can input the fact behavior data of the target event in the inquiry information input box, after the input is completed, the worker can click the determination key in the inquiry page, at the moment, the terminal device can acquire the fact behavior data of the target event input by the worker, generate an inquiry request of the target event through the fact behavior data of the target event, can send the inquiry request of the target event to the server, and the server can acquire the inquiry request of the target event.
In step S104, based on the fact-behaving data of the target event, first index information matching the fact-behaving data of the target event is acquired from index information generated from the fact-behaving data and the evidence information in the search database established in advance.
The evidence information may be information according to the law of litigation, which identifies the facts of the cases (or events), the evidence information is the core of the law of litigation, and in the judging process of any together event (or case), the original purpose of the recovery event needs to be reproduced through the evidence and the evidence chain formed by the evidence, and the evidence information should exist objectively. Materials that can be used to prove the fact of a case are evidence, and may include, for example: physical evidence, bookend, evidence person, statement of victims, criminal suspects, interviewee for description and dialectical analysis, opinion identification, investigation, examination, identification, investigation experiment and other records, audiovisual data, electronic data and the like, and evidence must be verified and verified to be used as the basis of the finalization. The index information may be information for searching an index of the judicial document (i.e., document information of the historical event), and the corresponding judicial document (i.e., document information of the historical event) may be rapidly searched through the index information.
In implementation, in order to quickly search a judicial document (i.e., document information of a historical event), the document information of the historical event may be preprocessed, in addition, considering that if the whole document information of the historical event is indexed, there is a large amount of content irrelevant to the fact behavior data of the target event, the searching efficiency is low, in order to effectively improve the indexing quality and the indexing speed, in order to be able to effectively improve the indexing speed, the fact behavior data and the evidence information of a prosecution of a detector may be extracted from the document information of the historical event, and the index information of the document information of the historical event may be constructed by the fact behavior data and the evidence information in the extracted document information of the historical event, and the index information and the document information of the corresponding historical event may be correspondingly stored in a search database.
After the fact behavior data of the target event is obtained in the above manner, a plurality of different keywords can be divided from the fact behavior data, the keywords can be searched in index information (the fact behavior data and evidence information construction in the document information of the historical event) in a search database, index information matched with the keywords is obtained, and the obtained index information can be used as first index information; alternatively, the fact behavior data of the target event may be searched in index information (fact behavior data and evidence information construction in document information of the history event) in the search database, so as to obtain index information matched with the fact behavior data of the target event, and the obtained index information may be used as the first index information, or may be specifically set according to the actual situation, which is not limited in the embodiment of the present specification.
In step S106, the document information of the history event corresponding to the first index information is obtained, and the matching feature of the document information of each history event and the fact behavior data of the target event is determined, and pooling processing and/or convolution processing are performed on the matching feature, so as to obtain processed data, where the pooling processing includes horizontal pooling processing and vertical pooling processing, or the pooling processing includes vertical pooling processing.
The document information of the historical event can be information of a judicial document, the judicial document can be a special document formed and used by a mechanism such as investigation, inspection, judgment, notarization and the like on each link and step of processing various events, and the special document can comprise documents with legal effectiveness, such as a judgment document, an adjudication document and the like, and also comprises documents which do not directly have legal effectiveness, but have practical guarantee effect on executing laws, such as a judgment document and the like.
In implementation, the document information of the history event corresponding to the first index information may be searched from the corresponding relation stored in the search database, so that the document information of the history event corresponding to the first index information may be obtained.
Considering that the content of the document information of the historical event and the fact action data of the target event may be more, it may be difficult to determine the embedded features corresponding to the information or the data at one time, so that the word processing can be performed on the document information of the historical event and the fact action data of the target event respectively to obtain the information of the word, the word and the like corresponding to the document information of each historical event and the information of the word, the word and the like corresponding to the fact action data of the target event, obtain the embedded features corresponding to the word, the word and the like corresponding to the document information of each historical event and the embedded features corresponding to the information of the word, the word and the like corresponding to the fact action data of the target event, fuse the embedded features corresponding to the word, the word and the like corresponding to the information of each historical event, obtain the embedded features corresponding to the word, the word and the like corresponding to the fact action data of each historical event, and the corresponding feature of the fact action event can be matched with the document information of the target event.
In consideration of the fact behavior data of the target event not only includes the front-back logic relationship thereof, but also includes the logic relationship of the fact behavior data of the target event, so that the processed data can be obtained by carrying out pooling processing and/or convolution processing on the matching features, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing, specifically, the matching features can be subjected to transverse pooling processing, the matching features can be subjected to longitudinal pooling processing, and the matching features can be input into a convolutional neural network model to obtain the processed data obtained by the processing modes.
In step S108, the processed data is subjected to fusion processing, so as to obtain document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
In implementation, the processed data obtained in the processing manner can be subjected to fusion processing, so that the processed data can be understood from different angles, and the acquired document information of the historical events is ordered, so that the document information of the high-correlation historical events is positioned at a front position, and further the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event can be obtained.
The embodiment of the specification provides a data retrieval method, by acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event, then, can acquire first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event, can acquire document information of a history event corresponding to the first index information, and determines matching characteristics of the document information of each history event and the fact behavior data of the target event, carries out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises longitudinal pooling processing and longitudinal pooling processing, or, the pooling processing comprises merging processing on the processed data to obtain document information of a target event matched with the fact behavior data of the target event in the document information of the history event, thus, the pooling processing and/or convolution processing are used, the document information of the history event to be matched with the document information of the fact behavior data of the target event can be logically and the fact behavior data of the target event can be captured from the side of the fact event, the fact behavior data of the target event can be logically and the history event can be captured from the fact information of the target event side of the target event, the fact event can be logically and the fact data can be captured from the fact information of the target event side of the target event, and the fact data can be captured from the fact data of the target event, and the fact information of the target event, the history information is logically, and the history information of the target event data is captured, in addition, the furtherpretrain on the massive document information without the marked historical event can also fully utilize the document information content of the historical event to learn judicial knowledge, and the retrieval efficiency and performance of similar case retrieval scenes can be greatly improved through the processing.
Example two
As shown in fig. 3, the embodiment of the present disclosure provides a data retrieval method, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like). The server may be a single server, a server cluster formed by a plurality of servers, or the like, and the server may be a background server such as legal business or the like. In this embodiment, a server is taken as an example for detailed description, and the following related contents may be referred to for the execution process of the terminal device, which is not described herein. The method specifically comprises the following steps:
in step S302, sample data for training a language model is acquired, the sample data including document information of a target history event therein.
The language model may include various language models, for example, a language model constructed based on a transducer model, or an XLNet language model, or an ERNIE language model, or a BERT model, etc., and may be specifically set according to actual situations, and in this embodiment, the language model may be a BERT model. The document information of the target historical event may be document information of any historical event, specifically may be judicial document information of the historical event, and may be specifically set according to actual situations, which is not limited in the embodiment of the present specification.
In implementation, the sample data may be obtained in various manners, for example, document information of a certain number of historical events may be randomly selected from the search database, or document information of a certain number of historical events may be crawled from the internet through a web crawler, etc., which may be specifically set according to actual situations, and the embodiment of the present specification is not limited to this.
In step S304, model training is performed on the language model based on the sample data, and in the process of performing model training, part of the data in the sample data is randomly removed, prediction is performed on the removed part of the data in the sample data through the language model, and in the process of performing model training, two sentences in the sample data are input into the language model, and whether one sentence is the next sentence of the other sentence is predicted through the language model until the language model converges, so as to obtain the trained language model.
In practice, considering that only a small amount of manually marked sample data is used for fine tuning the language model, so that the language model has limited effect, in order to enable the language model to learn better knowledge of the document information of the historical event, a language model pre-training MLM (Masked Language Model, based on a masked language model) task and an NSP (Next Sentence Prediction ) task can be applied to the document information of massive historical event, that is, the language model uses the masked language model MLM, that is, part of the data in the input sample data is removed (or masked) randomly during training, and then the language model is used for prediction, so that, as the result obtained by encoding the language model can be enabled to simultaneously contain contextual information, the language model can be better trained to obtain a deeper language model, and simultaneously, in the process of training the language model, a next sentence prediction NSP task can be added, that is, namely, two sentences in the sample data are input into the language model, and whether one sentence is the next sentence is predicted by the language model. The Further Pretrain can enable the language model to learn more relevant knowledge in the judicial field and the like, and is more suitable for the fields of judicial documents and the like.
In step S306, document information of a plurality of different history events is acquired.
The document information of the historical event may be information of a judicial document of the historical event.
In step S308, the fact-action data and the evidence information included in the document information of each history event are extracted from the document information of a plurality of different history events, respectively, and index information of the document information of the corresponding history event is generated based on the extracted fact-action data and evidence information.
In step S310, the index information is stored in the search database in association with the document information of the corresponding history event.
Through the processing from step S306 to step S310, the fact behavior data and the evidence information can be extracted from the document information of the historical event, and the index information is generated through the extracted fact behavior data and the evidence information, so that the index information with relatively less content and more effective content is generated for the whole document information of the historical event, and the index quality and the index speed can be effectively improved.
In step S312, a query request of the target event is acquired, and the query request includes fact behavior data of the target event.
In step S314, the similarity between the fact behavior data of the target event and each index information generated from the fact behavior data and the evidence information in the retrieval database is determined based on the BM25 algorithm.
The BM25 (Best Match 25) algorithm is an algorithm proposed based on a probability search model, and the calculation process of the BM25 algorithm may include the following three parts: importance of each word in the fact behavior data of the target event; correlation between each word in the fact behavior data of the target event and the document information of the historical event; and the relevance of each word in the fact behavior data of the target event and the fact behavior data of the target event.
In implementation, the BM25 algorithm may be used to retrieve the fact behavior data of the target event in the retrieval database, calculate the similarity according to the fact behavior data of the target event and the fact behavior data and evidence information in the history event, and recall document information of the history event, specifically, index information corresponding to document information of one history event may be randomly selected from the retrieval database, and then, the BM25 algorithm may be used to calculate the similarity between the fact behavior data of the target event and the index information, where the similarity between the fact behavior data of the target event and each index information in the retrieval database may be calculated.
In step S316, first index information matching the fact behavior data of the target event is acquired from index information generated from the fact behavior data and the evidence information in the search database established in advance based on the determined similarity.
In implementation, the correspondence between the index information and the document information of the historical event in the search database may be specifically shown in table 1 below.
TABLE 1
Index information Paperwork information of historical events
Index 1 Paperwork information A
Index 2 Paperwork information B
Index 3 Paperwork information C
After the similarity between the fact behavior data of the target event and each index information in the search database is calculated in the above manner, the obtained similarity can be compared with a preset similarity threshold value, index information with the similarity larger than the preset similarity threshold value is obtained, and the obtained index information can be used as first index information matched with the fact behavior data of the target event. For example, the similarity between the fact behavior data of the target event and the index 1 is calculated, the similarity between the fact behavior data of the target event and the index 2 is calculated, the similarity between the fact behavior data of the target event and the index 3 is calculated, the above 3 similarities may be compared with a preset similarity threshold, and if the similarity between the fact behavior data of the target event and the index 2 is greater than the preset similarity threshold, and the similarity between the fact behavior data of the target event and the index 3 is greater than the preset similarity threshold, the index 2 and the index 3 may be regarded as first index information matching with the fact behavior data of the target event. By the method, the index information matched with the fact behavior data of the target event can be retrieved from the retrieval database in a coarse granularity mode.
In step S318, the document information of the history event corresponding to the first index information is obtained, and the fact behavior data of the target event and the document information of the history event are respectively subjected to sentence segmentation to obtain a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each history event.
In implementation, after the first index information is obtained in the above manner, the document information of the history event corresponding to the first index information may be obtained based on the correspondence shown in table 1, and based on the example of step S316, the obtained first index information is index 2 and index 3, the document information of the history event corresponding to index 2 may be searched for by table 1 as document information B, and the document information of the history event corresponding to index 3 as document information C.
In order to accurately match the fact behavior data of the target event with the document information of the historical event, so as to obtain the document information of the historical event with better matching degree with the fact behavior data of the target event, processes such as matching feature determination, horizontal pooling processing, longitudinal pooling processing, two-dimensional convolution processing, feature fusion processing and the like can be respectively executed, the fact behavior data of the target event and the fact behavior data and evidence information in the document information of the historical event are output, and the similarity between the document information of the historical event and the fact behavior data of the target event is output. The matching features may be features extracted from the fact behavior data of the target event and the document information of the history event by using the BERT model, and then a horizontal pooling process, a vertical pooling process and a two-dimensional convolution process may be performed, so as to understand the matching features from different angles, and finally, fusion processing is performed on the output features of each angle, and finally, a matching value is output. In this embodiment, the processing procedures of the five parts are taken as an example to describe in detail, specifically, in the document information retrieval scene of the historical event, since the fact action data of the target event and the document information of the historical event are both longer, and meanwhile, the maximum text processing length of the BERT model is limited, so in some cases, the data and the information are difficult to directly input into the BERT model, so that the fact action data of the target event and the document information of the historical event can be split according to sentences, and a first sentence corresponding to the fact action data of the target event and a second sentence corresponding to the document information of each historical event can be obtained respectively.
In step S320, a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each history event are respectively input into a pre-trained language model, so as to obtain a matching feature of the first sentence and the second sentence, and the matching feature of the first sentence and the second sentence is determined as a matching feature of the document information of each history event and the fact behavior data of the target event.
In implementation, every two sentences in a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event can be put into the BERT model, and then the vector corresponding to [ CLS ] is taken to obtain similarity characteristics, namely the matching characteristics of the first sentence and the second sentence, and the matching characteristics of the first sentence and the second sentence can be determined as the matching characteristics of the document information of each historical event and the fact behavior data of the target event. I and j can be used to represent the fact behavior data of the target event and the number of sentences contained in the document information of the history event, respectively, so that the matching feature of i x j x h can be obtained through the above, and can be regarded as an i x j matrix, wherein each element of i and j is a vector with a dimension of h, and the similarity feature of one sentence in the fact behavior data of the corresponding target event and one sentence in the document information of the history event is obtained through a BERT model.
It should be noted that the processing in step S318 and step S320 is only one possible processing manner, and in practical application, a plurality of different processing manners may be further included, and an optional processing manner is provided below, which may specifically include the following: and inputting the fact behavior data of the target event and the document information of the historical events into a pre-trained language model to obtain matching characteristics of the document information of each historical event and the fact behavior data of the target event.
The language model may be a BERT model, where the length of the fact behavior data of the target event and the document information of the history event is smaller than the maximum text processing length of the BERT model, and the language model may be other models than the BERT model, where the length of the fact behavior data of the target event and the document information of the history event is smaller than the maximum text processing length of the language model, and if the language model has no limitation on the maximum text processing length, the matching feature of the document information of each history event and the fact behavior data of the target event may be obtained directly using the language model.
If the language model is constructed through the BERT model, the above-mentioned process of inputting the fact behavior data of the target event and the document information of the history event into the pre-trained language model to obtain the matching feature of the document information of each history event and the fact behavior data of the target event may include: if the lengths of the fact behavior data of the target event and the document information of the historical event are smaller than a preset length threshold, inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model, and obtaining matching features of the document information of each historical event and the fact behavior data of the target event.
The preset length threshold may be a maximum text processing length of the BERT model, specifically, 512, etc., or may be set according to actual situations, which is not limited in the embodiment of the present disclosure.
In step S322, based on the matching feature, pooling is performed in the direction representing the longitudinal direction in the matrix corresponding to the matching feature, and the pooled result is input into the RNN model and the attention mechanism constructed in advance, so as to obtain the processed data.
In implementation, as shown in fig. 4, for the matching feature of i×j×h, pooling may be performed according to the direction of i (i.e. the direction representing the longitudinal direction in the matrix corresponding to the matching feature), so as to obtain a vector of j×h, where the length j is the number of sentences in the fact behavior data of the target event, where each element is the matching feature of the sentence and the document information of the history event, and the dimension is h. The logical relationship before and after the sentence in the fact behavior data of the RNN capturing target event can be connected later, and the j-h vector can be compressed to the h dimension.
In step S324, based on the matching feature, pooling is performed in the matrix corresponding to the matching feature, in which the lateral direction is indicated, and the pooled result is input into the RNN model and the attention mechanism constructed in advance, so as to obtain processed data.
In implementation, as shown in fig. 4, for the matching feature of i×j×h, pooling may be performed in the direction of j (i.e. the direction representing the transverse direction in the matrix corresponding to the matching feature), so as to obtain a vector of i×h, where the length i is the number of sentences in the document information of the historical event, where each element is the matching feature of the sentence and the fact behavior data of the target event, and the dimension is h. The logical relationship before and after the sentence in the document information of the RNN capturing history event can be connected later, and the i.h vector can be compressed to the h dimension.
In step S326, the matching features are input into a convolutional neural network model constructed in advance, local features in the matching features are captured through the convolutional neural network model, the matching features are compressed to data of a preset number of dimensions by network layers, and processed data are obtained.
In implementations, as shown in fig. 4, for matching features of ij h, a convolutional neural network model may be used to capture local features, compressing the matching features to the h dimension network layer by network layer.
In step S328, the processed data obtained by the horizontal pooling process, the processed data obtained by the vertical pooling process, and the processed data obtained by the convolution process are spliced to obtain spliced data.
In implementation, for three h-dimensional vectors obtained by the transverse pooling process, the longitudinal pooling process and the two-dimensional convolution process, the three h-dimensional vectors can be spliced to obtain a 3 h-dimensional vector, wherein the vector comprises: the method comprises the steps of understanding the document information of a historical event from the aspect of the fact behavior data of the target event, understanding the sentence front-rear logic relationship in the fact behavior data of the target event, and understanding the fact behavior data of the target event from the aspect of the document information of the historical event.
In step S330, the spliced data is input into the softmax function for classification, so as to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event.
In implementation, the 3 h-dimensional vector can be used for being connected into the softmax function for classification, and the output numerical value can be used as similarity, so that the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event is obtained.
The embodiment of the specification provides a data retrieval method, by acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event, then, can acquire first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event, can acquire document information of a history event corresponding to the first index information, and determines matching characteristics of the document information of each history event and the fact behavior data of the target event, carries out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises longitudinal pooling processing and longitudinal pooling processing, or, the pooling processing comprises merging processing on the processed data to obtain document information of a target event matched with the fact behavior data of the target event in the document information of the history event, thus, the pooling processing and/or convolution processing are used, the document information of the history event to be matched with the document information of the fact behavior data of the target event can be logically and the fact behavior data of the target event can be captured from the side of the fact event, the fact behavior data of the target event can be logically and the history event can be captured from the fact information of the target event side of the target event, the fact event can be logically and the fact data can be captured from the fact information of the target event side of the target event, and the fact data can be captured from the fact data of the target event, and the fact information of the target event, the history information is logically, and the history information of the target event data is captured, in addition, the furtherpretrain on the massive document information without the marked historical event can also fully utilize the document information content of the historical event to learn judicial knowledge, and the retrieval efficiency and performance of similar case retrieval scenes can be greatly improved through the processing.
Example III
The above method for searching data provided in the embodiments of the present disclosure further provides a device for searching data based on the same concept, as shown in fig. 5.
The data retrieval device comprises: a query request module 501, an index acquisition module 502, an information processing module 503, and a retrieval output module 504, wherein:
a query request module 501, configured to obtain a query request of a target event, where the query request includes fact behavior data of the target event;
the index obtaining module 502 obtains first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
the information processing module 503 obtains the document information of the historical event corresponding to the first index information, determines the matching feature of the document information of each historical event and the fact behavior data of the target event, and performs pooling processing and/or convolution processing on the matching feature to obtain processed data, where the pooling processing includes horizontal pooling processing and vertical pooling processing, or the pooling processing includes vertical pooling processing;
And the retrieval output module 504 performs fusion processing on the processed data to obtain document information of a target historical event matched with the fact behavior data of the target event in the document information of the historical event.
In an embodiment of the present disclosure, the apparatus further includes:
the information acquisition module acquires document information of a plurality of different historical events;
the index generation module is used for respectively extracting fact behavior data and evidence information contained in the document information of each historical event from the document information of the plurality of different historical events and generating index information of the document information of the corresponding historical event based on the extracted fact behavior data and evidence information;
and the storage module is used for storing the index information and the corresponding document information of the historical event into the retrieval database correspondingly.
In the embodiment of the present disclosure, the index obtaining module 502 includes:
a similarity determining unit that determines, based on a BM25 algorithm, a similarity between the fact-behaving data of the target event and each index information generated from the fact-behaving data and the evidence information in the search database;
and an index acquisition unit for acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and the evidence information in a pre-established retrieval database based on the determined similarity.
In this embodiment of the present disclosure, the information processing module 503 inputs the fact behavior data of the target event and the document information of the history event into a pre-trained language model, so as to obtain a matching feature of the document information of each history event and the fact behavior data of the target event.
In this embodiment of the present disclosure, the language model is constructed by using a BERT model, and the information processing module 503 inputs the fact behavior data of the target event and the document information of the history event into a pre-trained language model to obtain matching features of the document information of each history event and the fact behavior data of the target event if the lengths of the fact behavior data of the target event and the document information of the history event are smaller than a preset length threshold.
In the embodiment of the present disclosure, the information processing module 503 includes:
the sentence segmentation unit is used for respectively carrying out sentence segmentation on the fact behavior data of the target event and the document information of the historical event to respectively obtain a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event;
The information processing unit is used for respectively inputting a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event into a pre-trained language model to obtain matching characteristics of the first sentence and the second sentence, and determining the matching characteristics of the first sentence and the second sentence as the matching characteristics of the document information of each historical event and the fact behavior data of the target event.
In an embodiment of the present disclosure, the apparatus further includes:
the sample acquisition module is used for acquiring sample data for training the language model, wherein the sample data comprises document information of a target historical event;
the model training module is used for carrying out model training on the language model based on the sample data, removing part of data in the sample data randomly in the process of carrying out model training, predicting the part of data removed in the sample data through the language model, inputting two sentences in the sample data into the language model in the process of carrying out model training, and predicting whether one sentence is the next sentence of the other sentence through the language model until the language model converges to obtain the trained language model.
In this embodiment of the present disclosure, the pooling processing includes a longitudinal pooling processing, and the information processing module 503 performs pooling on a direction representing a longitudinal direction in a matrix corresponding to the matching feature based on the matching feature, and inputs a pooling result into a pre-constructed RNN model and an attention mechanism, so as to obtain processed data.
In this embodiment of the present disclosure, the information processing module 503 inputs the matching features into a convolutional neural network model that is constructed in advance, captures local features in the matching features through the convolutional neural network model, and compresses the matching features to data of a preset number of dimensions on a network layer-by-network layer basis, so as to obtain processed data.
In the embodiment of the present specification, the processed data is obtained through a horizontal pooling process, a vertical pooling process, a pooling process and a convolution process,
the retrieval output module 504 includes:
a splicing unit for splicing the processed data obtained by the transverse pooling treatment, the processed data obtained by the longitudinal pooling treatment and the processed data obtained by the convolution treatment to obtain spliced data;
and the retrieval output unit inputs the spliced data into a softmax function for classification to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
In the embodiment of the present disclosure, the document information of the historical event is information of a judicial document of the historical event.
The embodiment of the specification provides a data retrieval device, by acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event, then, can acquire first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event, can acquire document information of a history event corresponding to the first index information, and determines matching characteristics of the document information of each history event and the fact behavior data of the target event, carries out pooling processing and/or convolution processing on the matching characteristics to obtain processed data, wherein the pooling processing comprises longitudinal pooling processing and longitudinal pooling processing, or, the pooling processing comprises merging processing on the processed data to obtain document information of a target event matched with the fact behavior data of the target event in the document information of the history event, thus, the pooling processing and/or convolution processing are used, the document information of the history event to be matched with the document information of the fact behavior data of the target event can be logically and the fact behavior data of the target event can be captured from the side of the fact event, the fact behavior data of the target event can be logically and the history event can be captured from the fact information of the target event side of the target event, the fact event can be logically and the history event can be captured from the fact information of the fact data of the target event side of the target event, the fact data is captured from the fact information of the history event side of the history event, and the fact information of the history event is captured, in addition, the furtherpretrain on the massive document information without the marked historical event can also fully utilize the document information content of the historical event to learn judicial knowledge, and the retrieval efficiency and performance of similar case retrieval scenes can be greatly improved through the processing.
Example IV
The above-mentioned data searching device provided for the embodiment of the present disclosure further provides a data searching device based on the same concept, as shown in fig. 6.
The retrieving device of the data may provide a terminal device or a server or the like for the above-described embodiments.
The data retrieval device may be configured or configured differently to produce a larger variance, and may include one or more processors 601 and memory 602, where the memory 602 may store one or more storage applications or data. Wherein the memory 602 may be transient storage or persistent storage. The application program stored in the memory 602 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in a retrieval device for data. Still further, the processor 601 may be arranged to communicate with the memory 602 and execute a series of computer executable instructions in the memory 602 on a data retrieval device. The data retrieval device may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more input/output interfaces 605, and one or more keyboards 606.
In particular, in this embodiment, the data retrieval device includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions in the data retrieval device, and configured to be executed by the one or more processors, the one or more programs including computer executable instructions for:
acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event;
acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
acquiring document information of historical events corresponding to the first index information, determining matching features of the document information of each historical event and fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching features to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing;
And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
In this embodiment of the present specification, further includes:
acquiring document information of a plurality of different historical events;
extracting fact behavior data and evidence information contained in the document information of each historical event from the document information of the plurality of different historical events respectively, and generating index information of the document information of the corresponding historical event based on the extracted fact behavior data and evidence information;
and storing the index information and the corresponding document information of the historical event into the retrieval database correspondingly.
In this embodiment of the present disclosure, the obtaining, based on the fact behavior data of the target event, from index information generated from the fact behavior data and evidence information in a pre-established search database, first index information matching with the fact behavior data of the target event includes:
determining similarity between the fact behavior data of the target event and index information generated by the fact behavior data and evidence information in the search database based on a BM25 algorithm;
And acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and the evidence information in a pre-established retrieval database based on the determined similarity.
In this embodiment of the present disclosure, the determining a matching feature between the document information of each of the historical events and the fact behavior data of the target event includes:
and inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model to obtain the matching characteristics of the document information of each historical event and the fact behavior data of the target event.
In this embodiment of the present disclosure, the language model is constructed by a BERT model, and the inputting the fact behavior data of the target event and the document information of the history event into a pre-trained language model, to obtain matching features of the document information of each history event and the fact behavior data of the target event, includes:
if the lengths of the fact behavior data of the target event and the document information of the historical event are smaller than a preset length threshold, inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model, and obtaining matching features of the document information of each historical event and the fact behavior data of the target event.
In this embodiment of the present disclosure, the determining a matching feature between the document information of each of the historical events and the fact behavior data of the target event includes:
respectively carrying out sentence segmentation on the fact behavior data of the target event and the document information of the historical event to respectively obtain a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event;
respectively inputting a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event into a pre-trained language model to obtain matching features of the first sentence and the second sentence, and determining the matching features of the first sentence and the second sentence as the matching features of the document information of each historical event and the fact behavior data of the target event.
In this embodiment of the present specification, further includes:
acquiring sample data for training the language model, wherein the sample data comprises document information of a target historical event;
model training is carried out on the language model based on the sample data, partial data in the sample data are randomly removed in the process of model training, the removed partial data in the sample data are predicted through the language model, two sentences in the sample data are input into the language model in the process of model training, and whether one sentence is the next sentence of the other sentence is predicted through the language model until the language model converges, so that the trained language model is obtained.
In this embodiment of the present disclosure, the pooling processing includes longitudinal pooling processing, where the pooling processing is performed on the matching feature to obtain processed data, and includes:
based on the matching features, pooling is carried out on the longitudinal direction indicated in the matrix corresponding to the matching features, and the pooling result is input into a pre-constructed RNN model and an attention mechanism to obtain processed data.
In this embodiment of the present disclosure, the convolving the matching feature to obtain processed data includes:
inputting the matching features into a pre-constructed convolutional neural network model, capturing local features in the matching features through the convolutional neural network model, and compressing the matching features to data of a preset number of dimensions by network layers to obtain processed data.
In the embodiment of the present specification, the processed data is obtained through a horizontal pooling process, a vertical pooling process, a pooling process and a convolution process,
the fusing processing is carried out on the processed data to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event, and the method comprises the following steps:
Splicing the processed data obtained through the transverse pooling process, the processed data obtained through the longitudinal pooling process and the processed data obtained through the convolution process to obtain spliced data;
and inputting the spliced data into a softmax function for classification to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event.
In the embodiment of the present disclosure, the document information of the historical event is information of a judicial document of the historical event.
The embodiment of the present specification provides a data retrieval apparatus, by acquiring a query request of a target event, the query request including fact behavior data of the target event, and then, from index information generated from the fact behavior data and evidence information in a pre-established retrieval database, first index information matching with the fact behavior data of the target event may be acquired, document information of a history event corresponding to the first index information may be acquired, and matching features of the document information of each history event and the fact behavior data of the target event may be determined, pooling processing and/or convolution processing may be performed on the matching features to obtain processed data, wherein the pooling processing includes lateral pooling processing and longitudinal pooling processing, or the pooling processing includes longitudinal pooling processing, fusion processing is performed on the processed data, obtaining the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event, thus, using pooling processing and/or convolution processing to match the document information of the historical event to be matched with the fact behavior data of the target event, on one hand, understanding the front-back logic relationship of the document information of the historical event from the fact behavior data side of the target event while capturing the fact behavior data of the target event, on the other hand, understanding the front-back logic relationship of the document information of the target event from the document information side of the historical event while capturing the document information of the historical event, and on the other hand, performing matching of the two from local by convolution processing, thereby being capable of understanding the fact behavior data of the target event and the document information of the historical event in two directions and learning respective language logic relationships, in addition, the furtherpretrain on the massive document information without the marked historical event can also fully utilize the document information content of the historical event to learn judicial knowledge, and the retrieval efficiency and performance of similar case retrieval scenes can be greatly improved through the processing.
Example five
Further, based on the method shown in fig. 1 to fig. 4, one or more embodiments of the present disclosure further provide a storage medium, which is used to store computer executable instruction information, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instruction information stored in the storage medium can implement the following flow when executed by a processor:
acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event;
acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
acquiring document information of historical events corresponding to the first index information, determining matching features of the document information of each historical event and fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching features to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing;
And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
In this embodiment of the present specification, further includes:
acquiring document information of a plurality of different historical events;
extracting fact behavior data and evidence information contained in the document information of each historical event from the document information of the plurality of different historical events respectively, and generating index information of the document information of the corresponding historical event based on the extracted fact behavior data and evidence information;
and storing the index information and the corresponding document information of the historical event into the retrieval database correspondingly.
In this embodiment of the present disclosure, the obtaining, based on the fact behavior data of the target event, from index information generated from the fact behavior data and evidence information in a pre-established search database, first index information matching with the fact behavior data of the target event includes:
determining similarity between the fact behavior data of the target event and index information generated by the fact behavior data and evidence information in the search database based on a BM25 algorithm;
And acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and the evidence information in a pre-established retrieval database based on the determined similarity.
In this embodiment of the present disclosure, the determining a matching feature between the document information of each of the historical events and the fact behavior data of the target event includes:
and inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model to obtain the matching characteristics of the document information of each historical event and the fact behavior data of the target event.
In this embodiment of the present disclosure, the language model is constructed by a BERT model, and the inputting the fact behavior data of the target event and the document information of the history event into a pre-trained language model, to obtain matching features of the document information of each history event and the fact behavior data of the target event, includes:
if the lengths of the fact behavior data of the target event and the document information of the historical event are smaller than a preset length threshold, inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model, and obtaining matching features of the document information of each historical event and the fact behavior data of the target event.
In this embodiment of the present disclosure, the determining a matching feature between the document information of each of the historical events and the fact behavior data of the target event includes:
respectively carrying out sentence segmentation on the fact behavior data of the target event and the document information of the historical event to respectively obtain a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event;
respectively inputting a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event into a pre-trained language model to obtain matching features of the first sentence and the second sentence, and determining the matching features of the first sentence and the second sentence as the matching features of the document information of each historical event and the fact behavior data of the target event.
In this embodiment of the present specification, further includes:
acquiring sample data for training the language model, wherein the sample data comprises document information of a target historical event;
model training is carried out on the language model based on the sample data, partial data in the sample data are randomly removed in the process of model training, the removed partial data in the sample data are predicted through the language model, two sentences in the sample data are input into the language model in the process of model training, and whether one sentence is the next sentence of the other sentence is predicted through the language model until the language model converges, so that the trained language model is obtained.
In this embodiment of the present disclosure, the pooling processing includes longitudinal pooling processing, where the pooling processing is performed on the matching feature to obtain processed data, and includes:
based on the matching features, pooling is carried out on the longitudinal direction indicated in the matrix corresponding to the matching features, and the pooling result is input into a pre-constructed RNN model and an attention mechanism to obtain processed data.
In this embodiment of the present disclosure, the convolving the matching feature to obtain processed data includes:
inputting the matching features into a pre-constructed convolutional neural network model, capturing local features in the matching features through the convolutional neural network model, and compressing the matching features to data of a preset number of dimensions by network layers to obtain processed data.
In the embodiment of the present specification, the processed data is obtained through a horizontal pooling process, a vertical pooling process, a pooling process and a convolution process,
the fusing processing is carried out on the processed data to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event, and the method comprises the following steps:
Splicing the processed data obtained through the transverse pooling process, the processed data obtained through the longitudinal pooling process and the processed data obtained through the convolution process to obtain spliced data;
and inputting the spliced data into a softmax function for classification to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event.
In the embodiment of the present disclosure, the document information of the historical event is information of a judicial document of the historical event.
The embodiment of the specification provides a storage medium, by pooling processing and/or convolution processing, document information of a history event to be matched is matched with fact behavior data of a target event, on one hand, front-back logic relationship of the document information of the history event can be understood from the fact behavior data side of the target event while front-back logic relationship of the fact behavior data of the target event can be captured, on the other hand, front-back logic relationship of the document information of the target event can be understood from the document information side of the history event while front-back logic relationship of the document information of the history event can be captured, and the matching of the two can be performed locally by using convolution processing, so that the fact behavior data of the target event and the document information of the history event can be understood in two directions and respective language logic relationship can be learned.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable fraud case serial-to-parallel device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable fraud case serial-to-parallel device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (14)

1. A method of retrieving data, the method comprising:
acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event;
acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
acquiring document information of historical events corresponding to the first index information, determining matching features of the document information of each historical event and fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching features to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing;
and carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
2. The method of claim 1, the method further comprising:
acquiring document information of a plurality of different historical events;
Extracting fact behavior data and evidence information contained in the document information of each historical event from the document information of the plurality of different historical events respectively, and generating index information of the document information of the corresponding historical event based on the extracted fact behavior data and evidence information;
and storing the index information and the corresponding document information of the historical event into the retrieval database correspondingly.
3. The method according to claim 1, wherein the obtaining, based on the fact behavior data of the target event, first index information matching the fact behavior data of the target event from index information generated from the fact behavior data and evidence information in a pre-established retrieval database includes:
determining similarity between the fact behavior data of the target event and index information generated by the fact behavior data and evidence information in the search database based on a BM25 algorithm;
and acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and the evidence information in a pre-established retrieval database based on the determined similarity.
4. The method of claim 1, the determining matching features of the paperback information for each of the historical events with the fact-behaving data for the target event, comprising:
And inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model to obtain the matching characteristics of the document information of each historical event and the fact behavior data of the target event.
5. The method of claim 4, wherein the language model is constructed by a BERT model, the inputting the fact behavior data of the target event and the document information of the history event into a pre-trained language model, obtaining matching features of the document information of each history event and the fact behavior data of the target event, and the method comprises:
if the lengths of the fact behavior data of the target event and the document information of the historical event are smaller than a preset length threshold, inputting the fact behavior data of the target event and the document information of the historical event into a pre-trained language model, and obtaining matching features of the document information of each historical event and the fact behavior data of the target event.
6. The method of claim 1, the determining matching features of the paperback information for each of the historical events with the fact-behaving data for the target event, comprising:
Respectively carrying out sentence segmentation on the fact behavior data of the target event and the document information of the historical event to respectively obtain a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event;
respectively inputting a first sentence corresponding to the fact behavior data of the target event and a second sentence corresponding to the document information of each historical event into a pre-trained language model to obtain matching features of the first sentence and the second sentence, and determining the matching features of the first sentence and the second sentence as the matching features of the document information of each historical event and the fact behavior data of the target event.
7. The method of claim 1, the method further comprising:
acquiring sample data for training the language model, wherein the sample data comprises document information of a target historical event;
model training is carried out on the language model based on the sample data, partial data in the sample data are randomly removed in the process of model training, the removed partial data in the sample data are predicted through the language model, two sentences in the sample data are input into the language model in the process of model training, and whether one sentence is the next sentence of the other sentence is predicted through the language model until the language model converges, so that the trained language model is obtained.
8. The method of claim 1, the pooling comprising a longitudinal pooling, the pooling the matching features to obtain processed data, comprising:
based on the matching features, pooling is carried out on the longitudinal direction indicated in the matrix corresponding to the matching features, and the pooling result is input into a pre-constructed RNN model and an attention mechanism to obtain processed data.
9. The method of claim 1, the convolving the matching features to obtain processed data, comprising:
inputting the matching features into a pre-constructed convolutional neural network model, capturing local features in the matching features through the convolutional neural network model, and compressing the matching features to data of a preset number of dimensions by network layers to obtain processed data.
10. The method of claim 1, wherein the processed data is obtained by a horizontal pooling process, a vertical pooling process, a pooling process, and a convolution process,
the fusing processing is carried out on the processed data to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event, and the method comprises the following steps:
Splicing the processed data obtained through the transverse pooling process, the processed data obtained through the longitudinal pooling process and the processed data obtained through the convolution process to obtain spliced data;
and inputting the spliced data into a softmax function for classification to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event.
11. The method of any of claims 1-10, the paperback information of the historical event being judicial paperback information of the historical event.
12. A device for retrieving data, the device comprising:
the query request module is used for acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event;
the index acquisition module is used for acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
the information processing module is used for acquiring the document information of the historical event corresponding to the first index information, determining the matching characteristic of the document information of each historical event and the fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching characteristic to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing;
And the retrieval output module is used for carrying out fusion processing on the processed data to obtain the document information of the target historical event matched with the fact behavior data of the target event in the document information of the historical event.
13. A retrieval device of data, the retrieval device of data comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event;
acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
acquiring document information of historical events corresponding to the first index information, determining matching features of the document information of each historical event and fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching features to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing;
And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
14. A storage medium for storing computer executable instructions that when executed by a processor implement the following:
acquiring a query request of a target event, wherein the query request comprises fact behavior data of the target event;
acquiring first index information matched with the fact behavior data of the target event from index information generated by the fact behavior data and evidence information in a pre-established retrieval database based on the fact behavior data of the target event;
acquiring document information of historical events corresponding to the first index information, determining matching features of the document information of each historical event and fact behavior data of the target event, and carrying out pooling processing and/or convolution processing on the matching features to obtain processed data, wherein the pooling processing comprises transverse pooling processing and longitudinal pooling processing, or the pooling processing comprises longitudinal pooling processing;
And carrying out fusion processing on the processed data to obtain the document information of the target historical event, which is matched with the fact behavior data of the target event, in the document information of the historical event.
CN202211095738.1A 2022-09-08 2022-09-08 Data retrieval method, device and equipment Pending CN116303917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211095738.1A CN116303917A (en) 2022-09-08 2022-09-08 Data retrieval method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211095738.1A CN116303917A (en) 2022-09-08 2022-09-08 Data retrieval method, device and equipment

Publications (1)

Publication Number Publication Date
CN116303917A true CN116303917A (en) 2023-06-23

Family

ID=86776735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211095738.1A Pending CN116303917A (en) 2022-09-08 2022-09-08 Data retrieval method, device and equipment

Country Status (1)

Country Link
CN (1) CN116303917A (en)

Similar Documents

Publication Publication Date Title
JP7026092B2 (en) How to determine descriptive information, devices, electronic devices, computer-readable media and computer programs
US20210397980A1 (en) Information recommendation method and apparatus, electronic device, and readable storage medium
US9996588B2 (en) Managing a search
WO2020237856A1 (en) Smart question and answer method and apparatus based on knowledge graph, and computer storage medium
CN110458324B (en) Method and device for calculating risk probability and computer equipment
Wu et al. Extracting topics based on Word2Vec and improved Jaccard similarity coefficient
CN112070550A (en) Keyword determination method, device and equipment based on search platform and storage medium
US20190205470A1 (en) Hypotheses generation using searchable unstructured data corpus
CN110134852B (en) Document duplicate removal method and device and readable medium
CN113569118B (en) Self-media pushing method, device, computer equipment and storage medium
Min et al. Near-duplicate video clip detection using model-free semantic concept detection and adaptive semantic distance measurement
CN111538903B (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
CN117421639A (en) Multi-mode data classification method, terminal equipment and storage medium
Meng et al. Enhancing multimedia semantic concept mining and retrieval by incorporating negative correlations
Polley et al. X-Vision: explainable image retrieval by re-ranking in semantic space
CN116975340A (en) Information retrieval method, apparatus, device, program product, and storage medium
CN116303917A (en) Data retrieval method, device and equipment
CN116975271A (en) Text relevance determining method, device, computer equipment and storage medium
CN114969253A (en) Market subject and policy matching method and device, computing device and medium
CN114547257A (en) Class matching method and device, computer equipment and storage medium
Naik et al. Semantic context driven language descriptions of videos using deep neural network
Mentzingen et al. Automation of legal precedents retrieval: findings from a rapid literature review
Rahman et al. ChartSumm: A large scale benchmark for Chart to Text Summarization
CN117035695B (en) Information early warning method and device, readable storage medium and electronic equipment
Ashqar et al. A Comparative Assessment of Various Embeddings for Keyword Extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination