WO2020211393A1

WO2020211393A1 - Written judgment information retrieval method and device, computer apparatus, and storage medium

Info

Publication number: WO2020211393A1
Application number: PCT/CN2019/122888
Authority: WO
Inventors: 杨凤鑫; 徐国强; 邱寒
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-04-16
Filing date: 2019-12-04
Publication date: 2020-10-22
Also published as: CN110134761A

Abstract

A written judgment information retrieval method, comprising: obtaining information to undergo retrieval, and performing semantic-based word segmentation on the information; extracting focus terms from a semantic segmentation result, and extracting factor indexes from the semantic segmentation result so as to obtain factor vectors; inputting the focus terms and the factor vectors as features into a preset semantic hash vector model, reading codes in a coding layer of the preset semantic hash vector model, and compressing the codes into hash values; searching, according to the hash values, a written judgment database for a similar written judgment, and generating a set of target written judgments to be selected; and performing similarity matching with respect to the information and each written judgment in the set, so as to obtain a target written judgment.

Description

Judgment document information retrieval method, device, computer equipment and storage medium

Cross references to related applications

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 16, 2019. The application number is 201910303290X, and the application name is "Judgment Document Information Retrieval Method, Device, Computer Equipment and Storage Medium". The reference is incorporated in this application.

Technical field

This application relates to a method, device, computer equipment and storage medium for searching judgment document information.

Background technique

With the housing exhibition of science and technology, a large amount of data is flooding into people's lives. How to retrieve the required data from the massive data has become a problem.

Take the judgment document as an example. As time goes by, the accumulated judgment document is a massive amount of data. How to retrieve the current required information from this massive amount of data troubles users. Conventional retrieval methods include information index retrieval and semantic information retrieval. Among them, information index retrieval is based on inverted indexing, keyword matching, etc., and the results obtained are inaccurate; while semantic information retrieval is more accurate, but the amount of data processing is more accurate. The retrieval speed is slow.

Summary of the invention

According to various embodiments disclosed in the present application, a method, device, computer equipment, and storage medium for searching judgment document information are provided.

A method for searching judgment document information, including:

Obtain the information to be retrieved and perform semantic-based word splitting on the information to be retrieved;

Extracting the focus words in the semantic splitting result, and extracting the factor index of the semantic splitting result to obtain a factor vector, where the factor index is an index that affects the judgment result in the judgment document;

Input the focus word and the factor vector as features into a preset semantic hash vector model, read the code of the coding layer in the preset semantic hash vector model, and compress the code into a hash value;

According to the hash value, search for similar judgment documents in the judgment document database to generate a set of target judgment documents to be selected, and the judgment document database stores data used to characterize the correspondence between the hash value and the judgment document; and

The information to be retrieved is matched with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document.

A judgment document information retrieval device, including:

The word splitting module is used to obtain the information to be retrieved and perform semantic-based word splitting on the retrieved information;

The factor extraction module is used to extract the focus words in the semantic split result, and perform factor index extraction on the semantic split result to obtain a factor vector. The factor index is an index that affects the judgment result in a judgment document, and the factor index is an influence judgment document The index of the judgment result;

The encoding compression module is used to input the focus word and the factor vector as features into a preset semantic hash vector model, read the encoding of the encoding layer in the preset semantic hash vector model, and compress the encoding into a hash value ；

The search module is configured to search for similar judgment documents in the judgment document database according to the hash value to generate a set of target judgment documents to be selected, and the judgment document database stores the corresponding relationship between the hash value and the judgment document Data; and

The similarity matching module is used to perform similarity matching between the information to be retrieved and the judgment documents in the set of target judgment documents to be selected to obtain the target judgment document.

A computer device, including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtain the information to be retrieved, and perform semantic-based word splitting on the information to be retrieved;

The details of one or more embodiments of the application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

Fig. 1 is a schematic flow chart of a method for searching judgment document information according to one or more embodiments.

Fig. 2 is a schematic flowchart of a method for retrieving judgment document information in another embodiment.

Fig. 3 is a block diagram of a judgment document information retrieval device according to one or more embodiments.

Figure 4 is a block diagram of a computer device according to one or more embodiments.

detailed description

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.

As shown in Figure 1, a method for searching judgment document information includes:

S100: Obtain information to be retrieved, and perform semantic-based word splitting on the information to be retrieved.

Semantic-based word splitting refers to splitting the information to be retrieved into independent words based on the meaning of the words. The information to be retrieved can be a part of a certain judgment document, such as a certain paragraph, a sentence, and the judgment result, etc.; the information to be searched can also be a key part of the judgment document, such as the judgment result of the judgment document and the matters involved in the judgment document Names of both parties, etc. For example, if the information to be retrieved is "the victim was beaten by a tool such as an iron rod, an axe, etc., and the victim was chopped while helping a certain person to resist", the semantic-based word splitting results are: hold, iron rod, axe, Tool, Yumou, execution, beating, victim, Youmou, help, Yumoujia, when resisting, being hacked.

S200: Extract the focus words in the semantic split result, and perform factor index extraction on the semantic split result to obtain a factor vector. The factor index is an index that affects the judgment result in the judgment document.

Focus words are generally key words used to characterize the main content of the entire judgment document. For such words, a focus word set can be constructed based on historical experience data, and the focus words can be obtained by matching the preset focus word set and the word split result, for example, The focus words can be beating, slashing, serious, minor, knife, lethal, etc. Factor indicators are used to influence (determine) the judgment results of the entire judgment document, such as whether it is profitable, infringement, whether intentional injury, etc. The selection of factor indicators can also be obtained based on historical experience data analysis. Generally speaking, due to the format of the judgment document In the same format and description method, the text and conclusion part of the judgment document will be selected as the data analysis, factor indicators will be selected from them, and the factor indicators will be qualitatively judged to obtain the factor vector, such as whether it is profit-yes, whether it is lethal-no, etc. Furthermore, it is possible to analyze historical judgment document samples and construct a factor index system. The factor index system adopts a tree structure, which can be divided into multiple large-type factor indicators, and multiple small-type factor indicators are assigned under each large-type factor indicator.

S300: Input the focus word and factor vector as features into the preset semantic hash vector model, read the code of the coding layer in the preset semantic hash vector model, and compress the code into a hash value.

The preset semantic hash vector model is a pre-built model, which can be obtained by training the semantic hash model based on historical data. Specifically, the preset semantic hash vector model can be obtained by training the deep neural network model based on historical data. In this step S300, a data compression process can be understood. For a large amount of data input into the preset semantic hash vector model, the code in the coding layer is read, and the large amount of input data is compressed into a hash value. For example, suppose that step S200 obtains 10,000 focus words and 50 factor vectors, and 10050 features are input into the preset semantic hash vector model. The hash value can be compressed into 16-dimensional or 32-dimensional data through coding and compression in step S300. The amount of data is extremely reduced, which is conducive to post-processing.

S400: According to the hash value, search for similar judgment documents in the judgment document database to generate a set of target judgment documents to be selected, and the judgment document database stores data used to characterize the correspondence between the hash value and the judgment document.

The judgment document database to be searched is a pre-built database, and a large number of judgment documents are stored in the database, and the hash value corresponding to the judgment document is also stored. Since the hash value is generated according to the input characteristics, and the input characteristics can accurately represent the entire information to be retrieved, a decision document similar to the judgment document to be searched can be found in the judgment document data to be searched based on the hash value. In addition, since the data has been compressed, there are many similar judgment documents that it can find, and there may be many similar judgment documents that can be searched, which can be aggregated into a set of target judgment documents to be selected. According to the information to be retrieved in step S300, similar judgment documents are searched in the massive data of the database to obtain a set of target judgment documents to be selected. Continuing with the above example, in step S300, 10050 vectors are compressed into a 16-dimensional hash value. According to the 16-dimensional hash value, a search in the judgment document database can find 1000 similar judgment documents. What needs to be pointed out The similar judgment document can be a complete judgment document or a part of the judgment document. In an embodiment, the information to be retrieved is "the first-instance civil judgment of Zhong Fengjian, Chen Dexiang, and Zhang Haiyuan motor vehicle traffic accident liability dispute". In step S400, it is divided into 1000 vectors and input into the preset semantic hash vector model to obtain The 32-dimensional hash value is specifically [0 0 0 1 1...0 1 1 1]. According to the 32-dimensional hash value, the set of similar candidate target judgment documents found in the judgment document database to be searched includes: [0 0 0 1 1... 0 1 1] Jiang Xueqin and Taiping Property Insurance Co., Ltd. Yichang Center Branch, Shi Lei Motor Vehicle Traffic Accident Liability Disputes Civil Judgment of the first instance; [0 0 1 1... 0 1 0 1 】Zhang Han and Xiamen Jinyuan Financial Guarantee Co., Ltd. apply for retrial civil ruling on general loan contract disputes; the original data can be greatly compressed according to the hash value. In addition, the relatively similar candidate information can be quickly found in the massive data.

S500: Perform similarity matching between the information to be retrieved and the judgment documents in the set of target judgment documents to be selected to obtain the target judgment document.

Perform similarity matching between the information to be retrieved and each subset in the target judgment set to be selected, and select the text with the highest matching degree or the matching degree greater than a preset threshold as the target judgment document. Since the set of target judgment documents to be selected and the data in the original database have been greatly reduced, while satisfying the accuracy of retrieval, matching the similarity of the information to be retrieved with the set of target judgment documents can greatly reduce the amount of data processing, which is efficient and accurate The target judgment document was retrieved.

The above judgment document information retrieval method obtains the information to be retrieved, performs semantic-based word splitting on the retrieved information, extracts the focus words in the semantic split results, and performs factor index extraction on the semantic split results to obtain the factor vector, and the focus words The sum factor vector is input as a feature to the preset semantic hash vector model, read the code of the coding layer in the preset semantic hash vector model, compress the code into a hash value, and search for similarity in the judgment document database according to the hash value The judgment document generates a set of target judgment documents to be selected, and matches the similarity of the information to be retrieved with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document. In the whole process, the data in the searched information and judgment document database is compressed by the hash value method, the first stage positioning is performed according to the hash value, the set of candidate target judgment documents is found, and the similarity matching method is adopted in the second stage , Find the target judgment document in the target judgment document collection, because the hash value compression method is used to significantly reduce the amount of data processing, and the hash value compression and similarity matching method are used to ensure the efficiency and accuracy of retrieval.

In one of the embodiments, performing factor index extraction on the semantic split result to obtain a factor vector includes: extracting the factor index associated in the semantic split result; according to the semantic split result, qualitatively judge the extracted factor index to obtain the factor vector.

Factor indicators are used to influence the final judgment result, such as whether it constitutes a crime, whether it bears joint liability, whether it is illegal embezzlement, whether it is for profit, etc. The extraction of these indicators can be pre-set based on the analysis of the historical judgment text. Since the judgment document has its fixed format, the judgment result section will state the factual basis of the judgment result. Based on these conventional factual basis, it can be extracted Factor indicators, and then qualitatively judge these factor indicators to determine whether there is a situation corresponding to the factor indicators, and obtain the factor vector. It can be understood that the factor vector includes two parts: factor index and qualitative judgment result. For example, the factor index includes whether it constitutes a crime, whether it is joint and severally liable, whether it is illegal infringement, and whether it is profit-making. These factor indexes are qualitatively judged, and the factor vector is Does not constitute a scope, bears joint liability, illegal appropriation, and profit.

In one of the embodiments, extracting the focus words in the semantic splitting result includes: obtaining a focus word set; and extracting the focus words in the semantic splitting result according to the combination of the focus words.

The focus word set can be constructed in advance. For example, based on historical data analysis, it is known which words belong to the focus word in the judgment document. The focus word is generally a word that appears multiple times in the judgment document and can be determined based on word frequency. Such as beatings, guns, knives, slashes, etc. Further, the focus word set can be generated in the following manner: obtaining a sample of historical judgment documents; randomly selecting a single historical judgment document sample, extracting words with a word frequency greater than a preset word frequency threshold in the selected single historical judgment document sample, and obtaining a set of candidate words; Obtain the word frequency of each word in the candidate word set in other historical judgment document samples and record it as inverse word frequency; calculate the product of each word frequency in the candidate word set and the corresponding inverse word frequency, and select the word whose product is greater than the preset threshold. Generate a set of focused words.

In practical applications, extract high-frequency words from samples of historical judgment documents, obtain the word frequency of high-frequency words in any single judgment document and the inverse word frequency of this word in other judgment documents, and calculate the product of the word frequency * inverse word frequency , Select words whose product is greater than the preset value as a subset of the focus word set. The “other” mentioned above can be all judgment documents except the currently selected judgment document, or it can be a random selection of another judgment document as a statistical sample of the frequency of inverse words. For example, extract the judgment document sample 1 and the judgment document sample 2 from the historical judgment document, count the word frequency of each word in the judgment sample, and obtain the high frequency words A, B, C and calculate the words A, B, C in the judgment document sample 2 The word frequency of is used as the inverse word frequency, the product of the word frequency and the inverse word frequency is calculated, the word with the larger product is selected as the focus word, and the above operation is repeated to finally generate the focus word set. In the above implementation, the focus word set considers word frequency and inverse word frequency. Inverse word frequency considers that some words may have a higher word frequency in a single judgment document, but the word frequency in other judgment documents, such as certain modal particles, exclude the interference of these words , Accurately construct a set of focus words.

As shown in FIG. 2, in one of the embodiments, before step S200, the method further includes:

S120: Perform modal removal and company name cleaning on the semantically separated words.

The company name can be identified by a named entity based on the database. The database stores common company names and grammar-based regular modal particles. When data cleaning is performed, the split words are searched and filtered in the database. When a word can be found in the database, the The word is filtered out. For example, the information to be retrieved is as follows: "The defendant Zhu was in front of Meishang Furniture Factory in UNK Community, Jiangning District, Nanjing. He had a dispute with Yu Jia because of driving problems, and Zhu gathered others to the workshop of Meishang Furniture Factory." Based on the entity recognition, the difference between the words "Shangmei Furniture Factory" is the company name, and the words are filtered. In this embodiment, the separated words are cleaned to reduce unnecessary or worthless words for the next step, which significantly reduces the amount of data processing in the next step and improves the processing efficiency of the entire solution.

As shown in FIG. 2, in one of the embodiments, step S500 includes:

S520: Input the information to be retrieved and the set of target judgment documents to be selected into the preset similarity matching model.

S540: Obtain the similarity between the information to be retrieved and each subset in the set of target judgment documents to be selected.

S560: Select the subset with the highest similarity as the target judgment document.

The similarity matching model is a pre-built model that can accurately identify the similarity between input data. In this embodiment, the similarity matching model method is adopted to quickly and accurately determine the target judgment document, which brings convenience to the user.

It should be understood that, although the various steps in the flowcharts of FIGS. 1-2 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 1-2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The order of execution is not necessarily in sequence, but can be executed alternately or alternately with at least part of other steps or sub-steps or stages of other steps

As shown in Fig. 3, a judgment document information retrieval device, the device includes:

The word splitting module 100 is used to obtain the information to be retrieved, and perform semantic-based word splitting on the information to be retrieved;

The factor extraction module 200 is used to extract the focus words in the semantic splitting result, and extract the factor index of the semantic splitting result to obtain a factor vector, the factor index is an index that affects the judgment result in the judgment document;

The encoding compression module 300 is used to input the focus words and factor vectors as features into the preset semantic hash vector model, read the encoding of the encoding layer in the preset semantic hash vector model, and compress the encoding into a hash value;

The searching module 400 is configured to search for similar judgment documents in the judgment document database according to the hash value to generate a set of target judgment documents to be selected, and the judgment document database stores data used to characterize the correspondence between the hash value and the judgment document;

The similarity matching module 500 is used to perform similarity matching between the information to be retrieved and each judgment document in the set of target judgment documents to be selected to obtain the target judgment document.

In the aforementioned judgment document information retrieval device, the word splitting module 100 obtains the information to be retrieved, performs semantic-based word splitting on the retrieved information, and the factor extraction module 200 extracts the focus words in the semantic split result, and performs factor indexing on the semantic split result Extract to obtain the factor vector, the encoding compression module 300 inputs the focus word and factor vector as features into the preset semantic hash vector model, reads the encoding of the encoding layer in the preset semantic hash vector model, and compresses the encoding into a hash value According to the hash value, the search module 400 searches for similar judgment documents in the judgment document database to generate a set of target judgment documents to be selected. The similarity matching module 500 compares the information to be retrieved with each judgment document in the set of target judgment documents to be selected Perform similarity matching to obtain the target judgment document. In the whole process, the data in the searched information and judgment document database is compressed by the hash value method, the first stage positioning is performed according to the hash value, the set of candidate target judgment documents is found, and the similarity matching method is adopted in the second stage , Find the target judgment document in the target judgment document collection, because the hash value compression method is used to significantly reduce the amount of data processing, and the hash value compression and similarity matching method are used to ensure the efficiency and accuracy of retrieval.

In one of the embodiments, the factor extraction module 200 is also used in the factor index acquisition module for acquiring the factor index associated in the extracted semantic split result; according to the semantic split result, the extracted factor index is qualitatively judged to obtain the factor vector.

In one of the embodiments, the factor extraction module is also used to obtain the focus word set; according to the focus word combination, the focus word in the semantic split result is extracted.

In one of the embodiments, the factor extraction module is also used to obtain a sample of historical judgment documents; a single historical judgment document sample is randomly selected, and words with a word frequency greater than a preset word frequency threshold in the selected single historical judgment document sample are extracted to obtain a set of candidate words ; Get the word frequency of each word in the candidate word set in other historical judgment document samples and record it as inverse word frequency; calculate the product of each word frequency in the candidate word set and the corresponding inverse word frequency, and select the word whose product is greater than the preset threshold , Generate a set of focus words.

In one of the embodiments, the above-mentioned judgment document information retrieval device further includes a cleaning module, which is used to clean the semantically separated words to remove modal particles and enterprise names.

In one of the embodiments, the similarity matching module 500 is further configured to input the information to be retrieved and the set of target judgment documents to be selected into the preset similarity matching model; to obtain each subset of the information to be retrieved and the set of target judgment documents to be selected The degree of similarity; select the subset with the highest degree of similarity as the target judgment document.

For the specific limitation of the judgment document information retrieval device, please refer to the above limitation on the judgment document information retrieval method, which will not be repeated here. Each module in the above judgment document information retrieval device can be implemented in whole or in part by software, hardware, and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 4. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The computer equipment database is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a method for searching judgment document information is realized.

Those skilled in the art can understand that the structure shown in FIG. 4 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors implement the methods provided in any of the embodiments of the present application. The steps of the judgment document information retrieval method.

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement any one of the embodiments of the present application. Provide the steps of the judgment document information retrieval method.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A method for searching judgment document information, including:

Obtain the information to be retrieved and perform semantic-based word splitting on the information to be retrieved;

Extracting the focus words in the semantic splitting result, and extracting the factor index of the semantic splitting result to obtain a factor vector, where the factor index is an index that affects the judgment result in the judgment document;

Input the focus word and the factor vector as features into a preset semantic hash vector model, read the code of the coding layer in the preset semantic hash vector model, and compress the code into a hash value;

According to the hash value, search for similar judgment documents in the judgment document database to generate a set of target judgment documents to be selected, and the judgment document database stores data used to characterize the correspondence between the hash value and the judgment document; and

The information to be retrieved is matched with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document.
The method according to claim 1, wherein said extracting a factor index from a semantic split result to obtain a factor vector comprises:

Extracting related factor indexes in the semantic splitting result; and

According to the semantic splitting result, qualitative judgment is performed on the extracted factor index to obtain a factor vector.
The method according to claim 2, wherein said extracting the associated factor index in the semantic splitting result comprises:

Obtain the data of the main body part and the conclusion part of the historical judgment document;

Select factor indicators based on the main text part data and the conclusion part data to construct a set of factor indicators; and

According to the semantic split result, the related factor index is extracted from the factor index set.
The method according to claim 1, wherein said extracting the focus words in the semantic splitting result comprises:

Get the focus word collection; and

According to the focus word combination, the focus word in the semantic split result is extracted.
The method according to claim 4, wherein said acquiring a focus word set comprises:

Obtain samples of historical judgment documents;

Randomly select a sample of a single historical judgment document, extract the words whose word frequency is greater than the preset word frequency threshold in the selected single historical judgment document sample, and obtain a set of candidate words;

Obtain the word frequency of each word in the candidate word set in other historical judgment document samples, and record it as the inverse word frequency; and

The product of the word frequency of each word in the candidate word set and the corresponding inverse word frequency is calculated respectively, and words corresponding to the product of which the product is greater than a preset threshold are selected to generate a focused word set.
The method according to claim 1, wherein the extracting focus words in the semantic splitting result, and performing factor index extraction on the semantic splitting result, before obtaining the factor vector, further comprises:

Remove the modal particles and clean the company name of the semantically separated words.
The method according to claim 6, wherein the removing modal particles and cleaning the company name on the semantically split words comprises:

Obtain a preset database in which the company name and grammar-based modal particles are stored; and

Search and filter the semantically separated words according to the preset database, and remove modal particles and company names in the semantically separated words.
The method according to claim 1, wherein the matching the similarity of the information to be retrieved with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document comprises:

Input the information to be retrieved and the set of target judgment documents to be selected into the preset similarity matching model;

Acquiring the similarity between the information to be retrieved and each subset in the set of target judgment documents to be selected; and

The subset with the highest similarity is selected as the target judgment document.
The method according to claim 1, wherein the matching the similarity of the information to be retrieved with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document comprises:

Match the similarity of the information to be retrieved with each judgment document in the target judgment set to be selected; and

The judgment document corresponding to the highest matching degree or the matching degree greater than the preset threshold is selected as the target judgment document.
A judgment document information retrieval device, including:

The word splitting module is used to obtain the information to be retrieved and perform semantic-based word splitting on the retrieved information;

The factor extraction module is used to extract the focus words in the semantic split result, and perform factor index extraction on the semantic split result to obtain a factor vector. The factor index is an index that affects the judgment result in a judgment document, and the factor index is an influence judgment document The index of the judgment result;

The encoding compression module is used to input the focus word and the factor vector as features into a preset semantic hash vector model, read the encoding of the encoding layer in the preset semantic hash vector model, and compress the encoding into a hash value ；

The search module is configured to search for similar judgment documents in the judgment document database according to the hash value to generate a set of target judgment documents to be selected, and the judgment document database stores the corresponding relationship between the hash value and the judgment document Data; and

The similarity matching module is used to perform similarity matching between the information to be retrieved and the judgment documents in the set of target judgment documents to be selected to obtain the target judgment document.
10. The device according to claim 10, wherein the factor extraction module is further configured to extract the factor index associated in the semantic splitting result; and according to the semantic splitting result, the extracted factor index Make a qualitative judgment and get the factor vector.
The device according to claim 10, wherein the factor extraction module is further used to obtain samples of historical judgment documents; randomly select a single historical judgment document sample, and extract words whose word frequency is greater than a preset word frequency threshold in the selected single historical judgment document sample , Obtain the candidate word set; obtain the word frequency of each word in the candidate word set in other historical judgment document samples, and record it as inverse word frequency; respectively calculate the product of the word frequency of each word in the candidate word set and the corresponding inverse word frequency , Selecting the words corresponding to the product greater than the preset threshold to generate a focus word set; and extracting the focus words in the semantic split result according to the combination of the focus words.
The device according to claim 10, wherein the device further comprises a cleaning module, which is used to clean the semantically separated words to remove modal particles and company names.
The device according to claim 10, wherein the similarity matching module is further configured to input the information to be retrieved and the target judgment document set into a preset similarity matching model; to obtain the information to be retrieved and the target to be selected The similarity of each subset in the set of judgment documents; and select the subset with the highest similarity as the target judgment document.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

Obtain the information to be retrieved, and perform semantic-based word splitting on the information to be retrieved;

Extracting the focus words in the semantic splitting result, and extracting the factor index of the semantic splitting result to obtain a factor vector, where the factor index is an index that affects the judgment result in the judgment document;

Input the focus word and the factor vector as features into a preset semantic hash vector model, read the code of the coding layer in the preset semantic hash vector model, and compress the code into a hash value;

According to the hash value, search for similar judgment documents in the judgment document database to generate a set of target judgment documents to be selected, and the judgment document database stores data used to characterize the correspondence between the hash value and the judgment document; and

The information to be retrieved is matched with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document.
The computer device according to claim 15, wherein the processor further executes the following steps when executing the computer-readable instruction:

Extracting related factor indexes in the semantic splitting result; and

According to the semantic splitting result, qualitative judgment is performed on the extracted factor index to obtain a factor vector.
The computer device according to claim 15, wherein the processor further executes the following steps when executing the computer-readable instruction:

Obtain samples of historical judgment documents;

Randomly select a sample of a single historical judgment document, extract the words whose word frequency is greater than the preset word frequency threshold in the selected single historical judgment document sample, and obtain a set of candidate words;

Obtain the word frequency of each word in the candidate word set in other historical judgment document samples, and record it as the inverse word frequency;

Calculate the product of each word frequency and the corresponding inverse word frequency in the candidate word set respectively, and select the words corresponding to the product greater than a preset threshold to generate a focused word set; and

According to the focus word combination, the focus word in the semantic split result is extracted.
One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Obtain the information to be retrieved, and perform semantic-based word splitting on the information to be retrieved;

Extracting the focus words in the semantic splitting result, and extracting the factor index of the semantic splitting result to obtain a factor vector, where the factor index is an index that affects the judgment result in the judgment document;

Input the focus word and the factor vector as features into a preset semantic hash vector model, read the code of the coding layer in the preset semantic hash vector model, and compress the code into a hash value;

According to the hash value, search for similar judgment documents in the judgment document database to generate a set of target judgment documents to be selected, and the judgment document database stores data used to characterize the correspondence between the hash value and the judgment document; and

The information to be retrieved is matched with each judgment document in the set of target judgment documents to be selected to obtain the target judgment document.
18. The storage medium of claim 18, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:

Extracting related factor indexes in the semantic splitting result; and

According to the semantic splitting result, qualitative judgment is performed on the extracted factor index to obtain a factor vector.
18. The storage medium of claim 18, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:

Obtain samples of historical judgment documents;

Randomly select a sample of a single historical judgment document, extract the words whose word frequency is greater than the preset word frequency threshold in the selected single historical judgment document sample, and obtain a set of candidate words;

Obtain the word frequency of each word in the candidate word set in other historical judgment document samples, and record it as the inverse word frequency;

Calculate the product of each word frequency and the corresponding inverse word frequency in the candidate word set respectively, and select the words corresponding to the product greater than a preset threshold to generate a focused word set; and

According to the focus word combination, the focus word in the semantic split result is extracted.