CN112148702A - File retrieval method and equipment - Google Patents

File retrieval method and equipment Download PDF

Info

Publication number
CN112148702A
CN112148702A CN202011010296.7A CN202011010296A CN112148702A CN 112148702 A CN112148702 A CN 112148702A CN 202011010296 A CN202011010296 A CN 202011010296A CN 112148702 A CN112148702 A CN 112148702A
Authority
CN
China
Prior art keywords
retrieval
legal
text
knowledge
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011010296.7A
Other languages
Chinese (zh)
Other versions
CN112148702B (en
Inventor
朱弘煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Zhitong Consulting Co Ltd Shanghai Branch
Original Assignee
Ping An Zhitong Consulting Co Ltd Shanghai Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Zhitong Consulting Co Ltd Shanghai Branch filed Critical Ping An Zhitong Consulting Co Ltd Shanghai Branch
Priority to CN202011010296.7A priority Critical patent/CN112148702B/en
Publication of CN112148702A publication Critical patent/CN112148702A/en
Application granted granted Critical
Publication of CN112148702B publication Critical patent/CN112148702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Technology Law (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is suitable for the technical field of artificial intelligence, and provides a method and equipment for file retrieval, which comprise the following steps: receiving a retrieval request; the retrieval request comprises a target text and a retrieval type; generating a text vector about the target text based on a preset legal knowledge graph; selecting a retrieval model associated with the retrieval type, and generating a retrieval language segment associated with the retrieval request based on the retrieval model and the text vector; and selecting the target legal document matched with the retrieval language segment from a document database to generate a retrieval result. According to the method and the device, semantic analysis is performed on the target text through the legal knowledge map, the corresponding text vector is extracted, the user can describe the problem to be searched through the natural language without thinking about the corresponding keyword, and therefore the description difficulty of the retrieval problem can be reduced.

Description

File retrieval method and equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a file retrieval method and device.
Background
With the popularization of legal knowledge, the contact chance between the masses and legal cases is more and more, and a user can select a specific case to look up due to reasons such as work needs or personal interests. However, since the legal cases are numerous, if the user needs to manually screen the cases, the time required for the user to select the cases is greatly increased, and the difficulty in case selection is increased. Therefore, how to provide an efficient legal case retrieval means becomes a problem which needs to be solved urgently at present.
The existing retrieval technology of legal cases mainly adopts keyword-based searching to judge whether keywords input by a user exist in a text, but because legal knowledge of the user is limited, the keywords required to be retrieved cannot be accurately expressed, so that the retrieval difficulty is increased through keyword searching, different keywords are mutually independent in the retrieval process, a large number of legal documents with low association degree with retrieval requests often appear in retrieval results, and the searching efficiency is reduced.
Disclosure of Invention
In view of this, embodiments of the present application provide a method and an apparatus for file retrieval, so as to solve the problems that in the existing file retrieval technology, searching based on keywords is mainly adopted, thereby increasing the retrieval difficulty and reducing the search efficiency.
A first aspect of an embodiment of the present application provides a method for file retrieval, including:
receiving a retrieval request; the retrieval request comprises a target text and a retrieval type;
generating a text vector about the target text based on a preset legal knowledge graph;
selecting a retrieval model associated with the retrieval type, and generating a retrieval language segment associated with the retrieval request based on the retrieval model and the text vector;
and selecting the target legal document matched with the retrieval language segment from a document database to generate a retrieval result.
A second aspect of an embodiment of the present application provides an apparatus for retrieving a file, including:
a retrieval request receiving unit for receiving a retrieval request; the retrieval request comprises a target text and a retrieval type;
the text vector generating unit is used for generating a text vector related to the target text based on a preset legal knowledge map;
a retrieval language segment generating unit, configured to select a retrieval model associated with the retrieval type, and generate a retrieval language segment associated with the retrieval request based on the retrieval model and the text vector;
and the retrieval result output unit is used for selecting the target legal document matched with the retrieval language segment from the document database and generating a retrieval result.
A third aspect of embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the first aspect.
The method and the device for retrieving the file have the following advantages that:
after receiving a retrieval request initiated by a user, the embodiment of the application can lead a target text contained in the retrieval request into a pre-established legal knowledge graph to obtain a text vector associated with the target text, determine an associated retrieval model based on a retrieval type, lead the text vector into the retrieval model to generate a corresponding retrieval word segment, determine a target legal document corresponding to the retrieval request through the retrieval word segment, generate a retrieval result, and achieve the purpose of accurate retrieval of the document. Compared with the existing file retrieval technology, the method and the device have the advantages that semantic analysis is carried out on the target text through the legal knowledge map, the corresponding text vector is extracted, a user can describe the problem to be searched through a natural language without thinking the corresponding key word, and therefore the description difficulty of the retrieval problem can be reduced; on the other hand, the corresponding retrieval model is configured according to different retrieval types, so that the retrieval language segments are more accurate, the number of files with low relevancy is greatly reduced, the retrieval efficiency is improved, and the purpose of accurate retrieval is realized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of a method for retrieving documents according to a first embodiment of the present application;
fig. 2 is a flowchart illustrating an implementation details of a method S102 for retrieving a file according to a second embodiment of the present application;
FIG. 3 is a flowchart illustrating a specific implementation of a method for retrieving documents according to a third embodiment of the present application;
FIG. 4 is a schematic illustration of a legal knowledge graph provided by an embodiment of the present application;
fig. 5 is a flowchart illustrating an implementation details of a method S302 for retrieving a file according to a fourth embodiment of the present application;
fig. 6 is a flowchart illustrating an implementation details of a method S103 for retrieving a file according to a fifth embodiment of the present application;
fig. 7 is a flowchart illustrating an implementation details of a method S103 for retrieving a file according to a sixth embodiment of the present application;
fig. 8 is a flowchart illustrating an implementation details of a method S104 for retrieving a file according to a seventh embodiment of the present application;
FIG. 9 is a block diagram of a document retrieval apparatus according to an embodiment of the present application;
fig. 10 is a schematic diagram of a terminal device according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
After receiving a retrieval request initiated by a user, the embodiment of the application can lead a target text contained in the retrieval request into a pre-established legal knowledge map to obtain a text vector associated with the target text, determine an associated retrieval model based on a retrieval type, lead the text vector into the retrieval model to generate a corresponding retrieval word segment, determine a target legal document corresponding to the retrieval request through the retrieval word segment to generate a retrieval result, thereby achieving the purpose of accurate retrieval of the document, solving the problem that the existing retrieval technology of legal cases mainly adopts a keyword-based search to judge whether keywords input by the user exist in the text, but since the legal knowledge of the user is limited, the keywords required to be retrieved can not be accurately expressed, the retrieval difficulty is often increased through keyword search, and different keywords are mutually independent in the retrieval process, a large number of legal documents with low association degree with the retrieval request often appear in the retrieval result, thereby reducing the searching efficiency.
In the embodiment of the present application, the main execution body of the flow is a terminal device. The terminal devices include but are not limited to: servers, computers, smart phones, tablets, and the like, capable of performing the task of document retrieval. Fig. 1 shows a flowchart of an implementation of a method for retrieving a file according to a first embodiment of the present application, which is detailed as follows:
in S101, a retrieval request is received; the search request includes a target text and a search type.
In this embodiment, the terminal device may receive a retrieval request initiated by a user. When a user needs to search legal documents, a search request can be generated through a local user terminal and sent to terminal equipment, and the terminal equipment corresponds to the search request. In a possible implementation manner, the terminal device is specifically a file database server, the file database server stores a plurality of legal files, and the database server can generate a retrieval result for the legal file associated with the retrieval request in the file database according to the retrieval request initiated by the user, and feed the retrieval result back to the user terminal so as to respond to the query of the user and the retrieval request. In this case, the user terminal may be installed with a client program corresponding to the database server, the client program may generate a search page on the user terminal, the user may input information related to the search request in the search page, and generate the search request by clicking a control on the search page, such as a key for "initiate search", "start search", and the like, and the client program may transmit the generated search request to the file database server to activate a search process.
In a possible implementation manner, each user terminal initiating the search request to the terminal device and the terminal device providing the search service may form a blockchain system, that is, each user terminal and terminal device are both used as a blockchain node of the blockchain system. The search request can be stored in a blockchain node, and the generated search request record is stored by adopting a blockchain network, so that the record information is not easy to be tampered. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. Each user terminal can inquire the retrieval requests initiated by other user terminals through the block chain system, so that the task playback and the task multiplexing of the retrieval requests are realized, and the repeated initiation of the retrieval requests is avoided.
In this embodiment, the search request includes a target text related to the legal document to be searched, and a search type corresponding to the current search operation. The target text may include search keywords corresponding to the search request, such as at least one independent keyword related to legal knowledge, such as "criminal law", "folk law", "marital law", and the like; the target text can also be information such as legal knowledge, legal problems or legal paragraphs described by natural language, for example, "30 thousand mutual funds before marriage of a couple buy a set of property as first payment, and then the couple undertakes the cost required for monthly supply if there are legal terms related to the division of the house attribution". Unlike the conventional search means, the target text is not limited to the search keyword, but may be a sentence or paragraph constructed based on a natural language or a legal document to be searched, thereby increasing the degree of freedom of the search operation. The retrieval type is used for limiting the retrieval rule of the target legal document in the retrieval operation.
In one possible implementation, the search types include, but are not limited to: similarity search, correlation search and question and answer search. The similarity retrieval specifically comprises the steps of selecting legal documents similar to the information content of the target text from a document database as target legal documents, namely legal documents with similar or identical content to the target text; the related retrieval specifically comprises the steps that legal documents related to legal content of a target text are selected from a document database to serve as target legal documents, and due to one case, a large number of legal documents such as complaints, legal evidence, judgment books and the like exist in the process from opening a case to closing a case, even a plurality of litigation processes exist, for example, the case exists, and a plurality of different legal documents exist in different litigation processes, so that the related retrieval can be used for retrieving the legal documents which are related to the target text to serve as the target legal documents; the question-answer retrieval specifically comprises the steps of selecting a legal document corresponding to a legal question corresponding to a target text from a document database as a target legal document, wherein the target text is specifically a legal question, the terminal device can carry out semantic analysis on the legal question, determine legal items related to the legal question and corresponding legal answers, and extract the target legal document from the document database based on the legal items and the legal answers.
In a possible implementation manner, if the retrieval request does not include the request type, S101 may further include: the terminal equipment can carry out semantic analysis on the target text in the retrieval request uploaded by the user, extract the knowledge tags of the target text and determine the request type according to the incidence relation among the knowledge tags. Specifically, the extraction of the knowledge tags may be based on coarse-grained extraction, for example, the extraction may be performed based on the granularity of text paragraphs, and the knowledge tags corresponding to different text paragraphs are determined. For example, if the association relationship between the paragraphs in the target text is a question-answer relationship, the corresponding request type may be an "answer search" type; if the knowledge labels corresponding to the paragraphs are the same or similar, it can be identified that the paragraphs in the search content belong to the same text, and the request type is identified as a "text recommendation type".
In S102, a text vector regarding the target text is generated based on a preset legal knowledge base.
In this embodiment, the terminal device may have a legal knowledge base prestored therein, the legal knowledge base may be downloaded through a cloud server, and the legal knowledge base downloaded by the cloud server may be generated and obtained based on a plurality of standard legal texts, for example, legal entities included in the legal texts are identified according to standard legal texts such as criminal law, civil law and constitution, and an association relationship between different legal entities is established based on the common occurrence times and occurrence positions between the legal entities, so as to construct and obtain the legal knowledge base. In a possible implementation manner, the legal knowledge graph can be constructed according to all existing legal documents in a document database, similarly, the terminal device can identify legal entities included in the existing legal documents, and establish an association relationship between different legal entities based on the common occurrence times and the occurrence positions of the legal entities, so as to construct and obtain the legal knowledge graph.
In this embodiment, the legal knowledge graph pre-stored in the terminal device includes a plurality of knowledge nodes, and each knowledge node may correspond to one legal entity. For example, the legal entities may be "intellectual property," "trademark," and "litigator," and there is a corresponding relationship between different legal entities, for example, "intellectual property" includes "trademark," that is, the former includes the latter and belongs to the inclusion relationship. The terminal equipment can create corresponding knowledge nodes for different legal entities, and generate a legal knowledge graph according to the incidence relation among the different knowledge nodes.
In this embodiment, a plurality of legal documents may be stored in the storage module of the terminal device, and the legal documents may include standard legal texts, such as documents for defining legal terms, such as criminal law, civil law, constitution, and the like; the legal document may also contain all intermediate text generated by the respective user when handling the legal case, as well as decision results on the legal case, etc., such as prosecution documents, evidence of answers, decision books, etc. The terminal equipment can download the legal documents from the internet, or receive uploading of each user, configure corresponding case identification for each historical case, and store the case identification in a local storage module or a cloud server. In a possible implementation manner, in order to improve the storage efficiency of a historical case, before storing legal documents, the terminal device may perform a duplication checking operation on all legal documents, calculate the duplication rate between each legal document, identify two legal documents as the same case if the duplication rate between any two legal documents is greater than a preset duplication threshold, and merge a plurality of legal documents whose duplication rates are greater than the preset duplication threshold, so that the data duplication rate in the storage device can be reduced, and the storage efficiency of the database is improved.
In this embodiment, the terminal device may perform semantic analysis on the target text, determine whether the target text includes legal entities included in the legal knowledge base, and generate the text vector according to information such as the occurrence positions and the occurrence times of the legal entities.
In S103, a retrieval model associated with the retrieval type is selected, and a retrieval corpus associated with the retrieval request is generated based on the retrieval model and the text vector.
In the present embodiment, there is also a difference in the corresponding retrieval rule when the retrieval operation is performed according to the different retrieval types, that is, the corresponding relationship between the target text and the target legal document to be retrieved is different. Based on this, the terminal device needs to determine a search model corresponding to the search type according to the search type. That is, the text vector is specifically used for determining the semantics of the target text, and in the process of retrieving the target legal document from the document database, the corresponding retrieval language segment needs to be determined according to the semantics of the target text, so that the text vector needs to be imported into the retrieval model related to the retrieval type, and the associated retrieval language segment is output.
In a possible implementation manner, the terminal device may store each search model library and configure an associated search type for each search model. The terminal device can extract the corresponding retrieval model from the retrieval model library according to the retrieval type of the retrieval request. The retrieval model can be downloaded from a cloud server, and in this case, the cloud server can update each retrieval model according to a preset period and send the updated retrieval model to each terminal device.
In this embodiment, the search term segment includes search keywords and an association relationship between the search keywords. For example, if the search term segment includes a plurality of search keywords, the search term segment defines the sum or non-equal association relationship between the search keywords, so as to generate the corresponding search term segment. For example, the search corpus is embodied as { criminal law and economic fraud or economic crime) and penalty }, so that the legal document containing the search keyword or the document tag containing the search keyword is selected from the document database as the target legal document.
In S104, a target legal document matching the search term is selected from the document database, and a search result is generated.
In this embodiment, the terminal device may match the file tags and the file contents of the legal files in the file database according to the search keywords included in the search corpus, select the target legal file matched with the search request from all the legal files based on the matching result, and generate the search result based on the file identifier and the storage index of each target legal file, so that the user can download the corresponding legal file according to the search result. Of course, the search result may include a document summary about each target legal document, so as to facilitate the user to know the specific content of the target legal document. The file identifiers include but are not limited to: file names, file numbers, and the like are symbols for uniquely identifying the respective files.
In a possible implementation manner, if the search keyword included in the search corpus includes a plurality of search keywords, the terminal device may determine, according to the number of the search keywords included in the legal documents in the document database, a display order of each legal document in the search result, and the larger the number of the matched search keywords is, the earlier the display order is. If the number of the retrieval keywords contained in the plurality of history cases is the same, the display order of each target legal document in the document retrieval result can be determined according to the occurrence number of each retrieval keyword in the target legal document, wherein the display order is advanced for the history cases with more occurrence numbers.
In one possible implementation manner, after S104, the method may further include: and acquiring an operation record fed back by the user based on the retrieval result, wherein the operation record comprises corresponding selection operation of the user from all target legal documents provided by the retrieval result, or the retrieval operation of readjusting the target text to initiate a new retrieval request. And determining the retrieval accuracy of the retrieval result based on the operation records of the user, and adjusting the retrieval model based on the retrieval accuracy, the operation records and the text vector so that the output retrieval language fragment better meets the requirement of the retrieval request. Specifically, if the operation record includes a target legal document selected by the user from the search result, the operation record may be used as a training output according to a document tag corresponding to the target legal document, and the text vector corresponding to the search request may be used as a training input to train the search model corresponding to the search request.
As can be seen from the above, in the file retrieval method provided in the embodiment of the present application, after receiving a retrieval request initiated by a user, a target text included in the retrieval request may be imported into a pre-established legal knowledge graph to obtain a text vector associated with the target text, and an associated retrieval model is determined based on a retrieval type, the text vector is imported into the retrieval model to generate a corresponding retrieval segment, and the target legal file corresponding to the retrieval request is determined by the retrieval segment to generate a retrieval result, thereby achieving the purpose of accurate retrieval of a file. Compared with the existing file retrieval technology, the method and the device have the advantages that semantic analysis is carried out on the target text through the legal knowledge map, the corresponding text vector is extracted, a user can describe the problem to be searched through a natural language without thinking the corresponding key word, and therefore the description difficulty of the retrieval problem can be reduced; on the other hand, the corresponding retrieval model is configured according to different retrieval types, so that the retrieval language segments are more accurate, the number of files with low relevancy is greatly reduced, the retrieval efficiency is improved, and the purpose of accurate retrieval is realized.
Fig. 2 shows a flowchart of a specific implementation of the method S102 for file retrieval according to the second embodiment of the present application. Referring to fig. 2, with respect to the embodiment described in fig. 1, in the method for retrieving a file provided by this embodiment, S102 includes: s1021 to S1024 are described in detail as follows:
further, the generating a text vector about the target text based on a preset legal knowledge base comprises:
in S1021, obtaining preset partition granularity information; the partition granularity information comprises N partition levels; and N is a positive integer not less than 1.
In this embodiment, the terminal device may perform semantic analysis based on a plurality of different granularities when performing semantic analysis on the target text, so that the terminal device is preconfigured with a plurality of granularities for dividing the information, for example, the granularities may be divided into a plurality of different granularities such as a chapter, a section, a paragraph, a sentence, a paragraph, and the like. The terminal device may encapsulate the granularity level to be divided to generate the above-mentioned division granularity information, where the division granularity information may be limited to at least one division level, and each division level corresponds to one division granularity.
Alternatively, the division levels may be continuous division levels, in which case the granular division information may be configured with an initial division level and a number of levels, and since the division levels are continuous based on the initial division level and the number of levels, a plurality of division levels continuous with the initial division level may be determined based on the division level table, and other division levels may be determined. For example, table 1 shows a hierarchical table provided in an embodiment of the present application, which is described with reference to table 1, and the hierarchical table includes five hierarchical levels, namely, chapters, sections, paragraphs, sentences, and paragraphs. If the initial division level in the division granularity information is a paragraph and the division level number is 3, it can be determined that the target text is divided by three levels, namely, a paragraph, a sentence, and a paragraph, when the division level information specifically limits the subsequent generation of the text vector.
Figure BDA0002697348720000101
TABLE 1
Alternatively, the above-described division levels may be discontinuous division levels. In this case, the above-mentioned division granularity information may limit the number of the included division levels, and may also configure a corresponding division granularity for each division level, and the terminal device may divide the target text according to the division granularity in each division granularity information.
In a possible implementation manner, the terminal device may perform preliminary analysis on the target text, determine a text type of the target text, and determine the corresponding division granularity information based on the text type. For example, if the target text is a legal article, and the corresponding text type is an article type, partition granularity information of 3 layers of partition levels can be configured; if the target text is a legal periodical which contains a plurality of legal articles, and the corresponding text type is a book type, the division granularity information containing 5 layers of division levels can be configured.
In S1022, based on the nth division level, dividing the target text into n types of information segments, and based on the legal knowledge graph, determining text labels corresponding to the n types of information segments respectively; the initial value of n is 1.
In this embodiment, after determining the information of the division granularity, the terminal device may determine the number of division levels required for the target text, and divide the target text from coarse granularity to fine granularity based on the number of division levels. Wherein, the smaller the grade number of the division level is, the coarser the corresponding granularity is; conversely, the larger the hierarchical technique, the finer the corresponding granularity. For example, if the granularity division information includes three division levels, i.e., a paragraph, a sentence, and a paragraph, the "paragraph" is the first division level, and the corresponding granularity is the coarsest; the sentence is the second division level with the corresponding granularity; and finally, the language segment is a third division level, and the corresponding granularity is the finest. Based on the method, the terminal equipment sequentially carries out cyclic division operation on the target text according to the sequence of the division levels, and extracts the text labels corresponding to the information segments obtained by division under different granularities.
In this embodiment, the terminal device first divides the target text based on the lowest division level to obtain a plurality of information segments of one type corresponding to the first division level. The paragraph attributes of each information segment are matched with the granularity of the division level to which the information segment belongs. For example, if the first division level is a paragraph, the corresponding information paragraphs of one category are a plurality of information paragraphs divided based on the paragraph; if the first division level is a chapter, the corresponding information segment is a plurality of information chapters divided based on the chapter.
In this embodiment, after dividing the target text into a plurality of information segments, the terminal device may extract legal knowledge tags corresponding to the information segments through the legal knowledge graph, and associate the legal knowledge tags with the n types of information segments. The number of the extracted legal knowledge tags may be 1 or more. The number of the legal knowledge tags related between the information terminals may be the same or different, and is not limited herein.
In S1023, if N is smaller than N, identifying the N types of information segments as target texts, increasing the value of N, returning to execute the N-th division level, dividing the target texts into a plurality of N types of information segments, and determining text labels corresponding to the N types of information segments based on the legal knowledge map.
In this embodiment, after the terminal device completes the division of the information segment at the lowest level, the terminal device may further divide each information segment based on the next division level to extract the legal knowledge tags corresponding to the information segment at the next granularity, so as to implement the operation of extracting the legal knowledge tags from top to bottom and from coarse to fine, so as to improve the accuracy of the text vector and facilitate the terminal device to determine the semantics of the target text. Based on this, the terminal device may determine whether the current dividing number N reaches the total dividing number N, and if the current dividing number N is less than the total dividing number N, perform the operation of S1023; if the current dividing number N is greater than or equal to the total dividing number N, the operation of S1024 is performed.
In this embodiment, when the terminal device determines that the current division number does not reach the total division number, it may adjust to re-identify the N types of information segments obtained by the current division as the target text, so as to iterate to the previous step to divide the information segments, thereby obtaining N +1 types of information segments with finer granularity, identify the legal knowledge tags corresponding to the N +1 types of information segments, and perform the above operations in a loop until the legal knowledge tags of the N types of information segments corresponding to the target text are finally determined.
For example, if the operation S1023 is triggered, the n is 1, that is, the 1-type information segment is obtained by division, and the legal knowledge tag corresponding to each 1-type information segment is determined, at this time, the 1-type information segment is re-identified as the target text, the operation S1022 is returned to be executed, if the 1-type information segment is divided based on the granularity of the paragraph, each paragraph may be further divided, so that each paragraph is divided based on the sentence granularity, a plurality of 2-type information segments corresponding to each paragraph, that is, a plurality of sentences belonging to the paragraph are obtained, and the corresponding legal knowledge tag is extracted for each sentence.
In S1024, if N is greater than or equal to N, the text vector is generated based on all text labels.
In this embodiment, when detecting that the current dividing number N is equal to the total dividing number N, the terminal device indicates that the information segment with the finest granularity has been divided, and at this time, a text vector related to the target text may be generated according to text labels corresponding to all the identified information segments.
In the embodiment of the application, the label extraction is carried out on the target text based on different partition granularities, so that the text labels corresponding to different partition levels can be obtained, the text vector corresponding to the target text is formed, and the accuracy of the text vector is improved.
Fig. 3 is a flowchart illustrating a specific implementation of a method for retrieving a file according to a third embodiment of the present application. Referring to fig. 3, with respect to the embodiment described in fig. 1, before generating a text vector regarding the target text based on a preset legal knowledge base in the method for document retrieval according to this embodiment, the method further includes: s301 to S305 are specifically described as follows:
further, before the generating a text vector about the target text based on the preset legal knowledge graph, the method further includes:
in S301, legal knowledge tags used to construct the legal knowledge graph are obtained.
In this embodiment, the legal knowledge tag may be fixed in a manner of being labeled by a legal expert, or may be fixed as a knowledge tag by performing semantic analysis on an existing standard legal text, for example, by using characters in key areas such as a title and a minor part name of the standard text, and determining a legal meaning corresponding to the knowledge tag based on a text corresponding to the minor part or the title.
For example, the terminal device decomposes and combs related department laws according to a law theory system, summarizes and summarizes core knowledge labels of law provisions, and refines the meaning, characteristics, law applicable situations and other elements of the knowledge labels. And arranging the contents of the French bans, the typical cases, the mainstream academic viewpoints and the like according to the knowledge labels.
In S302, based on all existing legal documents in the document database, determining an association relationship between the legal knowledge tags and an association type corresponding to the association relationship; the association type is used for representing the applicable scene of the association relation.
In this embodiment, the terminal device may mark each legal knowledge tag in each existing legal document, and obtain a speech segment in which a plurality of legal knowledge tags exist, that is, a co-occurrence speech segment. Determining connecting words between two or more commonly occurring legal knowledge tags according to the co-occurrence word segments, and determining the association relationship between the two legal knowledge tags. In a possible implementation manner, the terminal device may count the occurrence times of the association relationship in all existing legal documents, and determine the confidence level of the association relationship based on the occurrence times. The above-mentioned existing texts include but are not limited to: the text of the law, the guiding case, the academic viewpoint, etc.
For example, in the process of combing the general rule of the national people's republic of China, starting from the node of the civil legal relationship, summarizing according to the coverage range of the knowledge tags according to the civil subject-subject qualification-civil right ability-natural human civil right ability-death from birth to death, forming upper and lower triple relationships, and attaching 13, 14 and 15 pieces of the general rule of the national people's republic of China to form a complete knowledge chain.
In this embodiment, different association relations may correspond to one association type, and the association type may be determined based on a text type of an existing legal text corresponding to the extracted association relation, for example, if it is determined that the legal text of the association relation is a law, the association type may be a legal knowledge type; if the existing legal text of the association is determined to be litigation-like, the association type may be a trial knowledge type.
In S303, a knowledge sub-graph of the association type is constructed based on the association relationship between all the legal knowledge tags of the same association type.
In this embodiment, after determining the association relations corresponding to all legal knowledge tags and the association types corresponding to the association relations, the terminal device may configure corresponding knowledge sub-maps for different association types, and the configuration method may specifically be: and selecting an incidence relation matched with the incidence type from all incidence relations, and constructing a corresponding knowledge sub-map based on legal knowledge tags contained in the selected incidence relation. For example, the association types may include a legal knowledge type and a trial knowledge type, the terminal device may select all association relationships corresponding to the legal knowledge type from all association relationships, and construct a knowledge sub-map of the legal knowledge type based on the association relationships of the legal knowledge type and contained legal knowledge tags; correspondingly, the knowledge sub-map based on the judging knowledge type can be constructed, so that the knowledge sub-maps of different association types can be obtained, the uniformity of legal knowledge labels in the knowledge map is improved, the semantic analysis capability is improved, and the accuracy of subsequent file retrieval based on the legal knowledge map is improved.
In S304, extracting core legal tags of each standard legal text, and marking associated knowledge tags matched with the core legal tags in the knowledge sub-graph spectrum corresponding to each associated type; and legal entities corresponding to the associated knowledge labels belonging to the same core legal label are the same.
In this embodiment, each standard legal text may be associated with at least one core legal tag, where the core legal tag is specifically used to define the legal content specifically and mainly described in the standard legal text, for example, the standard legal text "criminal law" includes a plurality of legal entities, but the legal knowledge described in the core legal text is specifically the legal concept "criminal law", so that the legal knowledge tag "criminal law" may be identified as the core legal tag corresponding to the standard legal text. And for the legal texts partially describing the real cases, corresponding core legal labels can be configured for each standard legal text of the type through the legal items related to the real cases. Namely, the core legal label is specifically to extract legal concepts and/or legal rules so as to determine legal knowledge involved in each standard legal text.
In this embodiment, the terminal device may mark an associated knowledge tag matching the core legal tag in each knowledge sub-graph spectrum, where the associated knowledge tag and the core legal tag may be tags corresponding to the same legal knowledge and having the same name or an alias relationship.
In S305, according to each of the associated intellectual labels belonging to the same core legal label, an association relationship between the knowledge sub-maps of a plurality of the association types is established, and the legal knowledge map is generated.
In this embodiment, the terminal device may establish an association relationship between different knowledge sub-maps based on the associated knowledge tags of the core legal tags on the knowledge sub-maps of each associated type, thereby implementing the fusion of a plurality of knowledge sub-maps. Because the associated knowledge tags existing on different knowledge sub-maps belong to the same core legal tag, namely the corresponding legal entities are the same, the relationship between the different knowledge sub-maps can be established based on the core legal tag, and the legal knowledge map is generated.
Illustratively, fig. 4 shows a schematic diagram of a legal knowledge graph provided by an embodiment of the present application. Referring to fig. 4, the legal knowledge domain includes two knowledge sub-domains, namely a knowledge sub-domain of legal knowledge type and a knowledge sub-domain of trial knowledge type. The two knowledge sub-maps both contain a core legal knowledge label of criminal law, so that the association relation between the two knowledge sub-maps can be established based on the corresponding association knowledge labels of the legal entity of criminal law on the two knowledge sub-maps, and the integration of the legal knowledge maps of different association types is realized.
In the embodiment of the application, different knowledge sub-maps are constructed based on different association types, and fusion between the different knowledge sub-maps is realized based on the core legal label, so that the knowledge coverage range and the knowledge combing capability of the legal knowledge map are improved, and the semantic analysis capability is improved.
Fig. 5 shows a flowchart of a specific implementation of a method S302 for file retrieval according to a fourth embodiment of the present application. Referring to fig. 5, with respect to the embodiment shown in fig. 3, a method S302 for retrieving a file provided by this embodiment includes: s3021 to S3022 are specifically described as follows:
further, the determining an association relationship between the legal knowledge tags and an association type corresponding to the association relationship based on all existing legal documents in the document database includes:
in S3021, semantic analysis is performed on the existing legal document, and a text type corresponding to the existing legal document is determined.
In this embodiment, the terminal device may perform semantic analysis on an existing legal document in the document database, determine a text tag corresponding to the existing legal document, where the text tag is specifically used to define a text content profile of the existing legal document, and determine a text type corresponding to the existing legal document based on all the configured text tags. The text types include, but are not limited to: a french text type, a regulation text type, a judgment text type, an evidence material text type, and the like.
In S3022, if any phrase segment of the existing legal text contains a plurality of legal knowledge tags, determining an association relationship between the plurality of legal knowledge tags based on other characters in the phrase segment of the existing legal text; the other characters are characters except the plurality of legal knowledge tags in the corpus of the existing legal text.
In S3023, the association type corresponding to the association relationship is determined according to the text type.
In this embodiment, the terminal device may determine, in addition to the text type corresponding to the existing legal text, whether the existing legal text includes the plurality of legal knowledge tags determined, if so, extract the co-occurrence phrase including the plurality of legal knowledge tags from the existing legal text, determine the association relationship between the plurality of legal knowledge tags based on the connecting words between different legal knowledge tags in the co-occurrence phrase and the specific semantics of the phrase, and configure and identify the association type corresponding to the obtained association relationship according to the text type.
Different text types may determine the applicable scenarios of the association. If the text type is legal, the association relationship is used for explaining the relationship defined by laws between different legal knowledge labels, belongs to a legal knowledge level, is applied to a legal knowledge scene, and the corresponding association type can be a legal knowledge type; if the text type is case evidence, the association relationship is used for solving the evidence relationship of a case, belongs to the judging knowledge level, is applied to judging scenes, and the corresponding association type can be a judging knowledge type.
For example, the association relationship includes, but is not limited to: 1. litigation request-support relationship-request right basis; 2. identifying facts, supporting relationships, resisting to the foundation of identification; 3. complaint facts-confrontational relationships-discriminant facts; 4. claiming an event-proof relationship-evidence; 5. dialectical facts-proof relationships-evidence; 6. event-proof relationship-evidence 7. evidence-orientation relationship-evidence review rule; 8. resolution-producing relationship-focus of dispute; 9. dispute focus-generating relationships-referee reasons; 10. judgment reason-support relationship-judgment basis (law).
In the embodiment of the application, the association type of the association relation is determined by determining the file type of the existing legal file used for extracting the association relation, so that the purpose of automatically configuring the association type of the association relation is realized, the automation degree of construction of the legal knowledge graph is realized, and the construction efficiency is improved.
Fig. 6 shows a flowchart of a specific implementation of the method S103 for file retrieval according to a fifth embodiment of the present application. Referring to fig. 6, with respect to the embodiments described in fig. 1 to 5, a method S103 for retrieving a file provided by this embodiment includes: s601 to S604 are specifically detailed as follows:
further, the selecting a retrieval model associated with the retrieval type, and generating a retrieval corpus associated with the retrieval request based on the retrieval model and the text vector includes:
in S601, if the search type is an association search, the text vector is imported into a preset keyword search model to obtain a search keyword corresponding to the target text.
In this embodiment, when the retrieval type of the retrieval request is specifically associated retrieval, the terminal device may select a keyword retrieval model corresponding to the associated retrieval, import the text vector into the keyword retrieval model, and determine a retrieval keyword corresponding to the target text, so as to select an existing legal document similar to or related to the content of the target text from the document database as the target legal document. The text vector can be used for representing semantic content of the target text, and the text vector is led into the keyword retrieval model, so that retrieval keywords related to the target text semantics can be generated, and the accuracy of the retrieval keywords is improved.
In S602, a fuzzy keyword corresponding to each search keyword is generated according to a preset fuzzy search algorithm.
In this embodiment, the terminal device may further be configured with a fuzzy search algorithm, and each search keyword is introduced into the fuzzy search algorithm, so that a fuzzy keyword having an association relationship with the search keyword may be output. The search keywords and the fuzzy keywords can be different keywords corresponding to the same or similar legal entities, and also can be keywords corresponding to other knowledge labels on the legal knowledge map, wherein the legal knowledge labels corresponding to the search keywords have strong association.
In S603, a target search range of the target text is determined according to the search keyword and the fuzzy keyword.
In this embodiment, the terminal device may determine a reference retrieval range corresponding to the target text according to the association relationship between the retrieval keywords, and add an extended retrieval range corresponding to each fuzzy keyword in the reference retrieval range to obtain a target retrieval range corresponding to the target text, thereby extending the retrieval range and improving the accuracy of the retrieval operation.
In S604, the search corpus corresponding to the target search range is generated.
In this embodiment, after determining a target retrieval range corresponding to a target text, the terminal device may describe the target retrieval range through a corresponding retrieval language, and generate a corresponding retrieval language segment.
In the embodiment of the application, when the associated search is carried out, the search keywords can be determined according to the text vectors, and the fuzzy keywords are determined based on the search keywords, so that the search range can be improved, and the accuracy of the search operation is improved.
Fig. 7 shows a flowchart of a specific implementation of the method S103 for retrieving a file according to a sixth embodiment of the present application. Referring to fig. 7, with respect to any one of the embodiments in fig. 1 to 5, in the method for retrieving a file provided in this embodiment, S103 includes: S701-S703 are detailed as follows:
further, the selecting a retrieval model associated with the retrieval type, and generating a retrieval corpus associated with the retrieval request based on the retrieval model and the text vector includes:
in S701, if the retrieval type is question-answer retrieval, importing the text vector into a preset question retrieval model to obtain a question list associated with the target text; at least one legal issue is included within the questioning list.
In this embodiment, when the search type of the search request is specifically question-answer search, the terminal device may select a question search model corresponding to the question-answer search, introduce the text vector into the question search model, and determine a question list corresponding to the target text, that is, determine a legal question corresponding to the target text. The search process is specifically to determine the answer related to the legal question, and therefore, the text vector needs to be converted into a corresponding question list. If the text vector corresponds to a plurality of legal questions, the question retrieval model can output the legal questions, and the question confidence degrees corresponding to the legal questions are sequentially added into the question list, wherein the higher the question confidence degree is, the higher the correlation between the text vector and the legal questions is.
In S702, answer sentences and legal fields corresponding to the respective legal questions are acquired.
In this embodiment, the terminal device may pre-establish an answer speech segment corresponding to each legal question, that is, an explanation sentence for answering the legal question, and configure a corresponding legal field for each legal question, where the legal field may define a corresponding law. The terminal device can query the pre-established corresponding relation to obtain an answer speech segment corresponding to the legal question and the legal field.
In S703, the search corpus is generated based on the answer corpus and the legal domain.
In this embodiment, the terminal device may configure a corresponding search keyword based on the answer corpus and the legal field, and generate a search corpus corresponding to the target text based on the search keyword.
In the embodiment of the application, when the question and answer retrieval is carried out, the question list can be determined according to the text vector, and the answer language segments and the legal fields corresponding to all legal questions in the question list are obtained, so that the answer to the legal questions is realized, the files related to the answer are searched from the legal files to serve as the target legal files, the retrieval accuracy is improved, and different retrieval requirements are met.
Illustratively, table 2 shows a comparative schematic table of different search types provided by an embodiment of the present application. Referring to table 2, the terminal device may determine different retrieval types according to different input target texts, and respond to corresponding retrieval requests by using different retrieval models, so as to achieve the purpose of accurate retrieval and meet different retrieval requirements.
Figure BDA0002697348720000191
TABLE 2
Fig. 8 shows a flowchart of a specific implementation S104 of a method for file retrieval according to a seventh embodiment of the present application. Referring to fig. 8, with respect to any one of the embodiments in fig. 1 to fig. 5, in the method for retrieving a file provided in this embodiment, S104 includes: S1041-S1042, detailed as follows:
further, the selecting a target legal document matched with the search corpus from a document database and generating a search result includes:
in S1041, a file tag and a file type of each existing legal file in the text database are obtained.
In this embodiment, the file database may configure a corresponding file tag and set an associated file type for each existing legal file. The terminal equipment can acquire the information related to each existing legal document, so that the whole existing legal document does not need to be searched in full text, and the retrieval efficiency is improved.
In S1042, if the file tag and the file type are matched with the search corpus, the existing legal document is identified as a target legal document.
In this embodiment, the terminal device may determine whether the search corpus includes the file tags and the file types in the existing legal files, and if so, determine whether the existing legal file is the target legal file based on the matching result of the number of the included file tags and the file types, so as to extract the target legal file from the file database.
In the embodiment of the application, the target legal document is identified by extracting the document tag and the document type of the existing legal document and matching the document tag and the document type with the retrieval language segment, so that the retrieval efficiency is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 9 shows a block diagram of a file retrieval apparatus according to an embodiment of the present application, where the file retrieval apparatus includes units for executing steps in the corresponding embodiment of fig. 1. Please refer to fig. 9 and fig. 1 for a related description of the embodiment. For convenience of explanation, only the portions related to the present embodiment are shown.
Referring to fig. 9, the apparatus for file retrieval includes:
a retrieval request receiving unit 91 for receiving a retrieval request; the retrieval request comprises a target text and a retrieval type;
a text vector generating unit 92, configured to generate a text vector for the target text based on a preset legal knowledge base;
a retrieval language segment generating unit 93, configured to select a retrieval model associated with the retrieval type, and generate a retrieval language segment associated with the retrieval request based on the retrieval model and the text vector;
and the retrieval result output unit 94 is used for selecting the target legal document matched with the retrieval language segment from the document database and generating a retrieval result.
Optionally, the text vector generating unit 92 includes:
the device comprises a granularity information acquisition unit, a granularity information acquisition unit and a granularity information acquisition unit, wherein the granularity information acquisition unit is used for acquiring preset granularity information; the partition granularity information comprises N partition levels; n is a positive integer not less than 1;
the text label acquisition unit is used for dividing the target text into a plurality of n types of information sections based on the nth division level and determining text labels corresponding to the n types of information sections based on the legal knowledge map; the initial value of n is 1;
a cycle triggering unit, configured to, if N is smaller than N, identify the N types of information segments as a target text, increase the value of N, return to execute the nth-based division hierarchy, divide the target text into a plurality of N types of information segments, and determine text labels corresponding to the N types of information segments based on the legal knowledge graph;
and the text label packaging unit is used for generating the text vector based on all the text labels if the N is greater than or equal to the N.
Optionally, the apparatus for file retrieval further includes:
the legal knowledge tag acquisition unit is used for acquiring a legal knowledge tag used for constructing the legal knowledge map;
the association relation acquisition unit is used for determining the association relation between the legal knowledge tags and the association type corresponding to the association relation based on all existing legal documents in the document database;
the knowledge sub-graph establishing unit is used for establishing a knowledge sub-graph of the association type based on the association relation among all the legal knowledge tags of the same association type;
the associated knowledge tag identification unit is used for extracting core legal tags of each standard legal text and marking associated knowledge tags matched with the core legal tags in the knowledge sub-graph spectrum corresponding to each associated type;
and the knowledge sub-map fusion unit is used for establishing the association relationship among the knowledge sub-maps of a plurality of association types according to the associated knowledge tags belonging to the same core legal tag, and generating the legal knowledge map.
Optionally, the association relationship obtaining unit includes:
the text type identification unit is used for performing semantic analysis on the existing legal document and determining the text type corresponding to the existing legal document;
and the association type determining unit is used for configuring the association relationship among the legal knowledge tags and the association type corresponding to the association relationship based on the text type of the existing legal text if the existing legal text contains a plurality of legal knowledge tags.
Optionally, the search corpus generating unit 93 includes:
the retrieval keyword extraction unit is used for importing the text vector into a preset keyword retrieval model if the retrieval type is associated retrieval to obtain a retrieval keyword corresponding to the target text;
the fuzzy keyword determining unit is used for generating fuzzy keywords corresponding to the retrieval keywords according to a preset fuzzy search algorithm;
the target retrieval range determining unit is used for determining a target retrieval range of the target text according to the retrieval key words and the fuzzy key words;
and the first retrieval language section configuration unit is used for generating the retrieval language section corresponding to the target retrieval range.
Optionally, the search corpus generating unit 93 includes:
the question list determining unit is used for importing the text vector into a preset question retrieval model to obtain a question list associated with the target text if the retrieval type is question-answer retrieval; at least one legal question is contained in the question list;
the legal question answering unit is used for acquiring answer language segments and legal fields corresponding to the legal questions;
and the second retrieval language fragment configuration unit is used for generating the retrieval language fragment based on the answer language fragment and the legal field.
Optionally, the search result output unit 94 includes:
the existing legal document information acquisition unit is used for acquiring the document label and the document type of each existing legal document in the text database;
and the target legal document selecting unit is used for identifying the existing legal document as the target legal document if the document tag and the document type are matched with the retrieval language segment.
Therefore, the file retrieval equipment provided by the embodiment of the application can perform semantic analysis on the target text through the legal knowledge map and extract the corresponding text vector, so that a user can describe the problem to be searched through a natural language without thinking about the corresponding keyword, and the description difficulty of the retrieval problem can be reduced; on the other hand, the corresponding retrieval model is configured according to different retrieval types, so that the retrieval language segments are more accurate, the number of files with low relevancy is greatly reduced, the retrieval efficiency is improved, and the purpose of accurate retrieval is realized.
Fig. 10 is a schematic diagram of a terminal device according to another embodiment of the present application. As shown in fig. 10, the terminal device 10 of this embodiment includes: a processor 100, a memory 101 and a computer program 102, such as a file retrieval program, stored in said memory 101 and executable on said processor 100. The processor 100 executes the computer program 102 to implement the steps in the above-mentioned embodiments of the method for retrieving files, such as S101 to S105 shown in fig. 1. Alternatively, the processor 100, when executing the computer program 102, implements the functions of the units in the above device embodiments, such as the functions of the modules 91 to 94 shown in fig. 9.
Illustratively, the computer program 102 may be divided into one or more units, which are stored in the memory 101 and executed by the processor 100 to accomplish the present application. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 102 in the terminal device 10. For example, the computer program 102 may be divided into a packet dividing unit, a case index table creating unit, a packet storing unit, a search key receiving unit, and a file retrieval result outputting unit, each of which functions as described above.
The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The terminal device may include, but is not limited to, a processor 100, a memory 101. Those skilled in the art will appreciate that fig. 10 is merely an example of a terminal device 10 and does not constitute a limitation of terminal device 10 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit and an external storage device of the terminal device 10. The memory 101 is used for storing the computer program and other programs and data required by the terminal device. The memory 101 may also be used to temporarily store data that has been output or is to be output.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of document retrieval, comprising:
receiving a retrieval request; the retrieval request comprises a target text and a retrieval type;
generating a text vector about the target text based on a preset legal knowledge graph;
selecting a retrieval model associated with the retrieval type, and generating a retrieval language segment associated with the retrieval request based on the retrieval model and the text vector;
and selecting the target legal document matched with the retrieval language segment from a document database to generate a retrieval result.
2. The method of claim 1, wherein generating a text vector for the target text based on a preset legal knowledge graph comprises:
acquiring preset granularity dividing information; the partition granularity information comprises N partition levels; n is a positive integer not less than 1;
dividing the target text into a plurality of n types of information sections based on the nth division level, and determining text labels corresponding to the n types of information sections based on the legal knowledge graph; the initial value of n is 1;
if the N is smaller than the N, identifying the N types of information segments as target texts, increasing the value of the N, returning to execute the nth-based division level, dividing the target texts into a plurality of N types of information segments, and determining text labels corresponding to the N types of information segments respectively based on the legal knowledge graph;
and if the N is larger than or equal to the N, generating the text vector based on all text labels.
3. The method of claim 1, further comprising, before generating the text vector for the target text based on the preset legal knowledge graph, the steps of:
acquiring legal knowledge tags for constructing the legal knowledge graph;
determining association relations among the legal knowledge tags and association types corresponding to the association relations based on all existing legal documents in the document database; the association type is used for representing an applicable scene of the association relation;
constructing a knowledge sub-graph of the association type based on the association relationship among all the legal knowledge tags of the same association type;
extracting core legal tags of each standard legal text, and marking associated knowledge tags matched with the core legal tags in the knowledge sub-graph spectrum corresponding to each associated type;
establishing association relations among the knowledge sub-maps of a plurality of association types according to the associated knowledge tags belonging to the same core legal tag, and generating the legal knowledge map; and legal entities corresponding to the associated knowledge labels belonging to the same core legal label are the same.
4. The method of claim 3, wherein determining the association between legal knowledge tags and the association type corresponding to the association based on all existing legal documents in the document database comprises:
performing semantic analysis on an existing legal document, and determining a text type corresponding to the existing legal document;
if any language segment of the existing legal text contains a plurality of legal knowledge tags, determining the association relationship among the legal knowledge tags based on other characters in the language segment of the existing legal text; the other characters are characters except the plurality of legal knowledge tags in the word segment of the existing legal text;
and determining the association type corresponding to the association relation according to the text type.
5. The method according to any one of claims 1 to 4, wherein the selecting a search model associated with the search type and generating a search corpus associated with the search request based on the search model and the text vector comprises:
if the retrieval type is associated retrieval, importing the text vector into a preset keyword retrieval model to obtain a retrieval keyword corresponding to the target text;
generating fuzzy keywords corresponding to the retrieval keywords according to a preset fuzzy search algorithm;
determining a target retrieval range of the target text according to the retrieval keywords and the fuzzy keywords;
and generating the retrieval language segment corresponding to the target retrieval range.
6. The method according to any one of claims 1 to 4, wherein the selecting a search model associated with the search type and generating a search corpus associated with the search request based on the search model and the text vector comprises:
if the retrieval type is question-answer retrieval, importing the text vector into a preset question retrieval model to obtain a question list associated with the target text; at least one legal question is contained in the question list;
acquiring answer language segments and legal fields corresponding to the legal questions;
and generating the retrieval language segment based on the answer language segment and the legal field.
7. The method according to any one of claims 1 to 4, wherein the selecting the target legal document matching the search corpus from the document database and generating the search result comprises:
acquiring file tags and file types of all existing legal files in the text database;
and if the file tag and the file type are matched with the retrieval language segment, identifying the existing legal file as a target legal file.
8. An apparatus for document retrieval, comprising:
a retrieval request receiving unit for receiving a retrieval request; the retrieval request comprises a target text and a retrieval type;
the text vector generating unit is used for generating a text vector related to the target text based on a preset legal knowledge map;
a retrieval language segment generating unit, configured to select a retrieval model associated with the retrieval type, and generate a retrieval language segment associated with the retrieval request based on the retrieval model and the text vector;
and the retrieval result output unit is used for selecting the target legal document matched with the retrieval language segment from the document database and generating a retrieval result.
9. A terminal device, characterized in that the terminal device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program with the steps of the method according to any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202011010296.7A 2020-09-23 2020-09-23 File retrieval method and device Active CN112148702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011010296.7A CN112148702B (en) 2020-09-23 2020-09-23 File retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011010296.7A CN112148702B (en) 2020-09-23 2020-09-23 File retrieval method and device

Publications (2)

Publication Number Publication Date
CN112148702A true CN112148702A (en) 2020-12-29
CN112148702B CN112148702B (en) 2024-06-21

Family

ID=73896256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011010296.7A Active CN112148702B (en) 2020-09-23 2020-09-23 File retrieval method and device

Country Status (1)

Country Link
CN (1) CN112148702B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190692A (en) * 2021-05-28 2021-07-30 山东顺势教育科技有限公司 Self-adaptive retrieval method, system and device for knowledge graph
CN113590845A (en) * 2021-08-09 2021-11-02 平安国际智慧城市科技股份有限公司 Knowledge graph-based document retrieval method and device, electronic equipment and medium
CN113779230A (en) * 2021-09-15 2021-12-10 广州网律互联网科技有限公司 Law recommendation method, system and equipment based on law understanding
CN117725235A (en) * 2023-12-25 2024-03-19 武汉百智诚远科技有限公司 Legal knowledge enhancement retrieval system and method based on artificial intelligence algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334178A (en) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 Data retrieval method, device, equipment and readable storage medium storing program for executing
CN111143521A (en) * 2019-10-28 2020-05-12 广州恒巨信息科技有限公司 Method, system and device for retrieving legal items based on knowledge graph and storage medium
CN111241241A (en) * 2020-01-08 2020-06-05 平安科技(深圳)有限公司 Case retrieval method, device and equipment based on knowledge graph and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334178A (en) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 Data retrieval method, device, equipment and readable storage medium storing program for executing
CN111143521A (en) * 2019-10-28 2020-05-12 广州恒巨信息科技有限公司 Method, system and device for retrieving legal items based on knowledge graph and storage medium
CN111241241A (en) * 2020-01-08 2020-06-05 平安科技(深圳)有限公司 Case retrieval method, device and equipment based on knowledge graph and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190692A (en) * 2021-05-28 2021-07-30 山东顺势教育科技有限公司 Self-adaptive retrieval method, system and device for knowledge graph
CN113190692B (en) * 2021-05-28 2022-06-24 山东顺势教育科技有限公司 Self-adaptive retrieval method, system and device for knowledge graph
CN113590845A (en) * 2021-08-09 2021-11-02 平安国际智慧城市科技股份有限公司 Knowledge graph-based document retrieval method and device, electronic equipment and medium
CN113590845B (en) * 2021-08-09 2024-06-25 深圳平安智慧医健科技有限公司 Knowledge graph-based document retrieval method and device, electronic equipment and medium
CN113779230A (en) * 2021-09-15 2021-12-10 广州网律互联网科技有限公司 Law recommendation method, system and equipment based on law understanding
CN113779230B (en) * 2021-09-15 2024-03-19 广州网律互联网科技有限公司 Legal recommendation method, system and equipment based on legal understanding
CN117725235A (en) * 2023-12-25 2024-03-19 武汉百智诚远科技有限公司 Legal knowledge enhancement retrieval system and method based on artificial intelligence algorithm
CN117725235B (en) * 2023-12-25 2024-04-30 武汉百智诚远科技有限公司 Legal knowledge enhancement retrieval system and method based on artificial intelligence algorithm

Also Published As

Publication number Publication date
CN112148702B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN112148702B (en) File retrieval method and device
WO2020057022A1 (en) Associative recommendation method and apparatus, computer device, and storage medium
US20140279774A1 (en) Classifying Resources Using a Deep Network
US20190018899A1 (en) Method and system for providing real time search preview personalization in data management systems
CN110569361A (en) Text recognition method and equipment
CN112148889A (en) Recommendation list generation method and device
CN104516910B (en) The content recommendation in client server environment
CN111026858B (en) Project information processing method and device based on project recommendation model
US11361030B2 (en) Positive/negative facet identification in similar documents to search context
CN109345282A (en) A kind of response method and equipment of business consultation
CN113836131B (en) Big data cleaning method and device, computer equipment and storage medium
CN110929125A (en) Search recall method, apparatus, device and storage medium thereof
CN109992978B (en) Information transmission method and device and storage medium
CN112148701A (en) File retrieval method and equipment
WO2023134057A1 (en) Affair information query method and apparatus, and computer device and storage medium
CN110598070A (en) Application type identification method and device, server and storage medium
CN112328759A (en) Automatic question answering method, device, equipment and storage medium
Roopak et al. OntoKnowNHS: ontology driven knowledge centric novel hybridised semantic scheme for image recommendation using knowledge graph
CN109918375A (en) It is a kind of based on block chain and the big text of distributed storage storage, index and search method
CN114416998A (en) Text label identification method and device, electronic equipment and storage medium
EP3079083A1 (en) Providing app store search results
CN110532229B (en) Evidence file retrieval method, device, computer equipment and storage medium
CN112765966B (en) Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN113821608A (en) Service search method, service search device, computer equipment and storage medium
Kesharwani et al. Movie rating prediction based on: twitter sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant