CN117093604B

CN117093604B - Search information generation method, apparatus, electronic device, and computer-readable medium

Info

Publication number: CN117093604B
Application number: CN202311362817.9A
Authority: CN
Inventors: 梁文杰; 苑博文; 徐崚峰; 陈辉华; 刘殿兴; 岳丰; 方兴; 王伟; 代慧明; 张俊灵; 夏熙城; 孙少卿; 蒋文东; 张泉运
Original assignee: Citic Securities Co ltd
Current assignee: Citic Securities Co ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-02-02
Anticipated expiration: 2043-10-20
Also published as: CN117093604A

Abstract

Embodiments of the present disclosure disclose a search information generation method, apparatus, electronic device, and computer-readable medium. One embodiment of the method comprises the following steps: acquiring a target retrieval statement; carrying out sentence recognition on the target retrieval sentence; determining graph nodes in a pre-constructed graph database which are associated with sentence identification tags in a sentence identification tag set so as to generate associated node information; carrying out fine-granularity recall on the associated node information in the associated node information set according to the statement identification tag set; determining rule retrieval information corresponding to the target retrieval statement according to the recalled associated node information sequence; and sending the legal search information to an information display client for information display. The implementation mode realizes the effective retrieval of the legal content, namely, generates accurate legal retrieval information, reduces the retrieval times and laterally reduces the communication cost of server resources.

Description

Search information generation method, apparatus, electronic device, and computer-readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, an apparatus, an electronic device, and a computer readable medium for generating search information.

Background

The regulation information refers to legal documents such as legal documents, etc. having regulation. In the securities finance field, there are often a plurality of kinds of regulation information for restricting the value operation behavior in the securities finance field, and therefore, how to achieve efficient and accurate retrieval for the plurality of kinds of regulation information becomes particularly important. At present, when the rule information is searched, the following modes are generally adopted: and carrying out fuzzification retrieval of the regulation information through the regulation name.

However, when the above manner is adopted, there are often the following technical problems:

firstly, based on a fuzzy search mode of the rule names, only rule information matched with the rule names can be searched, and the search of rule contents cannot be effectively performed, and on the premise that the matched rule contents are not searched, the search times can be increased, and the communication cost of server resources is increased laterally;

secondly, membership and reference relations may exist between different rule information, full-quantity retrieval of the rule information with the membership and the reference relations cannot be effectively performed based on a fuzzy retrieval mode of the rule names, retrieval difficulty and times may be increased, and communication cost of server resources is increased laterally;

Thirdly, besides membership and reference relation possibly exist among the divider rule information, applicable scenes of the rule information and applicable objects also can be membership and reference relation, so that when conventional relational database data are adopted for data storage, the maintenance cost and the retrieval cost of the data can be increased.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a search information generation method, apparatus, electronic device, and computer readable medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a search information generation method, the method including: obtaining a target search statement, wherein the target search statement is a search statement aiming at regulatory information; performing sentence recognition on the target search sentence to generate a sentence recognition tag set; determining graph nodes in a pre-constructed graph database which are associated with sentence identification tags in the sentence identification tag set so as to generate associated node information, and obtaining an associated node information set; carrying out fine-grained recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recalled associated node information sequence; determining rule retrieval information corresponding to the target retrieval statement according to the recalled associated node information sequence; and sending the legal search information to an information display client for information display, wherein the information display client is a client for sending the target search statement.

In a second aspect, some embodiments of the present disclosure provide a search information generation apparatus, the apparatus including: an acquisition unit configured to acquire a target search sentence, wherein the target search sentence is a search sentence for regulatory information; a sentence recognizing unit configured to recognize the sentence of the target search sentence to generate a sentence recognition tag set; a first determining unit configured to determine graph nodes associated with sentence identification tags in the sentence identification tag set in a pre-built graph database, so as to generate associated node information, and obtain an associated node information set; the fine-granularity recall unit is configured to carry out fine-granularity recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recall associated node information sequence; a second determining unit configured to determine rule retrieval information corresponding to the target retrieval sentence based on the recalled associated node information sequence; and a transmitting unit configured to transmit the rule search information to an information display client for information display, wherein the information display client is a client transmitting the target search sentence.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

The above embodiments of the present disclosure have the following advantageous effects: by the search information generation method of some embodiments of the present disclosure, effective search of legal contents is realized, that is, accurate legal search information is generated, the search times are reduced, and the communication cost of server resources is laterally reduced. Specifically, the reasons why the search of the legal contents cannot be performed efficiently, and the search times and the server resource communication costs are increased are that: based on the fuzzy search mode of the legal name, only the legal information matched with the legal name can be searched, and the search of the legal content can not be effectively performed, but on the premise that the matched legal content is not searched, the search times can be increased, and the communication cost of the server resource is increased laterally. Based on this, the search information generation method of some embodiments of the present disclosure first acquires a target search sentence, wherein the target search sentence is a search sentence for regulatory information. Secondly, sentence recognition is carried out on the target search sentences so as to generate a sentence recognition tag set. And eliminating the invalid sentence content and extracting the valid information. Next, determining graph nodes in the pre-constructed graph database, which are associated with the sentence identification tags in the sentence identification tag set, so as to generate associated node information, and obtaining an associated node information set. And obtaining the associated node information associated with the sentence identification tag in the graph database. In practice, the graph database can store the association relationship between information better than the traditional relational database, and is beneficial to information retrieval based on the relationship. And further, carrying out fine-grained recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recalled associated node information sequence. In practice, the recall amount of the obtained associated node information set is often larger, so that the recall accuracy is not high enough, on the basis of the recall amount, the associated node information with lower association degree is removed through fine-grained recall, and the recall accuracy is improved through the mode of reducing the recall amount. And determining rule search information corresponding to the target search statement according to the recalled associated node information sequence. And obtaining rule contents corresponding to the recalled associated node information so as to generate rule retrieval information. And finally, sending the legal search information to an information display client for information display, wherein the information display client is a client for sending the target search statement. By the method, the legal content is effectively searched, namely accurate legal search information is generated, the search times are reduced, and the communication cost of server resources is laterally reduced.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of a retrieve information generation method according to the present disclosure;

FIG. 2 is a schematic diagram of the structure of some embodiments of a retrieve information generation apparatus according to the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, a flow 100 of some embodiments of a retrieve information generation method according to the present disclosure is shown. The search information generation method comprises the following steps:

Step 101, obtaining a target retrieval statement.

In some embodiments, the execution subject (e.g., computing device) of the search information generation method may obtain the above-described target search statement by way of a wired connection, or a wireless connection. The target search term may be a search term for regulatory information. In practice, regulatory information may refer to legal documents that are normative. For example, regulatory information may characterize legal documents, etc. legal documents. In practice, the target search statement may be a natural language statement. For example, the target retrieval statement may be "what requirements the central enterprise has for the equity incentive object". In practice, the execution subject may acquire the target search sentence input by the user through a question-answer interface.

It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.

The computing device may be hardware or software. When the computing device is hardware, the computing device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices listed above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein. It should be appreciated that the number of computing devices may have any number, as desired for implementation.

Step 102, performing sentence recognition on the target search sentence to generate a sentence recognition tag set.

In some embodiments, the execution body may perform sentence recognition on the target search sentence to generate a sentence recognition tag set. The sentence identification tag in the sentence identification tag set may be a search tag for performing rule information search.

As an example, the above-described execution subject may perform sentence recognition on the target search sentence through the tag recognition model to generate a sentence recognition tag set. In practice, since the length of the target search sentence is not fixed, the tag recognition model may employ a recurrent neural network model as a backbone network for extracting sentence features of the target search sentence. In addition, the tag identification model may further include a tag classification model, wherein the tag classification model may employ a full connection layer. The execution body may input sentence features into the tag classification model to generate a sentence recognition tag set. For example, the target retrieval statement may be "what is a central nationally owned enterprise required for a equity incentive object? ". The resulting set of sentence identification tags may be [ "central country has an enterprise", "stock right incentive", "incentive object" ].

In some optional implementations of some embodiments, the executing body performs sentence recognition on the target search sentence to generate a sentence recognition tag set, and may include the following steps:

and firstly, carrying out fine-granularity word segmentation on the target search sentence to obtain a candidate word sequence.

The word length of the candidate word in the candidate word sequence may be less than or equal to the target length. For example, the target length may be 3. In practice, the execution subject may perform fine-granularity word segmentation on the target search sentence through JIEBA word segmentation. Specifically, the execution body may restrict the word length of the word in the vocabulary corresponding to the JIEBA segmentation so that the length of the candidate word in the obtained candidate word sequence is smaller than or equal to the target length.

As an example, the target retrieval statement may be "what is a central enterprise required for a equity incentive object? The resulting candidate word sequence may be [ "central enterprise", "for", "equity", "incentive", "object", "have", "what", "claim", "? "].

And secondly, performing stop word filtering on the candidate word sequence according to the stop word list to obtain a filtered word sequence.

Wherein the stop word list may be a pre-maintained word list including stop words. For example, the stop phrase may include punctuation marks. In practice, for each candidate word in a sequence of candidate words, the execution entity may determine whether the candidate word is located in a stop vocabulary, and reject the candidate word from the sequence of candidate words in response to the candidate word being located in the stop vocabulary.

As an example, a candidate word sequence may be [ "central enterprise," for, "" equity, "" incentive, "" object, "" have, "" what, "" claim, ""? "]. The filtered word sequence may be [ "central enterprise", "for", "equity", "incentive", "object", "have", "what", "claim" ].

And thirdly, determining part-of-speech information of each filtered word in the filtered word sequence.

Wherein the part-of-speech information characterizes the part-of-speech of the filtered word. For example, the part-of-speech information may characterize a "noun". As another example, the part-of-speech information may characterize a "verb". In practice, the execution subject may determine part-of-speech information of the filtered word through an LTSM (Long short-term memory) model+crf (Conditional Random Fields, conditional random field) model.

And step four, screening out the filtered words, corresponding to the part-of-speech information, matched with the target part-of-speech information from the filtered word sequence, and taking the filtered words as target words to obtain a target word sequence.

Wherein the target part-of-speech information may characterize a noun.

As an example, the filtered word sequence may be [ "central enterprise", "for", "equity", "incentive", "object", "have", "what", "require" ], and the resulting target word sequence may be [ "central enterprise", "equity", "incentive", "object", "require" ].

Fifth, for each target word in the target word sequence, the following first processing step is performed:

and a first sub-step of determining word sense features corresponding to the target word as first target word sense features.

The execution subject can perform Word coding on the target Word through a Word2Vec model to obtain a first target Word sense feature.

And a second sub-step of determining the word sense similarity of the first target word sense feature and each word sense feature in the word sense feature library to generate a word sense similarity value and obtain a word sense similarity value sequence.

The word sense feature library may be a feature library which is maintained in advance and is used for storing word sense features corresponding to nouns related to the rule information. The generating manner of the word sense features in the word sense feature library may adopt the first substep, which is not described herein. In practice, the execution body may determine, as the word sense similarity value, a cosine similarity of the first target word sense feature and each of the word sense features in the word sense feature library.

And a third sub-step of eliminating the target word from the target word sequence in response to determining that no word sense similarity value greater than a preset similarity value exists in the word sense similarity value sequence.

In practice, the preset similarity value may be 0.5.

Sixth, for each word-rejected target word in the obtained word-rejected target word sequence, executing the following second processing step:

and a first sub-step of carrying out word splicing on the word-removed target words adjacent to the word-removed target words in the word-removed target word sequence and the word-removed target words in sequence to generate spliced words, so as to obtain a spliced word sequence.

As an example, the target word sequence after word culling may be [ "central enterprise", "equity", "incentive", "object" ]. For the target word "excitation" after word elimination, the execution subject may splice the target word "stock right" after word elimination before the target word "excitation" after word elimination with the target word "excitation" after word elimination to obtain a spliced word "stock right excitation", and splice the target word "object" after word elimination after word "excitation" with the target word "excitation" after word elimination to obtain a spliced word "excitation object", so that a spliced word sequence corresponding to the target word "excitation" after word elimination may be [ "" stock right excitation, "excitation object" ].

And a second sub-step of determining a second target word sense feature.

The second target word sense feature is a word sense feature corresponding to the word sense similarity value meeting the first screening condition in the corresponding word sense similarity value sequence of the target word after the word is removed in the word sense feature library. The first filtering condition may be that the word sense similarity value is the same as the maximum word sense similarity value of the target word in the corresponding word sense similarity value sequence after the word is removed.

And a third sub-step of determining the word sense characteristics of each spliced word in the spliced word sequence to obtain a spliced word sense characteristic sequence.

The execution subject can code the Word of the spliced Word through a Word2Vec model to generate Word sense characteristics of the spliced Word.

And a fourth sub-step of determining the spliced word corresponding to the word sense feature of the spliced word meeting the second screening condition in the word sense feature sequence of the spliced word as the sentence identification tag in the sentence identification tag set.

Wherein, the second screening condition is: and splicing word sense similarity values of the word sense features and the second target word sense features to be larger than corresponding word sense similarity values of the second target word sense features in the word sense similarity value sequence.

As an example, the sentence identification tags may be "equity incentives" and "incentives".

And a fifth substep, in response to determining that the word sense feature sequence of the spliced word does not have the word sense feature of the spliced word that satisfies the second filtering condition, performing word update on the target word after word removal by using the word tag corresponding to the second target word sense feature, so as to generate an updated word, and determining the updated word as a sentence identification tag in the sentence identification tag set.

As an example, the word tag corresponding to the second target word sense feature may be "central nationally owned business". The target word after word elimination can be a 'central enterprise', and the updated word can be a 'central national enterprise'.

The foregoing "in some alternative implementations of some embodiments" is an invention of the present disclosure, which solves the second technical problem mentioned in the background art, that is, "there may be a membership relationship and a reference relationship between different rule information, and based on a fuzzy search mode of rule names, the overall search of rule information with a membership relationship and a reference relationship cannot be effectively performed, which may increase the search difficulty and the number of times, and increase the communication cost of server resources laterally. Based on this, first, the present disclosure performs fine-grained word segmentation on a target search sentence, resulting in a candidate word sequence. In practice, the conventional word segmentation mode is limited by the influence of the word list, so that the word segmentation result is poor, therefore, the method and the device perform word segmentation in fine granularity, and the problem of poor word segmentation result caused by the conventional coarse granularity word segmentation mode is avoided. Secondly, the method and the device respectively perform stop word filtering and word filtering according to part-of-speech information to remove useless words. Next, the present disclosure further culls out unwanted words by determining the word sense similarity of the first target word sense feature to each of the word sense features in the word sense feature library. Next, the present disclosure considers that the word segmentation result obtained by fine-granularity word segmentation may not be accurate enough, that is, the word-reject target word obtained by a plurality of fine-granularity word segmentation may be an actual sentence recognition tag, so the present disclosure combines word sequences, performs word stitching on the word-reject target word, and determines whether the stitched word is an actual sentence recognition tag by calculating the similarity between the post-stitching word sense feature and the second target word sense feature. By the method, the sentence identification tag can be effectively extracted from the target search sentence. Finally, the full-quantity retrieval of the legal information with the membership and the reference relation can be realized by combining the graph database in the application, so that the retrieval accuracy is greatly improved, the problem of the increase of retrieval times caused by incorrect retrieval is solved, and the communication cost of server resources is increased laterally.

And step 103, determining graph nodes in the pre-constructed graph database which are associated with the sentence identification tags in the sentence identification tag set so as to generate associated node information, and obtaining an associated node information set.

In some embodiments, the executing entity may determine graph nodes associated with sentence identification tags in the sentence identification tag set in the pre-constructed graph database, so as to generate associated node information, and obtain an associated node information set. The graph database may be a database storing regulatory information in a graph structure. The association node information in the association node information set may characterize graph nodes having association relationships with sentence identification tags in the sentence identification tag set.

In some optional implementations of some embodiments, the executing entity determines graph nodes associated with sentence identification tags in the sentence identification tag set in the pre-constructed graph database to generate associated node information, so as to obtain an associated node information set, and may include the following steps:

and step one, reading a graph node structure diagram corresponding to the graph database.

The graph node structure diagram is a high-dimensional structure diagram aiming at the graph database and representing the graph node relation. The graph node structure diagram is a three-dimensional directed graph. And directed graph edge groups exist between graph nodes with membership in the graph node structure diagram. The directed graph edge group may include: a first directed graph edge for forward traversal between graph nodes and a second directed graph edge for reverse traversal between graph nodes.

In practice, the directed graph is generally stored in a linked list mode, and the graph nodes are traversed through pointers, so that when only one-way directed edges exist, the traversing efficiency is poor through pointer traversal from head to tail according to the direction of the directed edges, and when membership exists between the graph nodes, the reverse graph traversal can be realized by setting the first bidirectional directed edges and the second bidirectional directed edges, and the traversing efficiency is greatly improved.

The graph node types of the graph nodes in the graph node structure diagram comprise: an applicable object information node type, an applicable scene information node type, an applicable regulation information node type, and an applicable regulation clause information node type.

In practice, different rule information often corresponds to different applicable objects, applicable scenes, and specific rule terms for the applicable objects and applicable scenes, and therefore, membership between different graph nodes can be effectively determined by dividing the graph nodes into applicable object information node types, applicable scene information node types, applicable rule information node types, and applicable rule term information node types.

And a second step of determining a graph node corresponding to the sentence identification tag in the sentence identification tag set in the graph node structure diagram as a target graph node to obtain a target graph node set.

The target graph node may be a graph node having a directed edge connection between graph nodes corresponding to the sentence identification tag. In practice, for each sentence identification tag in the sentence identification tag set, the execution body may traverse the graph node structure graph to obtain at least one target graph node corresponding to the sentence identification tag.

For each target graph node in the set of target graph nodes, performing the following third processing step:

a first sub-step of determining the target graph node as a candidate graph node in response to determining that the graph node type of the target graph node is an applicable regulatory clause information node type.

In practice, when the graph node type of the target graph node is the applicable regulation term information node type, the characterization target graph node is the graph node corresponding to the specific regulation term related to the sentence identification tag set, and therefore, the above-described execution subject can directly determine the target graph node as the candidate graph node.

And a second sub-step of performing radial traversal on the graph node structure diagram with the target graph node as a center to obtain at least one candidate graph node in response to determining that the graph node type of the target graph node is not the applicable rule term information node type.

When the graph node type of the target graph node is not the applicable rule clause information node type, that is, the graph node type of the target graph node is any one of the applicable object information node type, the applicable scene information node type and the applicable rule information node type, the execution body may perform radial traversal on the graph node structure diagram with the target graph node as a center to obtain at least one candidate graph node. In practice, there may be multiple graph nodes connected to the target graph node, so K threads may be created, with parallel graph traversal centered on the target graph node. Wherein K is equal to the out-degree value of the target graph node, and K also characterizes the number of graph nodes connected to the target graph node.

Optionally, in response to determining that the graph node type of the target graph node is not the applicable legal provision information node type, the executing body performs radial traversal on the graph node structure diagram centering on the target graph node to obtain at least one candidate graph node, and may include the following steps:

step 1: and taking the target graph node as a center, and performing radial connected subgraph traversal on the graph node structure diagram through a pre-trained graph neural network model to determine a connected subgraph.

The graph neural network model may be a graph neural network model based on an attention mechanism. In practice, the graph neural network model may include: the image feature extraction layer set and the attention mechanism layer, wherein the image feature extraction layers in the image feature extraction layer set correspond to different image feature receptive fields. Namely, for the graph nodes in the graph node structure diagram, the graph feature extraction layers in the graph feature extraction layer set can realize feature extraction of the graph nodes under different receptive fields. The attention mechanism layer is used for automatically learning weights corresponding to different graph feature extraction layers in the graph feature extraction layer set. Meanwhile, regularization term constraints are set for the graph feature extraction layers in the graph feature extraction layer set. Because certain relativity exists when different graph feature extraction layers extract features, the regularization constraint items are introduced, so that the different graph feature extraction layers perform parameter learning in directions close to each other. Compared with the traditional graph neural network model, the method can better determine the connected subgraph by combining the characteristics corresponding to the graph nodes under different sensing fields.

Step 2: and determining the graph nodes in the connected subgraph as candidate graph nodes to obtain at least one candidate graph node.

And thirdly, node grouping is carried out on the obtained candidate graph node set, and a candidate graph node set is obtained.

And membership exists among the candidate graph nodes in the candidate graph node group. In practice, the executing entity may determine, as the candidate graph node group, a plurality of candidate graph nodes having membership. Specifically, the executing body can determine the candidate graph node group through a blood edge analysis mode.

And step four, generating a rule vector corresponding to each candidate graph node group in the candidate graph node group set as associated node information to obtain the associated node information set.

As an example, the execution subject may encode, by means of single-hot encoding, the rule information, rule term information, rule applicable scene, and rule use object corresponding to the candidate graph node group, to obtain the rule vector as the associated node information.

As an invention point of the present disclosure, the above-mentioned "in some optional implementations of some embodiments" solves the third technical problem mentioned in the background art, that is, "there may be membership and referential relationships among division gauge information, and other than membership and referential relationships among applicable scenes and applicable objects of regulatory information, so that when conventional relational database data is used for data storage, maintenance cost and retrieval cost of the data may be increased. In practice, a conventional relational database is often used for constructing a plurality of data tables for data of multiple membership and reference relationships, and the multiple tables are connected in a way of external keys and the like, and as the membership and reference relationships are complicated and the volume of data is increased, a way of table-connected query is adopted for the relational database, so that larger server resources are consumed. Based on this, the present disclosure uses a graph database to store regulatory information, applicable scenes, applicable objects, and the like. In particular, in the step of determining the associated node information set, firstly, the present disclosure obtains a graph node structure diagram, so as to obtain a high-dimensional structure diagram for the graph database, which characterizes the graph node relationship, thereby avoiding direct data reading, reducing the data reading amount, and reducing the data loading pressure of the server. For the graph node structure diagram, different rule information often corresponds to different applicable objects, applicable scenes and specific rule terms for the applicable objects and the applicable scenes, so that membership relations among different graph nodes can be effectively determined by dividing the graph nodes into applicable object information node types, applicable scene information node types, applicable rule information node types and applicable rule term information node types. Namely, the three-dimensional directed graph is adopted to store rule information, rule term information, rule use objects and rule application scenes, so that membership and reference relations among different graph nodes can be more clearly represented. In addition, since the directed graph is generally stored in a linked list mode, the graph nodes are traversed through pointers, when only one-way directed edges exist, the traversing efficiency is poor through pointer traversal from head to tail according to the direction of the directed edges, and therefore when membership exists between the graph nodes, the reverse graph traversal can be realized by setting the first bidirectional directed edges and the second bidirectional directed edges, and the traversing efficiency is greatly improved. Next, determining a graph node corresponding to the sentence identification tag in the sentence identification tag set in the graph node structure diagram as a target graph node, and obtaining a target graph node set. Thereby obtaining the starting point of the graph node traversal. Next, for each target graph node in the set of target graph nodes, the following third processing step is performed: in a first step, the target graph node is determined as a candidate graph node in response to determining that the graph node type of the target graph node is an applicable regulatory clause information node type. In practice, the graph node type of the target graph node is the applicable rule term information node type, and is characterized as being non-detachable, so that the target graph node can be directly determined as a candidate graph node. And secondly, in response to determining that the graph node type of the target graph node is not the applicable rule term information node type, performing radial traversal on the graph node structure diagram by taking the target graph node as a center to obtain at least one candidate graph node. In practice, when the graph node type of the target graph node is an applicable scenario information node type, there may be at least one rule term applicable to the applicable scenario of the target graph node. Meanwhile, as a plurality of graph nodes can be connected with the target node, graph node traversal can be performed more comprehensively and rapidly by adopting a radial traversal mode. And thirdly, node grouping is carried out on the obtained candidate graph node set to obtain a candidate graph node set, wherein membership relations exist among candidate graph nodes in the candidate graph node set. Thereby achieving the merging of graph nodes with membership. And step four, generating a rule vector corresponding to each candidate graph node group in the candidate graph node group set as associated node information to obtain the associated node information set. By combining the graph database and the optimized graph node structure diagram, the maintenance cost and the retrieval cost of the database are reduced.

And 104, carrying out fine-grained recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recalled associated node information sequence.

In some embodiments, the executing body may perform fine-grained recall of the associated node information in the associated node information set according to the statement identification tag set, to obtain a post-recall associated node information sequence. In practice, the associated node information in the associated node information set is the graph node information corresponding to the graph node associated with at least one sentence identification tag in the sentence identification tag set, that is, the constraint is not performed by each sentence identification tag in the sentence identification tag set, so that the recall amount corresponding to the obtained associated node information set is larger. Therefore, each sentence identification tag in the sentence identification tag set is taken as a constraint, and the fine-grained recall is carried out on the associated node information in the associated node information set to obtain the associated node information sequence after recall, so that the recall accuracy can be effectively improved, and the recall quantity can be reduced.

In some optional implementations of some embodiments, the executing body performs fine-grained recall of the associated node information in the associated node information set according to the statement identification tag set to obtain a post-recall associated node information sequence, and may include the following steps:

The first step, generating the label characteristics corresponding to each sentence identification label in the sentence identification label set to obtain a label characteristic set.

The execution subject can perform vector coding on the sentence identification tag through a Word2Vec model to generate tag features, so as to obtain a tag feature set.

As an example, the sentence identification tag set may be [ "central nationally enterprise", "equity incentive", "incentive object" ]. The sentence identification tag "the central country has enterprises" may correspond to the tag feature a. The sentence identification tag "equity incentive" may correspond to the tag feature B. The sentence identification tag "excitation object" may correspond to the tag feature C.

And secondly, carrying out random combination of preset granularity on the tag features in the tag feature set to generate a tag feature set.

In practice, the preset granularity may be that the characterization randomly combines greater than or equal to 2 tag features. Specifically, the preset granularity may be adjusted according to the number of tag features in the tag feature set, so as to restrict the number of tag feature sets obtained.

As an example, the set of tag feature sets may be [ [ "tag feature a", "tag feature B" ], [ "tag feature a", "tag feature B", "tag feature C" ], [ "tag feature a", "tag feature C" ] ].

Third, for each associated node information in the associated node information set, the following fourth processing step is performed:

and a first sub-step of determining the association degree value of each tag feature group in the tag feature group set and the association node information to obtain an association degree value group and obtain an association degree value group set.

Firstly, the executing body may perform feature stitching on each tag feature in the tag feature group to obtain stitching features. For example, the "tag feature vector a" and the "tag feature vector B" are feature-spliced to obtain the spliced feature a. And secondly, the execution main body can determine the characteristic similarity of the splicing characteristic and the associated node information as an associated degree value. For example, the execution body may determine the feature similarity between the spliced feature and the associated node information by calculating the cosine similarity, as the association value.

And a second sub-step of determining the association node information as candidate association node information in response to determining that there is an association value greater than a preset association value in the association value group set.

The preset association degree may be 0.85.

And fourthly, according to the obtained target association degree value corresponding to the candidate association node information in the candidate association node information set, carrying out descending sorting on the candidate association node information in the candidate association node information set so as to generate the recalled association node information sequence.

And 105, determining rule retrieval information corresponding to the target retrieval statement according to the recalled associated node information sequence.

Wherein, the rule retrieval information can be the retrieved rule information and/or rule term information corresponding to the target retrieval statement. In practice, the execution subject may determine, as the rule retrieval information, rule information or rule term information corresponding to the recall related node information in the recall related node information sequence in the graph database.

Optionally, the legislation retrieval information comprises: the sub-rule retrieves the information set. Wherein, the sub-regulation retrieval information can be regulation information and/or regulation clause information corresponding to the recalled associated node information.

In some optional implementations of some embodiments, the executing body determines, according to the recalled associated node information sequence, rule search information corresponding to the target search statement, and may include the following steps:

and determining rule information and/or rule term information corresponding to the recall related node information as sub-rule retrieval information in the sub-rule retrieval information set for each recall related node information in the recall related node information sequence.

And step 106, sending the legal search information to an information display client for information display.

In some embodiments, the executing entity may send the rule retrieving information to the information display client for information display through a wired connection or a wireless connection. The information display client is a client for transmitting the target search statement.

With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a search information generation apparatus, which correspond to those method embodiments shown in fig. 1, and which are particularly applicable to various electronic devices.

As shown in fig. 2, the retrieval information generation device 200 of some embodiments includes: an acquisition unit 201, a sentence recognition unit 202, a first determination unit 203, a fine granularity recall unit 204, a second determination unit 205, and a transmission unit 206. Wherein the obtaining unit 201 is configured to obtain a target search statement, wherein the target search statement is a search statement for regulatory information; a sentence recognizing unit 202 configured to recognize the sentence of the target search sentence to generate a sentence recognition tag set; a first determining unit 203 configured to determine graph nodes associated with the sentence identification tags in the sentence identification tag set in a pre-constructed graph database, so as to generate associated node information, and obtain an associated node information set; a fine-grained recall unit 204 configured to perform fine-grained recall on the associated node information in the associated node information set according to the sentence identification tag set, to obtain a post-recall associated node information sequence; a second determining unit 205 configured to determine rule retrieval information corresponding to the target retrieval sentence based on the recalled associated node information sequence; and a transmitting unit 206 configured to transmit the rule search information to an information display client for information display, wherein the information display client is a client transmitting the target search sentence.

It will be appreciated that the elements described in the search information generation apparatus 200 correspond to the respective steps in the method described with reference to fig. 1. Thus, the operations, features and advantages described above for the method are equally applicable to the search information generating apparatus 200 and the units contained therein, and are not described here again.

Referring now to fig. 3, a schematic diagram of an electronic device (e.g., computing device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with programs stored in a read-only memory 302 or programs loaded from a storage 308 into a random access memory 303. In the random access memory 303, various programs and data necessary for the operation of the electronic device 300 are also stored. The processing means 301, the read only memory 302 and the random access memory 303 are connected to each other by a bus 304. An input/output interface 305 is also connected to the bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from read only memory 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining a target search statement, wherein the target search statement is a search statement aiming at regulatory information; performing sentence recognition on the target search sentence to generate a sentence recognition tag set; determining graph nodes in a pre-constructed graph database which are associated with sentence identification tags in the sentence identification tag set so as to generate associated node information, and obtaining an associated node information set; carrying out fine-grained recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recalled associated node information sequence; determining rule retrieval information corresponding to the target retrieval statement according to the recalled associated node information sequence; and sending the legal search information to an information display client for information display, wherein the information display client is a client for sending the target search statement.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a statement identification unit, a first determination unit, a fine-grained recall unit, a second determination unit, and a transmission unit. The names of the units are not limited to the unit itself in some cases, for example, the first determining unit may be further described as "a unit for determining, in a pre-constructed graph database, graph nodes associated with the sentence identification tags in the sentence identification tag set, so as to generate associated node information, and obtain an associated node information set".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A search information generation method, comprising:

obtaining a target search statement, wherein the target search statement is a search statement aiming at regulatory information;

performing sentence recognition on the target search sentence to generate a sentence recognition tag set;

determining graph nodes in a pre-constructed graph database which are associated with sentence identification tags in the sentence identification tag set so as to generate associated node information, and obtaining an associated node information set;

carrying out fine-grained recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recalled associated node information sequence;

determining rule retrieval information corresponding to the target retrieval statement according to the recalled associated node information sequence;

transmitting the legal search information to an information display client for information display, wherein the information display client is a client transmitting the target search statement, and the information display client is a client transmitting the target search statement,

the sentence recognition on the target search sentence to generate a sentence recognition tag set includes:

carrying out fine-granularity word segmentation on the target search sentence to obtain a candidate word sequence;

According to the stop word list, performing stop word filtering on the candidate word sequence to obtain a filtered word sequence;

determining part-of-speech information of each filtered word in the filtered word sequence;

screening out the filtered word with the corresponding part-of-speech information matched with the target part-of-speech information from the filtered word sequence to be used as a target word, and obtaining a target word sequence;

for each target word in the sequence of target words, performing the following first processing step:

determining word sense features corresponding to the target word as first target word sense features;

determining the word sense similarity of the first target word sense feature and each word sense feature in the word sense feature library to generate a word sense similarity value, and obtaining a word sense similarity value sequence;

in response to determining that no word sense similarity value greater than a preset similarity value exists in the word sense similarity value sequence, eliminating the target word from the target word sequence;

for each word post-removal target word in the obtained word post-removal target word sequence, executing the following second processing step:

word stitching is carried out on word-removed target words adjacent to the word-removed target words in the word-removed target word sequence sequentially with the word-removed target words so as to generate stitched words, and a stitched word sequence is obtained;

Determining second target word sense features, wherein the second target word sense features are word sense features corresponding to word sense similarity values meeting first screening conditions in corresponding word sense similarity value sequences of the target words after word removal in the word sense feature library;

determining the word sense characteristics of each spliced word in the spliced word sequence to obtain a spliced word sense characteristic sequence;

determining the spliced word corresponding to the word sense feature of the spliced word meeting the second screening condition in the word sense feature sequence of the spliced word as a sentence identification tag in the sentence identification tag set, wherein the second screening condition is as follows: splicing word sense similarity values of word sense features and second target word sense features, wherein the word sense similarity values are larger than corresponding word sense similarity values of the second target word sense features in a word sense similarity value sequence;

in response to determining that the spliced word sense feature satisfying the second filtering condition does not exist in the spliced word sense feature sequence, performing word updating on the word-removed target word through the word tag corresponding to the second target word sense feature to generate an updated word, and determining the updated word as a sentence identification tag in the sentence identification tag set, wherein,

The determining the graph nodes in the pre-constructed graph database associated with the sentence identification tags in the sentence identification tag set to generate associated node information, and obtaining an associated node information set includes:

reading a graph node structure diagram corresponding to the graph database, wherein the graph node structure diagram is a three-dimensional directed graph, directed graph edge groups exist between graph nodes with membership in the graph node structure diagram, and the directed graph edge groups comprise: the first directed graph edge is used for forward traversal among graph nodes, and the second directed graph edge is used for reverse traversal among the graph nodes, and graph node types of the graph nodes in the graph node structure diagram comprise: the applicable object information node type, the applicable scene information node type, the applicable rule information node type and the applicable rule clause information node type;

determining a graph node corresponding to the sentence identification tag in the sentence identification tag set in the graph node structure diagram as a target graph node to obtain a target graph node set;

In response to determining that the graph node type of the target graph node is an applicable regulatory clause information node type, determining the target graph node as a candidate graph node;

in response to determining that the graph node type of the target graph node is not the applicable legal provision information node type, performing radial traversal on the graph node structure diagram with the target graph node as a center to obtain at least one candidate graph node;

node grouping is carried out on the obtained candidate graph node set to obtain a candidate graph node set, wherein membership exists among candidate graph nodes in the candidate graph node set;

and generating a rule vector corresponding to each group of candidate graph node groups in the candidate graph node group set as associated node information to obtain the associated node information set.

2. The method of claim 1, wherein the regulatory retrieval information comprises: sub-rule retrieval information sets; and

determining rule retrieval information corresponding to the target retrieval statement according to the recalled associated node information sequence, wherein the rule retrieval information comprises the following steps:

and for each recall related node information in the recall related node information sequence, determining rule information and/or rule term information corresponding to the recall related node information as sub-rule retrieval information in the sub-rule retrieval information set.

3. The method of claim 2, wherein said performing a radial traversal of the graph node structure graph centered on the target graph node, responsive to determining that the graph node type of the target graph node is not the applicable regulatory clause information node type, results in at least one candidate graph node, comprising:

taking the target graph node as a center, and performing radial connected subgraph traversal on the graph node structure diagram through a pre-trained graph neural network model to determine a connected subgraph;

and determining the graph nodes in the connected subgraph as candidate graph nodes to obtain at least one candidate graph node.

4. The method of claim 3, wherein the performing fine-grained recall of the associated node information in the associated node information set according to the statement identification tag set to obtain a post-recall associated node information sequence comprises:

generating a tag feature corresponding to each sentence identification tag in the sentence identification tag set to obtain a tag feature set;

carrying out random combination of preset granularity on the tag features in the tag feature set to generate a tag feature set;

for each associated node information in the set of associated node information, performing the fourth processing step of:

Determining the association degree value of the association node information and each tag feature group in the tag feature group set to obtain an association degree value group, and obtaining an association degree value group set;

in response to determining that there is a relevance value greater than a preset relevance value in the set of relevance values, determining the relevance node information as candidate relevance node information;

and according to the obtained target association degree value corresponding to the candidate association node information in the candidate association node information set, carrying out descending sorting on the candidate association node information in the candidate association node information set so as to generate the recalled association node information sequence.

5. A search information generation apparatus comprising:

an acquisition unit configured to acquire a target search statement, wherein the target search statement is a search statement for regulatory information;

a sentence recognition unit configured to perform sentence recognition on the target search sentence to generate a sentence recognition tag set;

a first determining unit configured to determine graph nodes associated with sentence identification tags in the sentence identification tag set in a pre-built graph database, so as to generate associated node information, and obtain an associated node information set;

The fine-granularity recall unit is configured to carry out fine-granularity recall on the associated node information in the associated node information set according to the statement identification tag set to obtain a recall associated node information sequence;

a second determining unit configured to determine rule retrieval information corresponding to the target retrieval sentence according to the recalled associated node information sequence;

a transmitting unit configured to transmit the regulation retrieval information to an information display client for information display, wherein the information display client is a client that transmits the target retrieval sentence, wherein,

6. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1 to 4.

7. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 4.