CN113535936A

CN113535936A - Deep learning-based regulation and regulation retrieval method and system

Info

Publication number: CN113535936A
Application number: CN202110686425.2A
Authority: CN
Inventors: 彭艳宏; 杨攀; 柯旭
Original assignee: Hangzhou Chuling Data Technology Co ltd
Current assignee: Hangzhou Chuling Data Technology Co ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2021-10-22
Anticipated expiration: 2041-06-21
Also published as: CN113535936B

Abstract

The invention discloses a deep learning-based regulation retrieval method and a system, wherein the method comprises the following steps: 1. acquiring a query text input by a user; 2. acquiring target word segmentation of the query text and attributes of the target word segmentation; 3. constructing a regulation and regulation database; 4. according to the target word and its attribute, making search in the regulation and regulation database and calculating matching degree X based on word_n(ii) a 5. Calculating a semantic-based degree of match Y_n(ii) a 6. According to X_nAnd Y_nCalculating the composite matching degree Z_n(ii) a 7. According to the composite matching degree Z_nAnd inquiring the target word segmentation attributes of the text and the specific hierarchical relation in the rule system to finally obtain a plurality of inverted retrieval results. The method realizes a Chinese text word segmentation model, a Chinese text dependency syntax analysis model, an OCR character recognition model and an ESIM text similarity calculation model on the basis of deep learning, and realizes quick and accurate retrieval of a regulation system.

Description

Deep learning-based regulation and regulation retrieval method and system

Technical Field

The invention relates to the technical field of computers, in particular to a deep learning-based regulation and regulation retrieval method and system.

Background

The current regulations (national laws and regulations, provincial regulations and enterprise regulations) are so numerous that it is difficult for a general person to become familiar with the regulations and to quickly handle the regulations in some cases. The existing general search engine is not subjected to targeted optimization in the aspect of regulation retrieval, has certain deviation on semantic analysis, has poor retrieval effect, and is specifically represented by the fact that no professional comprehensive regulation database exists and the retrieval matching based on semantic hierarchy exists. Therefore, the intelligent retrieval method and system for a certain retrieval word or statement based on the existing regulatory library and deep learning are developed, and have extremely high practical significance and application value.

Disclosure of Invention

In view of the above, the present invention provides a method and system for searching a regulation based on deep learning, which aims to solve the technical problems that people are difficult to accurately obtain the specific content of the corresponding regulation according to keywords through a general search engine, and the searched correlation is poor.

In order to achieve the above purpose, the present application provides a deep learning-based method for searching rules and regulations, comprising the following steps:

in a first aspect, the invention provides a deep learning-based regulatory search method, which comprises the following specific steps:

s1, acquiring a query text provided by a user, and inputting the query text into a Chinese text word segmentation model to obtain each target word segmentation in the query text; and inputting each target word segmentation into the Chinese text dependency syntactic analysis model to obtain the part of speech and the attribute of each target word segmentation. And screening the target participles according to the part of speech and the attribute of each target participle.

S2, searching in the regulation database to obtain a plurality of search results, and calculating the matching degree X of each search result based on the word segmentation_nAnd screening N retrieval results meeting the requirements.

2-1, searching a plurality of preliminary search results according to the original query text and the target participles screened out in the step S1. The preliminary search results each include a document-content portion and a document-title portion. The document-content part is a specific content part of the search result. The document-title is a title or a subtitle of a paragraph to which the search result belongs. The preliminary search results are input into the chinese text segmentation model and the chinese text dependency parsing model described in step S1. And obtaining the target word segmentation in each preliminary retrieval result and the part of speech and the attribute of the target word segmentation.

2-2, respectively inputting the target participles of the query text screened in the step S1 and the target participles extracted from the document-content part of each preliminary search result into an unsupervised matching algorithm to obtain the basic matching degree A between the query text and each preliminary search result_n；

Respectively inputting the target participles of the query text screened in the step S1 and the target participles extracted from the document-title part of each preliminary search result into a Jaccard similarity matching algorithm to obtain an additional matching degree B between the query text and each preliminary search result_n。

2-3, respectively calculating matching degree X between the query text and each preliminary retrieval result based on the participle_n＝c·A_n+(0.5-c)·B_n(ii) a Wherein c is a first weight coefficient, and the value range of c is 0-0.5. According to matching degree X based on word segmentation_nAnd screening a plurality of search results based on the word segmentation.

S3, respectively calculating the matching degree Y based on complete semantics between the query text and each search result screened out in the step S2 based on the participle by utilizing a Bert-ESIM model_n. The Bert-ESIM text similarity calculation model comprises an improved ESIM network. The modified ESIM network uses a cosine similarity calculator instead of the Softmax component. A Bert chinese text feature extractor is used instead of the input encoder.

S4, respectively calculating the composite matching degree Z of the N retrieval results and the query text_n＝d·X_n+(0.5-d)·Y_n(ii) a Wherein d is a second weight coefficient, and the value range of d is 0-0.5. According to a composite matching degree Z_nAnd sequencing and outputting the N retrieval results from large to small.

Preferably, the attributes of the target participle include subject, predicate, object, and complement. The part of speech of the target participle comprises nouns, verbs, adjectives, adverbs, conjunctions, entity words, prepositions, quantitative words, names of people, place names and time;

preferably, in step S1, target participles belonging to a subject, a predicate, an object, an entity, a time, a place, or a quantifier are retained.

Preferably, the Chinese text word segmentation model adopts a combination network of a multi-layer Bi-GRU network and a CRF network. The Chinese text word segmentation model is obtained by training a Chinese word segmentation data set comprising cwb2-data, a people daily data set, SIGHAN Bakeoff2005 and a MSRA Microsoft Asian institute data set. The input of the Chinese text word segmentation model is a Chinese text, and the output is each target word segmentation in the Chinese text and the attribute and the part of speech of each target word segmentation.

Preferably, the Chinese text dependency syntax analysis model adopts a combined network of a Bi-layer LSTM network and an MLP network. The Chinese text dependency syntax analysis model is obtained by training a Chinese dependency syntax analysis data set comprising SemEval-2016, CoNLL, Penn Treebank and Baidu open source data set; the Chinese text dependency syntactic analysis model inputs the target participle and outputs the part of speech and the attribute of the target participle in the query text.

Preferably, in step 2-1, the parts of the target participles extracted from each preliminary search result, which belong to prepositions, fictional words and pronouns, are screened out.

Preferably, the regulatory database described in step S2 includes: the rules and regulations data obtained by scanning physical rules and regulations books, and the laws and regulations obtained by web crawlers. The local entity regulation book obtains unstructured picture data after scanning. Converting unstructured picture data into structured regulation data by using an OCR character recognition model; the OCR character recognition model is composed of a text detection model and a text recognition model, wherein a main network of the text detection model adopts MobileNet-small-50. The text recognition model adopts a combined network of a Bi-layer LSTM network and a CTC network. The OCR character recognition model takes ICDAR2019-LSVT, ICDAR2017-RCTW-17, Chinese street view character recognition, Chinese document character recognition and ICDAR2019-ArT as a training set and a test set; the input of the OCR character recognition model is a picture, and the output is the character content in the picture and the coordinates of characters.

Preferably, the Bert-ESIM text similarity calculation model adopts Chinese text matching data sets including CCKS2018, Chinese SNLI MultiNLI, LCQMC, OCNLI and XNLI as a training set and a test set.

Preferably, the Bert-ESIM text similarity calculation model includes a Transformer model, a Bert model, an ESIM model, and a cosine similarity calculator. The Bert-ESIM text similarity calculation model inputs text pairs and outputs complete semantic-based matching degree Y of the text pairs_n。

The method for acquiring the Bert-ESIM text similarity calculation model specifically comprises the following steps:

calling each layer of weight parameters of the Bert model.

Initializing each layer of weight parameters in the Bert model to obtain the Bert Chinese text feature extractor.

And thirdly, replacing an input encoder in the ESIM network by adopting a Bert Chinese text feature extractor.

And fourthly, replacing the Softmax component in the ESIM network by adopting a cosine similarity calculator to obtain the Bert-ESIM semi-pre-training network.

And using the training set and the test set to perform fine adjustment, training and testing on the Bert-ESIM semi-pre-training network to obtain a Bert-ESIM text similarity calculation model.

In a second aspect, the invention provides a deep learning-based system and regulation retrieval system, which comprises a query text receiving module, a system and regulation document uploading and processing module, a system and regulation text splitting and warehousing module, a crawler module, an algorithm module and a system and regulation retrieval and display module.

The query text receiving module is used for receiving the query text input by the user and carrying out basic processing on the query text. The basic processing comprises the steps of segmenting the query text and obtaining the part of speech and the attribute of the segmentation.

The system of regulation document uploading and processing module is used for receiving and processing the system of regulation documents with different structures uploaded by a user.

The system text splitting and warehousing module is used for splitting chapters and sections of structured system texts, integrating text information of each natural section and warehousing the finally standardized texts.

The crawler module is used for collecting legal and legal rules texts disclosed in the Internet.

The algorithm module is used for analyzing the query text, acquiring detailed information of the retrieval text and converting the unstructured data into the structured text. The algorithm module comprises a Chinese text word segmentation algorithm, a Chinese text dependency syntax analysis algorithm, an OCR character recognition algorithm, a Bert-ESIM text similarity calculation algorithm, a BM25 algorithm and a TF-IDF algorithm.

The rule system retrieval and display module is used for integrating the query text receiving module and the algorithm module to obtain a required retrieval result and displaying the inverted retrieval result to a user in a Web page mode.

The invention has the following beneficial effects:

1. the invention introduces ESIM network when calculating text similarity. Replacing the Softmax component with a cosine similarity calculator in the modified ESIM network; the Bert Chinese text feature extractor is used instead of the input encoder of the original network. Compared with a Softmax component, the cosine similarity calculator has the core idea that the similarity of two vectors is measured by using a cosine value of an included angle theta of the two vectors, and the Softmax component is usually used for multi-classification and needs to fuse the two vectors into one vector as input, so that the difference between the vectors is weakened, and the similarity of texts can be better calculated by adopting the cosine similarity calculator. Meanwhile, compared with an input encoder of an original network, the Bert Chinese text feature extractor has the advantages that model parameters trained on a large amount of Chinese text data can be used as initial parameters of a Bert Chinese text feature extractor model in the text in a migration training mode, and then the whole Bert-ESIM model is finely adjusted by utilizing a self-built regulation data set; therefore, on one hand, when a more complex encoder (Bert) is adopted, the extraction effect of the features can be improved, and meanwhile, the training time and the calculation time of the model can be effectively controlled.

2. The invention provides data support for the retrieval of the regulation and the regulation through the self-built regulation and regulation database, and simultaneously provides an uploading interface of the document, thereby facilitating the uploading of relevant internal regulations and regulations of users (enterprises, public institutions and the like), and improving the pertinence and the matching rate of the regulation and the regulation retrieval; based on deep learning, a Chinese text word segmentation model, a Chinese text dependency syntactic analysis model, an OCR character recognition model and a Bert-ESIM text similarity calculation model are realized, a method is provided for converting unstructured data into structured texts, detailed information of query texts is provided for retrieval of subsequent regulations, matching between the query texts and retrieval results is performed on the basis of word segmentation and semantics, the matching effect of the regulation retrieval is improved, and intelligent retrieval aiming at the regulations is finally realized.

Drawings

Fig. 1 is a schematic flow chart of a regulatory search method provided in embodiment 1 of the present invention;

fig. 2 is a block diagram schematically illustrating the structure of the regulatory search system according to embodiment 2 of the present invention.

Detailed Description

In order to make the purpose, technical solution and system structure of the present invention more clearly understood, the present invention will be further described in detail with reference to the accompanying drawings and embodiments. The specific embodiments described herein are merely illustrative of the invention and the scope of the invention is not limited to the following.

Example 1

As shown in fig. 1, the present embodiment provides a deep learning-based method for retrieving a regulation system, which aims to solve the technical problem that it is difficult for people to accurately obtain the specific content of the corresponding regulation system according to keywords by using a general search engine, and specifically includes the following steps:

s1, acquiring a query text provided by a user, and inputting the query text into a Chinese text word segmentation model and a Chinese text dependency syntactic analysis model to obtain each target word in the query text and the part of speech and the attribute of each target word; attributes of the target participle include subject, predicate, object, and complement. The part of speech includes noun, verb, adjective, adverb, conjunctive, entity word, preposition word, quantitative word, name of person, place name and time; and screening the target participles according to the part-of-speech and the attributes of each target participle, and reserving a subject, a predicate, an object, an entity word, time, a place and a quantifier.

The Chinese text word segmentation model adopts a combination network of a multi-layer (three-layer) Bi-GRU network and a CRF network. The Chinese text word segmentation model is obtained by training a Chinese word segmentation data set comprising cwb2-data, a people daily data set, SIGHAN Bakeoff2005 and a MSRA Microsoft Asian institute data set. The input of the Chinese text word segmentation model is a conventional Chinese text, and the output is each target word segmentation in the Chinese text and the attribute and the part of speech (namely, noun, verb, adjective, adverb, conjunctive, entity word, preposition, quantitative word, name of a person, place name and time) of each target word segmentation.

The Chinese text dependency syntax analysis model adopts a combined network of a Bi-layer LSTM network and an MLP network. The Chinese text dependency syntax analysis model is obtained by training a Chinese dependency syntax analysis data set comprising SemEval-2016, CoNLL, Penn Treebank and Baidu open source data set; the input of the Chinese text dependency syntax analysis model is target participles (obtained by segmenting Chinese texts by the Chinese text participle model), and the output is the part of speech and the attribute of the target participles in sentences.

S2, searching in a pre-self-constructed regulation and regulation database according to the original query text and the target participles screened in the step S1 to obtain N search results and a matching degree X between each search result, the query text and each target participle based on the participles_nN is less than or equal to 100, and the specific process is as follows:

2-1, searching a plurality of preliminary search results according to the original query text and the target participles screened out in the step S1. The preliminary search result includes a document-content portion (i.e., a specific content portion of the document to be searched) and a document-title portion (i.e., a subtitle of a paragraph to which the document to be searched belongs).

The document-content part and the document-title part of each preliminary search result are respectively input into the Chinese text participle model described in the step S1. And obtaining the target word segmentation in each preliminary retrieval result and the part of speech and the attribute of the target word segmentation. And screening out the parts of prepositions, fictional words and pronouns in the target participles extracted from each preliminary retrieval result.

2-2, inputting the target participles of the query text query screened in the step S1 and the target participles extracted from the document-content part of each preliminary search result into a traditional unsupervised matching algorithm BM25 or TF-IDF (the vocabulary of the TF-IDF algorithm is obtained in a self-constructed regulation database), and obtaining the basic matching degree A between the query text query and each preliminary search result document_n；

Inputting the target participles of the query text query screened in the step S1 and the target participles extracted from the document-title part of each preliminary search result into a Jaccard similarity matching algorithm to obtain an additional matching degree B between the query text query and each preliminary search result document_n。

2-3, obtaining the basic matching degree A according to the calculation in the step 2-2_nAnd an additional degree of matching B_nRespectively calculating the matching degree X between the query text query and each preliminary search result document based on the word segmentation by using a weighted distribution algorithm_n＝c·A_n+(0.5-c)·B_n(ii) a Wherein c is a first weight coefficient, the value range of c is 0-0.5, and specific numerical values are searched according to actual conditionsThe emphasis and requirements of. According to matching degree X based on word segmentation_nAnd screening N optimal search results documents based on the word segmentation from large to small, wherein N is less than or equal to 100.

A self-constructed regulatory database includes: the rules and regulations data obtained by scanning local entity rules and regulations books and the open laws and regulations are obtained through a web crawler. The method comprises the steps that a large amount of relevant unstructured picture data are obtained after local entity regulation books are scanned, and an OCR character recognition model is used for converting the picture data into structured regulation data; the OCR character recognition model is composed of a text detection model and a text recognition model, wherein a main network of the text detection model adopts MobileNet-small-50. The text recognition model adopts a combined network of a Bi-layer LSTM network and a CTC network. The OCR character recognition model is obtained by taking ICDAR2019-LSVT, ICDAR2017-RCTW-17, Chinese street view character recognition, Chinese document character recognition, ICDAR2019-ArT and partially synthesized data as a training set and a test set; the input of the OCR character recognition model is a picture, and the output is the character content in the picture and the coordinates of characters.

S3, respectively calculating text similarity (short text-long text) between the original query text query input by the user and the document-content parts of the N search results based on the word segmentation screened in the step S2 by utilizing a Bert-ESIM model to obtain matching degree Y based on complete semantics between each of the N search results based on the word segmentation and the original query text_nThe specific process is as follows:

the main network of the Bert-ESIM text similarity calculation model consists of a Transformer model, a Bert model, an ESIM model and a cosine similarity calculator. The Bert-ESIM text similarity calculation model adopts an open-source Chinese text matching data set comprising CCKS2018, Chinese SNLI MultiNLI, LCQMC, OCNLI and XNLI as a training set and a test set; after training and testing, a usable model is finally obtained. The Bert-ESIM text similarity calculation model inputs a text pair, specifically a text pair consisting of a query text query and a document-content part of a search result, and outputs a complete language-based text pair between the query text query and the document-content part of the search resultDegree of semantic matching Y_n。

calling each layer of weight parameters of the Bert model based on a large amount of Chinese texts.

And thirdly, replacing an input encoding part in the ESIM network by adopting a Bert Chinese text feature extractor.

In the Bert-ESIM text similarity calculation model, a basic component for feature extraction is a transform component (mainly an Encoder-Decoder structure), a 12-layer transform component is used for forming a Bert Chinese text feature extractor, then an input encoding part of the original ESIM is replaced by the Bert, and finally a cosine similarity calculator is used for replacing a Softmax component. Because the Bert network is complex and has a lot of parameters, the transfer learning is adopted in the specific implementation process, and then the whole Bert-ESIM network is subjected to fine tuning and training by using a Chinese text matching data set so as to achieve the optimal effect.

S4, respectively calculating the composite matching degrees Z of the N search result documents and the query text based on the word segmentation_n＝d·X_n+(0.5-d)·Y_n(ii) a And d is a second weight coefficient, the value range of d is 0-0.5, and the specific numerical value is determined according to the emphasis point and the requirement during actual retrieval. According to a composite matching degree Z_nAnd sequencing the N retrieval results from large to small, returning the N retrieval results to the Web front end according to the well-arranged sequence, and displaying the N retrieval results to the user.

Example 2

As shown in fig. 2, a deep learning based regulatory search system, the regulatory search system comprising:

the query text receiving module: the system is used for receiving query texts input by a user and performing basic processing on the query texts. The basic processing comprises the steps of performing word segmentation and dependency syntax analysis on the query text to obtain target words of the query text and the part of speech and attributes of the target words.

A regulation and regulation document uploading and processing module: and the system and method are used for receiving and processing different-structure (TXT, PDF, picture and the like) regulation documents uploaded by a user. Meanwhile, the module converts unstructured data (such as pictures) into structured text data by using an OCR character recognition interface.

The system text splitting and warehousing module comprises: the method is used for splitting chapters and sections of the structured regulation text, integrating text information of each natural section (the content of the natural section, the chapter to which the natural section belongs, and the subtitle of the section or chapter closest to the natural section), and finally warehousing the text after standardization.

A crawler module: the system is used for collecting the legal and legal texts disclosed in the Internet and providing a data source for the construction of a regulation database. The module mainly collects data of certain specific websites to obtain corresponding legal and legal data.

An algorithm module: the module is used for analyzing the query text, acquiring detailed information of the retrieval text and converting unstructured data into structured text, and comprises a Chinese text word segmentation algorithm, a Chinese text dependency syntax analysis algorithm, an OCR character recognition algorithm, a Bert-ESIM text similarity calculation algorithm, a BM25 algorithm and a TF-IDF algorithm.

A rule system retrieval and display module: and the query text receiving module and the algorithm module are integrated to obtain a required retrieval result, and the inverted retrieval result is displayed to a user in a Web page mode.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A deep learning-based regulation retrieval method is characterized in that: s1, acquiring a query text provided by a user, and inputting the query text into a Chinese text word segmentation model to obtain each target word segmentation in the query text; inputting each target word segmentation into a Chinese text dependency syntax analysis model to obtain the part of speech and the attribute of each target word segmentation; screening the target participles according to the part of speech and the attributes of each target participle;

s2, searching in the regulation database to obtain a plurality of search results, and calculating the matching degree X of each search result based on the word segmentation_nThen, screening N retrieval results meeting the requirements;

2-1, retrieving a plurality of preliminary retrieval results according to the original query text and the target participles screened in the step S1; the preliminary search results both comprise a document-content part and a document-title part; the document-content part is a specific content part of the search result; document-title is the title or subtitle of the paragraph to which the search result belongs; inputting each preliminary search result into the Chinese text word segmentation model and the Chinese text dependency syntax analysis model in the step S1; obtaining target word segmentation in each preliminary retrieval result and the part of speech and the attribute of the target word segmentation;

Respectively inputting the target participles of the query text screened in the step S1 and the target participles extracted from the document-title part of each preliminary search result into a Jaccard similarity matching algorithm to obtain an additional matching degree B between the query text and each preliminary search result_n；

2-3, respectively calculating matching degree X between the query text and each preliminary retrieval result based on the participle_n＝c·A_n+(0.5-c)·B_n(ii) a Wherein c is a first weight coefficient, and the value range of c is 0-0.5; according to matching degree X based on word segmentation_nScreening out a plurality of search results based on the word segmentation;

s3, respectively calculating the matching degree Y based on complete semantics between the query text and each search result screened out in the step S2 based on the participle by utilizing a Bert-ESIM model_n(ii) a The Bert-ESIM text similarity calculation model comprises an improved ESIM network; replacing the Softmax component with a cosine similarity calculator in the modified ESIM network; replacing the input encoder with a Bert chinese text feature extractor;

s4, respectively calculating the composite matching degree Z of the N retrieval results and the query text_n＝d·X_n+(0.5-d)·Y_n(ii) a Wherein d is a second weight coefficient, and the value range of d is 0-0.5; according to a composite matching degree Z_nAnd sequencing and outputting the N retrieval results from large to small.

2. The deep learning-based regulatory search method of claim 1, wherein: the attributes of the target participles comprise subjects, predicates, objects, determinants, subjects and complements; the part of speech of the target participle comprises nouns, verbs, adjectives, adverbs, conjunctions, entity words, prepositions, quantitative words, names of people, place names and time.

3. The deep learning-based regulatory search method of claim 1, wherein: in step S1, target participles belonging to the subject, predicate, object, entity, time, place, or quantifier are retained.

4. The deep learning-based regulatory search method of claim 1, wherein: the Chinese text word segmentation model adopts a combination network of a multi-layer Bi-GRU network and a CRF network; the Chinese text word segmentation model is obtained by training a Chinese word segmentation data set comprising cwb2-data, a people daily data set, SIGHANBAKEOFF2005 and a MSRA Microsoft Asian institute data set; the input of the Chinese text word segmentation model is a Chinese text, and the output is each target word segmentation in the Chinese text and the attribute and the part of speech of each target word segmentation.

5. The deep learning-based regulatory search method of claim 1, wherein: the Chinese text dependency syntax analysis model adopts a combined network of a double-layer Bi-LSTM network and an MLP network; the Chinese text dependency syntax analysis model is obtained by training a Chinese dependency syntax analysis data set comprising SemEval-2016, CoNLL, PennTreebank and Baidu open source data set; the Chinese text dependency syntactic analysis model inputs the target participle and outputs the part of speech and the attribute of the target participle in the query text.

6. The deep learning-based regulatory search method of claim 1, wherein: in the step 2-1, the parts of prepositions, fictional words and pronouns in the target participles extracted from each preliminary retrieval result are screened out.

7. The deep learning-based regulatory search method of claim 1, wherein: the regulation database described in step S2 includes: obtaining regulation and regulation data by scanning an entity regulation and regulation book and laws and regulations obtained by a web crawler; the method comprises the steps that local entity regulation books obtain unstructured picture data after scanning; converting unstructured picture data into structured regulation data by using an OCR character recognition model; the OCR character recognition model is composed of a text detection model and a text recognition model, wherein a main network of the text detection model adopts MobileNet-small-50; the text recognition model adopts a combined network of a double-layer Bi-LSTM network and a CTC network; the OCR character recognition model takes ICDAR2019-LSVT, ICDAR2017-RCTW-17, Chinese street view character recognition, Chinese document character recognition and ICDAR2019-ArT as a training set and a test set; the input of the OCR character recognition model is a picture, and the output is the character content in the picture and the coordinates of characters.

8. The deep learning-based regulatory search method of claim 1, wherein: the Bert-ESIM text similarity calculation model adopts Chinese text matching data sets including CCKS2018, Chinese SNLIMultiNLI, LCQMC, OCNLI and XNLI as a training set and a test set.

9. The deep learning-based regulatory search method of claim 1, wherein: the Bert-ESIM text similarity calculation model comprises a Transformer model, a Bert model, an ESIM model and a cosine similarity calculator; the Bert-ESIM text similarity calculation model inputs text pairs and outputs complete semantic-based matching degree Y of the text pairs_n；

calling each layer of weight parameters of the Bert model;

initializing each layer of weight parameters in the Bert model to obtain a Bert Chinese text feature extractor;

replacing an input encoder in the ESIM network by adopting a Bert Chinese text feature extractor;

replacing a Softmax component in the ESIM network by adopting a cosine similarity calculator to obtain a Bert-ESIM semi-pre-training network;

10. A deep learning-based regulation and regulation retrieval system comprises a query text receiving module, a regulation and regulation document uploading and processing module, a regulation and regulation text splitting and warehousing module, a crawler module, an algorithm module and a regulation and regulation retrieval and display module; the method is characterized in that: the query text receiving module is used for receiving a query text input by a user and carrying out basic processing on the query text; the basic processing comprises the steps of segmenting the query text, and acquiring the part of speech and the attribute of the segmentation;

the system document uploading and processing module is used for receiving and processing the system documents of different structures uploaded by the user;

the system text splitting and warehousing module is used for splitting chapters and sections of structured system texts, integrating text information of each natural section and warehousing the finally standardized texts;

the crawler module is used for collecting legal and legal rules texts disclosed in the Internet;

the algorithm module is used for analyzing the query text, acquiring detailed information of the retrieval text and converting unstructured data into a structured text; the algorithm module comprises a Chinese text word segmentation algorithm, a Chinese text dependency syntax analysis algorithm, an OCR character recognition algorithm, a Bert-ESIM text similarity calculation algorithm, a BM25 algorithm and a TF-IDF algorithm;