CN115544211A

CN115544211A - Method for external trade and external law indexing and industry risk assessment

Info

Publication number: CN115544211A
Application number: CN202211335205.6A
Authority: CN
Inventors: 车流畅; 徐祎涵; 裴兆斌; 冉令博; 韩炎津; 刘亚芳; 张菁芃; 张睿涵; 王旭; 韩雪
Original assignee: Shenyang Normal University
Current assignee: Shenyang Normal University
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2022-12-30

Abstract

The invention provides a method for external trade foreign-involved law indexing and industry risk assessment, which comprises the following steps: external trade foreign law indexing and industry risk assessment; wherein, the step of external trade foreign law index includes: acquiring data of the foreign-involved legal documents, preprocessing the data of the foreign-involved legal documents, constructing an index similarity network of the foreign-involved legal documents, and sequencing the foreign-involved legal documents based on the similarity of the quotations to give related legal documents and case lists meeting the index requirements; the industry risk assessment method comprises the following steps: identifying the influence factors of the legal risk, selecting a plurality of factors as the legal risk factors, establishing a legal risk factor set in a hierarchical manner, constructing a judgment matrix according to the established legal risk factor set, determining the evaluation weight according to the constructed judgment matrix, establishing a fuzzy comprehensive evaluation matrix, and evaluating the legal risk level. The method and the system can help foreign trade enterprises to quickly find out the cause of legal risk and evaluate the legal risk of foreign trade enterprises.

Description

Method for external trade and external law indexing and industry risk assessment

Technical Field

The invention relates to the field of computers, in particular to a method and a system for indexing data, and more particularly relates to a method for external trade foreign law indexing and industry risk assessment.

Background

Foreign trade connects two markets in China and China, and plays an important role in constructing a new development pattern. The innovation of foreign trade system is continuously promoted in China, the goods trade is developed in a crossing manner, the trade structure is continuously optimized, the international market is continuously expanded, and the method makes important contribution to the development of economy and society. The most important factors for foreign trade are cost and safety, so enterprises can benefit from walking away without knowing laws of foreign entities. When the problem is not considered from the perspective of law by external trade enterprises and overseas intellectual property management, organization management and human resource management are not considered, legal risks are easy to occur. When disputes occur in investment and problems such as environmental protection, intellectual property, labor service, contract management and the like occur in the operation stage, how to quickly inquire the relevant legal documents and relevant cases of the country where the index is located is particularly important. Especially in the foreign american and english law system, legal problems are carried out by means of case cases, so that the process of legal reasoning and decision making depends heavily on the information stored in the text file. The foreign trade legal services industry is still in the primary stage at present, although it has undergone a long development process, the form is still based on the traditional counselor way that professional lawyers of enterprises are adopted as legal counselors, and there is not much change and development, and the development of the industry faces the problem and risk that each system formation, scale splitting and the like are the development of the industry.

Many online legal databases provide convenient access to such legal documents. These databases allow users to search according to legal terms, and these search options require that the query be very accurately formulated using terms specific to the domain. In addition, with the construction of online legal databases, legal information retrieval has become the core of many legal laws and case query processes today. A large portion of these online legal data consists of unstructured and textual data. The legal domain is considered to be a very complex domain, and the inquiry process relies heavily on the interpretation of knowledge by legal experts. The legal field stores huge information in the form of texts and documents. Legal information can be categorized under different headings, such as court notes, decisions, statements, and the like. These documents are a valuable repository of useful information about legal interpretations, which must be referred to by external trade-in laws. The effectiveness of traditional document lookups is limited due to the complexity of the legal knowledge contained in the legal document. It is necessary to establish the relevance and similarity between two cases, as explained in various legal documents. Therefore, it is very necessary to improve the external trade external law risk management capability, inquire the relevant legal laws and cases, and find out the influencing factors of the law risk, and further evaluate the external trade external law risk.

Disclosure of Invention

The invention provides a method for foreign law indexing and industry risk assessment for external trade, which is realized by adopting the following technical scheme in order to solve the technical problems.

The invention discloses a method for external trade foreign-involved law indexing and industry risk assessment, which comprises the following steps: external trade foreign law indexing S1 and industry risk assessment S2; wherein,

the step of the external trade foreign law index S1 comprises the following steps:

s11, collecting data of the foreign legal documents;

s12, preprocessing the data of the foreign legal documents;

s13, constructing a vector space model based on the preprocessed foreign-involved legal document data;

s14, constructing an index similarity network of the foreign-involved legal documents;

s15, giving out related legal documents and case lists meeting the index requirements based on the citation similarity sorting;

the step of industry legal risk assessment S2 includes:

s21, identifying influence factors of legal risks;

s22, selecting a plurality of factors as legal risk factors, and establishing a legal risk factor set in a hierarchical manner;

s23, constructing a judgment matrix according to the established legal risk factor set;

s24, determining an evaluation weight according to the constructed judgment matrix;

and S25, establishing a fuzzy comprehensive evaluation matrix and evaluating the legal risk level.

Further, the data of the foreign-involved legal documents in step S11 includes a query set and a corresponding document set, and the query set and the document set are subdivided into separate file sets.

Further, the preprocessing technique in step S12 includes: tokenization, stop word deletion, punctuation deletion, and wordbreak.

Further, legal risk factors include internal and external influencing factors.

Further, the investigation is completed by k experts, and the relative importance of each legal risk factor index is labeled using a proportional scaling method.

Further, the overall legal risk level is evaluated based on the maximum of all elements in the evaluation matrix.

The method for external trade foreign-involved law indexing and industry risk assessment is based on the fact that the quoted similarity is closer to the expert evaluation of human beings on the similarity of legal documents, not only one-to-one links are taken as the only index of the similarity, but also whether a path from one node to another node exists or not is taken into consideration to determine the similarity, the quoted network analysis can be effectively used for estimating the similarity index, the mutual relation among various legal concepts can be understood through the quoted links, and the network can be further analyzed through applying a link analysis algorithm; meanwhile, the multi-criterion evaluation problem in the fuzzy environment is solved, the adopted evaluation method identifies the legal risk of the foreign trade enterprises, helps the foreign trade enterprises to quickly find out the reason of the legal risk, and evaluates the legal risk of the foreign trade enterprises.

Drawings

FIG. 1 is a flow chart of a method for external trade foreign law indexing and industry risk assessment according to the present invention.

FIG. 2 is a schematic diagram of a structure of a target document vector for external trade foreign law indexing according to the present invention.

FIG. 3 is a risk factor diagram of a method for external trade foreign law indexing and industry risk assessment according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The invention provides a method for external trade foreign-involved law indexing and industry risk assessment, which is realized by adopting the following technical scheme in order to solve the technical problems.

A method for external trade foreign law indexing and industry risk assessment comprises the following steps: external trade foreign law indexing S1 and industry risk assessment S2; wherein,

s11, collecting the data of the foreign legal documents:

the national and legal incident decisions and legal documents form a data corpus comprising query sets and corresponding document sets, and the query sets and the document sets are subdivided into separate document sets.

S12, preprocessing the data of the foreign legal documents:

the data set of the foreign-involved legal documents consists of a query set and a document set, and the query file is split to obtain a single query. In particular, the query document is divided into a plurality of different queries in order to measure similarity.

Foreign-involved legal documents are completely unstructured and require language preprocessing to convert unstructured data into the appropriate structured information. A pretreatment technique comprising: marking, stopping word deletion, punctuation mark deletion and word sourcing; and carrying out word drying and standardization treatment on case judgment. The term document matrix is then constructed using the scores. The reference data for each case decision is recorded separately.

By means of strong database support, the system provides self-service retrieval service of relevant information such as judicial cases, policy regulations, legal provisions and the like for users, can integrate relevant legal information of foreign trade industries, improves retrieval efficiency, and is accurate and comprehensive in information collection and high in retrieval efficiency.

S13, constructing a vector space model based on the preprocessed foreign-related legal document data:

when the unstructured documents and the query technology are adopted for searching the information of the foreign-involved legal documents, the most similar documents are searched for the given user query, and the documents are searched according to the ranking sequence of the similarity. In the vector space model, text corresponding to foreign-law documents and queries is converted into numerical vectors.

The vector space model comprises three groups of model types including a word group package model, a document vector model and a phrase vector model. The phrase bag is used for digital statistics and emphasizes the importance of certain word groups in the corpus to the foreign legal documents. The document index in the vector space is in a multi-label learning form, the input space is expressed as a certain characteristic space chi of the foreign-involved legal document D, and the output space is expressed as a power set 2 gamma of a finite set of n digital vectors gamma. Given a training corpus

Learning to predict functions of invisible document digit vectors

Word embedding is the dense representation of words in the form of a numeric vector, revealing many hidden relationships from word to word.

In the term frequency-inverse document frequency approach, each term or phrase found in the vocabulary or corpus is represented by a different, orthogonal dimension. In the speech frequency-inverse document frequency method, the frequency of a single term in the text is measured and multiplied by the log inverse document frequency of that term in the entire corpus. The term frequency-inverse document frequency method is difficult to modify, such as adding other new documents with multiple dimensions. In order to solve the problems, an improved language frequency-inverse document frequency method is provided, indexes are added, and adjustment is carried out according to the duration of word use on a corpus time axis.

The term frequency tf (t, d) specifies the number of times a term t appears in a document d, as measured by the number of times the term appears in the vocabulary in the foreign-related legal document, as a counting function.

Where fr (x, t) is defined as:

since the length of an outlying legal document is different, for normalization, the term frequency is used divided by the length of the outlying legal document as the total number of terms in the outlying legal document:

TF(t，d)＝(Ct)/(D1)

where Ct is the term frequency and Dl is the length of the foreign-involved legal document.

How much information is given by a phrase in the set of foreign-involved legal documents D is measured by the inverse document frequency IDF (t, D) to indicate whether the word appears infrequently or commonly in the corpus of foreign-involved legal documents. Mathematically, it is calculated from the inverse logarithmic scale ratio of the foreign-involved legal documents containing the phrase.

IDF(t，D)＝log _e (Cd/Cdt)

Where Cd is the foreign-involved legal document count, cdt is the foreign-involved legal document count that contains t.

Where, | { d: t e d i is the number of foreign legal documents containing t, plus 1 to avoid except zero error.

Thus, from the formula:

tf-idf(t)＝tf(t，d)×idf(t，d)

foreign legal documents are represented by a set of words that form a word-document matrix. Preprocessing of foreign-involved legal documents also includes stemming and wording, forming terms, modeling each foreign-involved legal document by measuring the number of occurrences of each term. The phrase package represents the text regardless of the order and syntax of the text.

In the word-document matrix, each row represents a term, and each column represents a foreign-involved legal document. In the matrix, w _ij This value represents the number of times i items appear in the foreign-involved legal document j. Such as W _3,11 =29 indicates that the phrase denoted 3 appears 29 times in the 11 th foreign-involved legal document of the set. If the input is a collection of n foreign-involved legal documents containing w phrases, the phrase package is represented as an n x w matrix.

The phrase vector model is a set of relevant models for generating phrase embedding, and the neural network is used for reconstructing the language context of the phrase. Phrase-to-vector a vector space is generated from a large corpus of text, with considerable dimensionality, where each different term in the corpus is assigned a matching vector in the space. The vector space is arranged with word vectors such that words in the corpus having a common context are close to each other in space. The main purpose of the phrase vector model is to understand the distribution of expressions for each target term by specifying the context. Each embedded dimension represents a potential feature of the phrase, and cosine similarity can be used to compute a similarity operation for the vector. When the model is initialized, the lowest count of input phrases such as words with frequency lower than 20 is discarded.

Since the continuous phrase packet algorithm is suitable for larger data sets, the model is trained using the continuous phrase packet algorithm. The working mode of the continuous phrase packet algorithm is as follows: when a context is given, the probability of a phrase is predicted, and the context is specified with a single word or a group of words. The phrase context model predicts the target words of a given context word in the sliding window. The sliding window is composed of an input layer, a hidden layer and an output layer. In the phrase context model, the input to the neural network is a one-hot coded vector. For a given channel defined by x ₁ ，…，x _v The list of input context word sets is shown with only one word being 1 and the others being 0. In this model, W represents a V × N matrix between the input layer and the hidden layer. W matrix vector v of relevant words in input layer _w Each row of (1). For line i of W

And (4) showing. Thus, given a context phrase, assume x _k =1, and for k '≠ k, x' _k =0; the following can be obtained:

wherein, w _I Is an input word, represented by a vector

The k-th row of the matrix W is copied to the h-th row. When a scalar bias value exists in the model, the weighted sum of the input layer plus the bias value is transferred to the hidden layer.

From hidden layer to output layer, there is a different weight matrix W '= { W' _ij And an N × V matrix. N represents the dimensionality of the phrase. In addition, N is any hyper-parameter of the neural network, which represents the number of neurons in the hidden layer. In the phrase vector model, there is no linear activation function between layers. And inputting the hidden weight as a hidden activation weight. Using hidden activation weights h and hidden output weights

Performing dot product to calculate a score u for each phrase in the training corpus _j The formula is as follows:

further, the output of the model output layer is calculated. Output y _j By inputting u _j Obtained by a soft maximum function.

By combining the above formulas, we obtain:

the above steps represent forward propagation, followed by a backward propagation step, learning the weight matrix and calculating the loss function. The weights are updated in all layers, i.e. the input layer, the hidden layer and the output layer, by calculating the errors and constantly readjusting the weights. For each word pair, a maximum likelihood estimation technique using a form of cross entropy minimizes the loss. The continuous phrase packet algorithm minimizes the average negative log probability, as follows:

specifically, the dimension of the feature vector is set to 200. After the vocabulary and training input data are constructed, a learned word vector representation is performed on the test set documents.

An unsupervised algorithm for generating vectors for foreign-involved legal documents is a document vector model, which is a variant of creating a vector phrase vector model for a phrase. The similarity between the foreign-involved legal documents is searched by utilizing a vector generated by a foreign-involved legal document vector model, continuous phrases are randomly extracted from a paragraph by the model, and a central word is predicted from a randomly extracted phrase set by taking a context phrase and a paragraph id as input. Approximate foreign-involved legal documents are distinguished in vector space. The target of the foreign-involved legal document vector model learning is only to maximally improve the probability of predicting the target phrase under the condition that the given phrase and the foreign-involved legal document are used as contexts.

Wherein, W = [ W = ₁ ，w ₂ ，w ₃ ，...，w _T ]Representing a sequence of training words. T is the vocabulary of the training phrase. Accordingly, D = [ D ] ₁ ，d ₂ ，d ₃ ，...，d _T ]Is a sequence of documents. w is a _t Is toCorresponding to x in FIG. 2 _i+3 Target foreign-law document vector of (1), i.e. w _t ：＝x _i+3 ，

The goal of constructing a continuous phrase packet algorithm training model is to minimize the loss functions associated with certain classifiers with respect to phrase embedding and classifier parameters, so that neighboring phrases can be predicted from each other. The model according to fig. 2 minimizes the following average negative log probability:

the continuous phrase packet algorithm trains the model to minimize the target function, estimates the loss function using noise contrast, and distinguishes the target words from the noise samples using a logistic regression classifier. A sample is selected from a true distribution, which consists of a true class and some other noise class labels. Noise contrast evaluation dependent on the input word set w _I The purpose is to predict the output word w ^u . Given samples of N other phrases selected from the noise sample distribution Q

Represents Q.

The loss function is:

to classify location-sensitive foreign-involved legal documents in vector space, the document (or paragraph) vector is phrase-trained using differential training. To generate a location-agnostic numeric vector for a foreign-involved legal document, a set of phrases from a particular context and a general context are trained. Both generic words (i.e., index words that do not describe the nature of the document) and specific words (index words that describe the nature of the document) are considered. This common goal is represented by the following formula:

and then, generating the probability of each foreign-involved legal document feature vector by using the tuning parameters determined by the multithread gradient calculation and the critical section weight update:

using differential training, different emphasis is given to words extracted from specific and general contexts; and then, locally sensitive foreign-involved legal documents which are very similar to each other are classified by means of cosine similarity and the like. And converted to a vector space classification scheme. The model is extended by training and adding index words related to each foreign-involved legal document feature vector, allowing a user to view the index words associated with each foreign-involved legal document vector in a vector space.

S14, constructing an index similarity network of the foreign-involved legal documents:

cosine and Jaccard similarity of the collection of foreign-involved legal documents and the collection of queries are measured. The cosine similarity measure is used to calculate the cosine value of the included angle between the query and the document vector, as shown in the following formula:

the numerator is the dot product of the query vector q and the document vector d, and the denominator is the product of the Euclidean lengths of the query vector q and the foreign-involved legal document vector d.

The Jaccard coefficient is defined as the size of the intersection divided by the size of the union of the foreign legal document and the query vector, as shown by the following equation:

constructing a foreign law document set into a network, and representing information obtained in preprocessing by using proper nodes and edges; the nodes represent a case or a case, and the edges between the nodes represent the correlation of the two cases; the edge weight is very important in the analysis of the quotation network, so that the weight distribution is carried out on the edge as a correlation measure. The similarity values are used as edge weights when constructing a foreign-involved legal document set network. In the resulting network, the presence of a direct link or path from one node to another indicates similarity.

And S15, giving out related legal documents and case lists meeting the index requirements based on the citation similarity sorting.

The industry legal risk assessment S2 includes: establishing a legal risk assessment hierarchical model of a comprehensive evaluation index system;

s21, identifying influence factors of legal risks;

Providing data support and guarantee by depending on a database, and performing intelligent risk assessment and risk avoidance suggestions by an artificial intelligence system; and entering a risk evaluation interface after completing the data, and automatically analyzing the risk by the system and providing intuitive risk exit and avoidance suggestions for the user for reference.

The external trade enterprise firstly identifies the influence factors of legal risk in the risk assessment stage. Legal risk measurement has no uniform standard, and different measurement methods can be adopted according to different purposes.

In the legal risk assessment process, documents and theoretical analysis can be adopted, experts are invited to complete legal risk investigation, and a plurality of factors are selected as a legal risk factor set. Suppose B = { B1, B2} is a set of legal risk evaluations for an external trade enterprise. B1 is an internal influencing factor, and B2 is an external influencing factor. B1= { C1, C2} is a set of external influence factors including foreign local industry environment C1, law and regulation environment C2. B2= { C3, C4, C5} is a set of internal influencing factors, including intellectual property management C3, people management C4.

Wherein C1= { D1, D2, D3, D4}, C2= { D5, D6, D7, D8}, C3= { D9, D10, D11, D12}, C4= { D13, D14, D15, D16}, respectively include relevant factors such as inter-enterprise competition, judicial environment, supervision mechanism, human resource management system, establishment and implementation of intellectual property system, and the like.

According to the established legal risk factor set, the following steps are as follows:

1) Constructing a judgment matrix: the judgment matrix is used for judging the relative importance of the two indexes based on the constraint condition of the previous stage; the decision matrix may be used to determine weights; assume the decision matrix is: q = (α) _ij ) n is multiplied by n; wherein alpha is _ij ＞0，α _ij ＝1/α _ji And n is the index number of the same hierarchy.

The decision matrix may be constructed as:

according to the actual situation and the evaluation requirement, the k experts complete the investigation, and the relative importance of each index is marked by using a proportional scaling method.

2) Determining an evaluation weight; if the fraction is given by k experts then (α) _ij ) _k A for the judgment of the kth expert _ij The fraction of (c). The geometric mean of each index score may be calculated as:

calculating the geometric mean value alpha _ij ' the weight of each index can be described as:

the feature vector description of the judgment matrix W is:

W＝(w ₁ ，w ₂ ，...，w _n )

the weights are verified by consistency checking using matrix theory. Lambda [ alpha ] _max For maximum feature root, AW _i For the ith component of AW, CI is a consistency indicator. The consistency check of the fuzzy judgment matrix is described as follows:

random consistency is different on different scales, so a consistency ratio CR is introduced as a consistency evaluation index:

the RI rule for judging the random consistency index of the matrix is as follows: when CR is less than 0.1, the matrix consistency is judged to be acceptable. If CR ≧ 0.1, the decision matrix needs to be adjusted to achieve acceptable consistency.

The random consistency index RI of the matrix is judged as follows:

n	1	2	3	4	5	6	7	8	9
										RI	0	0	0.58	0.9	1.12	1.24	1.32	1.41	1.45

3) Establishing a fuzzy comprehensive evaluation matrix: establishing an evaluation set V for judging the risk level of each index, and expressing the risk level as follows:

v＝{v ₁ ，v ₂ ,v ₃ ，v ₄ ,v ₅ }

V ₁ 、V ₂ 、V ₃ 、V ₄ 、V ₅ low risk, relatively low risk, medium risk, relatively high risk, respectively. And (4) judging the risk level of each index by combining with an expert, and forming a fuzzy membership matrix R by the risk level.

Establishing a fuzzy comprehensive evaluation matrix U, wherein the evaluation matrix is described as follows:

U＝W*R。

based on the maximum of all elements in the evaluation matrix, the overall legal risk level is assessed.

The method for external trade foreign-involved law indexing and industry risk assessment is based on the fact that the quoted similarity is closer to the expert evaluation of human beings on the similarity of legal documents, not only one-to-one links are taken as the only index of the similarity, but also whether a path from one node to another node exists or not is taken into consideration to determine the similarity, the quoted network analysis can be effectively used for estimating the similarity index, the mutual relation among various legal concepts can be understood through the quoted links, and the network can be further analyzed through applying a link analysis algorithm; meanwhile, the multi-criterion evaluation problem under the fuzzy environment is solved, the adopted evaluation method identifies the legal risk of the foreign trade enterprises, helps the foreign trade enterprises to quickly find out the reason of the legal risk, and evaluates the legal risk of the foreign trade enterprises.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for external trade foreign law indexing and industry risk assessment is characterized by comprising the following steps: external trade foreign law indexing S1 and industry risk assessment S2; wherein,

the step of the external trade external law index S1 comprises the following steps:

s11, collecting data of the foreign legal documents;

s12, preprocessing the data of the foreign legal documents;

the step of industry legal risk assessment S2 includes:

s21, identifying influence factors of legal risks;

2. The method for foreign-involved legal indexing and industry risk assessment according to claim 1, wherein the foreign-involved legal document data in step S11 comprises query sets and corresponding document sets, and the query sets and the document sets are subdivided into separate document sets.

3. The method for external trade foreign law indexing and industry risk assessment according to claim 1, wherein the preprocessing technique in step S12 includes: tokenization, stop word deletion, punctuation deletion, and wordbreak.

4. The method for external trade foreign law indexing and industry risk assessment according to claim 1, wherein legal risk factors include internal and external influence factors.

5. The method for foreign law indexing and industry risk assessment according to claim 1, wherein the k experts complete the survey and mark the relative importance of each legal risk factor index using the proportional scaling method.

6. The method for external trade foreign law indexing and industry risk assessment according to claim 1, wherein the overall legal risk level is assessed based on the maximum value of all elements in the evaluation matrix.