CN114881028A - Case similarity matching method and device, computer equipment and storage medium - Google Patents
Case similarity matching method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN114881028A CN114881028A CN202210646944.0A CN202210646944A CN114881028A CN 114881028 A CN114881028 A CN 114881028A CN 202210646944 A CN202210646944 A CN 202210646944A CN 114881028 A CN114881028 A CN 114881028A
- Authority
- CN
- China
- Prior art keywords
- text
- case
- case text
- processing
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 158
- 239000013598 vector Substances 0.000 claims description 135
- 230000011218 segmentation Effects 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 description 11
- 238000000605 extraction Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 5
- 238000012163 sequencing technique Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004091 panning Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a case similarity matching method, a case similarity matching device, computer equipment and a storage medium, wherein the method comprises the following steps of; acquiring case judgment text in a case database; collecting stop words and special noun words from case judgment text and generating a stop word library and a special word library; selecting a first case text and a second case text which need to be subjected to similarity matching from case judgment text; inputting the first case text and the second case text into a twin network for processing to obtain a similar probability value of the first case text and the second case text; and if the similarity probability values of the first case text and the second case text meet the set similarity threshold, judging that the first case text and the second case text are similar cases. The invention improves the effectiveness and accuracy of case similarity matching.
Description
Technical Field
The invention relates to the technical field of data retrieval, in particular to a case similarity matching method, a case similarity matching device, computer equipment and a storage medium.
Background
With the development of the times, court trial cases are increased, case judgment books in a case library are more and more, two or more similar cases are often required to be found out from the case library for comparison during case analysis, and at present, the following modes are provided for case similarity search.
The first method is to extract cases in a case database, extract the attribute elements of people and vehicles in each case object model respectively, add the extracted attribute elements to a corresponding people and vehicle array comparison container, calculate the similarity of the attributes of people and vehicles in each people and vehicle array to be compared, record at least two attribute element objects with the maximum similarity and corresponding similarity values in a similarity mapping table in a key value pair mode, and finally perform similarity sequencing display on the attribute elements of people and vehicles in the case object models of each case according to the similarity mapping table. The method performs similarity matching of articles according to the key value pairs of the attributes, ignores a large amount of non-human vehicle information, seriously loses non-human vehicles or case information related to other fields, and seriously influences the case matching effect.
The second one is that three blocks of case facts, dispute focus and referee results of the document are extracted by using the layout and key words of the document as constraint conditions and utilizing an automatic extraction algorithm; based on the domain word list, the subject words of all the document segment blocks are respectively extracted by using the subject model to obtain the subject word block and the non-subject word block of all the document segment blocks, and the feature inverted index is constructed according to the subject words of all the document segment blocks and the feature words in the non-subject words. And mapping the reverse feature index into a feature vector, calculating the similarity between the query statement and each document in the document data set by using a topic similarity model, sequencing the similarity between the query statement and each document in the document data set, and outputting a sequencing result to complete document retrieval. The method only judges the similarity according to the subject words, omits the content of the text, and is difficult to subdivide and distinguish cases with similar subjects.
And thirdly, performing word2vec word vector pre-emulsion on the legal text, expressing the keywords by word vectors, and calculating the similarity between different cases by using cosine similarity. After a plurality of cases associated with the cases are obtained, the judgment results are found out based on the keyword extraction technology, the reasonable judgment result range of the case is intelligently given, and intelligent early warning is timely carried out when the difference between the actual judgment result and the recommended judgment range is overlarge. The method generates vectors based on the word2vec technology to carry out similarity comparison, and the representation of words trained by the word2vec technology cannot represent the same words in different contexts, different upper contexts and lower contexts, and cannot effectively express professional vocabularies, so that the similarity matching effect is poor, and the practicability is low.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a case similarity matching method, a case similarity matching device, computer equipment and a storage medium, and can improve the effectiveness and accuracy of case similarity matching.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a case similarity matching method includes:
acquiring case judgment text in a case database;
collecting stop words and special noun words from case judgment text and generating a stop word library and a special word library;
selecting a first case text and a second case text which need to be subjected to similarity matching from case judgment text;
inputting the first case text and the second case text into a twin network for processing to obtain a similar probability value of the first case text and the second case text;
and if the similarity probability values of the first case text and the second case text meet the set similarity threshold, judging that the first case text and the second case text are similar cases.
The further technical scheme is as follows: the first case text and the second case text are input into a twin network to be processed, so that similar probability values of the first case text and the second case text are obtained, and the twin network comprises a network model of a text vector based on ERNIE, a network model of a text vector based on a WordGCN diagram and a network model of a text vector based on a subject word.
The further technical scheme is as follows: the inputting the first case text and the second case text into the twin network for processing to obtain the similar probability value of the first case text and the second case text comprises the following steps:
inputting the first case text and the second case text into a network model of a text vector based on ERNIE for processing to obtain first processing characteristics of the first case text and the second case text;
inputting the first case text and the second case text into a network model of a text vector based on a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text;
inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing characteristics of the first case text and the second case text;
performing concatee merging processing on the first processing characteristics of the first case text and the second processing characteristics of the first case text and the second case text to obtain merging characteristics of the first case text and the second case text;
inputting the combined features of the first case text and the second case text into a full-link layer for processing to obtain full-link layer processing features of the first case text and the second case text;
carrying out multiplication operation on the fully connected layer processing characteristics of the first case text and the second case text and the third processing characteristics of the first case text and the second case text to obtain text semantic representation characteristics of the first case text and the second case text;
carrying out full-connection layer and activation function processing on text semantic representation characteristics of the first case text and the second case text to obtain text abstract semantic representations of the first case text and the second case text;
and processing the text abstract semantic representations of the first case text and the second case text through a matrix of a full connection layer with the dimension of 1 and a sigmoid activation function to obtain the similar probability values of the first case text and the second case text.
The further technical scheme is as follows: the step of inputting the first case text and the second case text into a network model based on an ERNIE text vector to be processed so as to obtain first processing characteristics of the first case text and the second case text comprises the following steps:
sentence segmentation is carried out according to sentence breaking symbols of text contents in the first case text and the second case text;
performing word segmentation on the sentence by a word segmentation tool in combination with a stop word vocabulary base and a special word vocabulary base to obtain word segmentation data;
processing the word data based on the MLM through ERNIE to obtain a word vector of each word;
summing the word vectors of each word in each sentence to obtain the feature vectors of the sentence vectors;
and carrying out concatee fusion on the feature vectors of all sentence vectors of the text content through Bi-LSTM to obtain the first processing features of the first case text and the second case text.
The further technical scheme is as follows: the step of inputting the first case text and the second case text into a network model of a text vector based on a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text comprises the following steps:
coding words of the first case text and the second case text through the relation between words in a sentence level and a corpus level in a WordGCN model to obtain word vectors;
constructing a statement vector according to the word vector;
and inputting the sentence vector into the Bi-GRU for processing to obtain a first case text and a second case text second processing characteristic.
The further technical scheme is as follows: the step of inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing characteristics of the first case text and the second case text comprises the following steps:
filtering stop words in the texts of the first case text and the second case text through a stop word vocabulary library;
extracting the subject terms of the filtered text;
recording the position index and the importance degree corresponding to the extracted subject term;
extracting proper nouns in the texts of the first case text and the second case text through a proper word vocabulary library;
recording the position index and the importance degree corresponding to the extracted proper nouns;
and adding the importance degree of the subject term and the importance degree of the proper noun to obtain a third processing characteristic.
The further technical scheme is as follows: and extracting the subject words of the filtered text, and extracting the subject words of the text by combining a BERTOPic model and an LDA model.
In a second aspect, a case similarity matching device comprises an acquisition unit, a generation unit, a selection unit, a processing unit and a judgment unit;
the acquisition unit is used for acquiring case judgment text in a case database;
the generating unit is used for collecting the stop words and special noun words from the case judgment text and generating a stop word library and a special word library;
the selecting unit is used for selecting a first case text and a second case text which need to be subjected to similarity matching from case judgment text;
the processing unit is used for inputting the first case text and the second case text into the twin network for processing to obtain the similar probability values of the first case text and the second case text;
and the judging unit is used for judging that the first case text and the second case text are similar cases if the similarity probability value of the first case text and the second case text meets the set similarity threshold.
In a third aspect, a computer device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the case similarity matching method steps as described above.
In a fourth aspect, a computer-readable storage medium, said storage medium storing a computer program, said computer program comprising program instructions which, when executed by a processor, cause said processor to perform the case similarity matching method steps as described above.
Compared with the prior art, the invention has the beneficial effects that: by adopting the ERNIE technology, the meaning of the polysemous words in the text can be well understood in combination with the context, case similarity analysis is carried out on the basis of the twin neural network, and key feature analysis is carried out on nouns in the case specificity field in combination with the attention mechanism, so that the understanding of the model on the case content is increased, and the effectiveness and the accuracy of case similarity matching are improved.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented according to the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more apparent, the following detailed description will be given of preferred embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a case similarity matching method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a case similarity matching apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a case similarity matching method according to an embodiment of the present invention.
As shown in fig. 1, the case similarity matching method includes the following steps: S10-S50.
And S10, acquiring case judgment text in the case database.
In the present embodiment, the case database refers to a database for court case decisions.
And S20, collecting the stop words and the special noun words from the case judgment text and generating a stop word library and a special word library.
In the present embodiment, some words are disabled due to the revision of legal laws and the like, and in addition, the following new terminology appears with the development of technology. Therefore, the stop words and the special noun words can be collected from the case judgment text, and a stop word library and a special word library can be established and generated for the case judgment text, so that the subsequent text similarity analysis is facilitated.
S30, selecting a first case text and a second case text which need to be subjected to similarity matching from the case judgment text.
Before the selection, a selection condition may be set, for example, the similarity matching text is selected according to the content related to the case description and the case conclusion.
It should be noted that the first case text and the second case text may not only refer to similarity matching between one first case text and one second case text, but also refer to similarity matching between one first case text and a plurality of second case texts. If the similarity matching is carried out on one first case text and one second case text, whether the two cases are similar or not can be obtained, if the similarity matching is carried out on the first case text and the second case texts, the similarity degree between the second case texts and the first case text can be obtained, and the cases can be conveniently analyzed by a person by sequencing according to the similarity degree.
And S40, inputting the first case text and the second case text into the twin network for processing to obtain the similar probability values of the first case text and the second case text.
In this embodiment, the twin network is divided into two identical and parallel left and right subnetworks, each of which includes a network model of ERNIE-based text vectors, a network model of WordGCN-based text vectors, and a network model of subject word-based text vectors. The first case text may be processed through the left subnetwork and the second case text may be processed through the right subnetwork.
In this embodiment, step S40 specifically includes the following steps: S401-S408,.
S401, inputting the first case text and the second case text into a network model based on an ERNIE text vector for processing to obtain first processing characteristics of the first case text and the second case text.
The ERNIE-based text vector network model is used for training all cases through the ERNIE model to obtain a word vector of each word. The training mode adopts a fine-tuning form to carry out word vector training.
In an embodiment, step S401 specifically includes the following steps: S4011-S4015.
S4011, sentence segmentation is carried out according to sentence breaking symbols of text contents in the first case text and the second case text.
S4012, segmenting words of the sentences by combining the stop word vocabulary base and the special word vocabulary base through a segmentation tool to obtain segmentation data.
S4013, processing the word data based on MLM through ERNIE to obtain a word vector of each word.
S4014, summing the word vectors of each word in each sentence to obtain the feature vectors of the sentence vectors.
S4015, performing concatee fusion on the feature vectors of all sentence vectors of the text content through Bi-LSTM to obtain first processing features of the first case text and the second case text.
In steps S4011 to S4015, in this embodiment, a sentence is segmented based on sentence break symbols in the text, a word segmentation tool is used to segment the sentence, the length of the sequence in the sentence is selected to be 16 words, the text sequence exceeding the length of the word is supplemented by truncation, and the text sequence shorter than the sequence length is supplemented by padding. It should be noted that the analysis of proper nouns requires word segmentation in combination with the vocabulary library of proper nouns to avoid segmenting the proper nouns internally. For position coding in transform mechanism in ERNIE, if the position of word in text is even number, the method adoptsIf the word is in an odd number of positions in the bookThe position represents the position of a word in a sentence, i is the dimension range of a word vector, and the dimension of 0 to 64 is selected.
The word vector of each word can be obtained by training an ERNIE model based on an MLM (masked Language model). A sentence has a plurality of words, the feature vectors of the sentence vectors are obtained by adding the words, and the text vectors are obtained by concatee fusion of all the sentence vectors. The length of the text vector is 500, that is, 500 sentences are selected, and the sentences beyond the length are selected to be truncated, so that the sentence vector embedding with the length of 500 and the dimensionality of 64 constructed based on the word vector can be obtained enrie-sentence . Will imbedding enrie-sentence Inputting the text vector into Bi-LSTM to obtain text vector embedding ernie-doc 。
S402, inputting the first case text and the second case text into a network model of a text vector based on a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text.
In an embodiment, step S402 specifically includes the following steps: s4021 to S4023.
S4021, coding words of the first case text and the second case text through relations between words in a sentence level and a corpus level in a WordGCN model to obtain word vectors.
S4022, constructing a statement vector according to the word vector.
S4023, inputting the statement vector into the Bi-GRU for processing to obtain a first case text and a second case text second processing characteristic.
In steps S4021 to S4023, in this embodiment, a WordGCN model is selected to characterize the sentence model. Words are encoded by the relationship between words in the sentence level and the corpus level. The sentence is encoded by word cutting and sentence length fixing, the sentence length is set to be 16 word lengths, the sentence length is cut off in a truncation mode, the sentence length is filled in a panning mode in a non-excess mode, and the sentence vector is obtained by adding word vectors. The length of the text vector is selected to be 500, and the sentence beyond the length is selected to be truncated, so that the sentence vector embedding constructed based on the word vector with the length of 500 and the dimensionality of 64 can be obtained words . Will imbedding words Inputting the text vector into a Bi-GRU to obtain a text vector embedding graph-doc 。
S403, inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing characteristics of the first case text and the second case text.
In an embodiment, step S403 specifically includes the following steps: S4031-S4036.
S4031, stop words in the texts of the first case text and the second case text are filtered through a stop word vocabulary library.
And S4032, extracting the subject terms of the filtered text.
And S4033, recording the position index and the importance degree corresponding to the extracted subject term.
S4034, proper nouns in the texts of the first case text and the second case text are extracted through the proper word vocabulary library.
S4035, recording the position index and the importance degree corresponding to the extracted proper noun.
S4036, add the importance of the subject term and the importance of the proper noun to obtain a third processing feature.
In step S4031-S4036, in this embodiment, stop words in a text are first filtered through a stop word vocabulary library, and topic words in a case are extracted by using a topic word model based on the segmented records to obtain topic words in the case. The topic word vocabulary adopts BERTOPic, the internal use is c-tf-idf, the keyword extraction is carried out based on the category number of cases instead of the document number, and meanwhile, the topic word extraction is carried out on the case content by using an LDA model. The subject words extracted by the two methods are recorded with the position indexes corresponding to the subject words and the importance degrees of the words, the importance degree range is 0-1, then the proper nouns in the case text are extracted through the case proper word library, the position indexes corresponding to the proper nouns are recorded, and the importance degrees of the proper nouns are uniformly set to be 0.5. And generating a special word position index through the union of the subject word position index and the proper noun position index. Adding the importance degrees of the subject words to obtain an attention weight matrix embedding attentation The matrix size is 16x 500.
S404, performing concatee combination processing on the first processing characteristics of the first case text and the second processing characteristics of the first case text and the second case text to obtain combination characteristics of the first case text and the second case text.
In this embodiment, embedding will be ernie-doc And embedding graph-doc Performing concatee to obtain characteristic imbedding merge 。
S405, inputting the combined features of the first case text and the second case text into a full-connected layer for processing to obtain full-connected layer processing features of the first case text and the second case text.
In this embodiment, embedding will be implemented merge Inputting to full connection layer to obtain characteristic embedding merge-fc 。
S406, multiplying the fully connected layer processing characteristics of the first case text and the second case text with the third processing characteristics of the first case text and the second case text to obtain the text semantic representation characteristics of the first case text and the second case text.
In this embodiment, will embed merge-fc Matrix and attention moment matrix embedding attentation Multiplying to obtain final text semantic representation imbedding doc 。
S407, carrying out full connection layer and activation function processing on the text semantic representation characteristics of the first case text and the second case text to obtain text abstract semantic representations of the first case text and the second case text.
In the embodiment, semantic representation embedding doc Connecting a full connection layer and an activation function and then connecting a full connection layer to obtain text abstract semantic representation embedding abs-doc 。
S408, processing the text abstract semantic representations of the first case text and the second case text through a matrix of a full connection layer with the dimension of 1 and a sigmoid activation function to obtain the similar probability values of the first case text and the second case text.
In this embodiment, the text abstract semantic representations embedding obtained through the left and right subnetworks abs-doc And finally outputting a similar probability value from 0 to 1 by connecting a sigmoid activation function to a matrix with the dimension of 1 of the full connection layer.
And S50, if the similarity probability value of the first case text and the second case text meets the set similarity threshold, judging that the first case text and the second case text are similar cases.
In the embodiment, the similarity threshold is 0.5, if the similarity probability value of the second case text to the first case text is greater than 0.5, the case texts of the two cases are considered to be similar, and if the similarity probability value is less than 0.5, the case contents of the two cases are considered to be dissimilar.
By adopting the ERNIE technology, the meaning of the polysemous words in the text can be well understood in combination with the context, case similarity analysis is carried out on the basis of the twin neural network, and key feature analysis is carried out on nouns in the case specificity field in combination with the attention mechanism, so that the understanding of the model on the case content is increased, and the effectiveness and the accuracy of case similarity matching are improved.
Fig. 2 is a schematic block diagram of a case similarity matching apparatus 100 according to an embodiment of the present invention. Corresponding to the case similarity matching method described above, the embodiment of the present invention further provides a case similarity matching apparatus 100. The case similarity matching apparatus 100 includes a unit for performing the above-described case similarity matching method, and may be configured in a server.
As shown in fig. 2, the case similarity matching apparatus 100 includes an obtaining unit 110, a generating unit 120, a selecting unit 130, a processing unit 140, and a determining unit 150.
An obtaining unit 110, configured to obtain case decision texts in a case database.
In this embodiment, case database refers to a database for court case decisions.
And the generating unit 120 is used for collecting the stop words and the special noun words from the case decision text and generating a stop word library and a special word library.
In the present embodiment, some words are disabled due to the revision of legal laws and the like, and in addition, the following new terminology appears with the development of technology. Therefore, the stop words and the special noun words can be collected from the case judgment text, and a stop word library and a special word library can be established and generated for the case judgment text, so that the subsequent text similarity analysis is facilitated.
A selecting unit 130, configured to select a first case text and a second case text that need to be subjected to similarity matching from case decision texts.
Before the selection, a selection condition may be set, for example, the similarity matching text is selected according to the content related to the case description and the case conclusion.
It should be noted that the first case text and the second case text may not only refer to similarity matching between one first case text and one second case text, but also refer to similarity matching between one first case text and a plurality of second case texts. If the similarity matching is carried out on one first case text and one second case text, whether the two cases are similar or not can be obtained, if the similarity matching is carried out on the first case text and the second case texts, the similarity degree between the second case texts and the first case text can be obtained, and the cases can be conveniently analyzed by a person by sequencing according to the similarity degree.
The processing unit 140 is configured to input the first case text and the second case text into the twin network for processing, so as to obtain a similar probability value of the first case text and the second case text.
In this embodiment, the twin network is divided into two identical and parallel left and right subnetworks, each of which includes a network model of ERNIE-based text vectors, a network model of WordGCN-based text vectors, and a network model of subject word-based text vectors. The first case text can be processed through the left sub-network and the second case text can be processed through the right sub-network.
In one embodiment, the processing unit 140 includes a first processing module, a second processing module, a third processing module, a merging module, a fourth processing module, an operation module, a fifth processing module, and a sixth processing module.
The first processing module is used for inputting the first case text and the second case text into a network model of a text vector based on ERNIE for processing so as to obtain first processing characteristics of the first case text and the second case text.
The ERNIE-based text vector network model is used for training all cases through the ERNIE model to obtain a word vector of each word. The training mode adopts a fine-tuning form to carry out word vector training.
In one embodiment, the first processing module includes a segmentation submodule, a word segmentation submodule, a first processing submodule, an operation submodule, and a fusion submodule.
And the segmentation submodule is used for carrying out sentence segmentation according to the sentence breaking symbols of the text contents in the first case text and the second case text.
And the word segmentation submodule is used for segmenting words of the sentences by combining the stop word vocabulary library and the special word vocabulary library through a word segmentation tool so as to obtain word segmentation data.
And the first processing sub-module is used for processing the word data based on the MLM through ERNIE to obtain a word vector of each word.
And the operation submodule is used for summing the word vectors of each word in each sentence so as to obtain the feature vectors of the sentence vectors.
And the fusion submodule is used for performing concatee fusion on the feature vectors of all the sentence vectors of the text content through Bi-LSTM so as to obtain the first processing features of the first case text and the second case text.
For the segmentation sub-module, the word segmentation sub-module, the processing sub-module, the operation sub-module and the fusion sub-module, in this embodiment, firstly, the sentence is segmented based on the sentence break symbol in the text, the sentence is segmented by the word segmentation tool, the sequence length in the sentence is selected to be 16 words, the text sequence exceeding the word length is completed by truncation, and the text sequence shorter than the sequence length is completed by padding. It should be noted that the analysis of proper nouns requires word segmentation in combination with the vocabulary library of proper nouns to avoid segmenting the proper nouns internally. For position coding in transform mechanism in ERNIE, if the position of word in text is even number, the method adoptsIf the word is in an odd number of positions in the bookThe position represents the position of a word in a sentence, i is the dimension range of a word vector, and the dimension of 0 to 64 is selected.
The word vector of each word can be obtained by training an ERNIE model based on an MLM (masked Language model). A sentence has a plurality of words, the feature vectors of the sentence vectors are obtained by adding the words, and the text vectors are obtained by concatee fusion of all the sentence vectors. The length of the text vector is 500, that is, 500 sentences are selected, and the sentences beyond the length are selected to be truncated, so that the sentence vector embedding with the length of 500 and the dimensionality of 64 constructed based on the word vector can be obtained enrie-sentence . Will imbedding enrie-sentence Inputting the text vector into Bi-LSTM to obtain text vector embedding ernie-doc 。
And the second processing module is used for inputting the first case text and the second case text into a network model of a text vector based on a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text.
In an embodiment, the second processing module includes an encoding sub-module, a construction sub-module, and a second processing sub-module.
And the coding submodule is used for coding the words of the first case text and the second case text through the relation between the words in the sentence level and the corpus level in the WordGCN model so as to obtain word vectors.
And the construction submodule is used for constructing the statement vector according to the word vector.
And the second processing submodule is used for inputting the statement vector into the Bi-GRU for processing so as to obtain a second processing characteristic of the first case text and the second case text.
For the encoding submodule, the construction submodule, and the second processing submodule, in this embodiment, a WordGCN model is selected to characterize the sentence model. Words are encoded by the relationship between words in the sentence level and the corpus level. Sentence coding is carried out by the method of word segmentation and fixed sentence length,the sentence length is set to be 16 words, the sentence length is cut off when exceeding the sentence length, filling is carried out when not exceeding the sentence length by panning, and the sentence vector is still obtained by adding word vectors. The length of the text vector is selected to be 500, and the sentence beyond the length is selected to be truncated, so that the sentence vector embedding constructed based on the word vector with the length of 500 and the dimensionality of 64 can be obtained words . Will imbedding words Inputting the text vector into a Bi-GRU to obtain a text vector embedding graph-doc 。
And the third processing module is used for inputting the first case text and the second case text into the network model of the text vector based on the subject word for processing so as to obtain third processing characteristics of the first case text and the second case text.
In an embodiment, the third processing module includes a filtering sub-module, a first extraction sub-module, a first recording sub-module, a second extraction sub-module, a second recording sub-module, and a calculation sub-module.
And the filtering submodule is used for filtering the stop words in the texts of the first case text and the second case text through the stop word vocabulary library.
And the first extraction submodule is used for extracting the subject terms of the filtered text.
And the first recording submodule is used for recording the position index and the importance degree corresponding to the extracted subject term.
And the second extraction submodule is used for extracting the proper nouns in the texts of the first case text and the second case text through the proper word vocabulary library.
And the second recording submodule is used for recording the position index and the importance degree corresponding to the extracted proper noun.
And the calculation submodule is used for adding the importance degree of the subject term and the importance degree of the proper noun to obtain a third processing characteristic.
For the filtering submodule, the first extracting submodule, the first recording submodule, the second extracting submodule, the second recording submodule and the calculating submodule, in this embodiment, the stop word in the text is firstly passed throughAnd filtering the stop word vocabulary library, and extracting the subject words of the case by adopting a subject word model based on the records after word segmentation to obtain the subject words in the case. The topic word vocabulary adopts BERTOPic, the internal use is c-tf-idf, the keyword extraction is carried out based on the category number of cases instead of the document number, and meanwhile, the topic word extraction is carried out on the case content by using an LDA model. The subject words extracted by the two methods are recorded with the position indexes corresponding to the subject words and the importance degrees of the words, the importance degree range is 0-1, then the proper nouns in the case text are extracted through the case proper word library, the position indexes corresponding to the proper nouns are recorded, and the importance degrees of the proper nouns are uniformly set to be 0.5. And generating a special word position index through the union of the subject word position index and the proper noun position index. Adding the importance degrees of the subject words to obtain an attention weight matrix embedding attentation The matrix size is 16x 500.
And the merging module is used for carrying out concatee merging processing on the first processing characteristics of the first case text and the second processing characteristics of the first case text and the second case text so as to obtain the merging characteristics of the first case text and the second case text.
In this embodiment, embedding will be implemented ernie-doc And embedding graph-doc Performing concatee to obtain characteristic imbedding merge 。
And the fourth processing module is used for inputting the combined characteristics of the first case text and the second case text into the full-link layer for processing so as to obtain the full-link layer processing characteristics of the first case text and the second case text.
In this embodiment, embedding will be implemented merge Inputting to the full connection layer to obtain the characteristic embedding merge-fc 。
And the operation module is used for performing multiplication operation on the fully-connected layer processing characteristics of the first case text and the second case text and the third processing characteristics of the first case text and the second case text to obtain the text semantic representation characteristics of the first case text and the second case text.
In this embodimentIn (1), will embed merge-fc Matrix and attention moment matrix embedding attentation Multiplying to obtain final text semantic representation imbedding doc 。
And the fifth processing module is used for carrying out full connection layer and activation function processing on the text semantic representation characteristics of the first case text and the second case text so as to obtain text abstract semantic representations of the first case text and the second case text.
In this embodiment, semantic representation embedding doc Connecting a full connection layer and an activation function and then connecting a full connection layer to obtain text abstract semantic representation embedding abs-doc 。
And the sixth processing module is used for processing the text abstract semantic representations of the first case text and the second case text through a matrix of a full connection layer with the dimension of 1 and a sigmoid activation function so as to obtain the similar probability values of the first case text and the second case text.
In this embodiment, the text abstract semantic representations embedding obtained through the left and right subnetworks abs-doc And finally outputting a similar probability value from 0 to 1 by connecting a matrix with the dimension of 1 of the full connection layer with a sigmoid activation function.
The determining unit 150 is configured to determine that the first case text and the second case text are similar cases if the similarity probability values of the first case text and the second case text meet the set similarity threshold.
In this embodiment, the similarity threshold is 0.5, if the similarity probability value between the second case text and the first case text is greater than 0.5, the case texts of the two cases are considered to be similar, and if the similarity probability value is less than 0.5, the case contents of the two cases are considered to be dissimilar.
By adopting the ERNIE technology, the meaning of the polysemous words in the text can be well understood in combination with the context, case similarity analysis is carried out on the basis of the twin neural network, and key feature analysis is carried out on nouns in the case specificity field in combination with the attention mechanism, so that the understanding of the model on the case content is increased, and the effectiveness and the accuracy of case similarity matching are improved.
The case similarity matching apparatus 100 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 3.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.
As shown in fig. 3, the computer device includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps of the case similarity matching method.
The computer device 700 may be a terminal or a server. The computer device 700 includes a processor 720, memory, and a network interface 750, which are connected by a system bus 710, where the memory may include non-volatile storage media 730 and internal memory 740.
The non-volatile storage medium 730 may store an operating system 731 and computer programs 732. The computer program 732, when executed, may cause the processor 720 to perform any of a variety of case similarity matching methods.
The processor 720 is used to provide computing and control capabilities, supporting the operation of the overall computer device 700.
The internal memory 740 provides an environment for the execution of the computer program 732 in the non-volatile storage medium 730, and when the computer program 732 is executed by the processor 720, the processor 720 can be caused to execute any case similarity matching method.
The network interface 750 is used for network communication such as sending assigned tasks and the like. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration relevant to the present teachings and is not intended to limit the computing device 700 to which the present teachings may be applied, and that a particular computing device 700 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components. Wherein the processor 720 is configured to execute the program code stored in the memory to perform the following steps:
acquiring case judgment text in a case database;
collecting stop words and special noun words from case judgment text and generating a stop word library and a special word library;
selecting a first case text and a second case text which need to be subjected to similarity matching from case judgment text;
inputting the first case text and the second case text into a twin network for processing to obtain a similar probability value of the first case text and the second case text;
and if the similarity probability values of the first case text and the second case text meet the set similarity threshold, judging that the first case text and the second case text are similar cases.
In one embodiment: the first case text and the second case text are input into a twin network to be processed, so that the similarity probability value of the first case text and the second case text is obtained, and the twin network comprises a network model of a text vector based on ERNIE, a network model of a text vector based on a WordGCN diagram and a network model of a text vector based on subject words.
In one embodiment: the inputting the first case text and the second case text into the twin network for processing to obtain the similar probability value of the first case text and the second case text comprises the following steps:
inputting the first case text and the second case text into a network model of a text vector based on ERNIE for processing to obtain first processing characteristics of the first case text and the second case text;
inputting the first case text and the second case text into a network model of a text vector based on a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text;
inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing characteristics of the first case text and the second case text;
performing concatee merging processing on the first processing characteristics of the first case text and the second processing characteristics of the first case text and the second case text to obtain merging characteristics of the first case text and the second case text;
inputting the combined features of the first case text and the second case text into a full-link layer for processing to obtain full-link layer processing features of the first case text and the second case text;
carrying out multiplication operation on the fully connected layer processing characteristics of the first case text and the second case text and the third processing characteristics of the first case text and the second case text to obtain text semantic representation characteristics of the first case text and the second case text;
carrying out full-connection layer and activation function processing on text semantic representation characteristics of the first case text and the second case text to obtain text abstract semantic representations of the first case text and the second case text;
and processing the text abstract semantic representations of the first case text and the second case text by a fully-connected layer matrix with the dimension of 1 and a sigmoid activation function to obtain the similar probability values of the first case text and the second case text.
In one embodiment: the inputting the first case text and the second case text into a network model of an ERNIE-based text vector for processing to obtain first processing characteristics of the first case text and the second case text comprises the following steps:
sentence segmentation is carried out according to sentence breaking symbols of text contents in the first case text and the second case text;
performing word segmentation on the sentence by a word segmentation tool in combination with a stop word vocabulary base and a special word vocabulary base to obtain word segmentation data;
processing the word data based on the MLM through ERNIE to obtain a word vector of each word;
summing the word vectors of each word in each sentence to obtain the feature vectors of the sentence vectors;
and carrying out concatee fusion on the feature vectors of all sentence vectors of the text content through Bi-LSTM to obtain the first processing features of the first case text and the second case text.
In one embodiment: the step of inputting the first case text and the second case text into a network model of a text vector based on a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text comprises the following steps:
coding words of the first case text and the second case text through the relation between words in a sentence level and a corpus level in a WordGCN model to obtain word vectors;
constructing a statement vector according to the word vector;
and inputting the sentence vector into the Bi-GRU for processing to obtain a first case text and a second case text second processing characteristic.
In one embodiment: the step of inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing characteristics of the first case text and the second case text comprises the following steps:
filtering stop words in the texts of the first case text and the second case text through a stop word vocabulary library;
extracting the subject terms of the filtered text;
recording the position index and the importance degree corresponding to the extracted subject term;
extracting proper nouns in the texts of the first case text and the second case text through a proper word vocabulary library;
recording the position index and the importance degree corresponding to the extracted proper nouns;
and adding the importance degree of the subject term and the importance degree of the proper noun to obtain a third processing characteristic.
In one embodiment: and extracting the subject term of the filtered text by combining a BERTOP model with an LDA model.
It should be understood that, in the embodiment of the present Application, the Processor 720 may be a Central Processing Unit (CPU), and the Processor 720 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that the configuration of computer device 700 depicted in FIG. 3 is not intended to be limiting of computer device 700 and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
In another embodiment of the present invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the case similarity matching method disclosed by the embodiments of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Claims (10)
1. The case similarity matching method is characterized by comprising the following steps:
acquiring case judgment text in a case database;
collecting stop words and special noun words from case judgment text and generating a stop word library and a special word library;
selecting a first case text and a second case text which need to be subjected to similarity matching from case judgment text;
inputting the first case text and the second case text into a twin network for processing to obtain a similar probability value of the first case text and the second case text;
and if the similarity probability values of the first case text and the second case text meet the set similarity threshold, judging that the first case text and the second case text are similar cases.
2. The case similarity matching method according to claim 1, wherein the first case text and the second case text are input into a twin network for processing, so as to obtain the similarity probability values of the first case text and the second case text, and the twin network comprises a network model of an ERNIE-based text vector, a network model of a WordGCN-based text vector and a network model of a subject word-based text vector.
3. The case similarity matching method according to claim 2, wherein the inputting the first case text and the second case text into the twin network for processing to obtain the similarity probability value of the first case text and the second case text comprises:
inputting the first case text and the second case text into a network model of a text vector based on ERNIE for processing to obtain first processing characteristics of the first case text and the second case text;
inputting the first case text and the second case text into a network model based on a text vector of a WordGCN image for processing to obtain second processing characteristics of the first case text and the second case text;
inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing characteristics of the first case text and the second case text;
performing concatee merging processing on the first processing characteristics of the first case text and the second processing characteristics of the first case text and the second case text to obtain merging characteristics of the first case text and the second case text;
inputting the combined features of the first case text and the second case text into a full-link layer for processing to obtain full-link layer processing features of the first case text and the second case text;
carrying out multiplication operation on the fully connected layer processing characteristics of the first case text and the second case text and the third processing characteristics of the first case text and the second case text to obtain text semantic representation characteristics of the first case text and the second case text;
carrying out full-connection layer and activation function processing on text semantic representation characteristics of the first case text and the second case text to obtain text abstract semantic representations of the first case text and the second case text;
and processing the text abstract semantic representations of the first case text and the second case text through a matrix of a full connection layer with the dimension of 1 and a sigmoid activation function to obtain the similar probability values of the first case text and the second case text.
4. The case similarity matching method according to claim 3, wherein the step of inputting the first case text and the second case text into a network model based on ERNIE text vectors for processing to obtain the first processing features of the first case text and the second case text comprises:
sentence segmentation is carried out according to sentence breaking symbols of text contents in the first case text and the second case text;
performing word segmentation on the sentence by a word segmentation tool in combination with a stop word vocabulary base and a special word vocabulary base to obtain word segmentation data;
processing the word data based on the MLM through ERNIE to obtain a word vector of each word;
summing the word vectors of each word in each sentence to obtain the feature vectors of the sentence vectors;
and carrying out concatee fusion on the feature vectors of all sentence vectors of the text content through Bi-LSTM to obtain the first processing features of the first case text and the second case text.
5. The case similarity matching method according to claim 3, wherein the step of inputting the first case text and the second case text into a network model based on text vectors of WordGCN images for processing to obtain second processing features of the first case text and the second case text comprises:
coding words of the first case text and the second case text through the relation between words in a sentence level and a corpus level in a WordGCN model to obtain word vectors;
constructing a statement vector according to the word vector;
and inputting the sentence vector into the Bi-GRU for processing to obtain a first case text and a second case text second processing characteristic.
6. The case similarity matching method according to claim 3, wherein the step of inputting the first case text and the second case text into a network model based on text vectors of subject words for processing to obtain third processing features of the first case text and the second case text comprises:
filtering stop words in the texts of the first case text and the second case text through a stop word vocabulary library;
extracting the subject terms of the filtered text;
recording the position index and the importance degree corresponding to the extracted subject term;
extracting proper nouns in the texts of the first case text and the second case text through a proper word vocabulary library;
recording the position index and the importance degree corresponding to the extracted proper nouns;
and adding the importance degree of the subject term and the importance degree of the proper noun to obtain a third processing characteristic.
7. The case similarity matching method according to claim 6, wherein the subject words of the filtered text are extracted by combining a BERTOPIC model with an LDA model.
8. The case similarity matching device is characterized by comprising an acquisition unit, a generation unit, a selection unit, a processing unit and a judgment unit;
the acquisition unit is used for acquiring case judgment text in a case database;
the generating unit is used for collecting the stop words and special noun words from the case judgment text and generating a stop word library and a special word library;
the selecting unit is used for selecting a first case text and a second case text which need to be subjected to similarity matching from case judgment text;
the processing unit is used for inputting the first case text and the second case text into the twin network for processing to obtain the similar probability values of the first case text and the second case text;
and the judging unit is used for judging that the first case text and the second case text are similar cases if the similarity probability value of the first case text and the second case text meets the set similarity threshold.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the case similarity matching method steps according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the case similarity matching method steps according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210646944.0A CN114881028A (en) | 2022-06-08 | 2022-06-08 | Case similarity matching method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210646944.0A CN114881028A (en) | 2022-06-08 | 2022-06-08 | Case similarity matching method and device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114881028A true CN114881028A (en) | 2022-08-09 |
Family
ID=82682212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210646944.0A Pending CN114881028A (en) | 2022-06-08 | 2022-06-08 | Case similarity matching method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114881028A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413988A (en) * | 2019-06-17 | 2019-11-05 | 平安科技(深圳)有限公司 | Method, apparatus, server and the storage medium of text information matching measurement |
CN110580281A (en) * | 2019-09-11 | 2019-12-17 | 江苏鸿信系统集成有限公司 | similar case matching method based on semantic similarity |
CN110717332A (en) * | 2019-07-26 | 2020-01-21 | 昆明理工大学 | News and case similarity calculation method based on asymmetric twin network |
CN111737954A (en) * | 2020-06-12 | 2020-10-02 | 百度在线网络技术(北京)有限公司 | Text similarity determination method, device, equipment and medium |
CN112329429A (en) * | 2020-11-30 | 2021-02-05 | 北京百度网讯科技有限公司 | Text similarity learning method, device, equipment and storage medium |
-
2022
- 2022-06-08 CN CN202210646944.0A patent/CN114881028A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413988A (en) * | 2019-06-17 | 2019-11-05 | 平安科技(深圳)有限公司 | Method, apparatus, server and the storage medium of text information matching measurement |
CN110717332A (en) * | 2019-07-26 | 2020-01-21 | 昆明理工大学 | News and case similarity calculation method based on asymmetric twin network |
CN110580281A (en) * | 2019-09-11 | 2019-12-17 | 江苏鸿信系统集成有限公司 | similar case matching method based on semantic similarity |
CN111737954A (en) * | 2020-06-12 | 2020-10-02 | 百度在线网络技术(北京)有限公司 | Text similarity determination method, device, equipment and medium |
CN112329429A (en) * | 2020-11-30 | 2021-02-05 | 北京百度网讯科技有限公司 | Text similarity learning method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110929038B (en) | Knowledge graph-based entity linking method, device, equipment and storage medium | |
CN113268586A (en) | Text abstract generation method, device, equipment and storage medium | |
CN113254593B (en) | Text abstract generation method and device, computer equipment and storage medium | |
CN110019820B (en) | Method for detecting time consistency of complaints and symptoms of current medical history in medical records | |
CN111291177A (en) | Information processing method and device and computer storage medium | |
CN112395875A (en) | Keyword extraction method, device, terminal and storage medium | |
CN111241813B (en) | Corpus expansion method, apparatus, device and medium | |
CN109344246B (en) | Electronic questionnaire generating method, computer readable storage medium and terminal device | |
CN116029306A (en) | Automatic scoring method for simple answers of limited domain literature | |
CN113988057A (en) | Title generation method, device, equipment and medium based on concept extraction | |
CN110795544A (en) | Content search method, device, equipment and storage medium | |
CN115795030A (en) | Text classification method and device, computer equipment and storage medium | |
CN113780418B (en) | Data screening method, system, equipment and storage medium | |
CN117725203A (en) | Document abstract generation method, device, computer equipment and storage medium | |
CN114881028A (en) | Case similarity matching method and device, computer equipment and storage medium | |
CN116108840A (en) | Text fine granularity emotion analysis method, system, medium and computing device | |
KR100751295B1 (en) | Query-based text summarization using cosine similarity and nmf | |
CN114662496A (en) | Information identification method, device, equipment, storage medium and product | |
CN114048742A (en) | Knowledge entity and relation extraction method of text information and text quality evaluation method | |
CN113761125A (en) | Dynamic summary determination method and device, computing equipment and computer storage medium | |
Rahman et al. | ChartSumm: A large scale benchmark for Chart to Text Summarization | |
CN112541105A (en) | Keyword generation method, public opinion monitoring method, device, equipment and medium | |
CN115688771B (en) | Document content comparison performance improving method and system | |
US11783112B1 (en) | Framework agnostic summarization of multi-channel communication | |
CN118333157B (en) | Domain word vector construction method and system for HAZOP knowledge graph analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |