CN115080924B - Software license clause extraction method based on natural language understanding - Google Patents

Software license clause extraction method based on natural language understanding Download PDF

Info

Publication number
CN115080924B
CN115080924B CN202210875400.1A CN202210875400A CN115080924B CN 115080924 B CN115080924 B CN 115080924B CN 202210875400 A CN202210875400 A CN 202210875400A CN 115080924 B CN115080924 B CN 115080924B
Authority
CN
China
Prior art keywords
license
clause
text data
sentence
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210875400.1A
Other languages
Chinese (zh)
Other versions
CN115080924A (en
Inventor
徐思涵
高雅
范玲玲
刘哲理
王志煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202210875400.1A priority Critical patent/CN115080924B/en
Publication of CN115080924A publication Critical patent/CN115080924A/en
Application granted granted Critical
Publication of CN115080924B publication Critical patent/CN115080924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/105Arrangements for software license management or administration, e.g. for managing licenses at corporate level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a software license clause extraction method based on natural language understanding, which comprises the steps of firstly, constructing a license text data set, marking words, phrases or sentences related to rights clauses and obligations clauses in license clauses, using the marked license text data set, training a license clause extraction model, and realizing the automatic extraction function of the license clauses in the license text; then, for given open source software, extracting all license text data in the given open source software, and extracting license terms related to the license text data by using the established term extraction model after denoising; and finally, for the extracted license terms, obtaining the attitude polarity corresponding to each term based on syntactic analysis. The method can assist software developers to quickly understand the license and select the license, and help software development projects to avoid legal risk and economic risk caused by improper use of the license.

Description

Software license clause extraction method based on natural language understanding
Technical Field
The invention belongs to the technical field of software systems and artificial intelligence, and particularly relates to a software license clause extraction method based on natural language understanding.
Background
In recent years, with the continuous improvement of software development technology and the continuous expansion of software demand, millions of application programs have appeared in the software application market. Under the intense competition of the software market, developers often rely on third-party open source software to realize some general functions in order to shorten the development period, so that the developers are more attentive to research and development of own unique functions. In the context of widespread use of open source software, open source software licenses have been produced. The software owner sends out a descriptive word, and makes a limitation or a requirement on actions such as change, sharing, release and the like involved when other people use the open source software, so that the software owner has legal significance. The reasonable use of the license can bring convenience to the software market, otherwise, complex disputes about the copyright of the software can be caused, and legal risks are triggered.
In an actual development scene, most attention points of people on the license are focused on terms with key functions and polarities thereof, and semantic understanding and key information extraction on the license are imperative; meanwhile, as the demands of developers are various, a plurality of self-defined licenses are continuously generated at present, and if the goals of semantic understanding such as automatic extraction of license terms and attitude polarities can be realized, the new self-defined license can be better adapted, various demands of the developers on the licenses can be met, and convenience is brought to the understanding and license selection of the developers.
In addition, almost all open source software projects need to refer to another open source code or module, when the introduced open source module has a license, there may be incompatibility of license terms between the introduced module and the project overall license, the potential legal risk and economic risk cannot be neglected, and the guarantee of the compatibility of the license is established on the basis that the developer has full understanding of each term of the license, so that the automatic extraction of the license terms is very necessary.
There are many deficiencies in the existing research efforts: most researchers have long paid attention to identification of license names and versions in the past, the identification essentially belongs to pattern matching, only a plurality of types of relatively popular licenses can be identified, and the existing large number of newly-generated custom license groups cannot be met; in the aspect of license clause extraction, the previous research mainly models the license through manual summary, and has no mobility, and scholars propose an automatic clause extraction technology, but the accuracy is not high, and the usability needs to be further verified.
Disclosure of Invention
Compared with the traditional method, the method is not limited by the range of the license, and can more accurately, more comprehensively and more automatically realize the extraction of the terms of the software license and the semantic understanding on the basis of the terms. Therefore, the invention provides a software license clause extraction method based on natural language understanding.
The invention is realized by the following technical scheme:
a software license terms extraction method based on natural language understanding, comprising the steps of:
step S1, a license text data set is constructed and license terms in the license text data set are marked;
s2, training a license term extraction model by using the labeled license text data set in the S1, and realizing the function of automatically extracting license terms in the license text;
s3, extracting all license text data in the given open source software;
s4, performing denoising pretreatment on the license text data obtained in the S3;
step S5, extracting the related license terms from the license text data processed in the step S4 by using the term extraction model established in the step S2;
and S6, for the extracted license terms, obtaining attitude polarity corresponding to each term based on syntactic analysis.
In the above technical solution, in step 1, corresponding manual labeling is performed on words, phrases or sentences related to claim terms such as Distribute, modify, and Commercial Use in the license text data set and obligation terms such as Include copy right, distribute Source, and State Changes, and the label indicates which terms the words, phrases or sentences belong to the beginning, middle, or end, or do not belong to any terms.
In the above technical solution, step 2 includes the following steps:
s21, firstly, word embedding is carried out by using a Glove pre-training model, and the vocabulary of the natural language is converted into corresponding vector representation;
s22, extracting context characteristics by using a bidirectional long-short term memory network model;
step S23, calculating the state transition information of the prediction category after the prediction output of the bidirectional long and short term memory network model by using a conditional random field algorithm, and solving the conditional probability distribution of the term sequence by combining the existing category label so as to obtain an effective term prediction result;
and S24, continuously training by using the steps until the accuracy of the model meets the requirement, and stopping training at the moment to obtain a final clause extraction model.
In the above technical solution, step S3 includes the following steps:
step S31, traversing each text file in the given open source software, finding out all special license files and further obtaining the contained license text data;
step S32, traversing each code file in the given open source software, finding out the names of all the third party packages introduced by the code files, and obtaining the license name and the text content thereof indicated by the corresponding third party package through inquiry;
step S33, traversing each code file in the given open source software, and obtaining the license related content declared among the code lines by analyzing the code annotation; if the license is declared only by the name of the license, the detailed text content of the license is required to be obtained according to the name of the license;
and step S34, summarizing all the license text data obtained in the open source software in the steps S31-S33 as analysis data for subsequent license clause extraction.
In the above technical solution, in the denoising preprocessing operation in step S4, punctuation marks, numbers, spaces, line feeds, and tab marks in the license text data obtained in step S3 are removed.
In the above technical solution, step S6 includes the following steps:
s61, for each extracted license term, finding the sentence where the license term is located, and carrying out syntactic analysis on the sentence by utilizing a probabilistic context-free grammar method to obtain a grammar tree of the sentence;
step S62, analyzing the corresponding part of speech of each word belonging to each clause according to the syntax tree of the sentence in which each clause is positioned, and if the sentence is a verb, a modal verb and a preposition, adding a key word set of the clause;
step S63, according to the syntax tree of the sentence in which each term is located, finding the sentence part which is in the sentence, is out of the term and is the main sentence of the term from the syntactic structure, analyzing the corresponding part of speech of each word in the sentence part, and adding the 'important word set' of the term if the sentence part is a verb, a modal verb and a preposition;
step S64, traversing the key word set of the clause, finding out a negative word in the list, and judging whether the attitude polarity of the clause is negative; if not, finding out the necessary words in the table, and regarding the attitude polarity of the clause as necessary as long as the word appears; if not, the attitude polarity of the clause is considered to be ok; the attitude polarity of the clause is obtained in the above manner.
The invention also provides a computer-readable storage medium, storing a computer program which, when executed, implements the steps of the method described above.
The invention has the advantages and beneficial effects that:
(1) On the basis of natural language understanding, license terms are extracted based on a named entity recognition technology, and attitude polarity of corresponding terms is determined by combining syntactic analysis, so that automatic semantic understanding and key information extraction of the software license are realized. (2) The method can assist software developers to quickly understand the license and select the license, and help software development projects to avoid legal risk and economic risk caused by improper use of the license. (3) Experiments on the open source software system prove that compared with the existing open source software license clause extraction related work, the method has certain improvement on accuracy, and can provide more accurate, more comprehensive and more automatic license semantic understanding for software development.
Drawings
Fig. 1 is a basic flowchart of a software license term extraction method based on natural language understanding proposed by the present invention.
For a person skilled in the art, other relevant figures can be obtained from the above figures without inventive effort.
Detailed Description
In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.
Referring to fig. 1, a method for extracting software license terms based on natural language understanding includes the steps of:
step S1, constructing a license text data set and marking license terms in the license text data set, and specifically comprising the following steps:
at step S11, as many kinds of license text data as possible are collected and acquired (400 pieces of open source software licenses which are popular in the open source software field and custom licenses which exist in the open source software project code are acquired in the present embodiment).
And step S12, manually labeling license terms of the acquired license text for training a supervised learning model in the step S2. Specifically, the corresponding manual labeling is required for the words, phrases or sentences referred to in the license text by the rights clauses such as Distribute, modify and Commercial Use and the obligation clauses such as Include code, clear Source and State Changes, and the labeling indicates which clause the words, phrases or sentences are at the beginning, middle or end or do not belong to any clause.
Step S2, using the labeled license text data set in the step S1, supervising and training a clause extraction model based on a named entity recognition technology, and realizing the automatic extraction function of the clauses of the license text, which specifically comprises the following steps:
step S21, firstly, word embedding is performed using the Glove pre-training model, that is, potential language knowledge in the pre-training model is used to convert the vocabulary of the natural language into corresponding vector representations.
And S22, extracting the context characteristics by using a bidirectional long-short term memory network (Bi-LSTM) model. Recurrent Neural Networks (RNNs) are good at dealing with predictive problems with time series data as input; a long-short term neural network (LSTM) introduces neural units such as a forgetting gate, an input gate, an output gate and a hidden state on the basis of the RNN, and is used for relieving the problems of gradient disappearance, gradient explosion, poor long-distance information dependence capability and the like of the RNN; the bidirectional long-short term memory network (Bi-LSTM) is composed of 2 independent LSTMs in forward sequence and reverse sequence, so that the characteristic parameters obtained at each moment have information between the past and the future at the same time, and the text feature extraction efficiency and performance are relatively better.
Step S23, after the prediction of the bidirectional long-short term memory network (Bi-LSTM) model is output by using a Conditional Random Field (CRF) algorithm, calculating the state transition information of the prediction category, namely the characteristic function of the CRF, and solving the conditional probability distribution of the clause sequence by combining the existing category label, thereby obtaining an effective clause prediction result.
And S24, continuously training by using the steps until the accuracy of the model is not improved within 1000 times of training, and stopping training to obtain a final clause extraction model.
Step S3, for the given open source software, extracting all the license text data therein, specifically comprising the following steps:
step S31, traversing each text file in the given open source software, finding out all special license files, and further obtaining the contained license text data.
Step S32, traversing each code file in the given open source software, obtaining the names of all the third party packages introduced by the code files through regular matching, and obtaining the license name and the text content thereof indicated by the corresponding third party package through inquiry.
Step S33, traversing each code file in the given open source software, and obtaining the license related content declared among the code lines by analyzing the code annotation; if a license is declared by only the name of the license, the details of the license need to be available based on the name of the license (a database of known licenses can be built in advance, and the details of the license can be found based on the name of the license).
And step S34, summarizing all the license text data obtained in the open source software in the steps S31-S33 as analysis data for subsequent license clause extraction.
And S4, preprocessing the obtained license text, specifically, removing punctuation marks, numbers, redundant spaces, line feed, tab characters and the like in the text by using regular matching.
And step S5, for all the license text contents which are extracted by the given open source software in the step S4 and are preprocessed, extracting the related terms for the license text contents by using the term extraction model obtained in the step S2.
S6, for the extracted license terms, obtaining attitude polarity corresponding to each term based on syntactic analysis; the method specifically comprises the following steps:
and S61, for each extracted license term, finding a sentence where the license term is located, and carrying out syntactic analysis on the sentence by using a Probabilistic Context Free Grammar (PCFG) method to obtain a grammar tree of the sentence.
Step S62, analyzing the corresponding part-of-speech of each word belonging to the clause according to the syntax tree of the sentence in which each clause is located, and adding the 'important word set' of the clause if the sentence is a verb, a verb and a preposition.
Step S63, according to the grammar tree of the sentence in which each clause is located, finding the sentence part which is in the sentence, is out of the clause and is the main sentence of the clause from the grammar structure, analyzing the corresponding part of speech of each word in the sentence part, and adding the 'important word set' of the clause if the sentence part is a verb, a modal verb and a preposition.
Step S64, traversing the key word set of the clause, finding out negative words in the list, and judging whether the attitude polarity of the clause is negative (Cannot) by the principle that negative is positive; if not, finding out the words in which the table is necessary, and regarding the attitude polarity of the clause as necessary as long as the word appears (Must); if not necessary, the attitude polarity of the clause is considered to be ok (Can). The attitude polarity of the clause is obtained in the above manner.
The invention being thus described by way of example, it should be understood that any simple alterations, modifications or other equivalent alterations as would be within the skill of the art without the exercise of inventive faculty, are within the scope of the invention.

Claims (7)

1. A software license clause extraction method based on natural language understanding is characterized by comprising the following steps:
s1, constructing a license text data set and marking license terms in the license text data set;
s2, training a license term extraction model by using the labeled license text data set in the S1, and realizing the function of automatically extracting license terms in the license text;
s3, extracting all license text data in the given open source software;
s4, performing denoising pretreatment on the license text data obtained in the S3;
step S5, using the clause extraction model established in the step S2 to extract the license clauses related to the license text data processed in the step S4;
s6, for the extracted license terms, obtaining attitude polarity corresponding to each term based on syntactic analysis; step S6 includes the steps of:
s61, for each extracted license term, finding the sentence where the license term is located, and carrying out syntactic analysis on the sentence by utilizing a probabilistic context-free grammar method to obtain a grammar tree of the sentence;
step S62, analyzing the corresponding part-of-speech of each word belonging to each clause according to the syntax tree of the sentence in which each clause is positioned, and adding the key word set of the clause if the sentence is a verb, a verb and a preposition;
step S63, according to the grammar tree of the sentence in which each clause is located, finding the sentence part which is in the sentence, is out of the clause and is the main sentence of the clause from the grammar structure, analyzing the corresponding part-of-speech of each word in the sentence part, and adding the key word set of the clause if the sentence part is a verb, a modal verb and a preposition;
step S64, traversing the key word set of the clause, finding out a negative word in the list, and judging whether the attitude polarity of the clause is negative; if not, finding out the necessary words in the table, and regarding the attitude polarity of the clause as necessary as long as the word appears; if not necessary, the attitude polarity of the clause is considered to be ok.
2. The natural language understanding-based software license terms extracting method according to claim 1, wherein: in step 1, the words, phrases or sentences related to the rights and obligations in the license text data set are labeled correspondingly, and the label indicates which terms the words, phrases or sentences belong to the beginning, middle or end, or do not belong to any terms.
3. The natural language understanding-based software license terms extracting method according to claim 2, wherein: in step 1, the entitlement clauses comprise Distribute, modify and Commercial Use clauses, and the obligation clauses comprise Included copy, disclose Source and State Changes clauses.
4. The natural language understanding-based software license term extraction method according to claim 1, characterized in that: the step 2 comprises the following steps:
s21, firstly, word embedding is carried out by using a Glove pre-training model, and the vocabulary of the natural language is converted into corresponding vector representation;
s22, extracting context characteristics by using a bidirectional long-short term memory network model;
step S23, calculating state transition information of a prediction category after the prediction output of the bidirectional long-short term memory network model by using a conditional random field algorithm, and solving conditional probability distribution of a term sequence by combining the existing category label to obtain an effective term prediction result;
and S24, continuously training by using the steps until the accuracy of the model meets the requirement, and stopping training at the moment to obtain the final clause extraction model.
5. The natural language understanding-based software license terms extracting method according to claim 1, wherein: step S3 includes the following steps:
step S31, traversing each text file in the given open source software, finding out all special license files, and further obtaining the contained license text data;
step S32, traversing each code file in the given open source software, finding out the names of all the third party packages introduced by the code files, and obtaining the license name and the text content thereof pointed by the corresponding third party package through inquiry;
step S33, traversing each code file in the given open source software, and obtaining the license related content declared among the code lines by analyzing the code annotation; if the license is declared only by the name of the license, the detailed text content of the license is required to be obtained according to the name of the license;
and step S34, summarizing all the license text data obtained in the open source software in the steps S31-S33 as analysis data for subsequent license clause extraction.
6. The natural language understanding-based software license term extraction method according to claim 1, characterized in that: in the denoising preprocessing operation of step S4, punctuation marks, numbers, spaces, line feeds, and tab marks in the license text data obtained in step S3 are to be removed.
7. A computer-readable storage medium, characterized in that a computer program is stored which, when executed, realizes the steps of the method according to any one of claims 1 to 6.
CN202210875400.1A 2022-07-25 2022-07-25 Software license clause extraction method based on natural language understanding Active CN115080924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210875400.1A CN115080924B (en) 2022-07-25 2022-07-25 Software license clause extraction method based on natural language understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210875400.1A CN115080924B (en) 2022-07-25 2022-07-25 Software license clause extraction method based on natural language understanding

Publications (2)

Publication Number Publication Date
CN115080924A CN115080924A (en) 2022-09-20
CN115080924B true CN115080924B (en) 2022-11-15

Family

ID=83243686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210875400.1A Active CN115080924B (en) 2022-07-25 2022-07-25 Software license clause extraction method based on natural language understanding

Country Status (1)

Country Link
CN (1) CN115080924B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934254A (en) * 2017-02-15 2017-07-07 中国银联股份有限公司 The analysis method and device of a kind of licensing of increasing income
CN106951743A (en) * 2017-03-22 2017-07-14 上海英慕软件科技有限公司 A kind of software code infringement detection method
CN107783960A (en) * 2017-10-23 2018-03-09 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for Extracting Information
CN109062904A (en) * 2018-08-23 2018-12-21 上海互教教育科技有限公司 Logical predicate extracting method and device
CN109063421A (en) * 2018-06-28 2018-12-21 东南大学 A kind of analysis of open source licensing compliance and conflicting detection method
CN109154939A (en) * 2016-04-08 2019-01-04 培生教育公司 The system and method generated for automated content polymerization
CN109933664A (en) * 2019-03-12 2019-06-25 中南大学 A kind of fine granularity mood analysis improved method based on emotion word insertion
CN110674639A (en) * 2019-09-24 2020-01-10 拾音智能科技有限公司 Natural language understanding method based on pre-training model
CN111753089A (en) * 2020-06-28 2020-10-09 深圳壹账通智能科技有限公司 Topic clustering method and device, electronic equipment and storage medium
CN112084309A (en) * 2020-09-17 2020-12-15 北京中科微澜科技有限公司 License selection method and system based on open source software map
CN112364165A (en) * 2020-11-12 2021-02-12 上海犇众信息技术有限公司 Automatic classification method based on Chinese privacy policy terms
CN113128227A (en) * 2020-01-14 2021-07-16 普天信息技术有限公司 Entity extraction method and device
CN113268714A (en) * 2021-06-03 2021-08-17 西南大学 Automatic extraction method for license terms of open source software
CN114254653A (en) * 2021-12-23 2022-03-29 深圳供电局有限公司 Scientific and technological project text semantic extraction and representation analysis method
CN114417851A (en) * 2021-12-03 2022-04-29 重庆邮电大学 Emotion analysis method based on keyword weighted information

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8521652B2 (en) * 2006-05-19 2013-08-27 Hewlett-Packard Development Company, L.P. Discovering licenses in software files
US9110883B2 (en) * 2011-04-01 2015-08-18 Rima Ghannam System for natural language understanding
US9798802B2 (en) * 2012-03-23 2017-10-24 Avast Software B.V. Systems and methods for extraction of policy information
US10503769B2 (en) * 2015-07-06 2019-12-10 Rima Ghannam System for natural language understanding
JP6558852B2 (en) * 2015-11-06 2019-08-14 日本電信電話株式会社 Clause identification apparatus, method, and program
US20200285716A1 (en) * 2019-03-07 2020-09-10 International Business Machines Corporation Detection and monitoring of software license terms and conditions
CN110609983B (en) * 2019-08-19 2023-06-09 广州利科科技有限公司 Structured decomposition method for policy file
CN110705265A (en) * 2019-08-27 2020-01-17 阿里巴巴集团控股有限公司 Contract clause risk identification method and device
US20210081841A1 (en) * 2019-09-12 2021-03-18 Viani Systems, Inc. Visually creating and monitoring machine learning models

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109154939A (en) * 2016-04-08 2019-01-04 培生教育公司 The system and method generated for automated content polymerization
CN106934254A (en) * 2017-02-15 2017-07-07 中国银联股份有限公司 The analysis method and device of a kind of licensing of increasing income
CN106951743A (en) * 2017-03-22 2017-07-14 上海英慕软件科技有限公司 A kind of software code infringement detection method
CN107783960A (en) * 2017-10-23 2018-03-09 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for Extracting Information
CN109063421A (en) * 2018-06-28 2018-12-21 东南大学 A kind of analysis of open source licensing compliance and conflicting detection method
CN109062904A (en) * 2018-08-23 2018-12-21 上海互教教育科技有限公司 Logical predicate extracting method and device
CN109933664A (en) * 2019-03-12 2019-06-25 中南大学 A kind of fine granularity mood analysis improved method based on emotion word insertion
CN110674639A (en) * 2019-09-24 2020-01-10 拾音智能科技有限公司 Natural language understanding method based on pre-training model
CN113128227A (en) * 2020-01-14 2021-07-16 普天信息技术有限公司 Entity extraction method and device
CN111753089A (en) * 2020-06-28 2020-10-09 深圳壹账通智能科技有限公司 Topic clustering method and device, electronic equipment and storage medium
CN112084309A (en) * 2020-09-17 2020-12-15 北京中科微澜科技有限公司 License selection method and system based on open source software map
CN112364165A (en) * 2020-11-12 2021-02-12 上海犇众信息技术有限公司 Automatic classification method based on Chinese privacy policy terms
CN113268714A (en) * 2021-06-03 2021-08-17 西南大学 Automatic extraction method for license terms of open source software
CN114417851A (en) * 2021-12-03 2022-04-29 重庆邮电大学 Emotion analysis method based on keyword weighted information
CN114254653A (en) * 2021-12-23 2022-03-29 深圳供电局有限公司 Scientific and technological project text semantic extraction and representation analysis method

Also Published As

Publication number Publication date
CN115080924A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
WO2022134832A1 (en) Address information extraction method, apparatus and device, and storage medium
CN112836046A (en) Four-risk one-gold-field policy and regulation text entity identification method
CN111104803B (en) Semantic understanding processing method, device, equipment and readable storage medium
WO2024067276A1 (en) Video tag determination method and apparatus, device and medium
CN111143571B (en) Entity labeling model training method, entity labeling method and device
CN113821605A (en) Event extraction method
Yan et al. Response selection from unstructured documents for human-computer conversation systems
CN111738018A (en) Intention understanding method, device, equipment and storage medium
CN113988071A (en) Intelligent dialogue method and device based on financial knowledge graph and electronic equipment
WO2023169301A1 (en) Text processing method and apparatus, and electronic device
CN115080924B (en) Software license clause extraction method based on natural language understanding
WO2023087935A1 (en) Coreference resolution method, and training method and apparatus for coreference resolution model
CN111178043A (en) Method and system for recognizing academic viewpoint sentence
CN116186232A (en) Standard knowledge intelligent question-answering implementation method, device, equipment and medium
CN116483314A (en) Automatic intelligent activity diagram generation method
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
Lee Natural Language Processing: A Textbook with Python Implementation
CN113177121A (en) Text topic classification method and device, electronic equipment and storage medium
Sarkar et al. Bengali noun phrase chunking based on conditional random fields
CN112925961A (en) Intelligent question and answer method and device based on enterprise entity
CN112988952B (en) Multi-level-length text vector retrieval method and device and electronic equipment
US20240143644A1 (en) Event detection
CN116227496B (en) Deep learning-based electric public opinion entity relation extraction method and system
CN115048924B (en) Negative sentence identification method based on negative prefix and suffix information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant