CN116975040A - Dangerous chemical information management method, device, equipment and readable storage medium - Google Patents
Dangerous chemical information management method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN116975040A CN116975040A CN202311024861.9A CN202311024861A CN116975040A CN 116975040 A CN116975040 A CN 116975040A CN 202311024861 A CN202311024861 A CN 202311024861A CN 116975040 A CN116975040 A CN 116975040A
- Authority
- CN
- China
- Prior art keywords
- data
- dangerous chemical
- dangerous
- target
- labeling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000000126 substance Substances 0.000 title claims abstract description 229
- 238000007726 management method Methods 0.000 title claims abstract description 53
- 238000003860 storage Methods 0.000 title claims description 14
- 238000000605 extraction Methods 0.000 claims abstract description 121
- 238000002372 labelling Methods 0.000 claims abstract description 100
- 238000004458 analytical method Methods 0.000 claims abstract description 85
- 238000012545 processing Methods 0.000 claims abstract description 44
- 238000004140 cleaning Methods 0.000 claims abstract description 26
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 238000003908 quality control method Methods 0.000 claims abstract description 19
- 230000000007 visual effect Effects 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims description 62
- 239000000383 hazardous chemical Substances 0.000 claims description 61
- 238000012549 training Methods 0.000 claims description 35
- 230000008569 process Effects 0.000 claims description 28
- 238000011156 evaluation Methods 0.000 claims description 17
- 238000012795 verification Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013481 data capture Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- -1 are flammable Substances 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000007636 ensemble learning method Methods 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000009931 harmful effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000011867 re-evaluation Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012430 stability testing Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Educational Administration (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a dangerous chemical information management method, a dangerous chemical information management device and a dangerous chemical information management method, wherein the dangerous chemical information management method comprises the following steps: determining a target data source, grabbing and collecting information of each dangerous chemical in the target data source, and after data cleaning and data quality control processing, finishing to generate initial dangerous chemical data; preprocessing the initial dangerous chemical data to generate dangerous chemical labeling data; selecting a target extraction mode matched with the dangerous chemical labeling data, and carrying out feature extraction on the dangerous chemical labeling data by utilizing the target extraction mode to obtain the numerical features of the dangerous chemical; and acquiring a target dangerous chemical analysis model which is pre-constructed and trained and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemicals by utilizing the target dangerous chemical analysis model to generate a visual analysis result. The application solves the problems of data quality and non-uniformity, removes complex redundant information, and greatly improves the working efficiency and accuracy when the complex redundant information is applied to an actual working scene.
Description
Technical Field
The present application relates to the field of information processing, and more particularly, to a method, apparatus, device, and readable storage medium for managing hazardous chemical substance information.
Background
Hazardous chemicals, i.e., dangerous chemicals, are flammable, explosive, toxic, harmful, radioactive, etc., which are susceptible to casualties and property damage during transportation, loading, unloading, storage and preservation, and thus require special protection. The hazardous chemicals industry has the following information management difficulties due to its industry particularities and complexity:
first, the hazardous chemicals industry involves a lot of complex information and regulations, such as hazardous materials classification, storage requirements, transportation requirements, etc., and the information redundancy is complicated due to its dangers.
Second, since information from the hazardous chemicals industry may come from multiple sources and formats, the quality and consistency of the data is not guaranteed, which results in difficulties in information collection and aggregation.
Third, the hazardous chemical industry belongs to a special technical industry, and specific terms and rules exist in the industry.
Based on the information, the application provides a hazardous chemical substance information management scheme aiming at the hazardous chemical substance industry so as to solve the problems.
Disclosure of Invention
In view of the above, the application provides a dangerous chemical information management method, a device, equipment and a readable storage medium, which solve the problems of data quality and non-uniformity, remove complex redundant information, solve the professional language problem in the dangerous chemical field, build perfect and professional analysis results, and greatly improve the working efficiency and accuracy when being applied to actual working scenes.
A dangerous chemical information management method comprises the following steps:
determining a target data source, grabbing and collecting information of each dangerous chemical in the target data source, and after data cleaning and data quality control processing, finishing to generate initial dangerous chemical data;
preprocessing the initial dangerous chemical data to generate dangerous chemical labeling data;
selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and extracting features of the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information;
and acquiring a target dangerous chemical analysis model which is pre-constructed and trained and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemical by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
Optionally, the determining a target data source, grabbing and collecting each piece of hazardous chemical substance information in the target data source, and after data cleaning and data quality control processing, finishing to generate initial hazardous chemical substance data, including:
determining a target data source, and grabbing and collecting each piece of dangerous chemical information in the target data source by using a corresponding data grabbing tool and grabbing form to obtain grabbing dangerous chemical data;
Sequentially carrying out data cleaning on the grabbing dangerous chemical data through denoising, data screening, data result standardization and missing data processing;
and verifying and checking the consistency of the cleaned data, and generating initial dangerous article data after the verification and the check pass through the post-finishing.
Optionally, preprocessing the initial hazardous chemical substance data to generate hazardous chemical substance labeling data, including:
performing word segmentation on the initial dangerous article data, and performing part-of-speech tagging on the segmented text to generate first tagging data;
identifying and labeling named entities in the first labeling data, and generating second labeling data;
performing text normalization and text vectorization on the second annotation data to generate third annotation data;
and labeling entity relationship of the third labeling data by extracting the relationship among the named entities, and generating dangerous chemical labeling data.
Optionally, before the entity relationship labeling is performed on the third labeling data by extracting the relationship between the named entities, the method further includes:
and carrying out synonym replacement and normalization processing on each named entity in the third labeling data.
Optionally, before the entity relationship labeling is performed on the third labeling data by extracting the relationship between the named entities, the method further includes:
and performing stop word filtering processing on the third annotation data according to the stop word set by the user.
Optionally, the preset feature extraction modes include feature extraction of a word bag model, feature extraction of word embedding, feature extraction of TF-IDF, feature extraction of N-gram, feature extraction of topic modeling, grammar and syntax feature extraction, entity feature extraction and text structure feature extraction.
Optionally, the process of constructing and training to obtain the target dangerous chemical analysis model matched with the user management requirement includes:
determining a matched analysis model framework based on the user management requirements and the data types of the dangerous chemical numerical characteristics, and constructing an initial dangerous chemical analysis model;
training the initial dangerous article analysis model by using training data, and performing performance evaluation on the initial dangerous article analysis model by using test data after every preset training times;
and stopping training and generating a target dangerous chemical analysis model when the performance evaluation result accords with the preset condition. A dangerous chemical information management device comprises a memory and a processor;
The memory is used for storing programs;
the processor is configured to execute the program to implement each step of the hazardous chemical substance information management method according to any one of the above.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the hazardous chemical information management method according to any one of the above.
According to the technical scheme, the method, the device, the equipment and the readable storage medium for managing the dangerous chemical information provided by the embodiment of the application are characterized in that the target data source is determined, the information of each dangerous chemical in the target data source is captured and collected, and the initial dangerous chemical data is generated after data cleaning and data quality control processing. And then preprocessing the initial hazardous chemical substance data to generate hazardous chemical substance labeling data. And selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and carrying out feature extraction on the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information. And finally, acquiring a target dangerous chemical analysis model which is pre-constructed and trained and is matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemicals by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
According to the application, the information of each dangerous chemical in each designated target data source is captured and collected, and specific data cleaning, data quality control and preprocessing are used for dangerous chemical industries, so that the problems of data quality and non-uniformity are solved, and complex redundant information is removed. After feature extraction is performed on the dangerous chemical labeling data, knowledge extraction and summarization are performed on the numerical features of the dangerous chemicals by using a model which is trained and optimized for the dangerous chemical industry, a visual analysis result is generated, the professional language problem in the dangerous chemical field is solved, a perfect and professional analysis result is built, and the method is applied to an actual working scene, so that the working efficiency and accuracy are greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a method for managing information of hazardous chemical substances according to the present application;
FIG. 2 is a block diagram of a dangerous chemical information management device according to the present application;
fig. 3 is a block diagram of a hardware structure of a dangerous chemical information management device disclosed by the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, distributed computing environments that include any of the above systems or devices, and the like.
The embodiment of the application provides a dangerous chemical information management method, which can be applied to various dangerous chemical information management platforms or systems, and can also be applied to various computer terminals or intelligent terminals, wherein an execution subject can be a processor or a server of the computer terminal or the intelligent terminal.
The following technical scheme is presented in the following description, and the specific reference is made to the following.
Fig. 1 is a schematic diagram of a method for managing information of dangerous chemicals, which is disclosed in the present application, as shown in fig. 1, and the method may include:
s1, determining a target data source, grabbing and collecting information of each dangerous chemical in the target data source, and after data cleaning and data quality control processing, finishing to generate initial dangerous chemical data.
Specifically, the data sources, i.e., target data sources, from which the collected hazardous chemical information, e.g., regulations, documents, reports, papers, news, etc., may be captured, may be collected from a variety of internal and external sources, as determined by the need to collect the information. And then, the data of the obtained dangerous chemical information can be cleaned, so that irrelevant or erroneous data in the dangerous chemical information can be removed, the unification of the data structure is carried out, and the accuracy and the integrity of the data are ensured through the data quality control processing.
And S2, preprocessing the initial dangerous chemical data to generate dangerous chemical labeling data.
Specifically, in the hazardous chemicals industry, the collected text data may contain a large amount of noise, irrelevant information or redundant content, and in order to ensure the accuracy and reliability of subsequent text analysis and information extraction, the initial hazardous chemicals data needs to be preprocessed. Preprocessing may include processes of word segmentation, part-of-speech tagging, entity recognition, etc., to better understand text content.
And S3, selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and carrying out feature extraction on the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information.
Specifically, in the hazardous chemical substance industry, hazardous chemical substance labeling data can be converted into a characteristic representation form which can be processed by a machine learning model, and the characteristic extraction aims at extracting numerical characteristics capable of expressing semantic and structural information from original text data. One or more extraction modes matched with the dangerous chemical labeling data can be selected from a plurality of preset feature extraction modes, and the dangerous chemical labeling data is used as a target extraction mode for extracting physical signs.
And S4, acquiring a target dangerous chemical analysis model which is pre-constructed and trained and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemicals by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
Specifically, a target dangerous chemical analysis model matched with the user management requirement is built and trained in advance, and then knowledge extraction and summarization can be carried out on the numerical characteristics of the dangerous chemicals by using the target dangerous chemical analysis model to obtain a visual analysis result.
According to the technical scheme, the method, the device, the equipment and the readable storage medium for managing the dangerous chemical information provided by the embodiment of the application are characterized in that the target data source is determined, the information of each dangerous chemical in the target data source is captured and collected, and the initial dangerous chemical data is generated after data cleaning and data quality control processing. And then preprocessing the initial hazardous chemical substance data to generate hazardous chemical substance labeling data. And selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and carrying out feature extraction on the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information. And finally, acquiring a target dangerous chemical analysis model which is pre-constructed and trained and is matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemicals by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
According to the application, the information of each dangerous chemical in each designated target data source is captured and collected, and specific data cleaning, data quality control and preprocessing are used for dangerous chemical industries, so that the problems of data quality and non-uniformity are solved, and complex redundant information is removed. After feature extraction is performed on the dangerous chemical labeling data, knowledge extraction and summarization are performed on the numerical features of the dangerous chemicals by using a model which is trained and optimized for the dangerous chemical industry, a visual analysis result is generated, the professional language problem in the dangerous chemical field is solved, a perfect and professional analysis result is built, and the method is applied to an actual working scene, so that the working efficiency and accuracy are greatly improved.
In some embodiments of the present application, the process of capturing and collecting each piece of hazardous chemical substance information in the target data source, and generating the initial hazardous chemical substance data after data cleaning and data quality control processing is introduced in step S1, where the process specifically includes:
and S11, determining a target data source, and grabbing and collecting each piece of dangerous chemical information in the target data source by using a corresponding data grabbing tool and grabbing form to obtain grabbing dangerous chemical data.
Specifically, in the process of data capture, according to different data sources, appropriate methods and tools should be used to capture data, that is, capture and collect each piece of hazardous chemical information in the target data source in a corresponding data capture tool and capture form, for example, capture can be performed in multiple manners such as manual collection, web page crawling, document downloading, and the like. The collected data is sorted, categorized and organized for subsequent cleaning and processing. Data collation may be performed using folders, databases, or other structured tools.
And step S12, sequentially carrying out denoising, data screening, data result standardization and missing data processing on the grabbing dangerous chemical data to clean the data.
Specifically, in the present application, the data cleaning method includes denoising, data screening, data result standardization and missing data processing, but it should be understood that the data cleaning process should include, but is not limited to, the main purpose of data cleaning is to remove irrelevant or erroneous data in the captured hazardous chemical data, and generate a unified data format for facilitating subsequent data preprocessing and data analysis. The following describes denoising, data screening, data result standardization and missing data processing respectively.
Denoising: during the data collection process, some irrelevant or erroneous data may be collected, and the erroneous, incomplete or repeated data is removed by noise filtering, so as to maintain the accuracy and consistency of the data.
Data screening treatment: the data is filtered according to the specific requirements and the focus of the user, and the specific data is removed or reserved. For example, only information related to hazardous chemical classifications, storage requirements, and transportation requirements is retained.
Data result normalization processing: the text data is preprocessed, including special characters, stop words, HTML marks, case-to-case conversion, and the like. This helps to improve the accuracy of subsequent text processing and analysis.
Data result standardization processing: if the data is from a different source, format and structure differences may exist. The data can be integrated and standardized to have a consistent format and structure, so that the subsequent data analysis and processing are convenient.
Missing data processing: missing information, such as records lacking key fields, may be present in the data, and missing value problems in the data may be handled by employing methods that fill, interpolate, or delete the missing data.
And step S13, verifying and checking the consistency of the cleaned data, and generating initial dangerous article data after the verification and the check pass through the post-finishing.
Specifically, after data cleaning is completed, verification and data consistency check can be performed on the cleaned data, and initial dangerous article data is generated after verification and check pass through post-finishing. Verification of the cleaned data may ensure accuracy and integrity of the data, including verification of the data by comparison with other reliable data sources, or by verifying whether certain attributes of the data are expected. A data consistency check is a check of whether there is an inconsistency in the data, e.g. different expressions of the same fact or contradictory information. Consistency checking and correction are performed on the data through rules or algorithms.
In addition, data monitoring can be set, and the quality and accuracy of data can be checked periodically by establishing a data monitoring mechanism, and the discovered problems can be corrected and optimized.
In some embodiments of the present application, the process of preprocessing the initial hazardous chemical substance data and generating hazardous chemical substance labeling data in step S2 may specifically include:
and S21, segmenting the initial dangerous article data, and marking the segmented text in part of speech to generate first marking data.
Specifically, the word segmentation process of the initial dangerous article data is to segment a text. For Chinese word segmentation, continuous text flow is segmented into meaningful words or word sequences, and common Chinese word segmentation algorithms comprise dictionary-based forward maximum matching, reverse maximum matching, bidirectional maximum matching and the like, and for English word segmentation, english text is segmented according to space and punctuation marks, and the text is segmented into single words. The process of part-of-speech tagging the segmented text assigns each word of the segmented text its part of speech in a sentence, e.g., noun, verb, adjective, etc. Common part-of-speech tagging algorithms include Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs).
And S22, identifying and labeling the named entities in the first labeling data, and generating second labeling data.
Specifically, named entities in the text, such as person names, place names, organization names and the like, can be identified and labeled by using machine learning technology and models, so that entity relationships and contexts in the text can be better understood in subsequent data processing and dividing processes.
And S23, performing text normalization and text vectorization on the second annotation data to generate third annotation data.
In particular, text normalization helps remove noise and redundant information in text, including case-to-case conversion, stop words (e.g., common articles, conjunctions, prepositions, etc.), punctuation marks, and the like.
Text vectorization is a vector representation that can be processed by machine learning algorithms to convert text, and common text vectorization methods include bag of words (BoW) models and Word embedding techniques such as Word2Vec and GloVe.
And S24, labeling the entity relationship of the third labeling data by extracting the relationship among the named entities, and generating dangerous chemical labeling data.
Specifically, the entity relationship labeling process is performed on the third labeling data by extracting the relationship between the named entities, namely entity relationship extraction is performed, the relationship and the relation between the entities are extracted by analyzing the semantics and the relationship in the text, and the relationship between the entities can be automatically extracted from the text by using a deep learning model, such as a relationship extraction model based on a neural network.
Optionally, in order to further improve the consistency and accuracy of information extraction, reduce the dimensionality and complexity of the text, and improve the extraction effect, before entity relationship labeling is performed on the third labeling data by extracting the relationship between named entities, and dangerous chemical labeling data is generated, a synonym replacement process or a deactivated word filtering process may be performed, that is:
(1) and before entity relation labeling is carried out on the third labeling data by extracting the relation among the named entities to generate dangerous chemical labeling data, carrying out synonym replacement and normalization processing on the named entities in the third labeling data.
And replacing or normalizing the synonyms in the text to reduce the diversity of words in the text and improve the consistency and accuracy of information extraction.
(2) And before the third labeling data is labeled for entity relation by extracting the relation among the named entities and the dangerous chemical labeling data is generated, carrying out stop word filtering processing on the third labeling data according to stop words set by a user.
By removing the common stop words, words which do not contribute to information extraction in the text are filtered, so that the dimension and complexity of the text are reduced, and the extraction effect is improved.
In some embodiments of the present application, the preset feature extraction modes may include word bag model feature extraction, word embedding feature extraction, TF-IDF feature extraction, N-gram feature extraction, topic modeling feature extraction, grammar and syntax feature extraction, entity feature extraction, and text structure feature extraction.
These several feature extraction modes are described below.
Extracting word bag model features:
the bag of words model is used to transform text into a matrix of values that treats the text as a collection of words, each text being represented as a vector containing word frequencies or the presence or absence of words, regardless of word order and context.
Word embedding feature extraction:
word embedding is a technique that maps words into a continuous vector space that is able to capture semantic relationships and context information between words. Common Word embedding algorithms include Word2Vec and GloVe. Word embedding may be obtained through a pre-training model, or may be trained in the current task using a neural network model.
TF-IDF feature extraction:
TF-IDF is a statistical method of measuring the importance of words in text by calculating the frequency of words in text and the inverse document frequency in the whole dataset to obtain a value representing the importance of words.
N-gram feature extraction:
n-gram is a method of taking N continuous words as a feature, and by extracting N continuous word sequences, more abundant context information can be captured. Common N values are 2 or 3.
Extracting theme modeling features:
topic modeling is a method for discovering hidden topics in a collection of documents. By mapping words in the text to the topic space, the topic distribution of the text can be characterized. A common topic modeling algorithm includes Latent Dirichlet Allocation (LDA).
Grammar and syntax feature extraction:
in addition to word-level features, grammatical and syntactic features of text may be extracted. For example, parts-of-speech tagging and syntactic analysis techniques may be used to extract syntactic structure information, such as noun phrases, verb phrases, and the like.
And (3) extracting entity characteristics:
in the task of named entity recognition, contextual features of the entity, such as front and back words, parts of speech, etc., may be extracted, which may be used in the task of recognition and relationship extraction.
Text structural feature extraction:
text structure feature extraction includes extracting the length and structure of text. For example, the character length, number of words, number of sentences, number of paragraphs, etc. of text may reflect the complexity and organization of the text.
In some embodiments of the present application, a process of constructing and training to obtain a target dangerous chemical analysis model matched with a user management requirement is described, which specifically may include:
(1) and determining a matched analysis model framework based on the user management requirements and the data types of the dangerous chemical numerical characteristics, and constructing an initial dangerous chemical analysis model.
Specifically, an appropriate analysis model architecture is selected according to the requirements of tasks, namely user management requirements, and data characteristics, namely data types of the numerical characteristics of the dangerous chemicals. In knowledge extraction, commonly used models include traditional machine learning models (e.g., support vector machines, naive bayes, etc.) and deep learning models (e.g., convolutional neural networks, recurrent neural networks, attention mechanisms, etc.). Different model architectures may be selected for different user management needs and data types. For example, for text classification tasks, convolutional neural networks or cyclic neural network models may be used, for named entity recognition tasks, conditional random field models may be used, and so on.
(2) Training the initial dangerous article analysis model by using training data, and performing performance evaluation on the initial dangerous article analysis model by using test data after every preset training times.
Specifically, before model development and training is started, labeled training data is needed, the training data is a data set containing input text and corresponding labels, wherein the labels can be classification, named entity identification, relation extraction and the like, the training data should be representative and cover various possible input conditions, and appropriate balance processing should be performed to avoid the problem of class imbalance. Before model training, the input text is required to be subjected to feature engineering, the text is converted into numerical features which can be processed by a machine learning model, the numerical features can be processed by technologies including word bag models, TF-IDF, word embedding and the like, the aim of the feature engineering is to select and design the most representative features, semantic and structural information of the text is fully expressed, and the performance and generalization capability of the model can be improved by proper feature engineering. Training is carried out on the initial dangerous article analysis model by using the prepared training data, wherein the training process is to send input samples and corresponding labels into the model, and update parameters of the model through an optimization algorithm so that the parameters can gradually fit the relation between input and output. Appropriate regularization and overfitting prevention treatments are performed during model training to avoid model performance degradation on unseen data.
The process of model evaluation and optimization can be further divided into the following steps:
(1) And (3) evaluation index selection: first, an appropriate evaluation index needs to be selected to measure the performance of the knowledge extraction model. Common evaluation indexes include accuracy, recall, precision, F1 value, and the like. The selection of the appropriate evaluation index depends on the specific requirements of the task and the characteristics of the data.
(2) State judgment: for some tasks, such as classification tasks, the output of the model is a class label. At this time, the confusion matrix may be used to evaluate the results, and thus calculate indexes such as accuracy and recall. For tasks such as named entity recognition and relationship extraction, the output of the model is a sequence annotation. At this time, evaluation indexes of sequence labeling such as BIO accuracy, recall, and F1 value, etc. may be used.
(3) Error analysis: during the outcome evaluation process, an error analysis is required to understand in which cases the model performs poorly, thereby optimizing the model. Error analysis may include looking at misclassified cases, and misinformation cases of the model, and counting and analyzing various error types.
(4) Model optimization: the model may be optimized based on error analysis. Common model optimization methods include:
Adjusting model parameters, namely, adjusting super parameters of the model, such as learning rate, regularization parameters and the like, so as to obtain better performance;
the generalization capability of the model can be improved by adding training data, namely by adding the diversity and the quantity of the training data;
feature engineering optimization, namely improving the performance of a model by adjusting modes such as feature representation, feature selection, feature extraction and the like;
model integration: by using an ensemble learning method, such as random forests, gradient lifting trees and the like, prediction results of a plurality of models are combined, and the performance of the models is improved.
(5) Difficult sample handling: based on the results of the error analysis, the problem of poor performance of the model on difficult samples can be addressed. Strategies such as increasing the number of samples of a particular class, adjusting sample weights, etc. may be employed to improve the performance of the model on difficult samples.
(6) Iterative optimization: result evaluation and optimization is an iterative process. After optimizing the model, re-evaluation and analysis are needed to be carried out on the optimized model, so that the performance of the model is further optimized. Iterative optimization may be performed multiple times until a satisfactory result is achieved.
(3) And stopping training and generating a target dangerous chemical analysis model when the performance evaluation result accords with the preset condition.
Specifically, in the training process of the application, performance evaluation is carried out on the initial dangerous chemical analysis model by using test data after every preset training times, and evaluation indexes can be selected according to specific requirements of tasks, such as accuracy, recall rate, F1 value and the like. The performance of the model can be improved by adjusting parameters of the model, adjusting characteristic representation, increasing or reducing data quantity and the like, the performance of the model can be more comprehensively estimated by using methods such as cross verification and the like, errors caused by randomness of data division are reduced, and training is stopped and a target dangerous chemical analysis model is generated when the performance estimation result meets preset conditions.
In practical application of the application, after the trained target dangerous chemical analysis model is obtained, the target dangerous chemical analysis model is required to be deployed and integrated.
Before deploying the target hazardous chemical analysis model, a suitable deployment environment including server hardware, an operating system, related software, libraries, and the like needs to be prepared. The method can be selectively deployed in environments such as a local server, a cloud server or a container according to actual requirements. Before deployment, the target hazardous chemical analysis model needs to be packaged and exported. The purpose of the encapsulation model is to package the model, preprocess, and other dependencies into a unit that can be run independently. The export of the model may take on common model export formats (e.g., HDF5, ONNX, etc.) for loading and use in the deployment environment.
A common way in deployment is to make model calls and services through APIs (application program interfaces). An API needs to be developed that includes logic to receive input data requests, call the model to reason, process model outputs, and return results. Prior to formal deployment, verification and testing is required to ensure that the model is working properly in the deployment environment and is able to handle the expected inputs and outputs. Verification and testing may include functional testing, performance testing, stability testing, and the like, using some sample data sets.
In addition, the target hazardous chemical analysis model typically needs to be integrated with other systems or processes to achieve comprehensive application. The integration function comprises integration with front-end applications, databases, processing flows and the like to realize functions of data input and output, result storage and display and the like.
In a practical deployment, continuous performance monitoring and optimization is required to ensure stability and usability of the model. The model can be monitored and analyzed by using a performance monitoring tool and a log recording technology, and can be optimized in time, such as a method for adjusting system configuration, improving performance and the like.
The dangerous chemical information management device provided by the embodiment of the application is described below, and the dangerous chemical information management device described below and the dangerous chemical information management method described above can be correspondingly referred to each other.
Referring to fig. 2, fig. 2 is a block diagram of a dangerous chemical information management device according to an embodiment of the present application.
As shown in fig. 2, the hazardous chemical substance information management device may include:
the data capturing unit 110 is configured to determine a target data source, capture and collect each piece of hazardous chemical substance information in the target data source by using a corresponding data capturing tool and capturing form, and perform data cleaning and data quality control processing to post-sort and generate initial hazardous chemical substance data;
the data processing unit 120 is configured to perform preprocessing on the initial hazardous chemical substance data to generate hazardous chemical substance labeling data;
the feature extraction unit 130 is configured to select a target extraction mode matched with the hazardous chemical substance labeling data from a plurality of preset feature extraction modes, and perform feature extraction on the hazardous chemical substance labeling data by using the target extraction mode to obtain a hazardous chemical substance numerical feature capable of expressing semantic and structural information;
the knowledge extraction unit 140 is configured to obtain a target dangerous chemical analysis model that is pre-constructed and trained and matches with a user management requirement, and perform knowledge extraction and summarization on the numerical characteristics of the dangerous chemical by using the target dangerous chemical analysis model, so as to generate a visual analysis result.
According to the technical scheme, the method, the device, the equipment and the readable storage medium for managing the dangerous chemical information provided by the embodiment of the application are characterized in that the target data source is determined, the information of each dangerous chemical in the target data source is captured and collected, and the initial dangerous chemical data is generated after data cleaning and data quality control processing. And then preprocessing the initial hazardous chemical substance data to generate hazardous chemical substance labeling data. And selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and carrying out feature extraction on the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information. And finally, acquiring a target dangerous chemical analysis model which is pre-constructed and trained and is matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemicals by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
According to the application, the information of each dangerous chemical in each designated target data source is captured and collected, and specific data cleaning, data quality control and preprocessing are used for dangerous chemical industries, so that the problems of data quality and non-uniformity are solved, and complex redundant information is removed. After feature extraction is performed on the dangerous chemical labeling data, knowledge extraction and summarization are performed on the numerical features of the dangerous chemicals by using a model which is trained and optimized for the dangerous chemical industry, a visual analysis result is generated, the professional language problem in the dangerous chemical field is solved, a perfect and professional analysis result is built, and the method is applied to an actual working scene, so that the working efficiency and accuracy are greatly improved.
Optionally, the data grabbing unit executes the process of determining a target data source, grabbing and collecting each piece of hazardous chemical substance information in the target data source, and generating initial hazardous chemical substance data through data cleaning and data quality control processing after finishing, and may include:
determining a target data source, and grabbing and collecting each piece of dangerous chemical information in the target data source by using a corresponding data grabbing tool and grabbing form to obtain grabbing dangerous chemical data;
sequentially carrying out data cleaning on the grabbing dangerous chemical data through denoising, data screening, data result standardization and missing data processing;
and verifying and checking the consistency of the cleaned data, and generating initial dangerous article data after the verification and the check pass through the post-finishing.
Optionally, the data processing unit performs preprocessing on the initial hazardous chemical substance data to generate the process of hazardous chemical substance labeling data, which may include:
performing word segmentation on the initial dangerous article data, and performing part-of-speech tagging on the segmented text to generate first tagging data;
identifying and labeling named entities in the first labeling data, and generating second labeling data;
Performing text normalization and text vectorization on the second annotation data to generate third annotation data;
and labeling entity relationship of the third labeling data by extracting the relationship among the named entities, and generating dangerous chemical labeling data.
Optionally, the data processing unit may be further configured to perform entity relationship labeling on the third labeling data by extracting a relationship between named entities, and perform synonym replacement and normalization processing on each named entity in the third labeling data before generating dangerous chemical labeling data.
Optionally, the data processing unit may be further configured to perform entity relationship labeling on the third labeling data by extracting a relationship between named entities, and perform a deactivated word filtering process on the third labeling data according to a deactivated word set by a user before generating dangerous chemical labeling data.
Optionally, the preset feature extraction modes include feature extraction of a word bag model, feature extraction of word embedding, feature extraction of TF-IDF, feature extraction of N-gram, feature extraction of topic modeling, grammar and syntax feature extraction, entity feature extraction and text structure feature extraction.
Optionally, the process of constructing and training to obtain the target dangerous chemical analysis model matched with the user management requirement may include:
determining a matched analysis model framework based on the user management requirements and the data types of the dangerous chemical numerical characteristics, and constructing an initial dangerous chemical analysis model;
training the initial dangerous article analysis model by using training data, and performing performance evaluation on the initial dangerous article analysis model by using test data after every preset training times;
and stopping training and generating a target dangerous chemical analysis model when the performance evaluation result accords with the preset condition.
The dangerous chemical information management device provided by the embodiment of the application can be applied to dangerous chemical information management equipment. Optionally, fig. 3 shows a hardware configuration block diagram of the hazardous chemical substance information management apparatus, and referring to fig. 3, the hardware configuration of the hazardous chemical substance information management apparatus may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
Processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
determining a target data source, grabbing and collecting information of each dangerous chemical in the target data source, and after data cleaning and data quality control processing, finishing to generate initial dangerous chemical data;
preprocessing the initial dangerous chemical data to generate dangerous chemical labeling data;
selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and extracting features of the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information;
and acquiring a target dangerous chemical analysis model which is pre-constructed and trained and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemical by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to:
determining a target data source, grabbing and collecting information of each dangerous chemical in the target data source, and after data cleaning and data quality control processing, finishing to generate initial dangerous chemical data;
preprocessing the initial dangerous chemical data to generate dangerous chemical labeling data;
selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and extracting features of the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information;
and acquiring a target dangerous chemical analysis model which is pre-constructed and trained and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemical by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The dangerous chemical information management method is characterized by comprising the following steps of:
determining a target data source, grabbing and collecting information of each dangerous chemical in the target data source, and after data cleaning and data quality control processing, finishing to generate initial dangerous chemical data;
preprocessing the initial dangerous chemical data to generate dangerous chemical labeling data;
selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and extracting features of the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information;
and acquiring a target dangerous chemical analysis model which is pre-constructed and trained and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemical by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
2. The method of claim 1, wherein the determining a target data source, grabbing and collecting each piece of hazardous chemical substance information in the target data source, and performing data cleaning and data quality control processing and post-finishing to generate initial hazardous chemical substance data comprises:
Determining a target data source, and grabbing and collecting each piece of dangerous chemical information in the target data source by using a corresponding data grabbing tool and grabbing form to obtain grabbing dangerous chemical data;
sequentially carrying out data cleaning on the grabbing dangerous chemical data through denoising, data screening, data result standardization and missing data processing;
and verifying and checking the consistency of the cleaned data, and generating initial dangerous article data after the verification and the check pass through the post-finishing.
3. The method of claim 1, wherein preprocessing the initial hazardous chemical data to generate hazardous chemical labeling data comprises:
performing word segmentation on the initial dangerous article data, and performing part-of-speech tagging on the segmented text to generate first tagging data;
identifying and labeling named entities in the first labeling data, and generating second labeling data;
performing text normalization and text vectorization on the second annotation data to generate third annotation data;
and labeling entity relationship of the third labeling data by extracting the relationship among the named entities, and generating dangerous chemical labeling data.
4. The method of claim 3, further comprising, prior to generating the hazardous chemical substance labeling data, performing entity relationship labeling on the third labeling data by extracting relationships between named entities:
and carrying out synonym replacement and normalization processing on each named entity in the third labeling data.
5. The method of claim 3, further comprising, prior to generating the hazardous chemical substance labeling data, performing entity relationship labeling on the third labeling data by extracting relationships between named entities:
and performing stop word filtering processing on the third annotation data according to the stop word set by the user.
6. The method of claim 1, wherein the predetermined number of feature extraction modes include bag of word model feature extraction, word embedding feature extraction, TF-IDF feature extraction, N-gram feature extraction, topic modeling feature extraction, grammar and syntax feature extraction, entity feature extraction, and text structure feature extraction.
7. The method of claim 1, wherein the process of constructing and training a target hazardous chemical analysis model that matches user management needs comprises:
Determining a matched analysis model framework based on the user management requirements and the data types of the dangerous chemical numerical characteristics, and constructing an initial dangerous chemical analysis model;
training the initial dangerous article analysis model by using training data, and performing performance evaluation on the initial dangerous article analysis model by using test data after every preset training times;
and stopping training and generating a target dangerous chemical analysis model when the performance evaluation result accords with the preset condition.
8. A hazardous chemical substance information management device, comprising:
the data grabbing unit is used for determining a target data source, grabbing and collecting each piece of dangerous chemical information in the target data source by using a corresponding data grabbing tool and grabbing form, and after data cleaning and data quality control processing, preparing initial dangerous chemical data;
the data processing unit is used for preprocessing the initial dangerous article data and generating dangerous article labeling data;
the feature extraction unit is used for selecting a target extraction mode matched with the dangerous chemical labeling data from a plurality of preset feature extraction modes, and extracting features of the dangerous chemical labeling data by utilizing the target extraction mode to obtain dangerous chemical numerical features capable of expressing semantic and structural information;
The knowledge extraction unit is used for obtaining a target dangerous chemical analysis model which is built and trained in advance and matched with the user management requirement, and carrying out knowledge extraction and summarization on the numerical characteristics of the dangerous chemicals by utilizing the target dangerous chemical analysis model to generate a visual analysis result.
9. The dangerous chemical information management device is characterized by comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the respective steps of the hazardous chemical substance information management method according to any one of claims 1 to 7.
10. A readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the hazardous chemical information management method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311024861.9A CN116975040A (en) | 2023-08-14 | 2023-08-14 | Dangerous chemical information management method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311024861.9A CN116975040A (en) | 2023-08-14 | 2023-08-14 | Dangerous chemical information management method, device, equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116975040A true CN116975040A (en) | 2023-10-31 |
Family
ID=88484955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311024861.9A Pending CN116975040A (en) | 2023-08-14 | 2023-08-14 | Dangerous chemical information management method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116975040A (en) |
-
2023
- 2023-08-14 CN CN202311024861.9A patent/CN116975040A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shah et al. | Resolving ambiguities in natural language software requirements: a comprehensive survey | |
US10540439B2 (en) | Systems and methods for identifying evidentiary information | |
US20120303661A1 (en) | Systems and methods for information extraction using contextual pattern discovery | |
US9754083B2 (en) | Automatic creation of clinical study reports | |
CN110543356A (en) | abnormal task detection method, device and equipment and computer storage medium | |
Zhou et al. | Recognizing software bug-specific named entity in software bug repository | |
Ciurumelea et al. | Suggesting comment completions for python using neural language models | |
CN113011156A (en) | Quality inspection method, device and medium for audit text and electronic equipment | |
Spyns et al. | Lexically evaluating ontology triples generated automatically from texts | |
CN112181490A (en) | Method, device, equipment and medium for identifying function category in function point evaluation method | |
Schraagen et al. | Extraction of semantic relations in noisy user-generated law enforcement data | |
CN113761875B (en) | Event extraction method and device, electronic equipment and storage medium | |
CN111460137B (en) | Method, equipment and medium for identifying micro-service focus based on topic model | |
WO2021080735A1 (en) | Automated exception featurization and search | |
Kramer et al. | Improvement of a naive Bayes sentiment classifier using MRS-based features | |
Panthum et al. | Generating functional requirements based on classification of mobile application user reviews | |
CN116400910A (en) | Code performance optimization method based on API substitution | |
Zhu et al. | A N-gram based approach to auto-extracting topics from research articles1 | |
CN116975040A (en) | Dangerous chemical information management method, device, equipment and readable storage medium | |
Arganese et al. | Nuts and bolts of extracting variability models from natural language requirements documents | |
US20210216721A1 (en) | System and method to quantify subject-specific sentiment | |
Butcher | Contract Information Extraction Using Machine Learning | |
US20240331815A1 (en) | Named-entity recognition of protected health information | |
CN112836477B (en) | Method and device for generating code annotation document, electronic equipment and storage medium | |
US20240346247A1 (en) | Artificial intelligence based log mask prediction for communications system testing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |