CN116663534A - Text data statistical analysis system and method based on natural language processing - Google Patents
Text data statistical analysis system and method based on natural language processing Download PDFInfo
- Publication number
- CN116663534A CN116663534A CN202310961991.9A CN202310961991A CN116663534A CN 116663534 A CN116663534 A CN 116663534A CN 202310961991 A CN202310961991 A CN 202310961991A CN 116663534 A CN116663534 A CN 116663534A
- Authority
- CN
- China
- Prior art keywords
- natural language
- data
- module
- language data
- management system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003058 natural language processing Methods 0.000 title claims abstract description 33
- 238000007619 statistical method Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000005516 engineering process Methods 0.000 claims abstract description 36
- 238000012216 screening Methods 0.000 claims abstract description 27
- 238000007405 data analysis Methods 0.000 claims abstract description 26
- 238000013523 data management Methods 0.000 claims abstract description 24
- 238000007726 management method Methods 0.000 claims abstract description 24
- 230000000007 visual effect Effects 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 230000004382 visual function Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000012800 visualization Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000007792 addition Methods 0.000 claims 1
- 238000012217 deletion Methods 0.000 claims 1
- 230000037430 deletion Effects 0.000 claims 1
- 238000001914 filtration Methods 0.000 claims 1
- 239000000463 material Substances 0.000 claims 1
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000013079 data visualisation Methods 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012098 association analyses Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a text data statistical analysis system and a text data statistical analysis method based on natural language processing, wherein the text data statistical analysis system comprises a natural language data screening system, a natural language data management system, a natural language data analysis system and a natural language chart visual management system, the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module, a directed graph consisting of a table and key words as nodes and ordered pairs as edges is automatically generated according to the statistical result and the user screening condition of the natural language chart visual management system, and the semantic matching module judges whether a natural language data query result matches a problem to be analyzed of a user by using a natural language processing technology. The invention can establish a knowledge base by utilizing a natural language processing technology and a knowledge graph technology, and then realize text data statistical analysis and text document data mining by utilizing an intelligent data analysis and visualization technology.
Description
Technical Field
The invention relates to the technical field of natural language analysis, in particular to a text data statistical analysis system and method based on natural language processing.
Background
With the continuous development of artificial intelligence technology, natural language semantic parsing and interaction technology is increasingly paid attention to. The current dialogue system has a corpus of the dialogue system per se aiming at a certain industry, and cannot carry out intelligent management and statistical calculation on data or relatively fix templates;
for large data centers, statistics are performed by reading text by manpower, so that a lot of time is consumed. Along with the explosive growth of the number of text documents, the manual work cannot meet the requirement of text data analysis, and the discretization storage of enterprise text documents is likely to cause that a large amount of important data information is lost without being mined, so that the waste of data resources is caused.
Therefore, the statistical analysis problem of the text document data of the enterprise is needed to be solved, and the key information is extracted to guide the production operation of the enterprise.
Disclosure of Invention
Aiming at the problems, the invention provides a text data statistical analysis system and a text data statistical analysis method based on natural language processing.
In order to solve the problems, the invention adopts the following technical scheme:
a text data statistical analysis system and method based on natural language processing comprises a natural language data screening system, a natural language data management system, a natural language data analysis system and a natural language chart visual management system;
the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module;
the natural language data management system comprises a natural language data construction module, a natural language data configuration module and a description semantic coding module;
the natural language data analysis system comprises a natural language understanding module, a natural language query module, a semantic matching module and a label semantic coding module;
the natural language chart visual management system is used for generating and visually displaying a data chart, the natural language chart visual management system provides a data chart generating template, reads the statistical result of the natural language chart visual management system according to the chart template, and automatically generates a directed graph consisting of a table and key words as nodes and ordered pairs as edges according to the statistical result of the natural language chart visual management system and user screening conditions;
the semantic matching module judges whether the natural language data query result matches the problem to be analyzed of the user or not by using a natural language processing technology, and the matched data needs to be included into statistics.
Preferably, the language data preprocessing module is used for preprocessing document texts, including corpus importing, format conversion and corpus cleaning.
Preferably, the natural language data screening module extracts document information from the document by using a natural language processing technology, provides knowledge data for a natural language data management system, and the information acquisition module is used for acquiring text description of data required to be accessed and called by a visitor and defining identity tag information of the visitor.
Preferably, the natural language data construction module is used for defining the field and the label of the natural language.
Preferably, the natural language data configuration module is used for configuring natural language data, establishing a mapping relation between the data and a map label, providing a data source for a subsequent natural language data analysis system, providing a visual function for a natural language data management system, adding, deleting and modifying natural language data, and processing the text description of the data required to be accessed and called by the visitor by the description semantic coding module, and then obtaining a resource description semantic feature vector through a semantic coder comprising an embedded layer.
Preferably, the natural language understanding module is connected with a user interaction interface to provide a user problem description template, and a user can input a problem to be analyzed according to the template in the user interaction interface, so that the problem to be analyzed by the user can be subjected to semantic extraction through a natural language processing technology based on the template and deep learning.
Preferably, the natural language query module queries and data statistics on the knowledge graph data by using a graph algorithm.
Preferably, the natural language query module includes a natural language algorithm, the natural language algorithm refers to a basic graph algorithm of a search algorithm class, natural language data query is performed according to the natural language algorithm, a query result is provided for a semantic matching module to judge, and the tag semantic coding module performs word segmentation processing on the identity tag information of the visitor and then obtains an identity tag semantic feature vector through the semantic encoder including an embedded layer.
Preferably, the natural language query module further comprises a natural language calculation function, wherein the natural language calculation function comprises basic statistical mathematical calculation of summation and difference calculation, and a statistical calculation result is used for being called by a natural language chart visualization management system.
A method of a text data statistical analysis system based on natural language processing, comprising the steps of:
s1, constructing a natural language data screening system, wherein the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module; constructing a natural language data management system, which comprises a natural language data construction module and a natural language data configuration module; constructing a natural language data analysis system, which comprises a natural language understanding module, a natural language query module and a semantic matching module; constructing a natural language chart visual management system;
s2, constructing a complete natural language data management system, completing the definition of the field and the tag data of the natural language, acquiring text description of the data required to be accessed and called by the visitor, and defining the identity tag information of the visitor;
s3, uploading natural language data to a corpus preprocessing module, and then carrying out corpus importing, format conversion and corpus cleaning on the natural language data;
s4, marking the natural language data, automatically extracting and importing the natural language data into a knowledge graph after marking, providing a data source for a subsequent natural language data analysis system, providing a visual function for a natural language data management system, and adding, deleting and modifying the natural language data;
s5, inputting the problem to be analyzed into a problem description template of the problem understanding module, and extracting semantics through a natural language processing technology based on the template and deep learning;
s6, generating a directed graph specifically as follows: the directed graph as=<M,N>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the vertex set,/>For an i-th user to demand keywords or entities, i=1,..n; />The r pieces of data information table are represented, r=1, a. m; the edge set is defined as CS =>Data information table associated with the ith user requirement keyword or entity; when the key words correspond to several fields of the same data information table, the field with the maximum similarity is selected, and the key words are added with the key words>。
The beneficial effects of the invention are as follows:
1. a knowledge base is established by utilizing a natural language processing technology and a knowledge graph technology, text data statistical analysis is realized by utilizing an intelligent data analysis and visualization technology, text document data mining is realized, unified data management and association analysis of the text documents of the same type are realized by utilizing the natural language processing technology, the knowledge graph technology and a graph algorithm technology, knowledge graph expansion and updating can be performed, and meanwhile, data analysis results are updated correspondingly.
2. By acquiring the text description of the data required to be accessed and called by the visitor, the text description of the data required to be accessed and called by the visitor and the identity tag information of the visitor are respectively subjected to self-adaptive semantic understanding by using a semantic understanding model for natural language processing, a text document data analysis result is automatically generated and displayed in a chart visually, and the readability of the data analysis result is enhanced.
3. The self-learning escape is added to the fields of different industries, and professional vocabularies of different industries can be learned, so that the query result has better industry pertinence and stronger practicability.
Drawings
FIG. 1 is a system block diagram of the present invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a text data statistical analysis system and method based on natural language processing includes a natural language data screening system, a natural language data management system, a natural language data analysis system and a natural language chart visualization management system;
the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module;
the natural language data management system comprises a natural language data construction module, a natural language data configuration module and a description semantic coding module;
the natural language data analysis system comprises a natural language understanding module, a natural language query module, a semantic matching module and a tag semantic coding module;
the natural language chart visual management system is used for generating and visually displaying a data chart, the natural language chart visual management system provides a data chart generating template, reads the statistical result of the natural language chart visual management system according to the chart template, and automatically generates a directed graph consisting of a table and key words as nodes and ordered pairs as edges according to the statistical result of the natural language chart visual management system and user screening conditions;
the semantic matching module judges whether the natural language data query result matches the problem to be analyzed of the user or not by using a natural language processing technology, and the matched data needs to be included into statistics.
Further, the language data preprocessing module is used for preprocessing document texts and comprises corpus importing, format converting and corpus cleaning.
Further, the natural language data screening module extracts document information by using a natural language processing technology, provides knowledge data for a natural language data management system, and the information acquisition module is used for acquiring text description of data required to be accessed and called by a visitor and defining identity tag information of the visitor.
Further, the natural language data construction module is used for defining the field and the label of the natural language.
Further, the natural language data configuration module is used for configuring natural language data, establishing a mapping relation between the data and the map label, providing a data source for a subsequent natural language data analysis system, providing a visual function for a natural language data management system, adding, deleting and modifying the natural language data, and processing text description of data required to be accessed and called by a visitor by the description semantic coding module, and obtaining a resource description semantic feature vector through a semantic coder comprising an embedded layer.
Further, the natural language understanding module is connected with the user interaction interface to provide a user problem description template, and a user can input the problem to be analyzed according to the template in the user interaction interface, so that the problem to be analyzed by the user can be subjected to semantic extraction through a natural language processing technology based on the template and deep learning.
Further, the natural language query module queries and data statistics on the knowledge graph data by using a graph algorithm.
Further, the natural language query module comprises a natural language algorithm technology, the natural language algorithm technology refers to a basic graph algorithm of a search algorithm class, natural language data query is carried out according to the natural language algorithm technology, a query result is used for judging by the semantic matching module, and the tag semantic coding module carries out word segmentation on identity tag information of a visitor and then obtains an identity tag semantic feature vector through a semantic encoder comprising an embedded layer.
Furthermore, the natural language query module also comprises a natural language calculation function, wherein the natural language calculation function comprises basic statistical mathematical calculation of summation and difference calculation, and the statistical calculation result is used for being called by a natural language chart visual management system.
Referring to fig. 2, a method of a text data statistical analysis system based on natural language processing includes the steps of:
s1, constructing a natural language data screening system, wherein the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module; constructing a natural language data management system, which comprises a natural language data construction module and a natural language data configuration module; constructing a natural language data analysis system, which comprises a natural language understanding module, a natural language query module and a semantic matching module; constructing a natural language chart visual management system;
s2, constructing a complete natural language data management system, completing the definition of the field and the tag data of the natural language, acquiring text description of the data required to be accessed and called by the visitor, and defining the identity tag information of the visitor;
s3, uploading natural language data to a corpus preprocessing module, and then carrying out corpus importing, format conversion and corpus cleaning on the natural language data;
s4, marking the natural language data, automatically extracting and importing the natural language data into a knowledge graph after marking, providing a data source for a subsequent natural language data analysis system, providing a visual function for a natural language data management system, and adding, deleting and modifying the natural language data;
s5, inputting the problem to be analyzed into a problem description template of the problem understanding module, and extracting semantics through a natural language processing technology based on the template and deep learning;
s6, generating a directed graph specifically as follows: the directed graph as=<M,N>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the vertex set,/>For an i-th user to demand keywords or entities, i=1,..n; />The r pieces of data information table are represented, r=1, a. m; the edge set is defined as CS =>Data information table associated with the ith user requirement keyword or entity; when the key words correspond to several fields of the same data information table, the field with the maximum similarity is selected, and the key words are added with the key words>。
In summary, the invention establishes a knowledge base by using a natural language processing technology and a knowledge graph technology, then realizes text data statistics analysis by using an intelligent data analysis and visualization technology, realizes text document data mining, realizes unified data management and association analysis of the text documents of the same type by using the natural language processing technology, the knowledge graph technology and a graph algorithm technology, can expand and update the knowledge graph, simultaneously correspondingly updates the data analysis result, respectively carries out self-adaptive semantic understanding on the text description of the data required to be accessed and invoked by the visitor and the identity tag information of the visitor by using a semantic understanding model for natural language processing by acquiring the text description of the data required to be accessed and invoked by the visitor, intuitively displays the text document data analysis result, enhances the readability of the data analysis result, increases the self-learning transfer for the fields of different industries, and can learn the specialized vocabulary of different industries, so that the query result has more industry pertinence and stronger practicability.
The formula in the invention is a formula which is obtained by removing dimension and taking the numerical calculation, and is closest to the actual situation by acquiring a large amount of data and performing software simulation, and the preset proportionality coefficient in the formula is set by a person skilled in the art according to the actual situation or is obtained by simulating the large amount of data.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (10)
1. The text data statistical analysis system based on natural language processing is characterized by comprising a natural language data screening system, a natural language data management system, a natural language data analysis system and a natural language chart visualization management system;
the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module;
the natural language data management system comprises a natural language data construction module, a natural language data configuration module and a description semantic coding module;
the natural language data analysis system comprises a natural language understanding module, a natural language query module, a semantic matching module and a label semantic coding module;
the natural language chart visual management system is used for generating and visually displaying a data chart, the natural language chart visual management system provides a data chart generating template, reads the statistical result of the natural language chart visual management system according to the chart template, and automatically generates a directed graph consisting of a table and key words as nodes and ordered pairs as edges according to the statistical result of the natural language chart visual management system and user screening conditions;
the semantic matching module judges whether the natural language data query result matches the problem to be analyzed of the user or not by using a natural language processing technology, and the matched data needs to be included into statistics.
2. The system of claim 1, wherein the language data preprocessing module is used for preprocessing document text, including corpus importing, format conversion and corpus cleaning.
3. The system of claim 2, wherein the natural language data filtering module extracts document information from the document by using a natural language processing technology, provides knowledge data for the natural language data management system, and the information obtaining module is used for obtaining text descriptions of materials required to be accessed and called by the visitor and defining identity tag information of the visitor.
4. A text data statistical analysis system based on natural language processing according to claim 3, wherein the natural language data construction module is used for defining the field and the label of the natural language.
5. The system of claim 4, wherein the natural language data configuration module is configured to configure natural language data, and establish a mapping relationship between the data and a map label, to provide a data source for a subsequent natural language data analysis system, and the natural language data management system provides a visual function to perform addition, deletion and modification of the natural language data, and the description semantic coding module processes the text description of the data required to be accessed and invoked by the visitor, and then obtains a resource description semantic feature vector through a semantic encoder including an embedded layer.
6. The system of claim 5, wherein the natural language understanding module is connected to a user interaction interface to provide a user question description template, and a user can input a question to be analyzed according to the template in the user interaction interface, and the question to be analyzed by the user is semantically extracted by a natural language processing technology based on the template and deep learning.
7. The system of claim 6, wherein the natural language query module queries and data statistics on knowledge-graph data using a graph algorithm.
8. The text data statistical analysis system based on natural language processing according to claim 7, wherein the natural language query module comprises a natural language algorithm technology, the natural language algorithm technology refers to a basic graph algorithm of a search algorithm class, natural language data query is performed according to the natural language algorithm technology, a query result is used for judgment by a semantic matching module, and the tag semantic coding module performs word segmentation processing on the identity tag information of the visitor and then obtains an identity tag semantic feature vector through the semantic encoder comprising an embedded layer.
9. The system of claim 8, wherein the natural language query module further comprises a natural language calculation function, the natural language calculation function comprises a basic statistical mathematical calculation of a summation and difference class, and the statistical calculation result is used for being called by a natural language chart visualization management system.
10. A method for a natural language processing based text data statistical analysis system as claimed in any one of claims 1 to 9, comprising the steps of:
s1, constructing a natural language data screening system, wherein the natural language data screening system comprises a natural language data preprocessing module, a natural language data screening module and an information acquisition module; constructing a natural language data management system, which comprises a natural language data construction module and a natural language data configuration module; constructing a natural language data analysis system, which comprises a natural language understanding module, a natural language query module and a semantic matching module; constructing a natural language chart visual management system;
s2, constructing a complete natural language data management system, completing the definition of the field and the tag data of the natural language, acquiring text description of the data required to be accessed and called by the visitor, and defining the identity tag information of the visitor;
s3, uploading natural language data to a corpus preprocessing module, and then carrying out corpus importing, format conversion and corpus cleaning on the natural language data;
s4, marking the natural language data, automatically extracting and importing the natural language data into a knowledge graph after marking, providing a data source for a subsequent natural language data analysis system, providing a visual function for a natural language data management system, and adding, deleting and modifying the natural language data;
s5, inputting the problem to be analyzed into a problem description template of the problem understanding module, and extracting semantics through a natural language processing technology based on the template and deep learning;
s6, generating a directed graph specifically as follows: the directed graph as=<M,N>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the vertex set,/>For an i-th user to demand keywords or entities, i=1,..n; />The r pieces of data information table are represented, r=1, a. m; the edge set is defined as CS =>=1, 2,3, & gt, n, r=1, & gt, m, r is the i-th user demand keyword or entity associated data information table }; when the key words correspond to several fields of the same data information table, the field with the maximum similarity is selected, and the key words are added with the key words>。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310961991.9A CN116663534A (en) | 2023-08-02 | 2023-08-02 | Text data statistical analysis system and method based on natural language processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310961991.9A CN116663534A (en) | 2023-08-02 | 2023-08-02 | Text data statistical analysis system and method based on natural language processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116663534A true CN116663534A (en) | 2023-08-29 |
Family
ID=87724694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310961991.9A Pending CN116663534A (en) | 2023-08-02 | 2023-08-02 | Text data statistical analysis system and method based on natural language processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116663534A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1265209A (en) * | 1997-07-22 | 2000-08-30 | 微软公司 | System for processing textual inputs natural language processing techniques |
CN111709235A (en) * | 2020-05-28 | 2020-09-25 | 上海发电设备成套设计研究院有限责任公司 | Text data statistical analysis system and method based on natural language processing |
CN112214997A (en) * | 2020-10-09 | 2021-01-12 | 深圳壹账通智能科技有限公司 | Voice information recording method and device, electronic equipment and storage medium |
CN113779211A (en) * | 2021-08-06 | 2021-12-10 | 华中科技大学 | Intelligent question-answer reasoning method and system based on natural language entity relationship |
CN114490970A (en) * | 2021-12-30 | 2022-05-13 | 清华大学 | Question-answer type data visualization method and system supporting natural language interaction |
CN116341518A (en) * | 2023-03-10 | 2023-06-27 | 杭州图灵数科信息技术有限公司 | Data processing method and system for big data statistical analysis |
-
2023
- 2023-08-02 CN CN202310961991.9A patent/CN116663534A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1265209A (en) * | 1997-07-22 | 2000-08-30 | 微软公司 | System for processing textual inputs natural language processing techniques |
CN111709235A (en) * | 2020-05-28 | 2020-09-25 | 上海发电设备成套设计研究院有限责任公司 | Text data statistical analysis system and method based on natural language processing |
CN112214997A (en) * | 2020-10-09 | 2021-01-12 | 深圳壹账通智能科技有限公司 | Voice information recording method and device, electronic equipment and storage medium |
CN113779211A (en) * | 2021-08-06 | 2021-12-10 | 华中科技大学 | Intelligent question-answer reasoning method and system based on natural language entity relationship |
CN114490970A (en) * | 2021-12-30 | 2022-05-13 | 清华大学 | Question-answer type data visualization method and system supporting natural language interaction |
CN116341518A (en) * | 2023-03-10 | 2023-06-27 | 杭州图灵数科信息技术有限公司 | Data processing method and system for big data statistical analysis |
Non-Patent Citations (1)
Title |
---|
陈小茵;: "基于自然语言的自动答疑系统设计", 南京广播电视大学学报, no. 04 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111709235B (en) | Text data statistical analysis system and method based on natural language processing | |
CN110990590A (en) | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning | |
CN112989841B (en) | Semi-supervised learning method for emergency news identification and classification | |
CN110929149A (en) | Industrial equipment fault maintenance recommendation method and system | |
CN112417891B (en) | Text relation automatic labeling method based on open type information extraction | |
CN110795932B (en) | Geological report text information extraction method based on geological ontology | |
CN108287911A (en) | A kind of Relation extraction method based on about fasciculation remote supervisory | |
CN111143571B (en) | Entity labeling model training method, entity labeling method and device | |
CN114647741A (en) | Process automatic decision and reasoning method, device, computer equipment and storage medium | |
CN112836509A (en) | Expert system knowledge base construction method and system | |
CN113934909A (en) | Financial event extraction method based on pre-training language and deep learning model | |
CN116245177B (en) | Geographic environment knowledge graph automatic construction method and system and readable storage medium | |
CN108763192B (en) | Entity relation extraction method and device for text processing | |
CN113505242A (en) | Method and system for automatically embedding knowledge graph | |
CN114218333A (en) | Geological knowledge map construction method and device, electronic equipment and storage medium | |
CN114911893A (en) | Method and system for automatically constructing knowledge base based on knowledge graph | |
CN113901224A (en) | Knowledge distillation-based secret-related text recognition model training method, system and device | |
CN111737498A (en) | Domain knowledge base establishing method applied to discrete manufacturing production process | |
CN117473054A (en) | Knowledge graph-based general intelligent question-answering method and device | |
CN115113919B (en) | Software scale measurement intelligent informatization system based on BERT model and Web technology | |
CN115878818A (en) | Geographic knowledge graph construction method and device, terminal and storage medium | |
CN116663534A (en) | Text data statistical analysis system and method based on natural language processing | |
CN114840657A (en) | API knowledge graph self-adaptive construction and intelligent question-answering method based on mixed mode | |
CN114842301A (en) | Semi-supervised training method of image annotation model | |
CN114372148A (en) | Data processing method based on knowledge graph technology and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |