CN111581394B - Large-scale knowledge topography drawing method - Google Patents
Large-scale knowledge topography drawing method Download PDFInfo
- Publication number
- CN111581394B CN111581394B CN202010368399.4A CN202010368399A CN111581394B CN 111581394 B CN111581394 B CN 111581394B CN 202010368399 A CN202010368399 A CN 202010368399A CN 111581394 B CN111581394 B CN 111581394B
- Authority
- CN
- China
- Prior art keywords
- value
- pixel point
- documents
- values
- topography
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Processing Or Creating Images (AREA)
- Image Generation (AREA)
Abstract
The invention discloses a large-scale knowledge topography drawing method, which is a relatively simple method for drawing knowledge topography aiming at large-scale text data, wherein a TSNE algorithm is used for carrying out planar layout of documents, a planar pixel point density function taking the document aggregation degree as a parameter is established so as to map the color of points on a plane, then an HSV mode is adopted for carrying out pixel point color setting and dividing into a plurality of fixed grades, the knowledge topography is rendered, the visual expressive force is good, and the realization technology is simple.
Description
Technical Field
The invention relates to a text analysis method, belongs to the field of information processing, and particularly relates to a large-scale knowledge topography drawing method.
Background
Knowledge topography enables visualization of text data by a contour map similar to that in a geographic information system, distinguishing how much data is and the relationship between data by using the shades of color. Some documents refer to the same as landscape or topic drawings, and the basic ideas are consistent although the names and expressions are not exactly the same. The knowledge topography is mainly applied to text data analysis, such as patent text data, paper text data, microblog micro-letter and other network text data, and is used for revealing knowledge content expressed by text language.
Knowledge topography can be represented by adopting the subject words in the text, and the graph is drawn, so that a method and a system for carrying out text analysis by utilizing the knowledge topography are disclosed in Chinese patent application CN106021228A (publication date 20161012), and the knowledge topography is drawn by extracting the subject words from the text. There are also technical schemes for document clustering, knowledge extraction and graphic drawing by using document distances, and typical examples are Canadian Rui Wei Annuation patent maps, and algorithms of the maps are complex. There are thermodynamic diagrams, topographic diagrams, rainbow rabbits, meteorological diagrams, etc. on graphic drawings.
When the text data are more and the subject word scale is larger (such as tens of thousands of documents or subject words, for example), it is very difficult to draw a knowledge topography map with clear and accurate structure for revealing the text content, and the CN106021228A is more than 1000 in the subject word, and due to the limitation of the layout algorithm, the readability of the drawn knowledge topography map is greatly reduced, and the relationship between the subjects to which the text belongs cannot be reflected, only the display of the text structure features. The Canadian Rui Wei Ann Innovation patent map has relatively more displayed nodes and can reflect the relation between the subjects to which the text belongs, but the map has complex algorithm and difficult technical realization.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a large-scale knowledge topographic map drawing method which has good visual expressive force and simple realization technology.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a large-scale knowledge topography drawing method comprises the following steps:
s1, acquiring the subject word of each document by adopting a word segmentation technology, and establishing a subject word matrix by using the subject words of all the documents;
s2, inputting the subject word matrix established in the step S1 into a TSNE algorithm, wherein the TSNE algorithm utilizes the subject word matrix to map the documents to a two-dimensional plane, and each document in the two-dimensional plane is represented by a spherical node; the main bodies to which the documents corresponding to the spherical nodes belong are distinguished by adopting different node colors, and the colors of the spherical nodes corresponding to the documents belonging to the same main body are the same;
s3, constructing a density function of the pixel points based on the number and coordinates of spherical nodes in the unit area of the two-dimensional plane;
the coordinates of spherical nodes corresponding to N documents are recorded as (x) i ,y i ) I=1, …, N, the average of the two-dimensional euclidean distances between spherical nodes corresponding to the document isThe coordinates (x, y) of the pixel point P define a density function formula of the pixel point as follows:
wherein N is p The number of spherical nodes covered in a unit area taking a pixel point P as a center, the values of alpha and beta determine the gradient effect of the topographic map, and the larger the values of alpha and beta are, the smaller the gradient of the topographic map is, and the more remarkable the mountain effect of the topographic map is;
s4, calculating color values of all pixel points and rendering a topographic map;
s4.1, normalizing the density function of the pixel points obtained in the step S3 to enable the value of the density function to be a floating point number between 0 and 1;
s4.2, establishing an HSV mode palette, wherein the values of H and V are fixed;
s4.3, establishing a one-to-one mapping relation between the normalized density value of the pixel points and the HSV mode palette: recording the normalized density value of the pixel point as q, wherein the value of H and V in the corresponding HSV color value of the pixel point on the HSV mode palette is fixed, and the value of S is q.times.100%;
s4.4, dividing the value of the S value of the pixel point color into M grade values, and adjusting the S value of each pixel point calculated in the step S4.3 to the corresponding grade value according to the M grade values; the formula is as follows:
M>=3。
further, in step S1, the subject term matrix is as follows:
wherein, keyword 1 ,…,Keyword l Representing the extractedI subject words of (a); d (D) 1 ,…,D N Representing N documents; e, e ij Representing document D i Keyword contained in (3) j I=1,..and N, j=1,..l.
Further, in step S4.1, the density function normalization of the pixel point is performed using the following formula:
Density max is the maximum value of the density values of all the pixel points.
The invention has the beneficial effects that: the invention provides a relatively simple method for drawing a knowledge topography for large-scale text data, which uses a TSNE algorithm to carry out the planar layout of documents, establishes a planar pixel point density function taking the document aggregation degree as a parameter to map the colors of points on a plane, then adopts an HSV mode to carry out pixel point color setting, and divides the pixel point color setting into a plurality of fixed grades to carry out the rendering of the knowledge topography, has good visual expressive force and simple realization technology.
Drawings
FIG. 1 shows the number N of all document nodes and the number N of document nodes in a unit area centered on a pixel point P in an embodiment of the present invention p Schematic of the relationship between the two;
FIG. 2 is a knowledge topography of the rendering effect of 5000 nodes in an embodiment of the invention;
fig. 3 is a diagram of a rendering knowledge topography effect of 15000 nodes in an embodiment of the invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that, while the present embodiment provides a detailed implementation and a specific operation process on the premise of the present technical solution, the protection scope of the present invention is not limited to the present embodiment.
The embodiment provides a large-scale knowledge topography drawing method, which comprises the following steps:
s1, acquiring the subject word of each document by adopting a word segmentation technology, and establishing a subject word matrix by using the subject words of all the documents, wherein the subject word matrix is as follows:
wherein, keyword 1 ,…,Keyword l Representing the extracted l subject words; d (D) 1 ,…,D N Representing N documents; e, e ij Representing document D i Keyword contained in (3) j I=1,.. N, j=1, once again, l;
s2, inputting the subject word matrix established in the step S1 into a TSNE algorithm, wherein the TSNE algorithm utilizes the subject word matrix to map the documents to a two-dimensional plane, and each document in the two-dimensional plane is represented by a spherical node; the main body to which the document corresponding to each spherical node belongs is distinguished by adopting different node colors, such as information of institutions, authors, countries and the like, and the colors of the spherical nodes corresponding to the documents belonging to the same main body are the same.
Unlike the prior art CN106021228A, in the method of this embodiment, the nodes on the two-dimensional plane are document nodes, not subject term nodes. The adopted plane layout algorithm is a TSNE algorithm, a non-Fruchterman-reingeld layout algorithm and a VosMapping algorithm. The TSNE algorithm has the advantage of still enabling clearer structural presentation in large-scale documents.
S3, constructing a density function of the pixel points based on the number and coordinates of spherical nodes in the unit area of the two-dimensional plane:
after the coordinates of each spherical node on the two-dimensional plane are determined, the spherical node is drawn on a computer screen, and the color of each pixel point needs to be determined in order to achieve the rendering effect of the topographic map. For this purpose, a density function is established for mapping the color value of each pixel;
the coordinates of spherical nodes corresponding to N documents are recorded as (x) i ,y i ) I=1, …, N, the average of the two-dimensional euclidean distances between spherical nodes corresponding to the document isThe coordinates (x, y) of the pixel point P define a density function formula of the pixel point as follows:
wherein N is p The number of spherical nodes (the number of non-total spherical nodes is the number of the spherical nodes) covered in a unit area taking the pixel point P as the center, so that the execution of an algorithm is quickened, the running time is saved, the values of a and beta determine the gradient effect of the topographic map, and the larger the values of a and beta are, the smaller the gradient of the topographic map is, and the more remarkable the mountain effect of the topographic map is. FIG. 1 shows N and N p Relationship between them.
The unit area is generally 10 pixels by 10 pixels, 100 pixels by 100 pixels, etc.
S4, calculating color values of all pixel points and rendering a topographic map;
s4.1, normalizing the density function of the pixel points obtained in the step S3 to enable the value of the density function to be a floating point number between 0 and 1;
the following conversion modes can be adopted:
Density max the maximum value of the density values of all the pixel points;
s4.2, establishing an HSV mode palette, wherein the values of H and V are fixed; HSV refers to Hue (H), saturation (S), brightness (V, value);
the values of H and V are not particularly limited, and can be any color value, such as H, V values corresponding to blue and green, and the two values represent the overall color of the topographic map;
s4.3, establishing a one-to-one mapping relation between the normalized density value of the pixel points and the HSV mode palette:
recording the normalized density value of the pixel point as q, wherein the value of H and V in the corresponding HSV color value of the pixel point on the HSV mode palette is fixed, and the value of S is q.times.100%;
for example, if the normalized density value of the pixel is 0.1, the values of H and V are fixed and the value of S is 10% in the corresponding color on the HSV mode palette.
S4.4, dividing the value of the S value of the pixel point color into M grade values, and adjusting the S value of each pixel point calculated in the step S4.3 to the corresponding grade value according to the M grade values; the formula is as follows:
M>=3;
the purpose of the partitioning is to make the final knowledge topography hierarchical.
For example, if the value of the S value is divided into 10 levels, the S value in the HSV color of the pixel point is adjusted according to the following formula:
for example, when the S value in the HSV color of a certain pixel point is 11%, the S value is adjusted to 20% according to the above formula, so that the adjusted graph can achieve a more layering topographic effect.
Fig. 2 and 3 are rendering effect knowledge topography diagrams of 5000 and 15000 nodes, respectively, wherein numerals represent numbers of respective documents.
Various modifications and variations of the present invention will be apparent to those skilled in the art in light of the foregoing teachings and are intended to be included within the scope of the following claims.
Claims (2)
1. A large-scale knowledge topography drawing method is characterized by comprising the following steps:
s1, acquiring the subject word of each document by adopting a word segmentation technology, and establishing a subject word matrix by using the subject words of all the documents;
s2, inputting the subject word matrix established in the step S1 into a TSNE algorithm, wherein the TSNE algorithm utilizes the subject word matrix to map the documents to a two-dimensional plane, and each document in the two-dimensional plane is represented by a spherical node; the main bodies to which the documents corresponding to the spherical nodes belong are distinguished by adopting different node colors, and the colors of the spherical nodes corresponding to the documents belonging to the same main body are the same;
s3, constructing a density function of the pixel points based on the number and coordinates of spherical nodes in the unit area of the two-dimensional plane;
the coordinates of spherical nodes corresponding to N documents are recorded as (x) i ,y i ) I=1, …, N, the average of the two-dimensional euclidean distances between spherical nodes corresponding to the document isThe coordinates (x, y) of the pixel point P define a density function formula of the pixel point as follows:
wherein N is p The number of spherical nodes covered in a unit area taking a pixel point P as a center, the values of alpha and beta determine the gradient effect of the topographic map, and the larger the values of alpha and beta are, the smaller the gradient of the topographic map is, and the more remarkable the mountain effect of the topographic map is;
s4, calculating color values of all pixel points and rendering a topographic map;
s4.1, normalizing the density function of the pixel points obtained in the step S3 to enable the value of the density function to be a floating point number between 0 and 1;
s4.2, establishing an HSV mode palette, wherein the values of H and V are fixed;
s4.3, establishing a one-to-one mapping relation between the normalized density value of the pixel points and the HSV mode palette: recording the normalized density value of the pixel point as q, wherein the value of H and V in the corresponding HSV color value of the pixel point on the HSV mode palette is fixed, and the value of S is q.times.100%;
s4.4, dividing the value of the S value of the pixel point color into M grade values, and adjusting the S value of each pixel point calculated in the step S4.3 to the corresponding grade value according to the M grade values; the formula is as follows:
M>=3;
in step S1, the subject term matrix is as follows:
wherein, keyword 1 ,…,Keyword l Representing the extracted l subject words; d (D) 1 ,…,D N Representing N documents; e, e ik Representing document D i Keyword contained in (3) k I=1,..n, k=1,..l.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010368399.4A CN111581394B (en) | 2020-04-30 | 2020-04-30 | Large-scale knowledge topography drawing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010368399.4A CN111581394B (en) | 2020-04-30 | 2020-04-30 | Large-scale knowledge topography drawing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111581394A CN111581394A (en) | 2020-08-25 |
CN111581394B true CN111581394B (en) | 2023-06-23 |
Family
ID=72111948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010368399.4A Active CN111581394B (en) | 2020-04-30 | 2020-04-30 | Large-scale knowledge topography drawing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111581394B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021228A (en) * | 2016-05-18 | 2016-10-12 | 德稻全球创新网络(北京)有限公司 | Method and system for performing text analysis by utilizing knowledge topographic map |
CN107766412A (en) * | 2017-09-05 | 2018-03-06 | 华南师范大学 | A kind of mthods, systems and devices for establishing thematic map |
CN108628991A (en) * | 2018-04-28 | 2018-10-09 | 上海久誉软件系统有限公司 | The analysis and visualization system that rail traffic failure influences passenger flow |
CN109952612A (en) * | 2016-11-08 | 2019-06-28 | 赛卢拉研究公司 | Method for express spectra classification |
CN110019766A (en) * | 2017-09-25 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Methods of exhibiting, device, mobile terminal and the readable storage medium storing program for executing of knowledge mapping |
CN110780873A (en) * | 2019-09-06 | 2020-02-11 | 平安普惠企业管理有限公司 | Interface color adaptation method and device, computer equipment and storage medium |
CN110866126A (en) * | 2019-11-22 | 2020-03-06 | 福建工程学院 | College online public opinion risk assessment method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180366013A1 (en) * | 2014-08-28 | 2018-12-20 | Ideaphora India Private Limited | System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter |
-
2020
- 2020-04-30 CN CN202010368399.4A patent/CN111581394B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021228A (en) * | 2016-05-18 | 2016-10-12 | 德稻全球创新网络(北京)有限公司 | Method and system for performing text analysis by utilizing knowledge topographic map |
CN109952612A (en) * | 2016-11-08 | 2019-06-28 | 赛卢拉研究公司 | Method for express spectra classification |
CN107766412A (en) * | 2017-09-05 | 2018-03-06 | 华南师范大学 | A kind of mthods, systems and devices for establishing thematic map |
CN110019766A (en) * | 2017-09-25 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Methods of exhibiting, device, mobile terminal and the readable storage medium storing program for executing of knowledge mapping |
CN108628991A (en) * | 2018-04-28 | 2018-10-09 | 上海久誉软件系统有限公司 | The analysis and visualization system that rail traffic failure influences passenger flow |
CN110780873A (en) * | 2019-09-06 | 2020-02-11 | 平安普惠企业管理有限公司 | Interface color adaptation method and device, computer equipment and storage medium |
CN110866126A (en) * | 2019-11-22 | 2020-03-06 | 福建工程学院 | College online public opinion risk assessment method |
Non-Patent Citations (3)
Title |
---|
Hsiang-Yun Wu. Focus+context metro map layout and annotation.《SCCG '16: Proceedings of the 32nd Spring Conference on Computer Graphics》.2016,41-47. * |
一种简易的技术主题图绘制方法;刘玉琴 等;《图书情报工作》;第第61卷卷(第第13期期);125-132 * |
基于图数据模型的聚类方法及可信度检测;程艳云 等;《系统仿真学报》;2102-2108 * |
Also Published As
Publication number | Publication date |
---|---|
CN111581394A (en) | 2020-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108492343B (en) | Image synthesis method for training data for expanding target recognition | |
CN111524106B (en) | Skull fracture detection and model training method, device, equipment and storage medium | |
US8958644B2 (en) | Creating tables with handwriting images, symbolic representations and media images from forms | |
Recky et al. | Windows detection using k-means in cie-lab color space | |
CN110874618B (en) | OCR template learning method and device based on small sample, electronic equipment and medium | |
CN113239954B (en) | Attention mechanism-based image semantic segmentation feature fusion method | |
CN112633277A (en) | Channel ship board detection, positioning and identification method based on deep learning | |
CN110059697A (en) | A kind of Lung neoplasm automatic division method based on deep learning | |
CN105354248A (en) | Gray based distributed image bottom-layer feature identification method and system | |
CN112070135A (en) | Power equipment image detection method and device, power equipment and storage medium | |
CN113762269A (en) | Chinese character OCR recognition method, system, medium and application based on neural network | |
CN111460059A (en) | Ambient air quality data visualization method, device, equipment and storage medium | |
CN110223310A (en) | A kind of line-structured light center line and cabinet edge detection method based on deep learning | |
CN106203448A (en) | A kind of scene classification method based on Nonlinear Scale Space Theory | |
CN110377659A (en) | A kind of intelligence chart recommender system and method | |
WO2021159778A1 (en) | Image processing method and apparatus, smart microscope, readable storage medium and device | |
CN114120141A (en) | All-weather remote sensing monitoring automatic analysis method and system thereof | |
She et al. | 3D building model simplification method considering both model mesh and building structure | |
CN114399784A (en) | Automatic identification method and device based on CAD drawing | |
CN111144466A (en) | Image sample self-adaptive depth measurement learning method | |
CN111581394B (en) | Large-scale knowledge topography drawing method | |
CN106021228B (en) | A kind of method and system carrying out text analyzing using knowledge topographic map | |
CN117173223A (en) | Standard template generation method, device, equipment and medium for ammeter code-breaking screen | |
CN116303747A (en) | Visualization system based on aviation weather four-dimensional dataset | |
CN102541347A (en) | Automatic identification system and method of handwriting Chinese character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |