CN113688209A

CN113688209A - Text semantic network construction method by adjusting dependency relationship of keywords

Info

Publication number: CN113688209A
Application number: CN202111019581.XA
Authority: CN
Inventors: 韦胜; 姚秀利
Original assignee: Jiangsu Urban Planning And Design Institute Co ltd
Current assignee: Jiangsu Urban Planning And Design Institute Co ltd
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-11-23
Anticipated expiration: 2041-09-01
Also published as: CN113688209B

Abstract

The invention discloses a text semantic network construction method by adjusting dependency relationship of keywords, which relates to the technical field of city planning, tourism planning, natural language processing and urban traffic. Secondly, a group of high-frequency keywords with high dependency relationship is extracted from the high-frequency keywords. And traversing the comment text data to be analyzed, and establishing connection record relations among the high-frequency keywords in different modes. And finally, constructing different text semantic networks and visualizing the contents of different research topics. According to the method, important semantic contents can be extracted by setting the dependency relationship of the keywords in the comment files, and the interference of non-main contents on the highlight of the theme research contents is reduced.

Description

Text semantic network construction method by adjusting dependency relationship of keywords

Technical Field

The invention relates to the technical field of urban planning, tourism planning, urban traffic and complex network modeling, in particular to a text semantic network construction method by adjusting the dependency relationship of keywords.

Background

The text semantic network construction in the natural language has very wide application value in practical application. In the industry fields of city planning, tourism planning, traffic planning and the like, the demand of social comment data analysis by using a text semantic network is gradually increasing. The text semantic network mainly refers to a complex network for establishing keyword connection according to the co-occurrence relation of keywords in social comment sentences, and further analyzes potential semantic connection characteristics among the keywords through various index analysis in a complex network theory. In practical applications, the above-described keyword co-occurrence relationship is a technical solution that is relatively easy to process, and the algorithm is relatively fast to execute and is easily accepted by an algorithm user or a direct result user. However, some of the existing problems are gradually revealed in the use of this technology. Meanwhile, the technology also needs to be further improved for key requirements in industries such as city planning, tourism planning, traffic planning and the like.

The problems existing in the prior art are as follows: a set of keywords that are not very relevant or of a very different type are often considered to have a strong connection because of co-occurrence in the comment sentence. For example, in a social media review of a suzhou high-speed rail station, the keyword "suzhou station" may be included, but there are often a large number of keywords describing the basic features of the suzhou high-speed rail station, such as "square", "waiting room", "subway", and the like. Similarly, reviews of Shanghai stations may also be similar. When the keywords are too many, if a few words of "shanghai station" appear in the review of the suzhou station, the relation between the two keywords of "suzhou station" and "shanghai station" is not significant, because words such as "plaza", "waiting room", "subway" and the like are relatively more frequent in the review.

This presents an intuitive and very important problem: the node position of the core key words in the whole semantic network is low, so that the expression effect of the nodes in the semantic network visualization is influenced, and the intention of highlighting a certain theme cannot be achieved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a text semantic network construction method by adjusting the dependency relationship of keywords. Meanwhile, the scheme of the invention provides a processing technical route of semantic relation among different types of keywords, and achieves various display and analysis effects.

In order to solve the technical problems, the invention adopts the following technical scheme:

the text semantic network construction method by adjusting the dependency relationship of the keywords provided by the invention comprises the following steps:

step 1, extracting high-frequency keywords from a group of comment text data to be analyzed;

step 2, extracting a group of high-frequency keywords R with high dependency relationship from the high-frequency keywords, wherein the high-frequency keywords without high dependency relationship in the high-frequency keywords are recorded as other high-frequency keywords R'; wherein, the high dependency relationship is a preset incidence relationship;

step 3, traversing comment text data to be analyzed, and establishing a connection record relation R1 between high-frequency keywords;

traversing comment text data to be analyzed, and establishing a new connection record relation R2 for the high-frequency keywords according to R;

traversing comment text data to be analyzed, and establishing a new connection recording relation R3 between R and R';

wherein the content of the first and second substances,

traversing comment text data to be analyzed, and establishing a connection record relation R1 between high-frequency keywords; the method comprises the following specific steps:

the method comprises the steps that firstly, comment text data to be analyzed comprise a plurality of comments, all the comments in the comment text data to be analyzed are traversed, one comment is traversed each time, and all high-frequency keywords contained in each comment are identified in the traversing process each time;

establishing a connection record relation between every two high-frequency keywords aiming at all the high-frequency keywords contained in each comment identified in the step I;

step three, if two identified high-frequency keywords appear in different comments in the previous traversal processing, increasing the number of connection records between the two high-frequency keywords by 1;

fourthly, finally, recording the connection record relations among all the high-frequency keywords as R1;

traversing comment text data to be analyzed, and establishing a new connection record relation R2 for the high-frequency keywords according to R; the method comprises the following specific steps:

step A, in each traversal process: identifying all high-frequency keywords contained in each comment;

step B, aiming at any two identified different high-frequency keywords, if only one high-frequency keyword belongs to the group R, a connection record relation is not established between the two high-frequency keywords;

aiming at any two identified different high-frequency keywords, if the two high-frequency keywords both belong to R or neither belong to R, establishing a connection record relationship between every two high-frequency keywords;

step C, if two identified high-frequency keywords appear in the previous traversal processing in different comments and meet the requirement of establishing the connection relation in the step B, increasing 1 for the number of connection records between the two high-frequency keywords;

d, finally, recording the connection record relations among all the high-frequency keywords as R2;

traversing the comment text data to be analyzed, and establishing a new connection record relation R3 between R and R', the specific steps are as follows:

b, aiming at any two identified high-frequency keywords, if only one high-frequency keyword belongs to R and the other high-frequency word does not belong to R, establishing a connection record relationship between the two high-frequency keywords;

aiming at any two identified high-frequency keywords, if the two high-frequency keywords both belong to R or the two high-frequency keywords do not belong to R, a connection record relation is not established between the two high-frequency keywords;

step d, finally, recording the connection record relations among all the high-frequency keywords as R3;

step 4, constructing three text semantic networks by using a complex network theory;

4.1, based on a complex network theory, constructing a text semantic network by taking high-frequency keywords as nodes of the text semantic network and R1 as a side connection rule in the text semantic network, and recording the text semantic network as NET 1;

4.2, based on a complex network theory, constructing a text semantic network by taking the high-frequency keywords as nodes of the text semantic network and taking R2 as a side connection rule in the text semantic network, and recording the text semantic network as NET 2;

and 4.3, based on a complex network theory, constructing a text semantic network by taking the high-frequency keywords as nodes of the text semantic network and taking R3 as a side connection rule in the text semantic network, and recording the text semantic network as NET 3.

As a further optimization scheme of the text semantic network construction method for adjusting the dependency relationship of the keywords, the step 4 is followed by a step 5,

step 5, utilizing three text semantic networks to respectively carry out index calculation on edges and nodes of the text semantic networks and visualize the incidence relation of the text semantic networks;

step 5.1, carrying out level index calculation on the network nodes on the text semantic network NET 1;

step 5.2, carrying out community detection division on the network nodes for the text semantic network NET 2;

step 5.3, distinguishing the visual display size of the text semantic network nodes according to the level index calculation result in the step 5.1, and performing grouping display effect on the text semantic network nodes according to the community detection division result in the step 5.2;

and 5.4, carrying out level index calculation on the network nodes of the text semantic network NET3, and visualizing the connection strength relation among the nodes.

As a further optimization scheme of the text semantic network construction method by adjusting the dependency relationship of the keywords, the step 1 is as follows:

step 1.1, extracting keywords from comment text data based on a word segmentation library and a stop word library;

and step 1.2, screening out a group of high-frequency keywords from the keywords extracted in the step 1.1.

As a further optimization scheme of the text semantic network construction method by adjusting the dependency relationship of the keywords, the sequence of the two high-frequency keywords appearing in the comments in the step C is not used as a judgment basis for a new edge connection record.

As a further optimization scheme of the text semantic network construction method by adjusting the dependency relationship of the keywords, all the text semantic networks constructed in the step 5 are multidirectional weighting complex networks.

As a further optimization scheme of the text semantic network construction method through adjusting the dependency relationship of the keywords, the grade indexes in the step 5.1 and the step 5.4 comprise the centrality index and the weighting centrality.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

(1) the invention provides a text semantic network construction method by adjusting the dependency relationship of key words, which mainly solves the problem that the expression of core key words in a text semantic network is not prominent;

(2) the invention provides a text semantic network construction method by adjusting the dependency relationship of keywords based on a natural language processing technology and a complex network theory, realizes semantic network expression among various keywords, and achieves the purposes of multi-scheme selection and visual display of certain key topic keywords.

Drawings

FIG. 1 is a schematic overall flow diagram of the present invention.

FIG. 2 is a diagram illustrating a set of high frequency keywords having high dependency relationships.

Fig. 3 is a schematic diagram of a connection recording flow between the high-frequency keywords of R1.

Fig. 4 is a schematic view of a flow of recording the connection between the high-frequency keywords of R2.

Fig. 5 is a schematic view of a flow of recording the connection between the high-frequency keywords of R3.

FIG. 6 is a schematic diagram of a visualization scheme for a text semantic network; wherein (a) is a visualization scheme 1, and (b) is a visualization scheme 2.

FIG. 7 is a diagram illustrating the visualization effect of high frequency keyword grouping with high dependency relationship.

FIG. 8 is a diagram illustrating the connection relationship between a high-frequency keyword having a high dependency relationship and other high-frequency keywords.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

in practical applications, a user needs to perform specific relationship analysis on a certain group of keywords in the comment keywords, that is, the user has thought or wants to focus on the semantic relationship of the group of keywords through text semantic analysis. Of course, other keywords are not deleted, disregarded, and combined for observation and analysis in the analysis. That is, it is necessary to highlight this set of keywords and retain other keywords. Therefore, the scheme provided by the invention is as follows: the node size of the specific keywords in the semantic network is maintained, and the connection relationship among the specific keywords is highlighted. Thus, it is proposed to use 2 text semantic networks to achieve the effect of NET1 and NET2 in the present invention. In order to compensate for the lack of expression of the relationship between the keywords of interest and other keywords in the first 2 text semantic networks, the invention proposes to construct a third text semantic network (i.e., NET 3) to solve the problem.

In conclusion, the scheme of the invention innovatively solves an important problem and an urgent need in practice through 3 text semantic networks, and provides a better solution for the practice and technical research in the fields of city planning, tourism planning, city traffic and complex network modeling.

In view of the above problems and needs, the solution of the present invention is: under the requirement of considering the product to realize automatic calculation, the connection level between the key words and other key words is reduced by adjusting the dependency relationship of the key words, namely improving the connection level of the key words which are important or to be observed. Meanwhile, all keywords can be displayed together in a visual mode.

The invention provides the following embodiment, a text semantic network construction method by adjusting the dependency relationship of keywords, which aims to highlight the purpose of different research topics by adjusting the connection relationship among high-frequency keywords, and specifically comprises the following steps:

step 1) extracting high-frequency keywords from a group of comment text data to be analyzed with reference to the attached figure 1;

step 1.1) extracting keywords from the comment text data based on the common word segmentation library and the stop word library;

step 1.2) a group of high-frequency keywords is screened out from the keywords extracted in the step 1.1.

Step 2) extracting a group of high-frequency keywords with high dependency relationship from the high-frequency keywords, and recording the high-frequency keywords as R, referring to the attached figure 2; high-frequency keywords without high dependency relationship in the high-frequency keywords are marked as other high-frequency keywords R'; the high dependency relationship is a preset incidence relationship;

and 3) traversing the comment text data to be analyzed and establishing a connection record relation among the high-frequency keywords with reference to the attached figure 3.

Step 3.1, in each traversal process: identifying all high-frequency keywords contained in each comment;

3.2, establishing a connection record relation between every two high-frequency keywords identified;

3.3, if two identified high-frequency keywords appear in different comment sentences in the previous traversal processing, increasing the number of connection records between the two high-frequency keywords by 1;

3.4, finally, recording the connection record relationship among all the high-frequency keywords as R1;

step 4) traversing the comment text data to be analyzed, carrying out screening processing on the high-frequency key words R with high dependency relationship, and establishing a new connection record relationship, referring to the attached figure 4;

step 4.1) in each traversal process: identifying all high-frequency keywords contained in each comment;

step 4.2) aiming at any two identified different high-frequency keywords, if only one high-frequency keyword belongs to R, a connection record relation is not established between the two high-frequency keywords;

step 4.3) aiming at any two identified different high-frequency keywords, if both the two high-frequency keywords belong to R or neither of the two high-frequency keywords belongs to R, establishing a connection record relationship between every two high-frequency keywords; the sequence of the two high-frequency keywords appearing in the comment is not used as a judgment basis for a new edge connection record;

and 4.4) if two identified high-frequency keywords appear in different comment sentences in the previous traversal processing and meet the requirement of establishing the connection relation in the steps 4.2 and 4.3, increasing the number of connection records between the two high-frequency keywords by 1.

Step 4.5) finally, recording the connection record relations among all the high-frequency keywords as R2;

step 5) referring to fig. 5, traversing the comment text data to be analyzed, not establishing connection records inside the high-frequency keywords R with high dependency relationship, not establishing connection records inside other high-frequency keywords, and establishing new connection record relationships only between the high-frequency keywords R with high dependency relationship and other high-frequency keywords R', and the specific steps are as follows:

step 5.1) in each traversal process: identifying all high-frequency keywords contained in each comment;

step 5.2) aiming at the identified high-frequency keywords, if only one high-frequency keyword belongs to R and the other high-frequency word does not belong to R, establishing a connection record relationship between the two high-frequency keywords;

step 5.3) aiming at the identified high-frequency keywords, if two high-frequency keywords both belong to R or the two high-frequency keywords do not belong to R, a connection record relation is not established between the two high-frequency keywords;

step 5.4) if two identified high-frequency keywords appear in the previous traversal processing in different comment sentences and meet the requirement of establishing the connection relation in the steps 5.2 and 5.3, increasing the number of connection records between the two high-frequency keywords by 1;

step 5.5) finally, recording the connection record relations among all the high-frequency keywords as R3;

and 5, constructing the text semantic networks which are all undirected weighted complex networks.

Step 6) constructing three text semantic networks by using a complex network theory;

step 6.1) based on a complex network theory, constructing a text semantic network by taking high-frequency keywords as nodes of the text semantic network and taking R1 as a side connection rule in the text semantic network, and recording the text semantic network as NET 1;

step 6.2) based on a complex network theory, constructing a text semantic network by taking high-frequency keywords as nodes of the text semantic network and taking R2 as a side connection rule in the text semantic network, and recording the text semantic network as NET 2;

step 6.3) based on a complex network theory, constructing a text semantic network by taking high-frequency keywords as nodes of the text semantic network and taking R3 as a side connection rule in the text semantic network, and recording the text semantic network as NET 3;

step 7) referring to the attached figure 6, index calculation is respectively carried out on the edges and the nodes of the text semantic network by using three text semantic networks, and the incidence relation of the text semantic networks is visualized;

step 7.1) carrying out level index calculation on the network nodes on the text semantic network NET 1;

step 7.2) carrying out community detection division on the network nodes on the text semantic network NET 2;

and 7.3) referring to the attached figure 7, distinguishing the visual display size of the text semantic network nodes by using the calculation result of the level class indexes in 6.1, and performing the grouping display effect of the text semantic network nodes by using the community detection division result in 6.2.

And 7.4) carrying out level index calculation on the network nodes on the text semantic network NET 3.

Wherein, the grade indexes in step 7.1 and step 7.4 include a centrality index and a weighted centrality.

Referring to fig. 8, the text semantic network NET3 can better reflect the connection relationship between the high-frequency keywords with high dependency relationship and other high-frequency keywords.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all should be considered as belonging to the protection scope of the invention.

Claims

1. A text semantic network construction method through adjusting keyword dependency relationship is characterized by comprising the following steps:

wherein the content of the first and second substances,

2. The method for constructing the text semantic network by adjusting the dependency relationship of the keywords according to claim 1, wherein step 4 is followed by step 5,

3. The method for constructing the text semantic network by adjusting the dependency relationship of the keywords according to claim 1, wherein the step 1 is as follows:

4. The method for constructing a text semantic network by adjusting dependency relationship of keywords according to claim 1, wherein the appearance sequence of the two high-frequency keywords in the comment in step C is not used as a judgment basis for a new edge connection record.

5. The method for constructing a text semantic network by adjusting dependency relationship of keywords according to claim 1, wherein the text semantic networks constructed in step 5 are all undirected weighted complex networks.

6. The method for constructing a text semantic network by adjusting dependency relationships of keywords according to claim 2, wherein the level class indexes in step 5.1 and step 5.4 include a centrality index and a weighted centrality.