CN115827882A

CN115827882A - Knowledge graph construction method based on multi-mode tourism big data

Info

Publication number: CN115827882A
Application number: CN202111088394.7A
Authority: CN
Inventors: 任桐炜; 黄蕾; 于凡; 赵志翔
Original assignee: Nanjing Research Institute Of Nanjing University; Nanjing University
Current assignee: Nanjing Research Institute Of Nanjing University; Nanjing University
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2023-03-21

Abstract

A knowledge graph construction method based on multi-modal tourism big data comprises the steps of extracting entities and relations among the entities from multi-modal tourism data, firstly obtaining data, obtaining semi-structured city and scenic spot data and unstructured travel note data from a tourism vertical website, and obtaining an unstructured tourism video from a video website; then data preprocessing is carried out, text analysis is carried out on the travel note text data, object recognition is carried out on the travel note picture data, object tracking and scene character recognition are carried out on the video data, and text analysis is carried out on the scene characters; then extracting entities from the shorthand texts, the video scene text texts, the picture objects and the video objects after the text analysis; and finally, mining the semantic relation between the entities according to the structural relation and the syntactic dependency relation so as to construct the travel knowledge map. The invention utilizes the internet tourism big data to construct the knowledge map, can effectively manage and utilize data of various modes, and provides support for tourism services such as retrieval, recommendation and the like.

Description

Knowledge graph construction method based on multi-mode tourism big data

Technical Field

The invention belongs to the field of multimedia computing, relates to semantic analysis of texts, pictures and videos, and particularly relates to a knowledge graph construction method based on multi-mode tourism big data.

Background

A knowledge graph may describe concepts, entities, events, and relationships between them in the objective world. The current domain knowledge graph is mostly used in knowledge intensive fields such as finance, law and medical treatment, and is difficult to be applied to the fields of scattered information distribution and wide and sparse knowledge such as tourism and the like. The tourism knowledge map is constructed according to mass multi-modal data of the Internet, so that vivid, intelligent and credible tourism services are provided, user experience is improved, and the development of the tourism industry is promoted.

The tourism big data has complex sources, huge volume and various modes, and is difficult to effectively obtain and manage. The sources of the existing knowledge graph facing the tourism field are limited to a small range, such as encyclopedia and interactive encyclopedia. The tourism data are relatively modeled and single in content, but according to tourism related data collected by each platform of the internet from individuals, people are not inclined to conventional classical scenic spots, especially young people, and are more willing to experience different lives and discover different tourism resources, and the information is distributed on the internet very dispersedly and is also rarely published on modeled introduction or question-and-answer platforms such as encyclopedias and interactive encyclopedias, so that the information provided by the encyclopedias data is far less than real tourism information. In addition, the existing travel knowledge map has a single mode, most of the travel knowledge map is text data, and image data and video data are few, and the images and the videos can provide very rich information. Nowadays, more and more information is presented on the internet in image video, which is no longer merely a supplement to, or even a replacement for, text, and with the development of portable devices, the course of travel is more easily recorded with pictures and video. Therefore, the travel knowledge map of the existing text modality can not meet the application requirements of the travel market.

Disclosure of Invention

The invention aims to solve the problem of providing a solution for knowledge discovery of multi-mode tourism big data, and aims to construct a multi-mode tourism knowledge map for supporting tourism service applications such as retrieval, recommendation and the like.

The technical scheme of the invention is as follows: a knowledge graph construction method based on multi-modal tourism big data is used for conducting entity extraction and heterogeneous relation mining on the multi-modal tourism data so as to construct a knowledge graph for retrieval and display, wherein the multi-modal tourism big data comprises characters, pictures and video modalities, and the construction method comprises the following steps:

obtaining multi-mode tourism data, namely obtaining semi-structured city and scenic spot data and unstructured travel note data from a tourism vertical website, and obtaining an unstructured tourism video from a video website;

data preprocessing, namely performing text recognition analysis and object recognition tracking on travel data and travel videos;

entity extraction, namely extracting semantic entities from the travel note data and the video based on the preprocessed data;

and (3) relation mining: and mining the relation between the entities for the extracted entities according to the data structure relation and the syntactic dependency relation, thereby constructing the travel knowledge map.

Preferably comprising the steps of:

1) Constructing an ontology base according to the tourism vertical website, and defining entity types including cities, scenic spots, places, time, activities and other entities;

2) Acquiring semi-structured and unstructured multi-modal data from a travel vertical website and a video website;

3) Preprocessing the data acquired in the step 2), performing word segmentation, part-of-speech analysis and dependency relationship analysis on the text in the travel record data, performing object identification on the picture in the travel record data, performing object tracking identification and scene character identification on the video, and performing word segmentation, part-of-speech analysis and dependency relationship analysis on the scene characters;

4) Extracting semantic entities from analyzed texts of travel note data, scene text recognized by videos, objects recognized by travel note data and objects recognized by video tracking by combining with semi-structured data;

5) And (4) mining the relationship between the entities obtained in the step 4) to form a knowledge graph.

The method comprises the steps of firstly obtaining multi-mode tourism big data, carrying out text analysis on texts in the data, carrying out object identification on pictures, carrying out text identification, text analysis and object tracking on videos to extract entities and mine relationships among the entities, providing a solution for multi-mode tourism knowledge map construction, realizing effective management and utilization of various modal data, and providing support for tourism services such as retrieval, recommendation and the like.

The invention realizes the analysis and mining of unstructured text, picture and video multi-mode travel data and constructs a multi-mode travel knowledge map. The invention utilizes the word segmentation technology, the part of speech tagging technology and the dependency relationship technology on the text, the object identification technology on the picture, the object tracking technology and the scene character identification technology on the video. Most of the existing travel knowledge map construction methods focus on semi-structured scenic spot introduction data of encyclopedia, interactive encyclopedia and travel websites, some methods perform entity extraction and relationship mining on descriptive characters of scenic spot introduction, but focus on attributes of scenic spots and scenic spots, but living and personalized travel big data such as personal notes and travel logs cannot be well analyzed and mined, and the construction of the knowledge map based on the data is realized. Meanwhile, the existing travel knowledge map construction method has a single mode, most of the travel knowledge map construction methods only have one mode of text, and a few methods link the corresponding scenery spot pictures to scenery spot entities according to the mapping of the data source in the construction process.

The effective benefits of the invention are: after semi-structured urban scenic spot mapping data, unstructured travel note image-text data and unstructured video data are obtained, the relation among entities is extracted and mined by utilizing text analysis, picture object recognition, video object tracking and video scene character recognition technologies, and accordingly the travel knowledge map is constructed. The method solves the problems of complex sources and various modes of the tourism big data, and has good universality and practicability.

Drawings

FIG. 1 is a schematic representation of a source of semi-structured data obtained by the present invention.

FIG. 2 is a schematic diagram of the sources of the travel note data obtained by the present invention.

Fig. 3 is a schematic diagram of a video data source obtained by the present invention.

FIG. 4 is a schematic diagram of a knowledge graph construction process of the present invention.

Detailed Description

The invention provides a knowledge graph construction method based on multi-mode tourism big data, which extracts entities and relations among the entities from the tourism data in the text, picture and video modes. The method firstly obtains data, and obtains semi-structured city and scenic spot data from a tourism vertical website as shown in figure 1; as shown in FIG. 2, unstructured travel notes data is obtained from a travel vertical website, and as shown in FIG. 3, unstructured travel videos are obtained from a video website. Then data preprocessing is carried out, word segmentation, part of speech analysis and dependency relationship analysis are carried out on travel note text data, object identification is carried out on travel note picture data, object tracking and scene character identification are carried out on video data, and word segmentation, part of speech analysis and dependency relationship analysis are carried out on scene characters; then extracting entities from the shorthand texts, the video scene text texts, the picture objects and the video objects after the text analysis; and finally, mining the relationship between the entities according to the data structure relationship and the syntactic dependency relationship so as to construct the tourism knowledge map.

As shown in FIG. 4, the knowledge graph construction of the multi-modal tourist data of the invention comprises the following steps:

obtaining multi-mode tourism data, namely obtaining semi-structured city and scenic spot data and unstructured travel note data from a tourism vertical website, and obtaining unstructured tourism videos from a video website;

The following describes the implementation of the entity extraction and heterogeneous relationship mining of the present invention.

1) Constructing an ontology base according to the tourism vertical website, wherein the ontology base is used for defining entity types including cities, scenic spots, places, time, activities and other entities;

2) Obtaining semi-structured and unstructured multi-modal data from travel vertical websites and video websites:

2.1 Obtain semi-structured cities and corresponding sight data of the cities from a travel vertical website;

2.2 Obtaining unstructured travel note data from the travel vertical website, and recording a text chapter structure according to an HTML (hypertext markup language) tag;

2.3 Obtain unstructured video data from a video website.

3) Preprocessing the data acquired in the step 2), and performing text recognition analysis and object recognition tracking:

3.1 Shot segmentation is performed on the video by using shot detection software ShotDetect;

3.2 ) taking frames for each shot segmented in the step 3.1) every 0.5 seconds, and recognizing frame scene texts by using text recognition software PaddleOCR;

3.3 Removing duplication of the text identified in the step 3.2) in each shot, and storing the text by taking the shot as a unit to obtain a scene character text;

3.4 Multi-class multi-object tracking using tracker centrrack for video;

3.5 Storing the object type and the object enclosing frame of each frame for the tracking result of the step 3.4);

3.6 Mask R-CNN is used for object recognition on the travel note picture;

3.7 An object bounding box for saving the class of the object for the recognition result of step 3.6);

3.8 Dividing sentences of each section of text of each chapter of travel notes;

3.9 ) performing word segmentation on the sentence segmentation result obtained in the step 3.8);

3.10 Performing part-of-speech analysis based on the word segmentation result of the step 3.9);

3.11 Carrying out named entity identification based on the word segmentation result of the step 3.9);

3.12 Performing dependency syntax analysis based on the word segmentation result of the step 3.9);

3.13 Sentence division is performed on the scene text of each shot of the video;

3.14 ) performing word segmentation on the sentence segmentation result obtained in the step 3.13);

3.15 Performing part-of-speech analysis based on the word segmentation result of the step 3.13);

3.16 Performing named entity recognition based on the word segmentation result of the step 3.13);

3.17 Dependency parsing is performed based on the segmentation result of step 3.13).

4) Extracting semantic entities from the analyzed data obtained in the step 3).

4.1 Constructing a mapping relation between the obtained city and the corresponding scenic spot according to the semi-structured data;

4.2 Sentence of the picture and the travel inscription text is taken as semantic unit according to the travel inscription line sequence;

4.3 Selecting named entities which can be corresponding to the urban scenery spot mapping from the named entities with the part of speech being the place name in each sentence in the step 4.2), extracting the named entities into urban entities and scenery spot entities, and recording the urban entities and the scenery spot entities as the nearest city or the nearest scenery spot;

4.4 Selecting named entities which cannot be mapped and corresponding to the urban scenic spots from the named entities with the part of speech as the place entities in each sentence in the step 4.2) and extracting the named entities as the place entities;

4.5 Selecting a word combination of adjacent time from each sentence in the step 4.2) and extracting the word combination as a time entity;

4.6 Select verbs from each sentence of step 4.2) and extract them as active entities;

4.7 4.2) selecting non-place nouns which have dependency relationship with verbs and prepositions from each sentence in the step 4.2) and extracting the nouns as other entities from the pictures;

4.8 Taking the shots of the video and the text recognition sentences of the video as semantic units according to the time sequence of the shots of the video;

4.9 Selecting named entities which can be corresponding to the urban scenery spot mapping from the named entities with the part of speech being the place name in each sentence in the step 4.8), extracting the named entities into urban entities and scenery spot entities, and recording the urban entities and the scenery spot entities as the nearest city or the nearest scenery spot;

4.10 Selecting named entities which cannot be mapped and corresponding to the urban scenic spots from the named entities with the part of speech as the place entities in each sentence in the step 4.8) and extracting the named entities as the place entities;

4.11 Selecting a word combination of adjacent time from each sentence in the step 4.8) and extracting the word combination as a time entity;

4.12 Select verbs from each sentence of step 4.8) to extract as active entities;

4.13 4.8) selecting non-place nouns which have dependency relationship with verbs and prepositions from each sentence in the step 4.8) and extracting objects tracked in the shot as other entities;

4.14 Computing the levenstein distance for the entities extracted from step 4.2) to step 4.13), and merging similar entities according to a set distance threshold.

5) Mining the relationship between the entities obtained in the step 4) from the analyzed data obtained in the step 3), and forming a knowledge graph:

5.1 Constructing an affiliation between the extracted scenery spot entity and the nearest city entity;

5.2 Constructing an affiliation between the extracted place entity and the nearest city entity;

5.3 Constructing an affiliation between the extracted location entity and the nearest sight point entity;

5.4 Constructing occurrence relationships between the extracted active entities and the nearest sight entities;

5.5 Constructing occurrence relationships for the extracted active entities and the site entities according to the dependency relationships;

5.6 Constructing an occurrence-time relationship for the extracted active entity and the time entity according to the dependency relationship;

5.7 Constructing an occurrence-time relationship for the extracted active entity and the time entity according to the dependency relationship;

5.8 Constructing a position proximity relation for the extracted sight spot entities and the extracted place entities according to the keywords and the dependency relations;

5.9 Constructing and utilizing arrival and departure relations of the extracted other entities, the extracted places, the extracted scenic spots and the extracted urban entities according to the keywords and the dependency relations;

5.10 Based on dependency relationships or semantic order, constructs relationships between other extracted entities.

The knowledge graph constructed by the invention integrates the non-structural data, and is beneficial to more comprehensively supporting the tourism service application such as retrieval, recommendation and the like under the condition of mass quantity.

Claims

1. A knowledge graph construction method based on multi-mode tourism big data is characterized in that entity extraction and heterogeneous relation mining are carried out on multi-mode tourism data, so that a knowledge graph is constructed for retrieval and display, the multi-mode comprises characters, pictures and video modes, and the construction method comprises the following steps:

and (3) relation mining: and mining the relationship among the entities for the extracted entities according to the data structure relationship and the syntactic dependency relationship so as to construct the travel knowledge map.

2. The knowledge graph construction method based on the multi-modal tourist big data as claimed in claim 1, characterized by comprising the following steps:

3) Preprocessing the data acquired in the step 2), performing word segmentation, part of speech analysis and dependency relationship analysis on the text in the travel record data, performing object identification on the picture in the travel record data, performing object tracking identification and scene character identification on the video, and performing word segmentation, part of speech analysis and dependency relationship analysis on the scene characters;

4) Extracting semantic entities from analyzed texts of travel note data, scene text recognized by videos, objects recognized by the travel note data and objects recognized by video tracking by combining with semi-structured data;

3. The knowledge graph construction method based on the multi-modal tourist big data according to the claim 1 or 2, characterized in that the multi-modal data acquisition specifically comprises:

1) Acquiring semi-structured cities and scenic spot data corresponding to the cities from a tourism vertical website;

2) Acquiring unstructured travel note data from a travel vertical website, and recording a text chapter structure according to an HTML (hypertext markup language) label;

3) Unstructured travel video data is obtained from a video website.

4. The knowledge graph construction method based on the multi-modal tourist big data as claimed in claim 1 or 2, characterized in that in the data preprocessing, the picture object recognition is carried out on the travel record data:

1) Performing object recognition on the travel note picture by using Mask R-CNN;

2) And saving the object type and the object surrounding frame for the object identification result.

5. The method for constructing the knowledge graph based on the multi-mode tourist datum according to the claim 1 or 2, wherein in the data preprocessing, the acquisition of the scene text of the video specifically comprises the following steps:

1) Shot segmentation is carried out on the video by using shotDetect;

2) Framing every 0.5 seconds for each shot split, and identifying field Jing Wenben using PaddleOCR;

3) And removing the duplication of the scene text identified in each shot, and storing the scene text by taking the shot as a unit to obtain the scene text.

6. The method for constructing the knowledge graph based on the multi-modal tourist big data according to the claim 1 or 2, wherein in the data preprocessing, the object recognition and tracking of the video specifically comprises the following steps:

1) Performing multi-class multi-object tracking on the video by using a CenterTrack;

2) And saving the object type and the object enclosing frame of each frame for the tracking result.

7. The method for constructing the knowledge graph based on the multi-modal tourist big data as claimed in claim 1 or 2, wherein in the data preprocessing, the text recognition analysis comprises word segmentation, part of speech analysis and dependency relationship analysis, and specifically comprises the following steps:

1) The method comprises the steps of (1) carrying out sentence division on each section of text of each chapter of travel note text, or carrying out sentence division on scene text of each shot of a video;

2) Performing word segmentation on the sentence segmentation result;

3) Performing part-of-speech analysis based on the word segmentation result;

4) Carrying out named entity recognition based on the word segmentation result;

5) And performing dependency syntax analysis based on the word segmentation result.

8. The method for building the knowledge graph based on the multi-modal tourist big data as claimed in claim 2, wherein in the entity extraction, the semantic entity extraction in the travel record data is specifically as follows:

1) Constructing a mapping relation between the obtained city and the corresponding scenic spot according to the semi-structured data;

2) The sentence of the picture and the text is taken as a semantic unit according to the line text sequence of the travel note data;

3) Selecting named entities which can be mapped with cities and scenic spots from the named entities with the part of speech as the place name in each sentence of the travel note data text, extracting the named entities into city entities and scenic spot entities, and recording the city entities and the scenic spot entities as the nearest city or nearest scenic spot;

4) Selecting named entities which cannot be mapped with urban scenic spots from the named entities with the part of speech as the place name in each sentence of the travel note data text, and extracting the named entities as place entities;

5) Selecting an adjacent time word combination from each sentence of the travel record data text and extracting the adjacent time word combination as a time entity;

6) Selecting verbs from each sentence of the travel note data text and extracting the verbs as active entities;

7) Selecting non-place nouns which have dependency relationship with verbs and prepositions from each sentence of the travel-memory data text and objects identified in the travel-memory data picture, and extracting the non-place nouns and the objects as other entities;

8) And (4) calculating the Levensan distance of the entities extracted from the step 3) to the step 7), and combining the similar entities according to a set threshold value.

9. The knowledge graph construction method based on the multi-mode tourism big data as claimed in claim 2, wherein in the entity extraction, the semantic entity extraction in the video is specifically as follows:

2) Taking the shots of the video and the recognition sentences in the corresponding shots as semantic units according to the time sequence of the video shots;

3) Selecting named entities which can be mapped and correspond to urban scenic spots from the named entities of which the part of speech of each recognition sentence is a place name, extracting the named entities into urban entities and scenic spot entities, and recording the urban entities and the scenic spot entities as the nearest city or the nearest scenic spot;

4) Selecting named entities which cannot be mapped and corresponding to urban scenic spots from the named entities of which the part of speech of each recognition sentence is a place name, and extracting the named entities as place entities;

5) Selecting an adjacent time word combination from each recognition sentence and extracting the adjacent time word combination as a time entity;

6) Selecting verbs from each recognition sentence and extracting the verbs as active entities;

7) Selecting non-place nouns which have dependency relations with verbs and prepositions and objects tracked in the shot from each recognition sentence, and extracting the non-place nouns and the objects as other entities;

8) Calculating the Levensan distance of the entities extracted from the step 3) to the step 7), and combining the similar entities according to a set threshold value.

10. The knowledge graph construction method based on the multi-modal tourist big data according to the claim 8 or 9, characterized in that the entity relationship mining is heterogeneous relationship mining, specifically:

1) Constructing a relationship between the extracted scenery spot entities and the nearest city entity;

2) Constructing a relationship between the extracted place entity and the nearest city entity;

3) Constructing a relationship between the extracted location entity and the nearest scenery spot entity;

4) Establishing occurrence relationship between the extracted activity entity and the nearest scenery spot entity;

5) Constructing an occurrence relation between the extracted active entities and the extracted place entities according to the dependency relation;

6) Constructing a time-of-occurrence relation between the extracted active entities and the time entities according to the dependency relation;

7) Constructing a time-of-occurrence relation between the extracted active entities and the time entities according to the dependency relation;

8) Constructing a position proximity relation for the extracted scenery spot entities and the extracted place entities according to the keywords and the dependency relations;

9) Constructing and utilizing arrival and departure relations of other extracted entities, places, scenic spots and city entities according to the keywords and the dependency relations;

10 Based on dependency relationships or semantic order, constructs relationships between other extracted entities.