CN110532544B

CN110532544B - Method and system for constructing low-resource word tourism field knowledge base

Info

Publication number: CN110532544B
Application number: CN201910650742.1A
Authority: CN
Inventors: 赵小兵; 冯小兰
Original assignee: Minzu University of China
Current assignee: Minzu University of China
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2023-03-24
Anticipated expiration: 2039-07-18
Also published as: CN110532544A

Abstract

The invention provides a method and a system for constructing a knowledge base in the field of low-resource character tourism, and relates to the field of computers. The method comprises the steps of constructing a Chinese tourism field knowledge base containing a plurality of triple knowledge and constructing a Chinese-low resource character dictionary of the tourism field; and translating a plurality of triple knowledge in the Chinese tourism knowledge base into low-resource words through the Chinese-low-resource word dictionary in the tourism field, thereby constructing the low-resource word tourism field knowledge base. According to the invention, the ternary group knowledge is constructed by virtue of the Chinese tourism linguistic data with rich resources, the comprehensive Chinese tourism field knowledge base is obtained, and then the Chinese tourism field knowledge base is migrated to the low-resource character tourism field knowledge base, so that the technical problem that the comprehensive scenic spot knowledge content of the low-resource characters is difficult to directly obtain due to the shortage of the linguistic data of the low-resource character tourism field in the network is solved, the aim of migrating the rich language knowledge to the low-resource language field is realized, and the intelligent service of other information such as the low-resource character tourism information is favorably realized.

Description

Low-resource character tourism field knowledge base construction method and system

Technical Field

The invention relates to the technical field of computers, in particular to a method and a system for constructing a low-resource character tourism field knowledge base.

Background

Tourism has become one of the most important leisure and recreation of people. With the rapid development of the internet, more and more Chinese travel websites emerge in the network, and rich travel information is provided for tourists. Chinese tourist websites have a large amount of information, scenic spot introduction texts have a long spread, and contain different information, on the contrary, low-resource characters do not realize intelligent service of tourist information. How to help low-resource characters to construct a knowledge base by means of resource-rich languages becomes one of important research hotspots in current natural language processing.

However, because of the lack of corpus in the low-resource character tourism field in the current network, it is not easy to directly obtain the comprehensive scenic spot knowledge content of the low-resource characters, and there is a certain difficulty in constructing the knowledge base.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a method for constructing a knowledge base in the low-resource character tourism field, which solves the technical problems that the comprehensive scenic spot knowledge content of low-resource characters is difficult to directly acquire and the knowledge base is difficult to construct due to the shortage of the linguistic data in the low-resource character tourism field in the network.

(II) technical scheme

In order to realize the purpose, the invention is realized by the following technical scheme:

the invention discloses a method for constructing a low-resource character tourism field knowledge base, which is executed by a computer and comprises the following steps of:

s1, constructing a Chinese tourism field knowledge base containing a plurality of triple knowledge;

s2, obtaining Chinese linguistic data and low-resource text linguistic data, and preprocessing the Chinese linguistic data and the low-resource text linguistic data;

s3, acquiring a Chinese word vector X corresponding to the preprocessed Chinese corpus based on the preprocessed Chinese corpus, and acquiring a low-resource Wen Zici vector Y corresponding to the preprocessed low-resource text corpus based on the preprocessed low-resource text corpus;

s4, acquiring a linear mapping matrix T from the Chinese word vector X to the low-resource word vector Y based on an MUSE model; multiplying the linear mapping matrix T and the Chinese word vector X matrix to obtain a vector U corresponding to the Chinese word vector X after being mapped to the low-resource character word vector Y;

s5, calculating k low-resource characters which have the closest word vector quantity expression cosine distance with Chinese words in a vector U in the low-resource character word vector Y, and taking the k low-resource characters as a translation candidate set of the k low-resource characters corresponding to Chinese to construct a Chinese-low resource character dictionary in the tourism field;

and S6, translating a plurality of triple knowledge in the Chinese tourism knowledge base into low-resource words based on the Chinese-low-resource word dictionary in the tourism field, and constructing the low-resource word tourism field knowledge base.

Preferably, S1 specifically includes:

s101, acquiring a text corpus of Chinese travel texts;

s102, training a text corpus based on a Word2Vec model to obtain a Word vector model and a Word vector model;

s103, acquiring a position vector of each word in the sentence based on the sentences in the text corpus;

s104, obtaining word vectors based on the word vector model, obtaining word characteristic vectors based on the word characteristic vector model, adding the word vectors and the word characteristic vector matrixes to obtain word vectors fused with the word characteristic vectors, and then fusing position vectors to obtain word vectors fused with multiple characteristics;

s105, inputting the word vectors fused with the multiple features into a relation extraction model to obtain probability distribution of entity relations;

s106, judging the entity relationship between the two entities according to the probability distribution of the entity relationship, constructing triple knowledge based on the entity relationship, structurally storing the entity triple knowledge in a database form, and constructing a Chinese tourism field knowledge base.

Preferably, the step S101 specifically includes:

the method comprises the steps of obtaining a tourism text through a crawler technology, preprocessing the tourism text, forming a text corpus by the preprocessed tourism text, wherein the preprocessing comprises sentence segmentation, word segmentation and word marking.

Preferably, the step S103 specifically includes:

for each word si in a sentence word sequence S = [ S1, S2., sl ] of length l, the relative distances from a first entity e1 and a second entity e2 are i-i1 and i-i2, the first entity e1 and the second entity e2 are both the target entities, wherein i represents the index subscript of the current word in the sentence, i1 and i2 are the index subscripts of the first entity e1 and the second entity e2 respectively, and the negative number represents that the current word is positioned in front of the entity word; then adopting Word2vec to initialize to obtain a 2ld position vector matrix, wherein d represents the dimensionality of the position vector; the position vector for each word in the sentence is represented as pvi = [ pvi, pvi ], where pvi and pvi represent vector representations of the relative distances of the i-th word in the sentence to entity e1 and entity e2, respectively.

Preferably, the step S104 specifically includes:

performing matrix addition on a word vector N = { N1, N2,. Nl } and a part-of-speech vector V = { V1, V2,. Vl } obtained by training a word vector model and a part-of-speech vector model to obtain a word vector of a fused part-of-speech vector, wherein the word vector is represented as W = alpha N + (1-alpha) V, and alpha =0.5, namely W2=0.5 (N + V) after the part-of-speech vector is fused; and then, fusing the position vector to obtain a multi-feature fused word vector W3= [ W2, PV ], wherein PV = { PV1, PV 2.. Pvl }.

Preferably, the step S105 specifically includes:

inputting a word vector W3 fused with multiple features as a relation extraction model, and processing by adopting a bidirectional LS (least square) TM (TM) to obtain text semantic information of a word sequence in a sentence in two directions from front to back and from back to front; the output calculation of the BLSTM of the ith word adopts a formula:

connecting the output of the BLSTM layer with a softmax classifier to obtain probability distribution of entity relations;

the entity relations comprise 9 types, including position relations, establishing relations, belongingrelations, proximity relations, correlation relations, inclusion relations, equivalence relations and attribution relations.

Preferably, the step S106 specifically includes:

and judging the entity relationship between two entities according to the probability distribution, constructing the triple knowledge < the first entity e1, the second entity e2 and the entity relationship > according to the entity relationship between the two entities, and acquiring a plurality of triple knowledge by processing the text corpus so as to construct a Chinese tourism field knowledge base.

Preferably, the step S2 specifically includes:

chinese linguistic data and low-resource text linguistic data are obtained through a crawler technology, useless information is removed, only text information of an article is reserved, and then word segmentation and word deactivation are carried out.

Preferably, the step S3 specifically includes:

and training the preprocessed Chinese corpus by adopting a fastText word vector model to obtain a corresponding Chinese word vector X, and training the preprocessed low-resource text corpus by adopting a fastText word vector model to obtain a corresponding low-resource text word vector Y.

The invention also provides a low-resource character tourism field knowledge base construction system, which comprises a computer, wherein the computer comprises:

at least one memory cell;

at least one processing unit;

wherein the at least one memory unit has stored therein at least one instruction that is loaded and executed by the at least one processing unit to perform the steps of:

s3, acquiring a corresponding Chinese word vector X based on the preprocessed Chinese corpus, and acquiring a corresponding low-resource Wen Zici vector Y based on the preprocessed low-resource text corpus;

s4, acquiring a linear mapping matrix T from the Chinese word vector X to the low-resource text word vector Y based on an MUSE model; multiplying the linear mapping matrix T and the Chinese word vector X matrix to obtain a vector U corresponding to the Chinese word vector X after being mapped to the low-resource character word vector Y;

s5, calculating k low-resource characters which have the closest cosine distance to the word vector quantity of the Chinese word in the vector U in the low-resource character word vector Y, and taking the k low-resource characters as a translation candidate set of the k low-resource characters corresponding to the Chinese so as to construct a Chinese-low-resource character dictionary in the tourism field;

(III) advantageous effects

The invention provides a method and a system for constructing a low-resource character tourism field knowledge base. Compared with the prior art, the method has the following beneficial effects:

the method comprises the steps of constructing a Chinese tourism field knowledge base containing a plurality of triple knowledge and constructing a Chinese-low resource word dictionary in the tourism field; and translating a plurality of triple knowledge in the Chinese tourism knowledge base into low-resource words through the Chinese-low-resource word dictionary in the tourism field, thereby constructing the low-resource word tourism field knowledge base. According to the invention, the ternary group knowledge is constructed by virtue of the Chinese tourism linguistic data with rich resources, the comprehensive Chinese tourism field knowledge base is obtained, and then the Chinese tourism field knowledge base is migrated to the low-resource character tourism field knowledge base, so that the technical problem that the comprehensive scenic spot knowledge content of the low-resource characters is difficult to directly obtain due to the shortage of the linguistic data in the low-resource character tourism field in the network is solved, the aim of migrating the rich language knowledge to the low-resource language field is realized, and the intelligent service of other information such as the low-resource character tourism information is favorably realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a method for constructing a low-resource textual tourist domain knowledge base according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a method for calculating a distance from each word in a sentence to a target entity according to an embodiment of the present invention;

FIG. 3 is a block diagram of a relationship extraction model in an embodiment of the invention;

fig. 4 is a schematic structural diagram of a word vector mapping manner in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application provides a method and a system for constructing a low-resource character tourism field knowledge base, so that the technical problems that comprehensive scenic spot knowledge contents of low-resource characters are not easy to directly obtain and the knowledge base is difficult to construct due to the fact that the low-resource character tourism field in a network is deficient in language materials are solved, and the aim of migrating rich language knowledge to the low-resource language field is achieved.

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

the embodiment of the invention constructs a Chinese-low resource word dictionary in the tourism field by constructing a Chinese tourism field knowledge base containing a plurality of triple knowledge and constructing; and translating a plurality of triple knowledge in the Chinese tourism knowledge base into low-resource characters through the Chinese-low-resource character dictionary in the tourism field so as to construct the low-resource character tourism field knowledge base. According to the invention, triple knowledge is constructed by virtue of the Chinese tourism linguistic data with rich resources, a comprehensive Chinese tourism field knowledge base is obtained, and then the Chinese tourism field knowledge base is migrated to the low-resource character tourism field knowledge base, so that the technical problem that comprehensive scenic spot knowledge content of low-resource characters is difficult to directly obtain due to the shortage of the linguistic data of the low-resource character tourism field in the network is solved, the aim of migrating rich language knowledge to the low-resource language field is realized, and the intelligent service of other information such as low-resource character tourism information is favorably realized.

In order to better understand the technical scheme, the technical scheme is described in detail in the following with reference to the attached drawings of the specification and specific embodiments.

The embodiment of the invention provides a method for constructing a low-resource character tourism field knowledge base, which is executed by a computer and comprises the following steps of:

s4, acquiring a linear mapping matrix T from the Chinese word vector X to the low-resource character word vector Y based on an MUSE model; multiplying the linear mapping matrix T with the Chinese word vector X matrix to obtain a vector U corresponding to the Chinese word vector X after being mapped to the low-resource character word vector Y;

s5, calculating k low-resource characters which have the shortest cosine distance with the word vector quantity of the Chinese word in the vector U in the low-resource character word vector Y, and taking the k low-resource characters as a translation candidate set of the k low-resource characters corresponding to the Chinese so as to construct a Chinese-low-resource character dictionary in the tourism field;

The steps are described in detail below.

Note that the low resource text is exemplified by the Tibetan language.

S1, constructing a Chinese travel field knowledge base containing a plurality of triple knowledge. The method specifically comprises the following steps:

s101, a text corpus of Chinese tourism texts is obtained, in the specific implementation process, the tourism texts are obtained through a crawler technology and are preprocessed, the preprocessed tourism texts form the text corpus, and the preprocessing comprises sentence segmentation, word segmentation and part of speech tagging. Where the parts of speech refer to parts of speech such as nouns, verbs, adverbs, etc., in the embodiment of the present invention, these words are divided into finer divisions, such as names divided into names of people, places, organizations, transliterated names, etc.

For example: the Lacumin temple is located in Jiangzin county in the Japanese Kai region.

The word segmentation result is as follows: the Lacumin temple is located in Jiangzin county in the Japanese Kai region.

And segmentation and word direction labeling results: the Lacumin temple/na is located in/v Japanese Kai region/ns Jiangzin county/ns county/s. /wj

Wherein: ns represents a place name; na represents a scene name; v represents a verb; s represents the place word wj represents punctuation;

s103, obtaining a position vector of each word in the sentence based on the distance from each word in the sentence in the text corpus to a target entity, wherein the target entity refers to the first entity e1 and the second entity e2.

In the specific implementation process, the distance from each word in a sentence to a target entity is calculated, so that different example relationships in the same sentence can be well distinguished, and the specific implementation is as follows: for each word si in a sentence word sequence S = [ S1, S2., sl ] with a length of l, the relative distances from the first entity e1 and the second entity e2 are i-i1 and i-i2, where i represents the index of the current word in the sentence, i1 and i2 are the index indices of the first entity e1 and the second entity e2, respectively, and a negative number indicates that the current word is located before the entity word. As shown in fig. 2, the sentence "qiaga Qu Desi was built at the end of the 16 th century and belongs to gruppe. The word sequence with the length of 8 is obtained after word segmentation, wherein the relative distance between the word sequence and the first entity e1, namely QIGA Qu Desi, is 1, and the relative distance between the word sequence and the second entity e2, namely Groupi, is-5. Then, a 2ld position vector matrix is obtained by adopting Word2vec initialization, wherein d represents the dimension of the position vector. Finally, the position vector representation of each word in the sentence is pvi = [ pvi, pvi ], where pvi1 and pvi represent vector representations of the relative distances of the ith word in the sentence to entity e1 and entity e2, respectively.

S104, word vectors are obtained based on the word vector model, word characteristic vectors are obtained based on the word characteristic vector model, the word vectors and the word characteristic vector matrix are added to obtain word vectors with the word characteristic vectors fused, and then position vectors are fused to obtain multi-feature fused word vectors.

In a specific implementation process, based on a word vector N = { N1, N2,. Nl } and a part-of-speech vector V = { V1, V2,. Vl } obtained by training a word vector model and a part-of-speech vector model, matrix addition is performed on N and V to obtain a word vector representation W = alpha N + (1-alpha) V of a fused part-of-speech vector (wherein 0 is not less than alpha and not more than 1); when α =1, W1= N, i.e., a word vector without the part-of-speech information fused; let α =0.5, i.e., W2=0.5 (N + V) after fusing the part-of-speech vectors; and then fusing the position vector to obtain a word vector W3= [ W2, PV ] fusing multiple characteristics, wherein PV = { PV1, PV 2.. Pvl }.

in a specific implementation process, the relation extraction model comprises a BLSTM layer, a full connection layer and a softmax classifier, as shown in FIG. 3. And inputting the word vector fused with the multiple features as a relation extraction model, and processing by adopting a bidirectional LSTM to obtain text semantic information of a word sequence in a sentence in two directions from front to back and from back to front. The BLSTM output calculation of the ith word uses the formula:

connecting the output of the BLSTM layer with a softmax classifier to obtain the probability distribution of the entity relationship;

The method specifically comprises the following steps: the entity relationship between the two entities is determined according to the probability distribution, and the triple knowledge < the first entity e1, the second entity e2 and the entity relationship > is constructed according to the entity relationship between the two entities, for example, in a sentence "the temple of the north grotto is a cave temple of the Gansu province", the triple knowledge < the temple of the north grotto, the Gansu province and the position relationship > can be constructed. And acquiring a plurality of triple knowledge by processing the text corpus so as to construct a Chinese tourism field knowledge base.

S2, obtaining Chinese language materials and Tibetan language materials, and preprocessing the Chinese language materials and the Tibetan language materials.

In the specific implementation process, chinese linguistic data and Tibetan linguistic data are obtained through a crawler technology, useless information is removed, only text information of an article is kept, and then preprocessing such as word segmentation and word stop is performed.

S3, acquiring a corresponding Chinese word vector X based on the preprocessed Chinese corpus, and acquiring a corresponding Tibetan word vector Y based on the preprocessed Tibetan corpus;

in a specific implementation process, the preprocessed Chinese corpus is trained by adopting a fastText word vector model to obtain a corresponding Chinese word vector X, and the preprocessed Tibetan corpus is trained by adopting a fastText word vector model to obtain a corresponding Tibetan word vector Y;

s4, acquiring a linear mapping matrix T from the Chinese word vector X to the Tibetan word vector Y based on an MUSE model; multiplying the linear mapping matrix T with the Chinese word vector X matrix to obtain a corresponding vector U after the Chinese word vector X is mapped to the Tibetan word vector Y;

and S5, calculating k low-resource characters which have the closest cosine distance with the word vector of the Chinese word in the vector U in the low-resource character word vector Y, and taking the k low-resource characters as a translation candidate set of the k low-resource characters corresponding to the Chinese so as to construct a Chinese-low-resource character dictionary in the tourism field.

For example: as shown in fig. 4, U (U = TX) in the figure represents chinese word vector representation after being mapped to Tibetan word vector space, Y represents Tibetan word vector space, x is chinese word, V _x Is the word vector corresponding to x, finds out the word vector space Y corresponding to V in the Tibetan _x K Tibetan words with the cosine closest to the Tibetan are used as a candidate set of translation from Chinese x to Tibetan, namely y1, y 2. For example, when k =5, after the chinese word "sun county" is subjected to the chinese-tibetan cross-language word vector mapping, the tibetan words closest to the cosine thereof are sequentially "

(heliostat county) "," "based on the status of the sun">

(day zong) "," "," "based on the status of the sun">

(sun-day) "," reserved on a sun or on a sun>

(Japanese Kate) ", a,“/>

(middle side) ", the Tibetan translation of" heliostat county "can be selected as->

/>

And S6, translating a plurality of triple recognitions in the Chinese travel knowledge base into Tibetan based on the travel field Chinese-Tibetan dictionary, and constructing a Tibetan travel field knowledge base.

The embodiment of the invention also provides a low-resource character tourism field knowledge base construction system, which comprises a computer, wherein the computer comprises:

at least one memory cell;

at least one processing unit;

s2, obtaining Chinese linguistic data and low-resource literal linguistic data, and preprocessing the Chinese linguistic data and the low-resource literal linguistic data;

In summary, compared with the prior art, the method has the following beneficial effects:

It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A low-resource character tourism field knowledge base construction method is characterized in that the method is executed by a computer and comprises the following steps:

s1, constructing a Chinese tourism field knowledge base containing a plurality of triple knowledge, comprising the following steps:

s101, acquiring a text corpus of Chinese travel texts;

s102, training a text corpus based on a Word2Vec model to obtain a Word vector model and a part-of-speech vector model;

s106, judging the entity relationship between two entities according to the probability distribution of the entity relationship, constructing triple knowledge based on the entity relationship, structurally storing the entity triple knowledge in a database form, and constructing a Chinese tourism field knowledge base;

s3, acquiring a corresponding Chinese word vector X based on the preprocessed Chinese corpus, and acquiring a corresponding low-resource word vector Y based on the preprocessed low-resource word corpus;

s4, acquiring a linear mapping matrix T from the Chinese word vector X to the low-resource character word vector Y based on an MUSE model; multiplying the linear mapping matrix T with the Chinese word vector X to obtain a vector U corresponding to the Chinese word vector X after being mapped to the low-resource character word vector Y;

s5, calculating k low-resource characters which are closest to word vectors of Chinese words in a vector U in the low-resource character word vector Y and represent cosine distances, and taking the k low-resource characters as k low-resource character translation candidate sets corresponding to Chinese to construct a Chinese-low resource character dictionary in the tourism field;

2. The method for constructing a low-resource literal tourism domain knowledge base as claimed in claim 1, wherein the step S101 is specifically:

the method comprises the steps of obtaining a tourism text through a crawler technology, preprocessing the tourism text, forming a text corpus by the preprocessed tourism text, wherein the preprocessing comprises sentence segmentation, word segmentation and part-of-speech tagging.

3. The method for constructing a low-resource literal tourist domain knowledge base according to claim 2, wherein said step S103 is specifically:

for each word si in a sentence word sequence S = [ S1, S2., sl ] with a length of l, the relative distances from a first entity e1 and a second entity e2 are i-i1 and i-i2, the first entity e1 and the second entity e2 are both target entities, wherein i represents the index subscript of the current word in the sentence, i1 and i2 are the index subscripts of the first entity e1 and the second entity e2 respectively, and a negative number represents that the current word is located before the entity word; then, adopting Word2vec to initialize to obtain a 2ld position vector matrix, wherein d represents the dimension of the position vector; the position vector of each word in the sentence is represented as pvi = [ pvi, pvi ], where pvi and pvi represent vector representations of the relative distances of the ith word in the sentence to entity e1 and entity e2, respectively.

4. The method for constructing a low-resource literal tourist domain knowledge base according to claim 3, wherein said step S104 specifically comprises:

5. The method for constructing a low-resource literal tourist domain knowledge base according to claim 4, wherein said step S105 specifically comprises:

the entity relations comprise 9 types, including position relations, establishing relations, creating relations, belonged relations, proximity relations, related relations, inclusion relations, equivalence relations and attribute relations.

6. The method for constructing a low-resource literal tourism domain knowledge base as claimed in claim 5, wherein the step S106 is specifically:

7. The method for constructing a low-resource literal tourist domain knowledge base according to claim 1, wherein the step S2 is specifically:

8. The method for constructing a low-resource literal tourism domain knowledge base as claimed in claim 1, wherein said step S3 is specifically:

9. A low-resource literal tourism domain knowledge base construction system, characterized in that the system comprises a computer, the computer comprises:

at least one memory cell;

at least one processing unit;

s1, constructing a Chinese tourism field knowledge base containing a plurality of triple knowledge, comprising the following steps: s101, acquiring a text corpus of Chinese travel texts;

s5, calculating k low-resource characters which are closest to word vectors of Chinese words in a vector U in the low-resource character word vector Y and represent cosine distances, and using the k low-resource characters as k low-resource character translation candidate sets corresponding to Chinese, so as to construct a Chinese-low resource character dictionary in the tourism field;