CN116719910A

CN116719910A - Text query method and system based on artificial intelligence technology

Info

Publication number: CN116719910A
Application number: CN202310987246.1A
Authority: CN
Inventors: 王祥凯; 张凯; 刘晓旭
Original assignee: Shandong Zhengyun Information Technology Co ltd
Current assignee: Shandong Zhengyun Information Technology Co ltd
Priority date: 2023-08-08
Filing date: 2023-08-08
Publication date: 2023-09-08

Abstract

The application discloses a text query method and a text query system based on an artificial intelligence technology, which relate to the field of text query and comprise the following steps: s1, collecting data of related texts; s2, automatically analyzing the collected foreign language text; s3, analyzing event types of the input Chinese inquiry sentences; according to the text query method and system based on the artificial intelligence technology, the collected foreign text is automatically analyzed, the event information extraction effect of the model can be effectively improved, meanwhile, query auxiliary sentences are constructed through the bilingual mapping dictionary, the distance between the query sentences and the search document language is shortened through a mode of sharing an encoder, then semantic alignment of the query sentences and the query auxiliary sentences is achieved through an interaction attention mechanism, finally, event information is integrated through the information fusion module, the understanding of the search model on text semantic information is improved, and the accuracy of foreign document query can be guaranteed through respective analysis and fusion of bilingual.

Description

Text query method and system based on artificial intelligence technology

Technical Field

The application relates to a text query technology, in particular to a text query method and a text query system based on an artificial intelligence technology.

Background

With the popularity and development of communication technology and computer technology, more and more users will utilize search engines for retrieval. The search engine can understand human natural language to a certain extent based on the natural language understanding platform, extract key content from the natural language and use the key content for searching, and finally achieve the effect of enabling the text understood by the search engine to be unified with the query text to be searched by the user and to a high degree. In such cases, a suitable natural language understanding platform is important to be able to accurately understand the user query text.

The prior art includes a natural language understanding platform, such as a training mode based on Crowdsourcing (crowds) mechanism for a natural language understanding system. The system provides a collaborative interactive platform for a plurality of developers to commonly provide training data to perform training of natural language understanding tasks.

However, the inventor uses foreign language documents of foreign language in the process of implementing the application, so that the inventor needs to query in the foreign language text, and the existing foreign language text query mode is mostly to query in the foreign language text after translating the query requirement, so that errors are easily generated in the process of querying the foreign language text due to different translations.

Disclosure of Invention

The application aims to provide a text query method and a text query system based on an artificial intelligence technology, which aim to solve the problem that in the prior art, foreign text query modes are mostly translated to query requirements, and then query is carried out in foreign text, so that errors are easily generated when foreign text query is carried out due to different translations.

In order to achieve the above object, the present application provides the following technical solutions: a text query method and system based on artificial intelligence technology includes the following steps:

s1, collecting data of related texts;

s2, automatically analyzing the collected foreign language text;

s3, analyzing event types of the input Chinese inquiry sentences;

s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;

s5, obtaining cross-language characteristic representation of the query;

s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;

s7, calculating the matching score of the search document;

and S8, displaying the matching query result of the text.

Further, the specific method for automatically analyzing the collected foreign text in the step S2 is as follows:

a1, vectorizing the text through two different modes of pre-training BERT and character embedding;

a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,

a3, encoding through a transducer encoding end;

a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;

a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;

a6, enabling the model to further grasp text key information through a gating fusion mechanism;

and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.

Further, the specific method for performing event type analysis on the input foreign language query sentence in S3 is as follows:

b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;

b2, fusing the Chinese word-level vector with the foreign sentence-level vector;

b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;

b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;

b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;

and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.

Further, the specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in the step S4 is as follows:

c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Construction of query-assisted sentences through bilingual mapping dictionary；

C2, representing the data as an N-dimensional vector, as shown in the following formula:

；

wherein E is _QV 、E _QC 、E _D 、E _R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;

and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:

；

wherein the method comprises the steps ofAnd->The method comprises the steps of respectively representing the context characteristic representation of a Chinese query and a foreign language query auxiliary sentence, and reducing the distance between semantic spaces of a Chinese language and a foreign language by sharing parameters of a coding end;

and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:

；

wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;

and C5, event role information in the extracted text is shown as the following formula:

；

wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.

Further, the specific method for obtaining the cross-language feature representation of the query in the step S5 is as follows:

d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the jth word of a Chinese sentenceThe following formula is shown:

；

d2, by corresponding matching score m _j.i Computing the Softmax function to obtain the attention weight alpha _j.i The following formula is shown:

；

d3, based on feature vector h _i ^c And h _j ^v To calculate a matching score as shown in the following equation:

；

wherein W is E R ^n*n And b.epsilon.R is the attention parameter in its text training process.

Further, the specific method in step S6 is as follows:

e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;

e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t ^T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:

；

e3, obtaining the Chinese document representation after evaluationThe following formula is shown:

；

wherein the method comprises the steps of，/>Is a vector of attention, +.>Is a vector matrix of chinese documents.

Further, the specific method for calculating the matching score of the search document in the step S7 is as follows: and the interaction sequencing module.

Obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:

。

a text query system based on artificial intelligence technology, comprising:

the data acquisition module is used for realizing foreign text;

the foreign language text analysis module is used for automatically analyzing the collected foreign language text;

the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;

the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;

the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;

the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;

the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;

the interaction sequencing module is used for calculating the matching score of the search document;

and the result display module is used for displaying the matched query result of the external text.

Compared with the prior art, the text query method and system based on the artificial intelligence technology can effectively improve the event information extraction effect of the model by automatically analyzing the collected foreign text, simultaneously construct a query auxiliary sentence through a bilingual mapping dictionary, pull the distance between the query sentence and the search document language through a shared encoder, realize the semantic alignment of the query sentence and the query auxiliary sentence through an interaction attention mechanism, finally realize the integration of event information through an information integration module, improve the understanding of the search model on text semantic information, and ensure the accuracy of foreign document query through respective analysis and integration of bilingual.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a schematic diagram of an overall structure according to an embodiment of the present application.

Detailed Description

In order to make the technical scheme of the present application better understood by those skilled in the art, the present application will be further described in detail with reference to the accompanying drawings.

Embodiment one:

referring to fig. 1, a text query method and system based on artificial intelligence technology includes the following steps:

s1, collecting data of related texts;

s2, automatically analyzing the collected foreign language text;

s3, analyzing event types of the input Chinese inquiry sentences;

s5, obtaining cross-language characteristic representation of the query;

s7, calculating the matching score of the search document;

and S8, displaying the matching query result of the text.

By means of automatic analysis of collected foreign language texts, event information extraction effect of a model can be effectively improved, query auxiliary sentences are built through a bilingual mapping dictionary, distances between the query sentences and the search document languages are shortened through a sharing encoder mode, semantic alignment of the query sentences and the query auxiliary sentences is achieved through an interaction attention mechanism, finally fusion of event information is achieved through an information fusion module, understanding of the retrieval model on text semantic information is improved, and accuracy of foreign language document query can be guaranteed through respective analysis and fusion of bilingual.

The specific method for automatically analyzing the collected foreign language text in the step S2 is as follows:

a3, encoding through a transducer encoding end;

The system comprises an embedded representation fusion module, an encoder module, a graph convolution module and a fusion reasoning module. 1. And (3) embedding a characterization fusion module: the text is vectorized by pre-training BERT and character embedding in two different ways. Text can be made more rich in tokens through two different tokens. 2. An encoder module: mainly comprises a sentence-level encoder module and a document-level encoder module. 1) Sentence-level encoder module: each sentence of the text is coded by the BI-LSTM network, and the model can learn fine-grained semantic information by sentence-level context coding. 2) Document level encoder module: the method has the advantages that the method enables the model to learn semantic information among the text cross sentences through encoding by the transducer encoding end, and the model can be deep in semantics of the text. 3. And a graph convolution module: and carrying out joint learning on the vector characterization of the text coding end and knowledge graph information constructed based on the information to obtain the text vector characterization integrated with the knowledge graph information. 4. And a fusion reasoning module: and carrying out joint learning on the vector representation blended with the knowledge graph information and the output characteristics of the document-level coding module to obtain the document-level representation blended with the knowledge graph information, and then enabling the model to further grasp text key information through a gating fusion mechanism with the output of the sentence-level coding module. And finally, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.

The specific method for carrying out event type analysis on the input foreign language query statement in the S3 is as follows:

Bilingual information fusion module: mainly comprises two network modules of a shared encoder network and a cross attention network. The shared encoder network firstly encodes foreign language sentences through an encoder and obtains foreign language encoding hidden layer vector representations and foreign language sentence level vector representations, then Chinese word level vectors and foreign language sentence level vectors are fused, and then Chinese hidden layer vectors and Chinese sentence level vectors are obtained through a shared encoder strategy. The cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain the Chinese vector representation fused with foreign language word level information. The syntax graph convolution module performs joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain a vector representation fused with the dependency syntax information. Finally, chinese semantic representation based on foreign language event type information is realized through an event type perception network in the event detector, so that Chinese event detection is completed.

The specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in the step S4 is as follows:

c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Constructing a query auxiliary sentence by bilingual mapping dictionary>；

；

The specific method for obtaining the cross-language feature representation of the query in the step S5 is as follows:

d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the j-th word of a Chinese sentence +.>The following formula is shown:

；

The specific method of step S6 is as follows:

；

The specific method for calculating the matching score of the search document in the step S7 is as follows:

。

embodiment two:

a text query system based on artificial intelligence technology, comprising:

the data acquisition module is used for realizing foreign text;

The method comprises the steps that a foreign text analysis module is used for automatically analyzing collected foreign text, a Chinese query statement analysis module is used for carrying out event type analysis on input Chinese query statements, an embedded characterization module is used for generating corresponding foreign auxiliary sentence representations on the Chinese query statements, a cross-language characterization module is used for obtaining cross-language feature representations of query, an event information fusion module is used for fusing event type information of the query statements with foreign retrieval documents, event role information of the foreign retrieval documents is fused with the query statements, an interactive ordering module is used for calculating matching scores of the retrieval documents, and a result display module is used for displaying matching query results of the foreign text.

Working principle: when the method is used, the foreign text analysis module is used for automatically analyzing the collected foreign text, the Chinese query statement analysis module is used for carrying out event type analysis on the input Chinese query statement, the embedded characterization module is used for generating corresponding foreign auxiliary sentence representation on the Chinese query statement, the cross-language characterization module is used for obtaining cross-language characteristic representation of query, the event information fusion module is used for fusing event type information of the query statement with a foreign retrieval document, the event role information of the foreign retrieval document is fused with the query statement, the interactive ordering module is used for calculating matching score of the retrieval document, and the result display module is used for displaying matching query results of the foreign text.

While certain exemplary embodiments of the present application have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the application, which is defined by the appended claims.

Claims

1. The text query method based on the artificial intelligence technology is characterized by comprising the following steps of:

s1, collecting data of related texts;

s2, automatically analyzing the collected foreign language text;

s3, analyzing event types of the input Chinese inquiry sentences;

s5, obtaining cross-language characteristic representation of the query;

s7, calculating the matching score of the search document;

and S8, displaying the matching query result of the text.

2. The text query method based on artificial intelligence technology according to claim 1, wherein the specific method for automatically analyzing the collected foreign text in step S2 is as follows:

a3, encoding through a transducer encoding end;

3. The text query method based on artificial intelligence technology according to claim 2, wherein the specific method for performing event type analysis on the input foreign language query sentence in S3 is as follows:

4. The text query method based on artificial intelligence technology according to claim 1, wherein the specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in step S4 is as follows:

；

wherein the method comprises the steps ofAnd->Contextual feature representations representing chinese query and foreign query auxiliary sentence, respectively, are reduced by sharing parameters at the encoding endDistance from the semantic space of the foreign language;

；

5. The text query method based on artificial intelligence technology according to claim 4, wherein the specific method for obtaining the cross-language feature representation of the query in step S5 is as follows:

；

6. The text query method based on artificial intelligence technology according to claim 5, wherein the specific method of step S6 is as follows:

；

7. The text query method based on artificial intelligence technology according to claim 6, wherein the specific method for calculating the matching score of the retrieved document in step S7 is as follows:

。

8. an artificial intelligence technology based text query system adapted for use in an artificial intelligence technology based text query method as claimed in any one of claims 1 to 7, comprising:

the data acquisition module is used for realizing foreign text;