CN116719910A - Text query method and system based on artificial intelligence technology - Google Patents
Text query method and system based on artificial intelligence technology Download PDFInfo
- Publication number
- CN116719910A CN116719910A CN202310987246.1A CN202310987246A CN116719910A CN 116719910 A CN116719910 A CN 116719910A CN 202310987246 A CN202310987246 A CN 202310987246A CN 116719910 A CN116719910 A CN 116719910A
- Authority
- CN
- China
- Prior art keywords
- query
- text
- information
- chinese
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000005516 engineering process Methods 0.000 title claims abstract description 23
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 20
- 230000004927 fusion Effects 0.000 claims abstract description 25
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 230000007246 mechanism Effects 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 230000003993 interaction Effects 0.000 claims abstract description 7
- 238000013507 mapping Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 71
- 238000012512 characterization method Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 230000008447 perception Effects 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000003014 reinforcing effect Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 4
- 230000010354 integration Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 2
- 230000014616 translation Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a text query method and a text query system based on an artificial intelligence technology, which relate to the field of text query and comprise the following steps: s1, collecting data of related texts; s2, automatically analyzing the collected foreign language text; s3, analyzing event types of the input Chinese inquiry sentences; according to the text query method and system based on the artificial intelligence technology, the collected foreign text is automatically analyzed, the event information extraction effect of the model can be effectively improved, meanwhile, query auxiliary sentences are constructed through the bilingual mapping dictionary, the distance between the query sentences and the search document language is shortened through a mode of sharing an encoder, then semantic alignment of the query sentences and the query auxiliary sentences is achieved through an interaction attention mechanism, finally, event information is integrated through the information fusion module, the understanding of the search model on text semantic information is improved, and the accuracy of foreign document query can be guaranteed through respective analysis and fusion of bilingual.
Description
Technical Field
The application relates to a text query technology, in particular to a text query method and a text query system based on an artificial intelligence technology.
Background
With the popularity and development of communication technology and computer technology, more and more users will utilize search engines for retrieval. The search engine can understand human natural language to a certain extent based on the natural language understanding platform, extract key content from the natural language and use the key content for searching, and finally achieve the effect of enabling the text understood by the search engine to be unified with the query text to be searched by the user and to a high degree. In such cases, a suitable natural language understanding platform is important to be able to accurately understand the user query text.
The prior art includes a natural language understanding platform, such as a training mode based on Crowdsourcing (crowds) mechanism for a natural language understanding system. The system provides a collaborative interactive platform for a plurality of developers to commonly provide training data to perform training of natural language understanding tasks.
However, the inventor uses foreign language documents of foreign language in the process of implementing the application, so that the inventor needs to query in the foreign language text, and the existing foreign language text query mode is mostly to query in the foreign language text after translating the query requirement, so that errors are easily generated in the process of querying the foreign language text due to different translations.
Disclosure of Invention
The application aims to provide a text query method and a text query system based on an artificial intelligence technology, which aim to solve the problem that in the prior art, foreign text query modes are mostly translated to query requirements, and then query is carried out in foreign text, so that errors are easily generated when foreign text query is carried out due to different translations.
In order to achieve the above object, the present application provides the following technical solutions: a text query method and system based on artificial intelligence technology includes the following steps:
s1, collecting data of related texts;
s2, automatically analyzing the collected foreign language text;
s3, analyzing event types of the input Chinese inquiry sentences;
s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;
s5, obtaining cross-language characteristic representation of the query;
s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;
s7, calculating the matching score of the search document;
and S8, displaying the matching query result of the text.
Further, the specific method for automatically analyzing the collected foreign text in the step S2 is as follows:
a1, vectorizing the text through two different modes of pre-training BERT and character embedding;
a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,
a3, encoding through a transducer encoding end;
a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;
a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;
a6, enabling the model to further grasp text key information through a gating fusion mechanism;
and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
Further, the specific method for performing event type analysis on the input foreign language query sentence in S3 is as follows:
b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;
b2, fusing the Chinese word-level vector with the foreign sentence-level vector;
b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;
b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;
b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;
and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.
Further, the specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in the step S4 is as follows:
c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Construction of query-assisted sentences through bilingual mapping dictionary;
C2, representing the data as an N-dimensional vector, as shown in the following formula:
;
wherein E is QV 、E QC 、E D 、E R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;
and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:
;
wherein the method comprises the steps ofAnd->The method comprises the steps of respectively representing the context characteristic representation of a Chinese query and a foreign language query auxiliary sentence, and reducing the distance between semantic spaces of a Chinese language and a foreign language by sharing parameters of a coding end;
and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:
;
wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;
and C5, event role information in the extracted text is shown as the following formula:
;
wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.
Further, the specific method for obtaining the cross-language feature representation of the query in the step S5 is as follows:
d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the jth word of a Chinese sentenceThe following formula is shown:
;
d2, by corresponding matching score m j.i Computing the Softmax function to obtain the attention weight alpha j.i The following formula is shown:
;
d3, based on feature vector h i c And h j v To calculate a matching score as shown in the following equation:
;
wherein W is E R n*n And b.epsilon.R is the attention parameter in its text training process.
Further, the specific method in step S6 is as follows:
e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;
e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:
;
e3, obtaining the Chinese document representation after evaluationThe following formula is shown:
;
wherein the method comprises the steps of,/>Is a vector of attention, +.>Is a vector matrix of chinese documents.
Further, the specific method for calculating the matching score of the search document in the step S7 is as follows: and the interaction sequencing module.
Obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:
。
a text query system based on artificial intelligence technology, comprising:
the data acquisition module is used for realizing foreign text;
the foreign language text analysis module is used for automatically analyzing the collected foreign language text;
the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;
the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;
the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;
the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;
the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;
the interaction sequencing module is used for calculating the matching score of the search document;
and the result display module is used for displaying the matched query result of the external text.
Compared with the prior art, the text query method and system based on the artificial intelligence technology can effectively improve the event information extraction effect of the model by automatically analyzing the collected foreign text, simultaneously construct a query auxiliary sentence through a bilingual mapping dictionary, pull the distance between the query sentence and the search document language through a shared encoder, realize the semantic alignment of the query sentence and the query auxiliary sentence through an interaction attention mechanism, finally realize the integration of event information through an information integration module, improve the understanding of the search model on text semantic information, and ensure the accuracy of foreign document query through respective analysis and integration of bilingual.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a schematic diagram of an overall structure according to an embodiment of the present application.
Detailed Description
In order to make the technical scheme of the present application better understood by those skilled in the art, the present application will be further described in detail with reference to the accompanying drawings.
Embodiment one:
referring to fig. 1, a text query method and system based on artificial intelligence technology includes the following steps:
s1, collecting data of related texts;
s2, automatically analyzing the collected foreign language text;
s3, analyzing event types of the input Chinese inquiry sentences;
s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;
s5, obtaining cross-language characteristic representation of the query;
s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;
s7, calculating the matching score of the search document;
and S8, displaying the matching query result of the text.
By means of automatic analysis of collected foreign language texts, event information extraction effect of a model can be effectively improved, query auxiliary sentences are built through a bilingual mapping dictionary, distances between the query sentences and the search document languages are shortened through a sharing encoder mode, semantic alignment of the query sentences and the query auxiliary sentences is achieved through an interaction attention mechanism, finally fusion of event information is achieved through an information fusion module, understanding of the retrieval model on text semantic information is improved, and accuracy of foreign language document query can be guaranteed through respective analysis and fusion of bilingual.
The specific method for automatically analyzing the collected foreign language text in the step S2 is as follows:
a1, vectorizing the text through two different modes of pre-training BERT and character embedding;
a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,
a3, encoding through a transducer encoding end;
a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;
a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;
a6, enabling the model to further grasp text key information through a gating fusion mechanism;
and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
The system comprises an embedded representation fusion module, an encoder module, a graph convolution module and a fusion reasoning module. 1. And (3) embedding a characterization fusion module: the text is vectorized by pre-training BERT and character embedding in two different ways. Text can be made more rich in tokens through two different tokens. 2. An encoder module: mainly comprises a sentence-level encoder module and a document-level encoder module. 1) Sentence-level encoder module: each sentence of the text is coded by the BI-LSTM network, and the model can learn fine-grained semantic information by sentence-level context coding. 2) Document level encoder module: the method has the advantages that the method enables the model to learn semantic information among the text cross sentences through encoding by the transducer encoding end, and the model can be deep in semantics of the text. 3. And a graph convolution module: and carrying out joint learning on the vector characterization of the text coding end and knowledge graph information constructed based on the information to obtain the text vector characterization integrated with the knowledge graph information. 4. And a fusion reasoning module: and carrying out joint learning on the vector representation blended with the knowledge graph information and the output characteristics of the document-level coding module to obtain the document-level representation blended with the knowledge graph information, and then enabling the model to further grasp text key information through a gating fusion mechanism with the output of the sentence-level coding module. And finally, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
The specific method for carrying out event type analysis on the input foreign language query statement in the S3 is as follows:
b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;
b2, fusing the Chinese word-level vector with the foreign sentence-level vector;
b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;
b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;
b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;
and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.
Bilingual information fusion module: mainly comprises two network modules of a shared encoder network and a cross attention network. The shared encoder network firstly encodes foreign language sentences through an encoder and obtains foreign language encoding hidden layer vector representations and foreign language sentence level vector representations, then Chinese word level vectors and foreign language sentence level vectors are fused, and then Chinese hidden layer vectors and Chinese sentence level vectors are obtained through a shared encoder strategy. The cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain the Chinese vector representation fused with foreign language word level information. The syntax graph convolution module performs joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain a vector representation fused with the dependency syntax information. Finally, chinese semantic representation based on foreign language event type information is realized through an event type perception network in the event detector, so that Chinese event detection is completed.
The specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in the step S4 is as follows:
c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Constructing a query auxiliary sentence by bilingual mapping dictionary>;
C2, representing the data as an N-dimensional vector, as shown in the following formula:
;
wherein E is QV 、E QC 、E D 、E R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;
and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:
;
wherein the method comprises the steps ofAnd->The method comprises the steps of respectively representing the context characteristic representation of a Chinese query and a foreign language query auxiliary sentence, and reducing the distance between semantic spaces of a Chinese language and a foreign language by sharing parameters of a coding end;
and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:
;
wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;
and C5, event role information in the extracted text is shown as the following formula:
;
wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.
The specific method for obtaining the cross-language feature representation of the query in the step S5 is as follows:
d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the j-th word of a Chinese sentence +.>The following formula is shown:
;
d2, by corresponding matching score m j.i Computing the Softmax function to obtain the attention weight alpha j.i The following formula is shown:
;
d3, based on feature vector h i c And h j v To calculate a matching score as shown in the following equation:
;
wherein W is E R n*n And b.epsilon.R is the attention parameter in its text training process.
The specific method of step S6 is as follows:
e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;
e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:
;
e3, obtaining the Chinese document representation after evaluationThe following formula is shown:
;
wherein the method comprises the steps of,/>Is a vector of attention, +.>Is a vector matrix of chinese documents.
The specific method for calculating the matching score of the search document in the step S7 is as follows:
obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:
。
embodiment two:
a text query system based on artificial intelligence technology, comprising:
the data acquisition module is used for realizing foreign text;
the foreign language text analysis module is used for automatically analyzing the collected foreign language text;
the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;
the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;
the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;
the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;
the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;
the interaction sequencing module is used for calculating the matching score of the search document;
and the result display module is used for displaying the matched query result of the external text.
The method comprises the steps that a foreign text analysis module is used for automatically analyzing collected foreign text, a Chinese query statement analysis module is used for carrying out event type analysis on input Chinese query statements, an embedded characterization module is used for generating corresponding foreign auxiliary sentence representations on the Chinese query statements, a cross-language characterization module is used for obtaining cross-language feature representations of query, an event information fusion module is used for fusing event type information of the query statements with foreign retrieval documents, event role information of the foreign retrieval documents is fused with the query statements, an interactive ordering module is used for calculating matching scores of the retrieval documents, and a result display module is used for displaying matching query results of the foreign text.
Working principle: when the method is used, the foreign text analysis module is used for automatically analyzing the collected foreign text, the Chinese query statement analysis module is used for carrying out event type analysis on the input Chinese query statement, the embedded characterization module is used for generating corresponding foreign auxiliary sentence representation on the Chinese query statement, the cross-language characterization module is used for obtaining cross-language characteristic representation of query, the event information fusion module is used for fusing event type information of the query statement with a foreign retrieval document, the event role information of the foreign retrieval document is fused with the query statement, the interactive ordering module is used for calculating matching score of the retrieval document, and the result display module is used for displaying matching query results of the foreign text.
While certain exemplary embodiments of the present application have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the application, which is defined by the appended claims.
Claims (8)
1. The text query method based on the artificial intelligence technology is characterized by comprising the following steps of:
s1, collecting data of related texts;
s2, automatically analyzing the collected foreign language text;
s3, analyzing event types of the input Chinese inquiry sentences;
s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;
s5, obtaining cross-language characteristic representation of the query;
s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;
s7, calculating the matching score of the search document;
and S8, displaying the matching query result of the text.
2. The text query method based on artificial intelligence technology according to claim 1, wherein the specific method for automatically analyzing the collected foreign text in step S2 is as follows:
a1, vectorizing the text through two different modes of pre-training BERT and character embedding;
a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,
a3, encoding through a transducer encoding end;
a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;
a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;
a6, enabling the model to further grasp text key information through a gating fusion mechanism;
and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
3. The text query method based on artificial intelligence technology according to claim 2, wherein the specific method for performing event type analysis on the input foreign language query sentence in S3 is as follows:
b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;
b2, fusing the Chinese word-level vector with the foreign sentence-level vector;
b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;
b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;
b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;
and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.
4. The text query method based on artificial intelligence technology according to claim 1, wherein the specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in step S4 is as follows:
c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Construction of query-assisted sentences through bilingual mapping dictionary;
C2, representing the data as an N-dimensional vector, as shown in the following formula:
;
wherein E is QV 、E QC 、E D 、E R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;
and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:
;
wherein the method comprises the steps ofAnd->Contextual feature representations representing chinese query and foreign query auxiliary sentence, respectively, are reduced by sharing parameters at the encoding endDistance from the semantic space of the foreign language;
and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:
;
wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;
and C5, event role information in the extracted text is shown as the following formula:
;
wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.
5. The text query method based on artificial intelligence technology according to claim 4, wherein the specific method for obtaining the cross-language feature representation of the query in step S5 is as follows:
d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the j-th word of a Chinese sentence +.>The following formula is shown:
;
d2, by corresponding matching score m j.i Computing the Softmax function to obtain the attention weight alpha j.i The following formula is shown:
;
d3, based on feature vector h i c And h j v To calculate a matching score as shown in the following equation:
;
wherein W is E R n*n And b.epsilon.R is the attention parameter in its text training process.
6. The text query method based on artificial intelligence technology according to claim 5, wherein the specific method of step S6 is as follows:
e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;
e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:
;
e3, obtaining the Chinese document representation after evaluationThe following formula is shown:
;
wherein the method comprises the steps of,/>Is a vector of attention, +.>Is a vector matrix of chinese documents.
7. The text query method based on artificial intelligence technology according to claim 6, wherein the specific method for calculating the matching score of the retrieved document in step S7 is as follows:
obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:
。
8. an artificial intelligence technology based text query system adapted for use in an artificial intelligence technology based text query method as claimed in any one of claims 1 to 7, comprising:
the data acquisition module is used for realizing foreign text;
the foreign language text analysis module is used for automatically analyzing the collected foreign language text;
the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;
the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;
the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;
the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;
the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;
the interaction sequencing module is used for calculating the matching score of the search document;
and the result display module is used for displaying the matched query result of the external text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310987246.1A CN116719910A (en) | 2023-08-08 | 2023-08-08 | Text query method and system based on artificial intelligence technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310987246.1A CN116719910A (en) | 2023-08-08 | 2023-08-08 | Text query method and system based on artificial intelligence technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116719910A true CN116719910A (en) | 2023-09-08 |
Family
ID=87875532
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310987246.1A Pending CN116719910A (en) | 2023-08-08 | 2023-08-08 | Text query method and system based on artificial intelligence technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116719910A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668319A (en) * | 2020-12-18 | 2021-04-16 | 昆明理工大学 | Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance |
CN113076398A (en) * | 2021-03-30 | 2021-07-06 | 昆明理工大学 | Cross-language information retrieval method based on bilingual dictionary mapping guidance |
CN114004236A (en) * | 2021-09-18 | 2022-02-01 | 昆明理工大学 | Chinese cross-language news event retrieval method integrated with event entity knowledge |
CN114880434A (en) * | 2022-05-24 | 2022-08-09 | 昆明理工大学 | Knowledge graph information guidance-based chapter-level event role identification method |
-
2023
- 2023-08-08 CN CN202310987246.1A patent/CN116719910A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668319A (en) * | 2020-12-18 | 2021-04-16 | 昆明理工大学 | Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance |
CN113076398A (en) * | 2021-03-30 | 2021-07-06 | 昆明理工大学 | Cross-language information retrieval method based on bilingual dictionary mapping guidance |
CN114004236A (en) * | 2021-09-18 | 2022-02-01 | 昆明理工大学 | Chinese cross-language news event retrieval method integrated with event entity knowledge |
CN114880434A (en) * | 2022-05-24 | 2022-08-09 | 昆明理工大学 | Knowledge graph information guidance-based chapter-level event role identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109213995B (en) | Cross-language text similarity evaluation technology based on bilingual word embedding | |
CN109271626B (en) | Text semantic analysis method | |
CN110825721B (en) | Method for constructing and integrating hypertension knowledge base and system in big data environment | |
CN109271529B (en) | Method for constructing bilingual knowledge graph of Xilier Mongolian and traditional Mongolian | |
Zhu et al. | Knowledge-based question answering by tree-to-sequence learning | |
CN113806563A (en) | Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material | |
CN115048447B (en) | Database natural language interface system based on intelligent semantic completion | |
CN110348024A (en) | Intelligent identifying system based on legal knowledge map | |
CN116204674B (en) | Image description method based on visual concept word association structural modeling | |
CN114818717B (en) | Chinese named entity recognition method and system integrating vocabulary and syntax information | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN112417170B (en) | Relationship linking method for incomplete knowledge graph | |
CN114757184B (en) | Method and system for realizing knowledge question and answer in aviation field | |
CN117010387A (en) | Roberta-BiLSTM-CRF voice dialogue text naming entity recognition system integrating attention mechanism | |
CN114064901B (en) | Book comment text classification method based on knowledge graph word meaning disambiguation | |
CN116595023A (en) | Address information updating method and device, electronic equipment and storage medium | |
Feng et al. | Multi-level cross-lingual attentive neural architecture for low resource name tagging | |
CN116414988A (en) | Graph convolution aspect emotion classification method and system based on dependency relation enhancement | |
Rizkallah et al. | ArSphere: Arabic word vectors embedded in a polar sphere | |
CN116719910A (en) | Text query method and system based on artificial intelligence technology | |
Jia et al. | Bilingual terminology extraction from comparable e-commerce corpora | |
Yadav et al. | Image Processing-Based Transliteration from Hindi to English | |
Dong | Research on Intangible Cultural Heritage Outreach Translation Based on Natural Language Processing Technology | |
CN116186211B (en) | Text aggressiveness detection and conversion method | |
CN114282530B (en) | Complex sentence emotion analysis method based on grammar structure and connection information trigger |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230908 |
|
RJ01 | Rejection of invention patent application after publication |