CN116719910A - Text query method and system based on artificial intelligence technology - Google Patents

Text query method and system based on artificial intelligence technology Download PDF

Info

Publication number
CN116719910A
CN116719910A CN202310987246.1A CN202310987246A CN116719910A CN 116719910 A CN116719910 A CN 116719910A CN 202310987246 A CN202310987246 A CN 202310987246A CN 116719910 A CN116719910 A CN 116719910A
Authority
CN
China
Prior art keywords
query
text
information
chinese
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310987246.1A
Other languages
Chinese (zh)
Inventor
王祥凯
张凯
刘晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Zhengyun Information Technology Co ltd
Original Assignee
Shandong Zhengyun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Zhengyun Information Technology Co ltd filed Critical Shandong Zhengyun Information Technology Co ltd
Priority to CN202310987246.1A priority Critical patent/CN116719910A/en
Publication of CN116719910A publication Critical patent/CN116719910A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text query method and a text query system based on an artificial intelligence technology, which relate to the field of text query and comprise the following steps: s1, collecting data of related texts; s2, automatically analyzing the collected foreign language text; s3, analyzing event types of the input Chinese inquiry sentences; according to the text query method and system based on the artificial intelligence technology, the collected foreign text is automatically analyzed, the event information extraction effect of the model can be effectively improved, meanwhile, query auxiliary sentences are constructed through the bilingual mapping dictionary, the distance between the query sentences and the search document language is shortened through a mode of sharing an encoder, then semantic alignment of the query sentences and the query auxiliary sentences is achieved through an interaction attention mechanism, finally, event information is integrated through the information fusion module, the understanding of the search model on text semantic information is improved, and the accuracy of foreign document query can be guaranteed through respective analysis and fusion of bilingual.

Description

Text query method and system based on artificial intelligence technology
Technical Field
The application relates to a text query technology, in particular to a text query method and a text query system based on an artificial intelligence technology.
Background
With the popularity and development of communication technology and computer technology, more and more users will utilize search engines for retrieval. The search engine can understand human natural language to a certain extent based on the natural language understanding platform, extract key content from the natural language and use the key content for searching, and finally achieve the effect of enabling the text understood by the search engine to be unified with the query text to be searched by the user and to a high degree. In such cases, a suitable natural language understanding platform is important to be able to accurately understand the user query text.
The prior art includes a natural language understanding platform, such as a training mode based on Crowdsourcing (crowds) mechanism for a natural language understanding system. The system provides a collaborative interactive platform for a plurality of developers to commonly provide training data to perform training of natural language understanding tasks.
However, the inventor uses foreign language documents of foreign language in the process of implementing the application, so that the inventor needs to query in the foreign language text, and the existing foreign language text query mode is mostly to query in the foreign language text after translating the query requirement, so that errors are easily generated in the process of querying the foreign language text due to different translations.
Disclosure of Invention
The application aims to provide a text query method and a text query system based on an artificial intelligence technology, which aim to solve the problem that in the prior art, foreign text query modes are mostly translated to query requirements, and then query is carried out in foreign text, so that errors are easily generated when foreign text query is carried out due to different translations.
In order to achieve the above object, the present application provides the following technical solutions: a text query method and system based on artificial intelligence technology includes the following steps:
s1, collecting data of related texts;
s2, automatically analyzing the collected foreign language text;
s3, analyzing event types of the input Chinese inquiry sentences;
s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;
s5, obtaining cross-language characteristic representation of the query;
s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;
s7, calculating the matching score of the search document;
and S8, displaying the matching query result of the text.
Further, the specific method for automatically analyzing the collected foreign text in the step S2 is as follows:
a1, vectorizing the text through two different modes of pre-training BERT and character embedding;
a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,
a3, encoding through a transducer encoding end;
a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;
a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;
a6, enabling the model to further grasp text key information through a gating fusion mechanism;
and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
Further, the specific method for performing event type analysis on the input foreign language query sentence in S3 is as follows:
b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;
b2, fusing the Chinese word-level vector with the foreign sentence-level vector;
b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;
b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;
b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;
and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.
Further, the specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in the step S4 is as follows:
c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Construction of query-assisted sentences through bilingual mapping dictionary
C2, representing the data as an N-dimensional vector, as shown in the following formula:
wherein E is QV 、E QC 、E D 、E R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;
and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:
wherein the method comprises the steps ofAnd->The method comprises the steps of respectively representing the context characteristic representation of a Chinese query and a foreign language query auxiliary sentence, and reducing the distance between semantic spaces of a Chinese language and a foreign language by sharing parameters of a coding end;
and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:
wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;
and C5, event role information in the extracted text is shown as the following formula:
wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.
Further, the specific method for obtaining the cross-language feature representation of the query in the step S5 is as follows:
d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the jth word of a Chinese sentenceThe following formula is shown:
d2, by corresponding matching score m j.i Computing the Softmax function to obtain the attention weight alpha j.i The following formula is shown:
d3, based on feature vector h i c And h j v To calculate a matching score as shown in the following equation:
wherein W is E R n*n And b.epsilon.R is the attention parameter in its text training process.
Further, the specific method in step S6 is as follows:
e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;
e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:
e3, obtaining the Chinese document representation after evaluationThe following formula is shown:
wherein the method comprises the steps of,/>Is a vector of attention, +.>Is a vector matrix of chinese documents.
Further, the specific method for calculating the matching score of the search document in the step S7 is as follows: and the interaction sequencing module.
Obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:
a text query system based on artificial intelligence technology, comprising:
the data acquisition module is used for realizing foreign text;
the foreign language text analysis module is used for automatically analyzing the collected foreign language text;
the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;
the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;
the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;
the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;
the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;
the interaction sequencing module is used for calculating the matching score of the search document;
and the result display module is used for displaying the matched query result of the external text.
Compared with the prior art, the text query method and system based on the artificial intelligence technology can effectively improve the event information extraction effect of the model by automatically analyzing the collected foreign text, simultaneously construct a query auxiliary sentence through a bilingual mapping dictionary, pull the distance between the query sentence and the search document language through a shared encoder, realize the semantic alignment of the query sentence and the query auxiliary sentence through an interaction attention mechanism, finally realize the integration of event information through an information integration module, improve the understanding of the search model on text semantic information, and ensure the accuracy of foreign document query through respective analysis and integration of bilingual.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a schematic diagram of an overall structure according to an embodiment of the present application.
Detailed Description
In order to make the technical scheme of the present application better understood by those skilled in the art, the present application will be further described in detail with reference to the accompanying drawings.
Embodiment one:
referring to fig. 1, a text query method and system based on artificial intelligence technology includes the following steps:
s1, collecting data of related texts;
s2, automatically analyzing the collected foreign language text;
s3, analyzing event types of the input Chinese inquiry sentences;
s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;
s5, obtaining cross-language characteristic representation of the query;
s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;
s7, calculating the matching score of the search document;
and S8, displaying the matching query result of the text.
By means of automatic analysis of collected foreign language texts, event information extraction effect of a model can be effectively improved, query auxiliary sentences are built through a bilingual mapping dictionary, distances between the query sentences and the search document languages are shortened through a sharing encoder mode, semantic alignment of the query sentences and the query auxiliary sentences is achieved through an interaction attention mechanism, finally fusion of event information is achieved through an information fusion module, understanding of the retrieval model on text semantic information is improved, and accuracy of foreign language document query can be guaranteed through respective analysis and fusion of bilingual.
The specific method for automatically analyzing the collected foreign language text in the step S2 is as follows:
a1, vectorizing the text through two different modes of pre-training BERT and character embedding;
a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,
a3, encoding through a transducer encoding end;
a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;
a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;
a6, enabling the model to further grasp text key information through a gating fusion mechanism;
and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
The system comprises an embedded representation fusion module, an encoder module, a graph convolution module and a fusion reasoning module. 1. And (3) embedding a characterization fusion module: the text is vectorized by pre-training BERT and character embedding in two different ways. Text can be made more rich in tokens through two different tokens. 2. An encoder module: mainly comprises a sentence-level encoder module and a document-level encoder module. 1) Sentence-level encoder module: each sentence of the text is coded by the BI-LSTM network, and the model can learn fine-grained semantic information by sentence-level context coding. 2) Document level encoder module: the method has the advantages that the method enables the model to learn semantic information among the text cross sentences through encoding by the transducer encoding end, and the model can be deep in semantics of the text. 3. And a graph convolution module: and carrying out joint learning on the vector characterization of the text coding end and knowledge graph information constructed based on the information to obtain the text vector characterization integrated with the knowledge graph information. 4. And a fusion reasoning module: and carrying out joint learning on the vector representation blended with the knowledge graph information and the output characteristics of the document-level coding module to obtain the document-level representation blended with the knowledge graph information, and then enabling the model to further grasp text key information through a gating fusion mechanism with the output of the sentence-level coding module. And finally, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
The specific method for carrying out event type analysis on the input foreign language query statement in the S3 is as follows:
b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;
b2, fusing the Chinese word-level vector with the foreign sentence-level vector;
b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;
b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;
b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;
and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.
Bilingual information fusion module: mainly comprises two network modules of a shared encoder network and a cross attention network. The shared encoder network firstly encodes foreign language sentences through an encoder and obtains foreign language encoding hidden layer vector representations and foreign language sentence level vector representations, then Chinese word level vectors and foreign language sentence level vectors are fused, and then Chinese hidden layer vectors and Chinese sentence level vectors are obtained through a shared encoder strategy. The cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain the Chinese vector representation fused with foreign language word level information. The syntax graph convolution module performs joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain a vector representation fused with the dependency syntax information. Finally, chinese semantic representation based on foreign language event type information is realized through an event type perception network in the event detector, so that Chinese event detection is completed.
The specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in the step S4 is as follows:
c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Constructing a query auxiliary sentence by bilingual mapping dictionary>
C2, representing the data as an N-dimensional vector, as shown in the following formula:
wherein E is QV 、E QC 、E D 、E R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;
and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:
wherein the method comprises the steps ofAnd->The method comprises the steps of respectively representing the context characteristic representation of a Chinese query and a foreign language query auxiliary sentence, and reducing the distance between semantic spaces of a Chinese language and a foreign language by sharing parameters of a coding end;
and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:
wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;
and C5, event role information in the extracted text is shown as the following formula:
wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.
The specific method for obtaining the cross-language feature representation of the query in the step S5 is as follows:
d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the j-th word of a Chinese sentence +.>The following formula is shown:
d2, by corresponding matching score m j.i Computing the Softmax function to obtain the attention weight alpha j.i The following formula is shown:
d3, based on feature vector h i c And h j v To calculate a matching score as shown in the following equation:
wherein W is E R n*n And b.epsilon.R is the attention parameter in its text training process.
The specific method of step S6 is as follows:
e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;
e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:
e3, obtaining the Chinese document representation after evaluationThe following formula is shown:
wherein the method comprises the steps of,/>Is a vector of attention, +.>Is a vector matrix of chinese documents.
The specific method for calculating the matching score of the search document in the step S7 is as follows:
obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:
embodiment two:
a text query system based on artificial intelligence technology, comprising:
the data acquisition module is used for realizing foreign text;
the foreign language text analysis module is used for automatically analyzing the collected foreign language text;
the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;
the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;
the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;
the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;
the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;
the interaction sequencing module is used for calculating the matching score of the search document;
and the result display module is used for displaying the matched query result of the external text.
The method comprises the steps that a foreign text analysis module is used for automatically analyzing collected foreign text, a Chinese query statement analysis module is used for carrying out event type analysis on input Chinese query statements, an embedded characterization module is used for generating corresponding foreign auxiliary sentence representations on the Chinese query statements, a cross-language characterization module is used for obtaining cross-language feature representations of query, an event information fusion module is used for fusing event type information of the query statements with foreign retrieval documents, event role information of the foreign retrieval documents is fused with the query statements, an interactive ordering module is used for calculating matching scores of the retrieval documents, and a result display module is used for displaying matching query results of the foreign text.
Working principle: when the method is used, the foreign text analysis module is used for automatically analyzing the collected foreign text, the Chinese query statement analysis module is used for carrying out event type analysis on the input Chinese query statement, the embedded characterization module is used for generating corresponding foreign auxiliary sentence representation on the Chinese query statement, the cross-language characterization module is used for obtaining cross-language characteristic representation of query, the event information fusion module is used for fusing event type information of the query statement with a foreign retrieval document, the event role information of the foreign retrieval document is fused with the query statement, the interactive ordering module is used for calculating matching score of the retrieval document, and the result display module is used for displaying matching query results of the foreign text.
While certain exemplary embodiments of the present application have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the application, which is defined by the appended claims.

Claims (8)

1. The text query method based on the artificial intelligence technology is characterized by comprising the following steps of:
s1, collecting data of related texts;
s2, automatically analyzing the collected foreign language text;
s3, analyzing event types of the input Chinese inquiry sentences;
s4, generating corresponding foreign language auxiliary sentence representations for the Chinese inquiry sentences;
s5, obtaining cross-language characteristic representation of the query;
s6, fusing the event type information of the query statement with the foreign language retrieval document, and fusing the event role information of the foreign language retrieval document with the query statement;
s7, calculating the matching score of the search document;
and S8, displaying the matching query result of the text.
2. The text query method based on artificial intelligence technology according to claim 1, wherein the specific method for automatically analyzing the collected foreign text in step S2 is as follows:
a1, vectorizing the text through two different modes of pre-training BERT and character embedding;
a2, coding each sentence of the text through BI-LSTM network, coding through sentence level context,
a3, encoding through a transducer encoding end;
a4, carrying out joint learning on the vector characterization of the coding end of the text and knowledge graph information constructed based on the information to obtain text vector characterization integrated with the knowledge graph information;
a5, carrying out joint learning on the vector representation integrated with the knowledge graph information and the output characteristics of the document level coding module to obtain document level representation integrated with the knowledge graph information;
a6, enabling the model to further grasp text key information through a gating fusion mechanism;
and A7, performing event role information identification reasoning through the CRF layer to complete identification of the role information of the event.
3. The text query method based on artificial intelligence technology according to claim 2, wherein the specific method for performing event type analysis on the input foreign language query sentence in S3 is as follows:
b1, firstly, encoding a foreign language sentence through an encoder and obtaining a foreign language encoding hidden layer vector representation and a foreign language sentence level vector representation;
b2, fusing the Chinese word-level vector with the foreign sentence-level vector;
b3, obtaining a Chinese hidden layer vector and a Chinese sentence level vector through a shared encoder strategy;
b4, the cross attention network performs joint learning on the obtained foreign language hidden layer vector and the Chinese hidden layer vector to obtain Chinese vector representation fused with foreign language word level information;
b5, the syntax graph convolution module carries out joint learning on the Chinese vector representation and the Chinese dependency syntax information to obtain the vector representation of the fusion dependency syntax information;
and B6, realizing Chinese semantic representation based on foreign language event type information through a type perception network, and completing Chinese event detection.
4. The text query method based on artificial intelligence technology according to claim 1, wherein the specific method for generating the corresponding foreign language auxiliary sentence representation for the foreign language query sentence in step S4 is as follows:
c1, giving a Chinese inquiry sentenceDocument representation->Event type T and text event role +.>Construction of query-assisted sentences through bilingual mapping dictionary
C2, representing the data as an N-dimensional vector, as shown in the following formula:
wherein E is QV 、E QC 、E D 、E R Respectively representing the embedded representation of event role information and event type information of a query sentence, a query auxiliary sentence and a foreign language document;
and C3, carrying out feature extraction representation on the query statement and the query auxiliary statement through a transducer coding end, wherein the feature extraction representation is shown in the following formula:
wherein the method comprises the steps ofAnd->Contextual feature representations representing chinese query and foreign query auxiliary sentence, respectively, are reduced by sharing parameters at the encoding endDistance from the semantic space of the foreign language;
and C4, extracting the characteristics of the foreign language document, wherein the characteristics are shown in the following formula:
wherein the method comprises the steps ofA context-encoding feature sequence representing a foreign document;
and C5, event role information in the extracted text is shown as the following formula:
wherein the method comprises the steps ofRepresenting a sequence of context-encoding features of a foreign document.
5. The text query method based on artificial intelligence technology according to claim 4, wherein the specific method for obtaining the cross-language feature representation of the query in step S5 is as follows:
d1, using the feature representation of foreign language-assisted sentenceTo construct a representation of the j-th word of a Chinese sentence +.>The following formula is shown:
d2, by corresponding matching score m j.i Computing the Softmax function to obtain the attention weight alpha j.i The following formula is shown:
d3, based on feature vector h i c And h j v To calculate a matching score as shown in the following equation:
wherein W is E R n*n And b.epsilon.R is the attention parameter in its text training process.
6. The text query method based on artificial intelligence technology according to claim 5, wherein the specific method of step S6 is as follows:
e1, filtering useless features in event role feature information, taking a sigmoid activation function as a gating state, performing dot multiplication on the useless features and the event role feature, and performing a tanh activation function to obtain the screening of the features by a gate unit;
e2, reinforcing key event information in the document by using an attention mechanism, and obtaining an embedded vector t and a characteristic representation of the text according to the event typeThrough t T Scoring each feature in the text to perceive important information in the text, as shown in the following equation:
e3, obtaining the Chinese document representation after evaluationThe following formula is shown:
wherein the method comprises the steps of,/>Is a vector of attention, +.>Is a vector matrix of chinese documents.
7. The text query method based on artificial intelligence technology according to claim 6, wherein the specific method for calculating the matching score of the retrieved document in step S7 is as follows:
obtaining query representations from event information fusion modulesAnd characteristic representation of the document->Then calculated by maximum similarity (MaxSim), by +.>And->A Score between the query and the document may be calculated that is the sum of the maximum similarity of each token representation of the query statement with each token representation of the document, as shown in the following equation:
8. an artificial intelligence technology based text query system adapted for use in an artificial intelligence technology based text query method as claimed in any one of claims 1 to 7, comprising:
the data acquisition module is used for realizing foreign text;
the foreign language text analysis module is used for automatically analyzing the collected foreign language text;
the Chinese inquiry sentence input module is used for inputting Chinese inquiry sentences;
the Chinese inquiry statement analysis module is used for carrying out event type analysis on the input Chinese inquiry statement;
the embedded characterization module is used for generating corresponding foreign language auxiliary sentence representations for the Chinese query sentences;
the cross-language characterization module is used for obtaining cross-language characteristic representation of the query;
the event information fusion module is used for fusing the event type information of the query statement with the foreign language retrieval document and fusing the event role information of the foreign language retrieval document with the query statement;
the interaction sequencing module is used for calculating the matching score of the search document;
and the result display module is used for displaying the matched query result of the external text.
CN202310987246.1A 2023-08-08 2023-08-08 Text query method and system based on artificial intelligence technology Pending CN116719910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310987246.1A CN116719910A (en) 2023-08-08 2023-08-08 Text query method and system based on artificial intelligence technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310987246.1A CN116719910A (en) 2023-08-08 2023-08-08 Text query method and system based on artificial intelligence technology

Publications (1)

Publication Number Publication Date
CN116719910A true CN116719910A (en) 2023-09-08

Family

ID=87875532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310987246.1A Pending CN116719910A (en) 2023-08-08 2023-08-08 Text query method and system based on artificial intelligence technology

Country Status (1)

Country Link
CN (1) CN116719910A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668319A (en) * 2020-12-18 2021-04-16 昆明理工大学 Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance
CN113076398A (en) * 2021-03-30 2021-07-06 昆明理工大学 Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN114004236A (en) * 2021-09-18 2022-02-01 昆明理工大学 Chinese cross-language news event retrieval method integrated with event entity knowledge
CN114880434A (en) * 2022-05-24 2022-08-09 昆明理工大学 Knowledge graph information guidance-based chapter-level event role identification method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668319A (en) * 2020-12-18 2021-04-16 昆明理工大学 Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance
CN113076398A (en) * 2021-03-30 2021-07-06 昆明理工大学 Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN114004236A (en) * 2021-09-18 2022-02-01 昆明理工大学 Chinese cross-language news event retrieval method integrated with event entity knowledge
CN114880434A (en) * 2022-05-24 2022-08-09 昆明理工大学 Knowledge graph information guidance-based chapter-level event role identification method

Similar Documents

Publication Publication Date Title
CN109213995B (en) Cross-language text similarity evaluation technology based on bilingual word embedding
CN109271626B (en) Text semantic analysis method
CN110825721B (en) Method for constructing and integrating hypertension knowledge base and system in big data environment
CN109271529B (en) Method for constructing bilingual knowledge graph of Xilier Mongolian and traditional Mongolian
Zhu et al. Knowledge-based question answering by tree-to-sequence learning
CN113806563A (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN115048447B (en) Database natural language interface system based on intelligent semantic completion
CN110348024A (en) Intelligent identifying system based on legal knowledge map
CN116204674B (en) Image description method based on visual concept word association structural modeling
CN114818717B (en) Chinese named entity recognition method and system integrating vocabulary and syntax information
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN112417170B (en) Relationship linking method for incomplete knowledge graph
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
CN117010387A (en) Roberta-BiLSTM-CRF voice dialogue text naming entity recognition system integrating attention mechanism
CN114064901B (en) Book comment text classification method based on knowledge graph word meaning disambiguation
CN116595023A (en) Address information updating method and device, electronic equipment and storage medium
Feng et al. Multi-level cross-lingual attentive neural architecture for low resource name tagging
CN116414988A (en) Graph convolution aspect emotion classification method and system based on dependency relation enhancement
Rizkallah et al. ArSphere: Arabic word vectors embedded in a polar sphere
CN116719910A (en) Text query method and system based on artificial intelligence technology
Jia et al. Bilingual terminology extraction from comparable e-commerce corpora
Yadav et al. Image Processing-Based Transliteration from Hindi to English
Dong Research on Intangible Cultural Heritage Outreach Translation Based on Natural Language Processing Technology
CN116186211B (en) Text aggressiveness detection and conversion method
CN114282530B (en) Complex sentence emotion analysis method based on grammar structure and connection information trigger

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230908

RJ01 Rejection of invention patent application after publication