CN112232086A - Semantic recognition method and device, computer equipment and storage medium - Google Patents

Semantic recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112232086A
CN112232086A CN202011108225.0A CN202011108225A CN112232086A CN 112232086 A CN112232086 A CN 112232086A CN 202011108225 A CN202011108225 A CN 202011108225A CN 112232086 A CN112232086 A CN 112232086A
Authority
CN
China
Prior art keywords
text
semantic
recognition model
semantic recognition
matching degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011108225.0A
Other languages
Chinese (zh)
Inventor
刘艾婷
常景冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011108225.0A priority Critical patent/CN112232086A/en
Publication of CN112232086A publication Critical patent/CN112232086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a semantic identification method, a semantic identification device, computer equipment and a storage medium, and can acquire a sample text and a semantic associated text corresponding to the sample text; acquiring text characteristic information based on the sample text and the semantic associated text corresponding to the sample text; acquiring a first matching degree between the sample text and the semantic associated text based on the text characteristic information through a first semantic identification model; acquiring a second matching degree between the sample text and the semantic associated text through a second semantic identification model based on the text characteristic information, wherein the first semantic identification model is a student model of the second semantic identification model; and training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, so that the trained semantic recognition model can be used for carrying out semantic recognition on the text, and the accuracy of semantic recognition is improved.

Description

Semantic recognition method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a semantic recognition method, an apparatus, a computer device, and a storage medium.
Background
In a user interaction system such as a search engine, a question-answering system and the like, a general user can input a sentence through the user interaction system, and then the user interaction system can perform semantic matching on the sentence input by the user to determine the intention of the sentence input by the user, so that feedback is performed to the user according to the result of the semantic matching.
The existing semantic matching method mainly carries out matching based on keywords and assists judgment through word level information such as word weight and the like, for example, the semantic matching degree is obtained by calculating the word overlap ratio of two sentences, but the method does not process unknown words, does not sufficiently mine semantic information of the sentences, cannot realize deep understanding of the sentences, and cannot accurately position the keywords or key phrases when the sentences are long, so that the identification accuracy is low.
Disclosure of Invention
The embodiment of the application provides a semantic recognition method, a semantic recognition device, computer equipment and a storage medium, which can improve the accuracy of semantic recognition on a text.
In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:
the embodiment of the application provides a semantic identification method, which comprises the following steps:
acquiring a sample text and a semantic associated text corresponding to the sample text;
acquiring text characteristic information based on the sample text and the semantic associated text corresponding to the sample text;
acquiring a first matching degree between the sample text and the semantic associated text based on the text characteristic information through a first semantic identification model;
acquiring a second matching degree between the sample text and the semantic associated text through a second semantic identification model based on the text characteristic information, wherein the first semantic identification model is a student model of the second semantic identification model;
and training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, and performing semantic recognition on the text through the trained semantic recognition model.
According to an aspect of the present application, there is also provided a semantic recognition apparatus, including:
the system comprises a first acquisition unit, a second acquisition unit and a semantic association unit, wherein the first acquisition unit is used for acquiring a sample text and a semantic association text corresponding to the sample text;
the second acquisition unit is used for acquiring text characteristic information based on the sample text and the semantic associated text corresponding to the sample text;
a third obtaining unit, configured to obtain, through a first semantic recognition model, a first matching degree between the sample text and the semantic associated text based on the text characteristic information;
a fourth obtaining unit, configured to obtain, by a second semantic recognition model based on the text characteristic information, a second matching degree between the sample text and the semantic associated text, where the first semantic recognition model is a student model of the second semantic recognition model;
and the training unit is used for training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, and performing semantic recognition on a text through the trained semantic recognition model.
According to an aspect of the present application, there is also provided a computer device, including a processor and a memory, where the memory stores a computer program, and the processor executes any one of the semantic recognition methods provided by the embodiments of the present application when calling the computer program in the memory.
According to an aspect of the present application, there is also provided a storage medium for storing a computer program, which is loaded by a processor to execute any one of the semantic recognition methods provided by the embodiments of the present application.
The method and the device for obtaining the semantic associated text can obtain the sample text and the semantic associated text corresponding to the sample text, and obtain text characteristic information based on the sample text and the semantic associated text; acquiring a first matching degree between the sample text and the semantic associated text through a first semantic identification model based on the text characteristic information, and acquiring a second matching degree between the sample text and the semantic associated text through a second semantic identification model based on the text characteristic information, wherein the first semantic identification model is a student model of the second semantic identification model; at this time, the first semantic recognition model can be trained according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, so that the trained semantic recognition model can be used for performing semantic recognition on the text. According to the scheme, the matching degree between the sample text and the semantic associated text can be obtained based on the text characteristic information obtained from the sample text and the semantic associated text, so that the trained semantic identification model can learn the semantic information from the second semantic identification model, and the accuracy of semantic identification on the text is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a scene of a semantic recognition system provided by an embodiment of the present application;
FIG. 2 is a flow chart of a semantic recognition method provided by an embodiment of the present application;
FIG. 3 is a diagram of training a second semantic recognition model provided by an embodiment of the present application;
FIG. 4 is a schematic illustration of knowledge-based distillation training of a TinyBERT model provided in an embodiment of the present application;
FIG. 5 is a diagram illustrating a semantic recognition result displayed on a display interface according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a semantic recognition apparatus provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a semantic recognition method, a semantic recognition device, computer equipment and a storage medium.
Referring to fig. 1, fig. 1 is a schematic view of a semantic recognition system provided in this embodiment, where the semantic recognition system may include a semantic recognition device, and the semantic recognition device may be specifically integrated in a server or a terminal or other computer device, and take the computer device as an example of a server 10, and the server 10 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform, but is not limited thereto.
The server 10 and the terminal 20 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The terminal 20 may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a wearable device.
The server 10 may be configured to obtain a sample text and a semantic associated text corresponding to the sample text, perform word segmentation processing on the sample text and the semantic associated text respectively to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text, and perform concatenation processing on the first word sequence and the second word sequence to obtain a concatenated word sequence. Then, feature extraction can be carried out on the spliced word sequence to obtain text feature information, a first matching degree between the sample text and the semantic associated text is obtained through a first semantic identification model based on text characteristic information, a second matching degree between the sample text and the semantic associated text is obtained through a second semantic identification model based on the text characteristic information, the first semantic identification model is a student model of the second semantic identification model, the first semantic identification model can be trained according to the first matching degree and the second matching degree to obtain a trained semantic identification model, and the text is subjected to semantic identification through the trained semantic identification model. For example, the trained semantic recognition model may be set on the terminal 20, and the terminal 20 may receive a current text input by a user, acquire a standard text from a standard text database, and perform word segmentation and concatenation processing on the current text and the standard text to obtain a current concatenated word sequence; and extracting the features of the word sequence after the current splicing to obtain current text feature information, acquiring the current matching degree between the current text and the standard text through the trained semantic recognition model based on the current text feature information, recognizing the text with the current matching degree larger than a preset threshold value, obtaining the text to be displayed, and displaying the text to be displayed.
It should be noted that the scene schematic diagram of the semantic recognition system shown in fig. 1 is only an example, and the semantic recognition system and the scene described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as can be known by those skilled in the art, along with the evolution of the semantic recognition system and the appearance of a new service scene, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
The semantic recognition method provided by the embodiment of the application can relate to technologies such as a machine learning technology in artificial intelligence, and the artificial intelligence technology and the machine learning technology are explained first below.
Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. Artificial intelligence infrastructures generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, and mechatronics. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal learning.
In this embodiment, description will be made from the perspective of a semantic recognition apparatus, which may be specifically integrated in a computer device such as a server or a terminal.
Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a semantic recognition method according to an embodiment of the present application. The semantic recognition method can comprise the following steps:
s101, obtaining a sample text and a semantic associated text corresponding to the sample text.
For example, in an application scenario of a search engine, the sample text may be a sentence that a user inputs in an input text box of the search engine and needs to search, the sample text may include one or more pieces, the length, the language type (for example, chinese or english), and the like of the sample text may be flexibly set according to the actual requirement, and specific content is not limited here. At this time, the semantically-related text corresponding to the sample text may be a standard question related to the sample text and pre-stored in a Frequent Ask Questions (FAQ) database. It should be noted that, a semantic associated text corresponding to a sample text may include zero, one or more, and the like.
For another example, in an application scenario of the question-answering system, the sample text may be a question that a user inputs to be searched in a question input text box of the question-answering system, and the length, language type (for example, chinese or english), and the like of the sample text are not limited herein. At this time, the semantic related text corresponding to the sample text may be a standard question related to the sample text and stored in advance in the FAQ database.
The obtaining mode of the sample text and the semantic associated text can be flexibly set according to actual needs, for example, a pre-stored historical search text can be obtained from a text database such as a local database or a server and the like as the sample text, and the semantic associated text corresponding to the sample text can be obtained from an FAQ database on the local database or the server. For example, for the sample text "how to play by AAA", the corresponding semantic association text may include "how to play by AAA", "how to acquire by AAA skin", "how to receive for AAA skin free", "what relationship AAA and BBB are", and "what is the most AAA and BBB", etc.
And S102, acquiring text characteristic information based on the sample text and the semantic associated text corresponding to the sample text.
In one embodiment, the obtaining text feature information based on the sample text and the semantic relation text corresponding to the sample text may include: performing word segmentation processing on the sample text and the semantic associated text respectively to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text; splicing the first word sequence and the second word sequence to obtain a spliced word sequence; and performing feature extraction on the spliced word sequence to obtain text feature information.
After the sample text and the semantic associated text are obtained, word segmentation processing can be performed on the sample text to obtain a first word sequence corresponding to the sample text, and word segmentation processing can be performed on the semantic associated text to obtain a second word sequence corresponding to the semantic associated text. The word segmentation processing mode can be flexibly set according to actual needs, for example, the word segmentation processing can be respectively performed on the sample text and the semantic associated text according to one word and one character to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text; or performing word segmentation processing on the sample text and the semantic associated text at intervals of two or three words respectively to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text; or word segmentation processing can be respectively carried out according to keywords contained in the sample text and the semantic associated text to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text; and so on.
In an embodiment, the performing word segmentation processing on the sample text and the semantic associated text respectively to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text may include: performing word segmentation processing on the sample text according to the character level to obtain first word segmentation information; vectorizing the first word segmentation information to obtain a first word sequence; performing word segmentation processing on the semantic associated text according to the word level to obtain second word segmentation information; and vectorizing the second word segmentation information to obtain a second word sequence.
In order to improve the reliability of word segmentation processing and the reliability of subsequent processing on the word sequence obtained by word segmentation processing, word segmentation processing may be performed on the sample text according to the word level to obtain first word segmentation information, that is, word segmentation processing may be performed on the sample text according to word by word, for example, word segmentation processing may be performed on the sample text "how to fry eggs on tomatoes" according to the word level to obtain "west", "red", "persimmon", "fry", "egg", "what", "no", and "do", and the like. In order to improve the convenience of subsequent calculation, vectorization processing may be performed on the first word segmentation information to obtain a first word sequence. And performing word segmentation processing on the semantic associated text according to the word level to obtain second word segmentation information, and performing vectorization processing on the second word segmentation information to obtain a second word sequence. The vectorization processing may be to convert each word obtained by the word segmentation processing into a numerical value, so as to facilitate subsequent calculation.
And then, the first word sequence and the second word sequence can be spliced to obtain a spliced word sequence.
The specific splicing mode can be flexibly set according to actual needs, for example, the first word sequence and the second word sequence can be spliced from beginning to end to obtain a spliced word sequence; or identifiers can be respectively set at the head and the tail of the first word sequence and the second word sequence, and the identifiers can be used for identifying the head and tail positions of the first word sequence and the second word sequence, and then the head and the tail of the first word sequence and the second word sequence after the identifiers are set are spliced to obtain a spliced word sequence; and so on.
It should be noted that when the semantic associated text includes a plurality of second word sequences, a plurality of second word sequences may be obtained, and at this time, the first word sequence and each second word sequence may be respectively subjected to a concatenation process to obtain a plurality of concatenated word sequences.
In an embodiment, the splicing the first word sequence and the second word sequence to obtain a spliced word sequence may include: setting a first preset character at the head of the first word sequence and setting a second preset character at the tail of the second word sequence; and splicing the tail part of the first word sequence and the head part of the second word sequence through a third preset character to obtain a spliced word sequence.
In order to improve the flexibility and convenience of splicing, the first word sequence and the second word sequence can be spliced through preset characters. Specifically, the first preset character may be set at the head of the first word sequence, the second preset character may be set at the tail of the second word sequence, and the tail of the first word sequence and the head of the second word sequence are spliced by the third preset character to obtain a spliced word sequence, where for example, the spliced word sequence may be: [ first preset character ] first word sequence [ second preset character ] second word sequence [ third preset character ], at this time, the head position of the first word sequence can be inquired through the first preset character, the tail position of the first word sequence can be inquired through the second preset character, the head position of the second word sequence can be inquired through the second preset character, and the tail position of the second word sequence can be inquired through the third preset character. The first preset character, the second preset character and the third preset character can be the same or different, the first preset character, the second preset character and the third preset character can be flexibly set according to actual needs, and specific contents are not limited here.
After the spliced word sequence is obtained, feature extraction may be performed on the spliced word sequence to obtain text feature information, where the feature extraction mode may be flexibly set according to actual needs, for example, feature extraction may be performed on the spliced word sequence through a first semantic recognition model or a second semantic recognition model to obtain text feature information, where the text feature information may be feature information of the spliced word sequence, and the text feature information may be obtained by splicing a first word sequence of a sample text and a second word sequence of a semantic associated text, so the text feature information may include feature information corresponding to the sample text and the semantic associated text.
In an embodiment, the extracting the feature of the spliced word sequence to obtain the text feature information may include: extracting attention information of the spliced word sequence through a multi-head attention layer of the first semantic recognition model and a multi-head attention layer of the second semantic recognition model respectively to obtain multi-head attention information; and carrying out full-connection processing on the multi-head attention information through full-connection feedforward neural network layers of the first semantic recognition model and the second semantic recognition model respectively to obtain text characteristic information.
The first semantic identification model and the second semantic identification model may be flexibly set according to actual requirements, for example, the first semantic identification model may be a student model of the second semantic identification model, that is, the second semantic identification model may be a teacher model of the first semantic identification model, the first semantic identification model may specifically be a Bidirectional Encoder token from transducers (BERT) model, which is referred to as a TinyBERT model for short, and the second semantic identification model may specifically be a BERT model, where a model scale of the second semantic identification model is larger than a model scale of the first semantic identification model. Each of the first semantic recognition model and the second semantic recognition model may include a multi-head attention (MHA) Layer, a fully connected feed-forward network (FFN) Layer, and the like.
The multi-headed attention information may include first multi-headed attention information output by a multi-headed attention layer of the first semantic recognition model, and second multi-headed attention information output by a multi-headed attention layer of the second semantic recognition model; the text feature information may include first text feature information output by a fully-connected feedforward neural network layer of the first semantic recognition model and second text feature information output by a fully-connected feedforward neural network layer of the second semantic recognition model. In order to improve the accuracy of feature extraction, attention information extraction can be carried out on the spliced word sequence through a multi-head attention layer MHA of the first semantic recognition model to obtain first multi-head attention information, and full-connection processing is carried out on the first multi-head attention information through a full-connection feedforward neural network layer FFN of the first semantic recognition model to obtain first text feature information. For example, word segmentation is performed on a sample text and a semantic associated text at a word level to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text, the first word sequence and the second word sequence are spliced through preset symbols, and the spliced word sequences
The calculation formula of the multi-head attention layer can depend on Q, K, V and other three main components, wherein Q, K and V are both word vector matrixes, the word vector matrixes can be composed of spliced word sequences, and the calculation formula can be as follows:
Figure BDA0002727687880000091
MultiHead(Q,K,V)=Concat(head1,...,headh)WO (2)
Figure BDA0002727687880000092
wherein,
Figure BDA0002727687880000093
the combination of formula (1), formula (2), and formula (3) can be calculated to obtain multi-head attention information MultiHead (Q, K, V).
The calculation formula of the fully-connected feedforward neural network layer can be as follows:
FNN(x)=max(0,xW1+b1)W2+b2 (4)
wherein W1 and W2 can represent vector matrixes, b1 and b2 can represent vectors initialized randomly, x can represent multi-head attention information output by a multi-head attention layer for a BERT model, FNN (x) can represent text feature information output by a fully-connected feedforward neural network layer, and the hidden layer dimension can be dmodel=dff768, the number of the multi-head attention attentions may be h-12, dk=dv=dmodelSetting 12 of the above-mentioned Transformer layers may result in a BERT model, where the attention matrix may capture rich linguistic knowledge, and only 2 of the above-mentioned Transformer layers may be set for a TinyBERT model.
And extracting attention information of the spliced word sequence through a multi-head attention layer of the second semantic recognition model according to the formula (1), the formula (2) and the formula (3) to obtain second multi-head attention information, and performing full-connection processing on the second multi-head attention information through a full-connection feedforward neural network layer of the second semantic recognition model according to the formula (4) to obtain second text feature information.
S103, acquiring a first matching degree between the sample text and the semantic associated text based on the text characteristic information through the first semantic recognition model.
For example, after the first text characteristic information is obtained through the first semantic recognition model, a first matching degree between the sample text and the semantic associated text can be obtained through the first semantic recognition model based on the first text characteristic information, where the first matching degree may be that the sample text is matched with the semantic associated text or that the sample text is not matched with the semantic associated text.
In one embodiment, obtaining, by the first semantic recognition model based on the text characteristic information, a first matching degree between the sample text and the semantically associated text may include: and distilling the text characteristic information sequentially through the embedding layer, the attention layer and the prediction layer of the first semantic recognition model to obtain a first matching degree between the sample text and the semantic associated text.
Wherein, the first semantic recognition model and the second semantic recognition model may each include an Embedding Layer (Embedding Layer), an attention Layer (fransformer Layer, which may be referred to as a fransformer Layer), a Prediction Layer (Prediction Layer), and the like, and in order to improve accuracy of obtaining the matching degree, the text characteristic information may be distilled (which may also be referred to as knowledge distillation processing) sequentially through the Embedding Layer, the attention Layer, and the Prediction Layer of the first semantic recognition model, so as to obtain a first matching degree between the sample text and the semantic-related text, for example, the text characteristic information may be distilled through the Embedding Layer of the first semantic recognition model to obtain embedded information, then the embedded information may be distilled through the attention Layer of the first semantic recognition model to obtain attention information, and finally the attention information may be predicted through the Prediction Layer of the first semantic recognition model, and obtaining a first matching degree between the sample text and the semantic association text.
And S104, acquiring a second matching degree between the sample text and the semantic associated text through a second semantic recognition model based on the text characteristic information, wherein the first semantic recognition model is a student model of the second semantic recognition model.
For example, after the second text characteristic information is obtained through the second semantic recognition model, the second matching degree between the sample text and the semantic associated text can be obtained through the second semantic recognition model based on the second text characteristic information. The second degree of match may be that the sample text and the semantically associated text match or that the sample text and the semantically associated text do not match.
In one embodiment, obtaining, by the second semantic recognition model, a second matching degree between the sample text and the semantically associated text based on the text characteristic information may include: and distilling the text characteristic information sequentially through the embedding layer, the attention layer and the prediction layer of the second semantic recognition model to obtain a second matching degree between the sample text and the semantic associated text.
For example, the text characteristic information can be distilled through the embedding layer of the second semantic recognition model to obtain embedded information, the embedded information can be distilled through the attention layer of the second semantic recognition model to obtain attention information, and the attention information can be predicted through the prediction layer of the second semantic recognition model to obtain the second matching degree between the sample text and the semantic related text.
In one embodiment, the text recognition method may further include: acquiring a reference text and a target text corresponding to the reference text; extracting the characteristics of the reference text and the target text to obtain target text characteristic information; acquiring a prediction matching degree between a reference text and a standard text based on the target text characteristic information; and training the initial second semantic recognition model according to the predicted matching degree and the pre-labeled real matching degree to obtain a second semantic recognition model.
In order to improve the accuracy and reliability of the training of the first semantic recognition model, the first semantic recognition model of the student model serving as the second semantic recognition model can better learn the semantic information to be learned, and the second semantic recognition model can be trained in advance. Specifically, a reference text and a target text corresponding to the reference text may be obtained, where the sample text and the semantic association text may be flexibly set according to actual requirements, for example, the reference text may be similar to the sample text, and the target text corresponding to the reference text may be similar to the semantic association text.
Then, feature extraction may be performed on the reference text and the target text to obtain target text feature information, for example, word segmentation may be performed on the reference text and the target text, respectively, to obtain a third word sequence of the reference text and a fourth word sequence of the target text, where a word segmentation processing manner may be flexibly set according to actual needs, and the word segmentation processing manner may be consistent with the above-described word segmentation processing manner for the sample text and the semantic associated text. And secondly, splicing the third word sequence and the fourth word sequence to obtain a target spliced word sequence, wherein the specific splicing mode can be flexibly set according to actual needs, and the splicing mode can be consistent with the splicing mode for the first word sequence and the second word sequence. At this time, feature extraction may be performed on the target spliced word sequence to obtain target text feature information, where a feature extraction manner may be flexibly set according to actual needs, and the feature extraction manner may be consistent with the above-described manner of performing feature extraction on the spliced word sequence. The attention information of the spliced word sequence of the target can be extracted through the multi-head attention layer of the second semantic recognition model to obtain multi-head attention information of the target, and the full-connection processing is carried out on the multi-head attention information of the target through the full-connection feedforward neural network layer of the second semantic recognition model to obtain the characteristic information of the target text.
For example, as shown in fig. 3, taking the second semantic recognition model as a BERT model as an example, word segmentation is performed on the reference text Query and the target text Query at a word level, a third word sequence of the obtained reference text may be Tok i … T ok N and a fourth word sequence of the target text may be Tok i … T ok M, and two preset symbols [ CLS ] are used for the word segmentation]And [ SEP ]]The third word sequence and the fourth word sequence are spliced, and the spliced word sequence can be [ CLS [ ]]Tok i…T ok N[SEP]Tok i…Tok M[SEP]Can be expressed as x ═ x (x)1x2…xN+M+3)。[CLS]The location may be output as a predicted classification Label, i.e. whether the reference text and the target text match, for example, a classification Label of 0 indicates no match, and a classification Label of 1 indicates a match. And performing feature extraction on the target spliced word sequence according to the formula (1), the formula (2) and the formula (3) by a multi-head attention layer of the BERT model and the formula (4) by a full-connection feedforward neural network layer to obtain target text feature information.
At this time, the embedded layer, the attention layer and the prediction layer of the second semantic recognition model may be sequentially passed through, based on the target text feature information, the prediction matching degree between the reference text and the standard text is obtained, and the initial second semantic recognition model is trained according to the prediction matching degree and the pre-labeled real matching degree to obtain the second semantic recognition model. The pre-labeled real matching degree can be an accurate matching degree between the reference text and the standard text. For example, a target loss function may be determined through a first loss function of an embedding layer, a second loss function of an attention layer, a third loss function of a prediction layer, and the like of the second semantic recognition model, and a prediction matching degree between the reference text and the standard text and a pre-labeled real matching degree are converged based on the target loss function to adjust parameters of the initial second semantic recognition model to appropriate values, so as to obtain a second semantic recognition model, i.e., a trained second semantic recognition model. And acquiring a second matching degree between the sample text and the semantic associated text based on the text characteristic information through the trained second semantic recognition model.
S105, training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, and performing semantic recognition on the text through the trained semantic recognition model.
After the first matching degree between the sample text and the semantic associated text is obtained through the first semantic identification model and the second matching degree between the sample text and the semantic associated text is obtained through the second semantic identification model, the first semantic identification model can be trained according to the first matching degree and the second matching degree to obtain a trained semantic identification model, so that the first semantic identification model can learn knowledge related to semantic identification from the second semantic identification model, and subsequently, the text can be subjected to semantic identification through the trained semantic identification model.
In an embodiment, training the first semantic recognition model according to the first matching degree and the second matching degree, and obtaining the trained semantic recognition model may include: acquiring a mapping relation between different network levels in a first semantic recognition model and a second semantic recognition model; determining loss functions corresponding to different network levels according to the mapping relation; determining a target loss function through loss functions corresponding to different network levels; and training the first semantic recognition model based on the target loss function, the first matching degree and the second matching degree to obtain a trained semantic recognition model.
In order to improve the accuracy of the model training, a mapping relationship between different network levels in a first semantic recognition model and a second semantic recognition model may be obtained, where the different network levels may include an embedded layer, an attention layer, a prediction layer, etc., for example, as shown in fig. 4, the first semantic recognition model is exemplified by a student model (TinyBERT model), the second semantic recognition model is exemplified by a teacher model (BERT model), the TinyBERT model may include an embedded layer, two attention layers, a prediction layer, etc., the BERT model may include an embedded layer, twelve attention layers (i.e., N ═ 12), a prediction layer, etc., in fig. 4, a Transformer represents an attention layer, a Distillation represents a Distillation process, a text input may include an input of a sample text and a semantic associated text, d may be a Hidden layer state Size (i.e., Hidden layer number Size) of the BERT model, and d' may be a Hidden layer state Size of the TinyBERT model, and d' may be smaller than d to obtain a smaller TinyBERT model, the pentagonal region may represent a Prediction Layer (predictionlayer), the quadrangular region may represent an Embedding Layer (Embedding Layer), the hexagonal region may represent an attention Layer (transducer Layer), and the Number of network layers (Layer Number) of the TinyBERT model is smaller than that of the BERT model. At this time, a mapping relationship between the embedding layer of the first semantic recognition model and the embedding layer of the second semantic recognition model, a mapping relationship between the attention layer of the first semantic recognition model and the attention layer of the second semantic recognition model, and a mapping relationship between the prediction layer of the first semantic recognition model and the prediction layer of the second semantic recognition model may be established. Because the attention layer of the first semantic recognition model is less than that of the second semantic recognition model, two layers can be selected from the twelve attention layers of the second semantic recognition model, and mapping relations are respectively established between the two attention layers and the two attention layers of the first semantic recognition model. For example, a mapping relationship between two attention layers of the first semantic recognition model and the head-to-tail attention layers of the second semantic recognition model may be established, or a mapping relationship between two attention layers of the first semantic recognition model and the first and sixth attention layers of the second semantic recognition model may be established, and so on.
Then, according to the mapping relation between different network levels in the first semantic recognition model and the second semantic recognition model, loss functions corresponding to different network levels can be determined.
In an embodiment, different network levels of the first semantic recognition model and the second semantic recognition model each include an embedding layer, an attention layer, and a prediction layer, and determining the loss function corresponding to the different network levels according to the mapping relationship may include: acquiring the text length of a sample text, first embedding dimension information of a first semantic recognition model, second embedding dimension information of a second semantic recognition model and a linear transformation matrix of dimension mapping between the first semantic recognition model and the second semantic recognition model, and determining a first loss function of an embedding layer according to the text length, the first embedding dimension information, the second embedding dimension information and the linear transformation matrix; acquiring the number of attention heads, an attention weight matrix, first hidden layer state information of a first semantic recognition model and second hidden layer state information of a second semantic recognition model, and determining a second loss function of an attention layer according to the number of attention heads, the attention weight matrix, the first hidden layer state information, the second hidden layer state information, the text length and a linear transformation matrix; and acquiring a first probability distribution output by the first semantic recognition model and a second probability distribution output by the second semantic recognition model, and determining a third loss function of the prediction layer according to the first probability distribution and the second probability distribution.
In order to improve the accuracy and reliability of the determination of the loss functions corresponding to different network levels, the loss functions of the embedding layer, the attention layer and the prediction layer can be determined respectively. Specifically, for the Embedding Layer (Embedding Layer), the text length of the sample text, the first Embedding dimension information of the first semantic recognition model, the second Embedding dimension information of the second semantic recognition model, and the linear transformation matrix of the dimension mapping between the first semantic recognition model and the second semantic recognition model may be obtained, and a first loss function of the Embedding Layer is determined according to the text length, the first Embedding dimension information, the second Embedding dimension information, and the linear transformation matrix, where the first loss function may be represented by the following formula (5):
Figure BDA0002727687880000141
wherein,
Figure BDA0002727687880000142
first embedding dimension information (embedding), E, which may represent a first semantic recognition model (TinyBERT model)T∈Rl×dMay represent a second embedded dimension of a second semantic recognition model (BERT model), l may represent a text length of a sample text that is input, d0A dimension of the first embedding dimension information may be represented, and d represents a dimension of the second embedding dimension information. Since the embedded layer of the TinyBERT model becomes smaller than the BERT model to obtain smaller model and acceleration, WeMay be a d0A trainable linear transformation matrix in x d dimension, namely a linear transformation matrix for dimension mapping between the TinyBERT model and the BERT model, which can project first embedding dimension information (student embedding) of the TinyBERT model to second embedding dimension information (teac) of the BERT modelher embedding) is located.
For the attention Layer (transducer Layer), the distillation of the transducer Layer may include attention based distillation (attention based distillation) and hidden Layer distillation (hidden stages based distillation), etc. Among other things, attention weights in the BERT model may capture rich linguistic knowledge (e.g., grammatical and co-reference information, etc.), and thus, attention-based distillation may encourage the transfer of linguistic knowledge from the teacher model (BERT model) to the student model (TinyBERT model). Specifically, the TinyBERT model learns and fits a multi-head attention matrix and hidden layer state output in the BERT model, can obtain the number of attention heads, an attention weight matrix, first hidden layer state information of a first semantic recognition model and second hidden layer state information of a second semantic recognition model, and determines a second loss function of an attention layer according to the number of attention heads, the attention weight matrix, the first hidden layer state information, the second hidden layer state information, text length and a linear transformation matrix. The second loss function can be expressed as the following equation (6) and equation (7):
Figure BDA0002727687880000151
Figure BDA0002727687880000152
where h may denote the number of heads of attention, Ai∈Rl×lMay represent an attention weight matrix corresponding to the ith entry header, l may represent a text length of the input sample text,
Figure BDA0002727687880000153
hidden state information of the first semantic recognition model (TinyBERT model) may be represented,
Figure BDA0002727687880000154
hidden state information of the second semantic recognition model (BERT model) can be expressed by the above formula (4)And (6) calculating. d may represent the hidden state size of the second semantic recognition model (BERT model), d 'may represent the hidden state size of the first semantic recognition model (TinyBERT model), and d' may be smaller than d to obtain a smaller TinyBERT model. Matrix array
Figure BDA0002727687880000155
Is a trainable linear transformation matrix that can project hidden states of the TinyBERT model into the space in which the BERT model is located.
For a Prediction Layer (Prediction Layer), a first probability distribution output by the first semantic recognition model and a second probability distribution output by the second semantic recognition model can be obtained, and a third loss function of the Prediction Layer is determined according to the first probability distribution and the second probability distribution. Wherein, the prediction layer can be used to simulate the performance of the second semantic recognition model (BERT model) in the prediction layer, the prediction layer can calculate the softmax cross entropy of the second probability distribution output by the second semantic recognition model (BERT model) and the first probability distribution output by the first semantic recognition model (TinyBERT model), and the third loss function can be shown in the following formula (8):
Lpred=-softmax(zT)·log_softmax(zS/t) (8)
wherein z isSThe classification vector (i.e., logits vector), z, which may represent a first semantic recognition model (TinyBERT model) predictionTThe predicted logits vector of the second semantic recognition model (BERT model) may be represented, log _ softmax () may represent log likelihood, t may represent a temperature value, and a value of t may be flexibly set according to actual needs, for example, t is 1 by default.
It should be noted that a Mean Squared Error (MSE) may be used to calculate a difference between a first probability distribution output by the first semantic recognition model and a second probability distribution output by the second semantic recognition model, and the convergence effect is better than the softmax cross entropy, that is, the third loss function may be shown in the following formula (9):
Lpred=MSE(zT,zS) (9)
after the loss functions corresponding to different network levels are obtained, the target loss function can be determined through the loss functions corresponding to different network levels, so that the first semantic recognition model can be trained based on the target loss function, the first matching degree and the second matching degree, and the trained semantic recognition model is obtained. For example, a target loss function may be determined based on the first, second, and third loss functions, which may be shown in equations (10) and (11) below:
wherein, using the above distillation targets (i.e., equations 5, 6, 7, and 8, or equations 5, 6, 7, and 9), the distillation loss function for the corresponding layer between the teacher and student networks is unified as:
Figure BDA0002727687880000161
Figure BDA0002727687880000162
wherein,
Figure BDA0002727687880000163
can represent a loss function, λ, for a given model layer (e.g., a Transformer layer or an embedding layer)mA hyperparameter that may be indicative of the importance of the mth layer distillation. Specifically, Knowledge Distillation (KD) aims to transfer the Knowledge of a second semantic recognition model (i.e., BERT model, which may also be referred to as a teacher model) with a larger model size to a first semantic recognition model (i.e., TinyBERT model, which may also be referred to as a student model) with a smaller model size, and train the student model to mimic the behavior of the teacher model. Assuming that the student model has M transform layers and the teacher model has N transform layers, M layers can be selected from the teacher model for transform layer distillation, and the function N ═ g (M) is used as a mapping function from the student model to the teacher model, which means that the mth layer of the student model can learn information from the nth layer of the teacher model. In this case, the insertion layer distillation and the prediction layer distillation are also considered, and 0 may be set as the insertion layerThe index of (1) is set as the index of the prediction layer, and the corresponding layer mapping is respectively defined as 0 ═ g (0) and N +1 ═ g (M +1), so that the student model can obtain knowledge from the teacher model by minimizing the target loss function corresponding to the formula (11), and the student model can be obtained by distillation by fixing parameters of the teacher model.
It should be noted that the hyper-parameter in the present embodiment may be set as shown in the following table:
Figure BDA0002727687880000171
compared with the original BERT model, the TinyBERT model provided by this embodiment has the advantages that on the premise that the performance of the model is not reduced, the parameter quantity of the model is reduced (for example, the parameter quantity can be reduced to 1/12 at least), the prediction speed under a Graphics Processing Unit (GPU) of a terminal running the TinyBERT model is greatly increased (for example, the prediction speed can be increased by at least 13 times), a good effect is obtained, the purposes of compressing the scale of the model and accelerating the prediction are achieved, and the problems of expensive calculation resources and insufficient memory are solved.
In an embodiment, after the first semantic recognition model is trained according to the first matching degree and the second matching degree to obtain the trained semantic recognition model, the text recognition method may further include: receiving a current text input by a user, and acquiring a standard text from a standard text database; performing word segmentation and splicing processing on the current text and the standard text to obtain a word sequence after current splicing; performing feature extraction on the word sequence after the current splicing to obtain current text feature information; acquiring the current matching degree between the current text and the standard text based on the current text characteristic information through the trained semantic recognition model; identifying a text with the current matching degree larger than a preset threshold value to obtain a text to be displayed; and displaying the text to be displayed.
After the trained semantic recognition model is obtained through training, the trained semantic recognition model may be applied to a server or a terminal, in an embodiment, for example, the trained semantic recognition model is applied to the terminal, when a user needs to search a problem through a search engine, the user may receive a current text which needs to be searched and is input by the user in an input text box of the search engine displayed by the terminal, for example, as shown in fig. 5, the current text which is input by the user may be received as "XXX play", at this time, the terminal may obtain a standard text from a standard text database on the local or the server, and then perform word segmentation and concatenation processing on the current text and the standard text according to the word segmentation processing mode and the concatenation processing mode, so as to obtain a word sequence after current concatenation. And extracting the features of the current spliced word sequence through the trained semantic recognition model to obtain current text feature information, and obtaining the current matching degree between the current text and the standard text based on the current text feature information through the trained semantic recognition model, wherein the current matching degree can comprise matching probability distribution, whether the matching results (including matching and mismatching) exist, and the like. At this time, the text with the current matching degree greater than the preset threshold (that is, the text with the matching probability greater than the preset threshold) can be identified, and the text to be displayed is obtained, the preset threshold can be flexibly set according to actual needs, and the text to be displayed can be displayed in a search engine display interface displayed by the terminal, for example, as shown in fig. 5, the text to be displayed can be displayed in the order of the matching probability from large to small: the user can select the text to be searched based on the displayed text to be displayed, and the accuracy of text display is improved.
In another embodiment, taking an application of the trained semantic recognition model on the server as an example, the server may receive a current text input by the user and sent by the terminal, obtain a standard text from a standard text database, perform word segmentation and concatenation on the current text and the standard text according to the word segmentation processing method and the concatenation processing method described above to obtain a current concatenated word sequence, perform feature extraction on the current concatenated word sequence through the trained semantic recognition model to obtain current text feature information, obtain a current matching degree between the current text and the standard text based on the current text feature information through the trained semantic recognition model, recognize a text with the current matching degree greater than a preset threshold, and obtain a text to be displayed. At the moment, the text to be displayed can be sent to the terminal, so that the terminal can display the text to be displayed, the text required by the user can be accurately displayed, and the user requirements are met.
The method and the device for obtaining the semantic associated text can obtain the sample text and the semantic associated text corresponding to the sample text, and obtain text characteristic information based on the sample text and the semantic associated text; acquiring a first matching degree between the sample text and the semantic associated text through a first semantic identification model based on the text characteristic information, and acquiring a second matching degree between the sample text and the semantic associated text through a second semantic identification model based on the text characteristic information, wherein the first semantic identification model is a student model of the second semantic identification model; at this time, the first semantic recognition model can be trained according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, so that the trained semantic recognition model can be used for performing semantic recognition on the text. According to the scheme, the matching degree between the sample text and the semantic associated text can be obtained based on the text characteristic information obtained from the sample text and the semantic associated text, so that the trained semantic identification model can learn the semantic information from the second semantic identification model, and the accuracy of semantic identification on the text is improved.
In order to better implement the semantic recognition method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the semantic recognition method. The meaning of nouns is the same as in the semantic recognition method, and specific implementation details can refer to the description in the method embodiment.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a semantic recognition device according to an embodiment of the present disclosure, where the semantic recognition device may include a first obtaining unit 301, a second obtaining unit 302, a third obtaining unit 303, a fourth obtaining unit 304, a training unit 305, and the like.
The first obtaining unit 301 is configured to obtain a sample text and a semantic association text corresponding to the sample text;
a second obtaining unit 302, configured to obtain text feature information based on the sample text and the semantic associated text corresponding to the sample text;
a third obtaining unit 303, configured to obtain, based on the text characteristic information, a first matching degree between the sample text and the semantic associated text through the first semantic recognition model;
a fourth obtaining unit 304, configured to obtain, based on the text characteristic information, a second matching degree between the sample text and the semantic associated text through a second semantic recognition model, where the first semantic recognition model is a student model of the second semantic recognition model;
the training unit 305 is configured to train the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, so as to perform semantic recognition on the text through the trained semantic recognition model.
In one embodiment, the training unit 305 may include:
the acquisition subunit is used for acquiring the mapping relation between different network levels in the first semantic recognition model and the second semantic recognition model;
the first determining subunit is used for determining loss functions corresponding to different network levels according to the mapping relation;
the second determining subunit is used for determining a target loss function through the loss functions corresponding to different network hierarchies;
and the training subunit is used for training the first semantic recognition model based on the target loss function, the first matching degree and the second matching degree to obtain a trained semantic recognition model.
In an embodiment, the different network levels of the first semantic recognition model and the second semantic recognition model each include an embedding layer, an attention layer, and a prediction layer, and the first determining subunit may be specifically configured to: acquiring the text length of a sample text, first embedding dimension information of a first semantic recognition model, second embedding dimension information of a second semantic recognition model and a linear transformation matrix of dimension mapping between the first semantic recognition model and the second semantic recognition model, and determining a first loss function of an embedding layer according to the text length, the first embedding dimension information, the second embedding dimension information and the linear transformation matrix; acquiring the number of attention heads, an attention weight matrix, first hidden layer state information of a first semantic recognition model and second hidden layer state information of a second semantic recognition model, and determining a second loss function of an attention layer according to the number of attention heads, the attention weight matrix, the first hidden layer state information, the second hidden layer state information, the text length and a linear transformation matrix; and acquiring a first probability distribution output by the first semantic recognition model and a second probability distribution output by the second semantic recognition model, and determining a third loss function of the prediction layer according to the first probability distribution and the second probability distribution.
In an embodiment, the second determining subunit may be specifically configured to: a target loss function is determined based on the first, second, and third loss functions.
In an embodiment, the second obtaining unit 302 comprises:
the word segmentation subunit is used for respectively carrying out word segmentation processing on the sample text and the semantic associated text to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text;
the splicing subunit is used for splicing the first word sequence and the second word sequence to obtain a spliced word sequence;
and the extraction subunit is used for extracting the characteristics of the spliced word sequence to obtain text characteristic information.
In an embodiment, the splicing subunit may be specifically configured to: setting a first preset character at the head of the first word sequence and setting a second preset character at the tail of the second word sequence; and splicing the tail part of the first word sequence and the head part of the second word sequence through a third preset character to obtain a spliced word sequence.
In one embodiment, the word segmentation subunit may be specifically configured to: performing word segmentation processing on the sample text according to the character level to obtain first word segmentation information; vectorizing the first word segmentation information to obtain a first word sequence; performing word segmentation processing on the semantic associated text according to the word level to obtain second word segmentation information; and vectorizing the second word segmentation information to obtain a second word sequence.
In an embodiment, the extraction subunit may be specifically configured to: extracting attention information of the spliced word sequence through a multi-head attention layer of the first semantic recognition model and a multi-head attention layer of the second semantic recognition model respectively to obtain multi-head attention information; and carrying out full-connection processing on the multi-head attention information through full-connection feedforward neural network layers of the first semantic recognition model and the second semantic recognition model respectively to obtain text characteristic information.
In an embodiment, the third obtaining unit 303 may specifically be configured to: and distilling the text characteristic information sequentially through the embedding layer, the attention layer and the prediction layer of the first semantic recognition model to obtain a first matching degree between the sample text and the semantic associated text.
In one embodiment, the text recognition apparatus may further include:
the fifth acquiring unit is used for acquiring the reference text and the target text corresponding to the reference text;
the first extraction unit is used for extracting the characteristics of the reference text and the target text to obtain the characteristic information of the target text;
a sixth obtaining unit, configured to obtain a prediction matching degree between the reference text and the standard text based on the target text feature information;
and the model training unit is used for training the initial second semantic recognition model according to the predicted matching degree and the pre-labeled real matching degree to obtain the second semantic recognition model.
In one embodiment, the text recognition apparatus may further include:
the receiving unit is used for receiving a current text input by a user and acquiring a standard text from a standard text database;
the processing unit is used for performing word segmentation and splicing processing on the current text and the standard text to obtain a word sequence after current splicing;
the second extraction unit is used for extracting the characteristics of the word sequence after the current splicing to obtain the current text characteristic information;
a seventh obtaining unit, configured to obtain, based on the current text characteristic information, a current matching degree between the current text and the standard text through the trained semantic recognition model;
the identification unit is used for identifying the text with the current matching degree larger than a preset threshold value to obtain the text to be displayed;
and the display unit is used for displaying the text to be displayed.
In the embodiment of the application, the first obtaining unit 301 may obtain the sample text and the semantic associated text corresponding to the sample text, and the second obtaining unit 302 may obtain the text feature information based on the sample text and the semantic associated text; a third obtaining unit 303 obtains a first matching degree between the sample text and the semantic associated text based on the text characteristic information through a first semantic identification model, and a fourth obtaining unit 304 obtains a second matching degree between the sample text and the semantic associated text based on the text characteristic information through a second semantic identification model, wherein the first semantic identification model is a student model of the second semantic identification model; at this time, the training unit 305 may train the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, so as to perform semantic recognition on the text through the trained semantic recognition model. According to the scheme, the matching degree between the sample text and the semantic associated text can be obtained based on the text characteristic information obtained from the sample text and the semantic associated text, so that the trained semantic identification model can learn the semantic information from the second semantic identification model, and the accuracy of semantic identification on the text is improved.
An embodiment of the present application further provides a computer device, where the computer device may be a server or a terminal, and as shown in fig. 7, it shows a schematic structural diagram of the computer device according to the embodiment of the present application, specifically:
the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 7 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The computer device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 via a power management system, so that functions of managing charging, discharging, and power consumption are implemented via the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The computer device may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application programs stored in the memory 402, thereby implementing various functions as follows:
acquiring a sample text and a semantic associated text corresponding to the sample text, and acquiring text characteristic information based on the sample text and the semantic associated text; acquiring a first matching degree between the sample text and the semantic associated text through a first semantic identification model based on the text characteristic information, and acquiring a second matching degree between the sample text and the semantic associated text through a second semantic identification model based on the text characteristic information, wherein the first semantic identification model is a student model of the second semantic identification model; the first semantic recognition model can be trained according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, so that the trained semantic recognition model can be used for carrying out semantic recognition on the text, and the accuracy of the semantic recognition on the text is improved.
In an embodiment, when the first semantic recognition model is trained according to the first matching degree and the second matching degree, and the trained semantic recognition model is obtained, the processor 401 may perform: acquiring a mapping relation between different network levels in a first semantic recognition model and a second semantic recognition model; determining loss functions corresponding to different network levels according to the mapping relation; determining a target loss function through loss functions corresponding to different network levels; and training the first semantic recognition model based on the target loss function, the first matching degree and the second matching degree to obtain a trained semantic recognition model.
In an embodiment, different network levels of the first semantic recognition model and the second semantic recognition model each include an embedding layer, an attention layer, and a prediction layer, and when determining the loss function corresponding to the different network levels according to the mapping relationship, the processor 401 may perform: acquiring the text length of a sample text, first embedding dimension information of a first semantic recognition model, second embedding dimension information of a second semantic recognition model and a linear transformation matrix of dimension mapping between the first semantic recognition model and the second semantic recognition model, and determining a first loss function of an embedding layer according to the text length, the first embedding dimension information, the second embedding dimension information and the linear transformation matrix; acquiring the number of attention heads, an attention weight matrix, first hidden layer state information of a first semantic recognition model and second hidden layer state information of a second semantic recognition model, and determining a second loss function of an attention layer according to the number of attention heads, the attention weight matrix, the first hidden layer state information, the second hidden layer state information, the text length and a linear transformation matrix; and acquiring a first probability distribution output by the first semantic recognition model and a second probability distribution output by the second semantic recognition model, and determining a third loss function of the prediction layer according to the first probability distribution and the second probability distribution.
In one embodiment, when determining the target loss function by the loss functions corresponding to different network hierarchies, the processor 401 may perform: a target loss function is determined based on the first, second, and third loss functions.
In one embodiment, when obtaining text feature information based on the sample text and the semantic relation text corresponding to the sample text, processor 401 may perform: performing word segmentation processing on the sample text and the semantic associated text respectively to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text; splicing the first word sequence and the second word sequence to obtain a spliced word sequence; and performing feature extraction on the spliced word sequence to obtain text feature information.
In an embodiment, when the first word sequence and the second word sequence are spliced to obtain a spliced word sequence, the processor 401 may execute: setting a first preset character at the head of the first word sequence and setting a second preset character at the tail of the second word sequence; and splicing the tail part of the first word sequence and the head part of the second word sequence through a third preset character to obtain a spliced word sequence.
In an embodiment, when performing word segmentation processing on the sample text and the semantically associated text to obtain a first word sequence of the sample text and a second word sequence of the semantically associated text, the processor 401 may perform: performing word segmentation processing on the sample text according to the character level to obtain first word segmentation information; vectorizing the first word segmentation information to obtain a first word sequence; performing word segmentation processing on the semantic associated text according to the word level to obtain second word segmentation information; and vectorizing the second word segmentation information to obtain a second word sequence.
In an embodiment, when performing feature extraction on the spliced word sequence to obtain text feature information, the processor 401 may perform: extracting attention information of the spliced word sequence through a multi-head attention layer of the first semantic recognition model and a multi-head attention layer of the second semantic recognition model respectively to obtain multi-head attention information; and carrying out full-connection processing on the multi-head attention information through full-connection feedforward neural network layers of the first semantic recognition model and the second semantic recognition model respectively to obtain text characteristic information.
In one embodiment, when obtaining the first matching degree between the sample text and the semantically associated text based on the text characteristic information through the first semantic recognition model, the processor 401 may perform: and distilling the text characteristic information sequentially through the embedding layer, the attention layer and the prediction layer of the first semantic recognition model to obtain a first matching degree between the sample text and the semantic associated text.
In one embodiment, processor 401 may perform: acquiring a reference text and a target text corresponding to the reference text; extracting the characteristics of the reference text and the target text to obtain target text characteristic information; acquiring a prediction matching degree between a reference text and a standard text based on the target text characteristic information; and training the initial second semantic recognition model according to the predicted matching degree and the pre-labeled real matching degree to obtain a second semantic recognition model.
In an embodiment, after training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, the processor 401 may perform: receiving a current text input by a user, and acquiring a standard text from a standard text database; performing word segmentation and splicing processing on the current text and the standard text to obtain a word sequence after current splicing; performing feature extraction on the word sequence after the current splicing to obtain current text feature information; acquiring the current matching degree between the current text and the standard text based on the current text characteristic information through the trained semantic recognition model; identifying a text with the current matching degree larger than a preset threshold value to obtain a text to be displayed; and displaying the text to be displayed.
In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the semantic identification method, and are not described herein again.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the above embodiments.
It will be understood by those skilled in the art that all or part of the steps of the methods of the embodiments described above may be performed by computer instructions, or by computer instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, in which a computer program is stored, where the computer program includes computer instructions, and the computer program can be loaded by a processor to execute any one of the semantic recognition methods provided in the present application.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Because the computer instructions stored in the storage medium can execute the steps in any semantic recognition method provided in the embodiments of the present application, the beneficial effects that can be achieved by any semantic recognition method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The semantic recognition method, the semantic recognition device, the computer device, and the storage medium provided in the embodiments of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (13)

1. A method of semantic identification, comprising:
acquiring a sample text and a semantic associated text corresponding to the sample text;
acquiring text characteristic information based on the sample text and the semantic associated text corresponding to the sample text;
acquiring a first matching degree between the sample text and the semantic associated text based on the text characteristic information through a first semantic identification model;
acquiring a second matching degree between the sample text and the semantic associated text through a second semantic identification model based on the text characteristic information, wherein the first semantic identification model is a student model of the second semantic identification model;
and training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, and performing semantic recognition on the text through the trained semantic recognition model.
2. The text recognition method of claim 1, wherein the training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model comprises:
acquiring a mapping relation between different network levels in the first semantic recognition model and the second semantic recognition model;
determining loss functions corresponding to different network levels according to the mapping relation;
determining a target loss function through the loss functions corresponding to the different network hierarchies;
and training the first semantic recognition model based on the target loss function, the first matching degree and the second matching degree to obtain a trained semantic recognition model.
3. The text recognition method according to claim 2, wherein different network levels of the first semantic recognition model and the second semantic recognition model each include an embedding layer, an attention layer and a prediction layer, and the determining the loss function corresponding to different network levels according to the mapping relationship comprises:
acquiring the text length of the sample text, first embedding dimension information of the first semantic recognition model, second embedding dimension information of the second semantic recognition model and a linear transformation matrix of dimension mapping between the first semantic recognition model and the second semantic recognition model, and determining a first loss function of the embedding layer according to the text length, the first embedding dimension information, the second embedding dimension information and the linear transformation matrix; and the number of the first and second groups,
acquiring the number of attention heads, an attention weight matrix, first hidden layer state information of the first semantic recognition model and second hidden layer state information of the second semantic recognition model, and determining a second loss function of the attention layer according to the number of attention heads, the attention weight matrix, the first hidden layer state information, the second hidden layer state information, the text length and the linear transformation matrix; and the number of the first and second groups,
acquiring a first probability distribution output by the first semantic recognition model and a second probability distribution output by the second semantic recognition model, and determining a third loss function of the prediction layer according to the first probability distribution and the second probability distribution;
the determining a target loss function by the loss functions corresponding to the different network hierarchies comprises:
determining a target loss function based on the first loss function, the second loss function, and the third loss function.
4. The text recognition method of claim 1, wherein the obtaining text feature information based on the sample text and the corresponding semantically-related text of the sample text comprises:
performing word segmentation processing on the sample text and the semantic associated text respectively to obtain a first word sequence of the sample text and a second word sequence of the semantic associated text;
splicing the first word sequence and the second word sequence to obtain a spliced word sequence;
and extracting the characteristics of the spliced word sequence to obtain text characteristic information.
5. The text recognition method of claim 4, wherein the concatenating the first word sequence and the second word sequence to obtain a concatenated word sequence comprises:
setting a first preset character at the head of the first word sequence and setting a second preset character at the tail of the second word sequence;
and splicing the tail part of the first word sequence and the head part of the second word sequence through a third preset character to obtain a spliced word sequence.
6. The text recognition method of claim 5, wherein the performing word segmentation on the sample text and the semantically associated text to obtain a first word sequence of the sample text and a second word sequence of the semantically associated text comprises:
performing word segmentation processing on the sample text according to the character level to obtain first word segmentation information;
vectorizing the first word segmentation information to obtain a first word sequence;
performing word segmentation processing on the semantic associated text according to the word level to obtain second word segmentation information;
and vectorizing the second word segmentation information to obtain a second word sequence.
7. The text recognition method of claim 5, wherein the extracting the features of the spliced word sequence to obtain text feature information comprises:
extracting attention information of the spliced word sequence through a multi-head attention layer of the first semantic recognition model and the multi-head attention layer of the second semantic recognition model respectively to obtain multi-head attention information;
and carrying out full-connection processing on the multi-head attention information through full-connection feedforward neural network layers of the first semantic recognition model and the second semantic recognition model respectively to obtain text characteristic information.
8. The text recognition method of claim 1, wherein the obtaining, by the first semantic recognition model based on the text characteristic information, the first matching degree between the sample text and the semantically associated text comprises:
and distilling the text characteristic information sequentially through an embedding layer, an attention layer and a prediction layer of a first semantic recognition model to obtain a first matching degree between the sample text and the semantic associated text.
9. The text recognition method of claim 1, further comprising:
acquiring a reference text and a target text corresponding to the reference text;
extracting the features of the reference text and the target text to obtain target text feature information;
acquiring the prediction matching degree between the reference text and the standard text based on the target text characteristic information;
and training an initial second semantic recognition model according to the predicted matching degree and the pre-labeled real matching degree to obtain the second semantic recognition model.
10. The text recognition method according to any one of claims 1 to 9, wherein after the training of the first semantic recognition model according to the first matching degree and the second matching degree to obtain the trained semantic recognition model, the text recognition method further comprises:
receiving a current text input by a user, and acquiring a standard text from a standard text database;
performing word segmentation and splicing processing on the current text and the standard text to obtain a word sequence after current splicing;
performing feature extraction on the current spliced word sequence to obtain current text feature information;
acquiring the current matching degree between the current text and the standard text based on the current text characteristic information through the trained semantic recognition model;
identifying a text with the current matching degree larger than a preset threshold value to obtain a text to be displayed;
and displaying the text to be displayed.
11. A semantic recognition apparatus, comprising:
the system comprises a first acquisition unit, a second acquisition unit and a semantic association unit, wherein the first acquisition unit is used for acquiring a sample text and a semantic association text corresponding to the sample text;
the second acquisition unit is used for acquiring text characteristic information based on the sample text and the semantic associated text corresponding to the sample text;
a third obtaining unit, configured to obtain, through a first semantic recognition model, a first matching degree between the sample text and the semantic associated text based on the text characteristic information;
a fourth obtaining unit, configured to obtain, by a second semantic recognition model based on the text characteristic information, a second matching degree between the sample text and the semantic associated text, where the first semantic recognition model is a student model of the second semantic recognition model;
and the training unit is used for training the first semantic recognition model according to the first matching degree and the second matching degree to obtain a trained semantic recognition model, and performing semantic recognition on a text through the trained semantic recognition model.
12. A computer device comprising a processor and a memory, the memory having stored therein a computer program, the processor when calling the computer program in the memory performing the semantic recognition method according to any one of claims 1 to 10.
13. A storage medium for storing a computer program which is loaded by a processor to perform the semantic recognition method of any one of claims 1 to 10.
CN202011108225.0A 2020-10-16 2020-10-16 Semantic recognition method and device, computer equipment and storage medium Pending CN112232086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011108225.0A CN112232086A (en) 2020-10-16 2020-10-16 Semantic recognition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011108225.0A CN112232086A (en) 2020-10-16 2020-10-16 Semantic recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112232086A true CN112232086A (en) 2021-01-15

Family

ID=74117652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011108225.0A Pending CN112232086A (en) 2020-10-16 2020-10-16 Semantic recognition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112232086A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177415A (en) * 2021-04-30 2021-07-27 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN113360616A (en) * 2021-06-04 2021-09-07 科大讯飞股份有限公司 Automatic question-answering processing method, device, equipment and storage medium
CN113807540A (en) * 2021-09-17 2021-12-17 北京搜狗科技发展有限公司 Data processing method and device
CN113836940A (en) * 2021-09-26 2021-12-24 中国南方电网有限责任公司 Knowledge fusion method and device in electric power metering field and computer equipment
CN114360537A (en) * 2021-12-27 2022-04-15 科大讯飞股份有限公司 Spoken question and answer scoring method, spoken question and answer training method, computer equipment and storage medium
CN114694253A (en) * 2022-03-31 2022-07-01 深圳市爱深盈通信息技术有限公司 Behavior recognition model training method, behavior recognition method and related device
WO2022194013A1 (en) * 2021-03-16 2022-09-22 Moffett International Co., Limited System and method for knowledge-preserving neural network pruning
CN115146522A (en) * 2021-03-31 2022-10-04 西门子股份公司 Model training method, diagnosis method, device, electronic device and readable medium
WO2024187785A1 (en) * 2023-03-10 2024-09-19 腾讯科技(深圳)有限公司 Text recognition method and apparatus, electronic device, storage medium, and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062489A (en) * 2019-12-11 2020-04-24 北京知道智慧信息技术有限公司 Knowledge distillation-based multi-language model compression method and device
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device
CN111611377A (en) * 2020-04-22 2020-09-01 淮阴工学院 Knowledge distillation-based multi-layer neural network language model training method and device
CN111667728A (en) * 2020-06-18 2020-09-15 苏州思必驰信息科技有限公司 Voice post-processing module training method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062489A (en) * 2019-12-11 2020-04-24 北京知道智慧信息技术有限公司 Knowledge distillation-based multi-language model compression method and device
CN111611377A (en) * 2020-04-22 2020-09-01 淮阴工学院 Knowledge distillation-based multi-layer neural network language model training method and device
CN111667728A (en) * 2020-06-18 2020-09-15 苏州思必驰信息科技有限公司 Voice post-processing module training method and device
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022194013A1 (en) * 2021-03-16 2022-09-22 Moffett International Co., Limited System and method for knowledge-preserving neural network pruning
CN115146522A (en) * 2021-03-31 2022-10-04 西门子股份公司 Model training method, diagnosis method, device, electronic device and readable medium
CN113177415A (en) * 2021-04-30 2021-07-27 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN113177415B (en) * 2021-04-30 2024-06-07 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN113360616A (en) * 2021-06-04 2021-09-07 科大讯飞股份有限公司 Automatic question-answering processing method, device, equipment and storage medium
CN113807540A (en) * 2021-09-17 2021-12-17 北京搜狗科技发展有限公司 Data processing method and device
CN113836940A (en) * 2021-09-26 2021-12-24 中国南方电网有限责任公司 Knowledge fusion method and device in electric power metering field and computer equipment
CN113836940B (en) * 2021-09-26 2024-04-12 南方电网数字电网研究院股份有限公司 Knowledge fusion method and device in electric power metering field and computer equipment
CN114360537A (en) * 2021-12-27 2022-04-15 科大讯飞股份有限公司 Spoken question and answer scoring method, spoken question and answer training method, computer equipment and storage medium
CN114694253A (en) * 2022-03-31 2022-07-01 深圳市爱深盈通信息技术有限公司 Behavior recognition model training method, behavior recognition method and related device
WO2024187785A1 (en) * 2023-03-10 2024-09-19 腾讯科技(深圳)有限公司 Text recognition method and apparatus, electronic device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN112232086A (en) Semantic recognition method and device, computer equipment and storage medium
CN109992773B (en) Word vector training method, system, device and medium based on multi-task learning
CN111368079B (en) Text classification method, model training method, device and storage medium
CN112131366A (en) Method, device and storage medium for training text classification model and text classification
CN110234018B (en) Multimedia content description generation method, training method, device, equipment and medium
CN111898374B (en) Text recognition method, device, storage medium and electronic equipment
CN110442718A (en) Sentence processing method, device and server and storage medium
CN112131430B (en) Video clustering method, device, storage medium and electronic equipment
CN112527993B (en) Cross-media hierarchical deep video question-answer reasoning framework
CN111563158A (en) Text sorting method, sorting device, server and computer-readable storage medium
CN113392179A (en) Text labeling method and device, electronic equipment and storage medium
CN114282528A (en) Keyword extraction method, device, equipment and storage medium
CN113761220A (en) Information acquisition method, device, equipment and storage medium
CN114648032B (en) Training method and device of semantic understanding model and computer equipment
CN113761887A (en) Matching method and device based on text processing, computer equipment and storage medium
CN113705191A (en) Method, device and equipment for generating sample statement and storage medium
CN116956116A (en) Text processing method and device, storage medium and electronic equipment
CN117216197A (en) Answer reasoning method, device, equipment and storage medium
CN117711001B (en) Image processing method, device, equipment and medium
CN110502613A (en) A kind of model training method, intelligent search method, device and storage medium
CN116976283A (en) Language processing method, training method, device, equipment, medium and program product
CN115186073A (en) Open domain table text question-answering method based on hybrid retrieval
CN112052320B (en) Information processing method, device and computer readable storage medium
CN118568568B (en) Training method of content classification model and related equipment
CN118228718B (en) Encoder processing method, text processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40037818

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination