CN117009532B

CN117009532B - Semantic type recognition method and device, computer readable medium and electronic equipment

Info

Publication number: CN117009532B
Application number: CN202311222099.5A
Authority: CN
Inventors: 童丽霞; 黄金生; 雷植程; 吴启辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-12-19
Anticipated expiration: 2043-09-21
Also published as: CN117009532A

Abstract

The application discloses a semantic type recognition method, a semantic type recognition device, a computer readable medium and electronic equipment, and relates to a natural language processing technology, wherein the semantic type recognition method comprises the following steps: acquiring semantic information to be identified in a target service scene; extracting features of semantic information to be identified through a target semantic model corresponding to a target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes after performing model training according to the sample semantic information corresponding to the target service scenes; and determining the semantic type corresponding to the semantic information to be identified according to the target feature vector. According to the technical scheme, the target semantic model has higher service scene adaptability and higher accuracy in semantic type identification in the target service scene, so that the effect of improving the generalization capability of the model under the condition of accurately identifying the semantic type is achieved.

Description

Semantic type recognition method and device, computer readable medium and electronic equipment

Technical Field

The application belongs to the technical field of computers, and particularly relates to a semantic type identification method, a semantic type identification device, a computer readable medium and electronic equipment.

Background

With the development of natural language processing technology, products developed based on the natural language processing technology are used in more and more scenes to improve service efficiency and reduce labor cost, such as an intelligent customer service system, an intelligent question-answering system and the like. In these scenarios, identifying semantic information of a service object is the basis for providing high quality services. At present, a common semantic information recognition method is to use a trained model to recognize semantic information, wherein the model is usually obtained by training a large amount of training data under a specific scene, and has more accurate capability for semantic recognition under the specific scene. However, since the model is only specific to a specific scene, the generalization of the model is poor, and when a business scene is changed, model training needs to be performed again based on data in the changed scene, and the training time is long.

Disclosure of Invention

The purpose of the application is to provide a semantic type recognition method, a semantic type recognition device, a computer readable medium and electronic equipment, so that the generalization capability of a model is improved under the condition of accurately recognizing semantic types.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.

According to an aspect of the embodiments of the present application, there is provided a semantic type recognition method, including:

acquiring semantic information to be identified in a target service scene;

extracting features of the semantic information to be identified through a target semantic model corresponding to the target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes after performing model training according to the sample semantic information corresponding to the target service scenes;

and determining the semantic type corresponding to the semantic information to be identified according to the target feature vector.

According to an aspect of the embodiments of the present application, there is provided a semantic type recognition apparatus, including:

the to-be-identified information acquisition module is used for acquiring to-be-identified semantic information in the target service scene;

the feature extraction module is used for extracting features of the semantic information to be identified through a target semantic model corresponding to the target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes after performing model training according to the sample semantic information corresponding to the target service scenes;

The semantic type determining module is used for determining the semantic type corresponding to the semantic information to be identified according to the target feature vector.

In one embodiment of the present application, the feature extraction module includes:

the vector mapping unit is used for carrying out vector mapping on the semantic information to be identified through the input layer of the target semantic model to obtain a first feature vector;

the coding unit is used for coding the first feature vector through a coding layer of the target semantic model to obtain a second feature vector, wherein the coding layer comprises candidate semantic models obtained by model training according to sample semantic information corresponding to a plurality of business scenes;

and the feature extraction unit is used for extracting features of the second feature vector through a characterization layer of the target semantic model to obtain the target feature vector, wherein the vector dimension of the target feature vector is smaller than that of the first feature vector.

In one embodiment of the present application, the vector mapping unit is specifically configured to:

word segmentation is carried out on the semantic information to be identified, and a word sequence to be identified is obtained;

and carrying out vector mapping on each word in the word sequence to be recognized to obtain the first feature vector.

In one embodiment of the present application, the feature extraction unit is specifically configured to:

pooling the second feature vector to obtain a sentence vector to be identified;

and performing dimension reduction processing on the sentence vector to be identified to obtain the target feature vector.

In one embodiment of the present application, the apparatus further comprises:

the sample data acquisition module is used for acquiring a positive sample data set and a negative sample data set corresponding to each preset semantic type in the target service scene; the positive sample data set comprises a plurality of positive sample semantic information matched with the preset semantic type, and the negative sample data set comprises a plurality of negative sample data semantic information not matched with the preset semantic type;

the model construction module is used for constructing a sample data set according to a positive sample data set and a negative sample data set corresponding to each preset semantic type, and constructing a semantic type recognition model according to a candidate semantic model obtained by model training according to sample semantic information corresponding to the plurality of business scenes;

and the target model training module is used for training the semantic type recognition model based on the sample data set to obtain the target semantic model.

In one embodiment of the present application, the model building module is specifically configured to:

pairing positive sample semantic information in the positive sample data set corresponding to the preset semantic type to generate a plurality of positive sample data pairs;

pairing the positive sample semantic information in the positive sample data set corresponding to the preset semantic type with the negative sample semantic information in the negative sample data set corresponding to the preset semantic type to form a plurality of negative sample data pairs;

generating a sample data set corresponding to the preset semantic type according to the positive sample data pairs and the negative sample data pairs;

and constructing a sample data set based on the sample data set corresponding to each preset semantic type.

In one embodiment of the present application, the sample data set includes a plurality of sample data pairs including first data, second data, and matching tags for the first data and the second data; the model training module is specifically used for:

respectively extracting features of first data and second data in the sample data pair to obtain sample features corresponding to the first data and sample features corresponding to the second data;

And calculating a loss function according to the distance between the sample characteristic corresponding to the first data and the sample characteristic corresponding to the second data and the matching label of the first data and the second data.

In one embodiment of the present application, the apparatus further comprises:

the training data set dividing module is used for dividing the sample semantic information corresponding to the plurality of business scenes into a plurality of training data sets;

and the candidate model training module is used for training a preset model according to a plurality of training data sets to obtain the candidate semantic model, and the candidate semantic model is used for constructing the target semantic model.

In one embodiment of the present application, the training data set partitioning module is specifically configured to:

classifying sample semantic information corresponding to the plurality of business scenes to obtain sample information sets corresponding to all business semantic types;

determining a target type and a preset number of candidate types from each service semantic type;

extracting two pieces of sample semantic information from the sample information set corresponding to the target type to form a sample pair corresponding to the target type, and extracting two pieces of sample semantic information from the sample information set corresponding to each candidate type to form a sample pair corresponding to each candidate type;

And generating the training data set according to the sample pairs corresponding to the target types and the sample pairs corresponding to the candidate types.

In one embodiment of the present application, the training data set includes a plurality of sample pairs; the candidate model training module comprises:

a data identification unit, configured to select a sample pair from the training data set, take one sample semantic information in the selected sample pair as sample information to be identified, take the other sample semantic information as positive sample information matched with the sample information to be identified, and take sample semantic information in the training data set except for the selected sample pair as negative sample information not matched with the sample information to be identified;

and the training unit is used for training a preset model according to the sample information to be identified, the positive sample information and the negative sample information.

In one embodiment of the present application, the training unit is specifically configured to:

respectively extracting the characteristics of the sample information to be identified, the positive sample information and the negative sample information to obtain sample characteristics to be identified, positive sample characteristics and negative sample characteristics;

calculating a first representative distance between the sample feature to be identified and the positive sample feature, and calculating a second representative distance between the sample feature to be identified and a target sample feature, the target sample feature comprising the positive sample feature and the negative sample feature;

A loss function is calculated from the first representative distance and the second representative distance.

In one embodiment of the present application, the semantic type determination module comprises:

the similarity calculation unit is used for calculating the similarity between the target feature vector and a plurality of preset feature vectors, wherein the preset feature vectors are obtained by extracting features of semantic information of preset semantic types through the target semantic model;

the semantic type determining unit is used for determining the semantic type corresponding to the semantic information to be identified according to the similarity between the target feature vector and each preset feature vector.

In an embodiment of the present application, the semantic type determining unit is specifically configured to:

determining the maximum similarity from the similarity between the target feature vector and each preset feature vector;

and when the maximum similarity is larger than a preset threshold, taking the preset semantic type to which the preset feature vector corresponding to the maximum similarity belongs as the semantic type corresponding to the semantic information to be identified.

According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a semantic type recognition method as in the above technical solution.

According to an aspect of the embodiments of the present application, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein execution of the executable instructions by the processor causes the electronic device to perform the semantic type recognition method as in the above technical solution.

According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the semantic type recognition method as in the above technical solution.

In the technical scheme provided by the embodiment of the application, when semantic type recognition is performed on the semantic information to be recognized in the target service scene, the target semantic model corresponding to the target service scene is used for extracting the characteristics of the semantic information to be recognized to obtain the target feature vector corresponding to the semantic information to be recognized, and then the semantic type corresponding to the semantic information to be recognized is determined according to the target feature vector.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.

Fig. 2 schematically shows a schematic diagram of an application scenario of the technical solution of the present application.

Fig. 3 schematically illustrates a flowchart of a semantic type recognition method provided in one embodiment of the present application.

Fig. 4 schematically illustrates a flowchart of a semantic type recognition method provided in one embodiment of the present application.

FIG. 5 schematically illustrates a schematic diagram of a semantic type recognition model training process provided by one embodiment of the present application.

Fig. 6 schematically illustrates a flowchart of a semantic type recognition method provided in one embodiment of the present application.

FIG. 7 schematically illustrates a training process of a preset model provided in one embodiment of the present application.

Fig. 8 schematically shows a block diagram of the semantic type recognition apparatus provided in the embodiment of the present application.

Fig. 9 schematically shows a block diagram of a computer system suitable for use in implementing embodiments of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

As shown in fig. 1, system architecture 100 may include a terminal device 110, a network 120, and a server 130. Terminal device 110 may include a smart phone, tablet, notebook, smart voice interaction device, smart home appliance, vehicle terminal, aircraft, and the like. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 120 may be a communication medium of various connection types capable of providing a communication link between terminal device 110 and server 130, and may be, for example, a wired communication link or a wireless communication link.

The system architecture in the embodiments of the present application may have any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to the terminal device 110, or may be applied to the server 130, or may be implemented by the terminal device 110 and the server 130 together, which is not limited in particular in this application.

For example, the terminal device 110 obtains the semantic information to be identified in the target service scenario, and sends the semantic information to be identified to the server 130 through the network 120. The server 130 performs feature extraction on the semantic information to be identified through a target semantic model corresponding to the target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes and then performing model training according to sample semantic information corresponding to the target service scenes. Then, the server 130 determines a semantic type corresponding to the semantic information to be recognized according to the target feature vector. Finally, the server 130 feeds back the semantic type corresponding to the semantic information to be identified to the terminal device 110 through the network 120.

Fig. 2 schematically illustrates a schematic view of an application scenario of the technical solution of the present application. As shown in fig. 2, the technical solution of the present application may be implemented by the robot 220 in the intelligent response scenario, that is, the target service scenario is the intelligent response scenario. The user 210 sends voice information to the robot 220, and the voice information can be used as semantic information to be recognized acquired by the robot 220. Before intelligent response is performed, the robot 220 may perform model training through sample semantic information corresponding to a plurality of service scenarios, and then perform model training by using the sample semantic information in the intelligent response scenario to obtain a target semantic model. When intelligent response is performed, the robot 220 inputs the acquired voice information into the target semantic model to obtain a target feature vector corresponding to the voice information, and then determines a semantic type corresponding to the voice information according to the target feature vector. Next, the robot 220 may select response information matching the voice information of the user 210 according to the semantic type corresponding to the recognized voice information, and feed back the response information to the user 210.

In the present application, the target semantic model may be implemented based on machine learning techniques. Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The determination of the semantic type corresponding to the semantic information to be identified may be implemented by natural language processing techniques. Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

The semantic type recognition method provided by the application is described in detail below with reference to the specific embodiments.

Fig. 3 schematically illustrates a flowchart of a semantic type recognition method according to an embodiment of the present application, where the method may be implemented by a semantic type recognition apparatus according to any embodiment of the present application, where the apparatus may be provided in a terminal device or a server, such as the terminal device 110 or the server 130 shown in fig. 1, and an implementation procedure of the method will be described below with the apparatus as an execution body. As shown in fig. 3, the semantic type recognition method provided in the embodiment of the present application includes steps 310 to 330, which are specifically as follows:

Step 310, obtaining semantic information to be identified in the target service scene.

In particular, in some fields where semantic information needs to be identified, various business scenarios are often provided. For example, for the customer service domain, from the service initiator, an internal call scenario and an external call scenario are included, where the internal call scenario refers to that a customer actively initiates a call to the service party, and the external call scenario refers to that the service party actively initiates a call to the customer party. Differentiated from the service contents, the business scenario may include a consultation scenario, an information inquiry scenario, a appeal answer scenario, and the like. The target business scenario is a specified business scenario. The semantic information to be identified can be voice data or text data.

In one embodiment of the present application, when the obtained semantic information to be recognized is voice data, the semantic type recognition device performs data format conversion on the semantic information to be recognized, and converts the semantic information to be recognized into text data, so as to facilitate subsequent input of a target semantic model.

Step 320, extracting features of the semantic information to be identified through a target semantic model corresponding to the target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes and then performing model training according to sample semantic information corresponding to the target service scenes.

Specifically, the target semantic model is generated through two model training. The semantic type recognition device firstly carries out model training according to sample semantic information corresponding to a plurality of business scenes to obtain candidate semantic models, so that the candidate semantic models can carry out semantic type recognition on the semantic information under the business scenes, and the model universality is high. Then, the semantic type recognition device carries out model training again according to the candidate semantic model and sample semantic information in the target service scene to obtain a target semantic model, and the target semantic model can accurately recognize the semantic information in the target service scene.

When the semantic type recognition is carried out on the semantic information to be recognized by the target semantic model, firstly, feature extraction is carried out on the semantic information to be recognized to obtain a target feature vector, namely, the semantic information is converted into a vector form, so that the semantic type can be conveniently judged according to the target feature vector.

In one embodiment of the present application, a process for extracting features of semantic information to be identified includes: vector mapping is carried out on semantic information to be identified through an input layer of a target semantic model, and a first feature vector is obtained; the first feature vector is encoded through an encoding layer of the target semantic model to obtain a second feature vector, wherein the encoding layer comprises candidate semantic models obtained by model training according to sample semantic information corresponding to a plurality of business scenes; and extracting features of the second feature vector through a characterization layer of the target semantic model to obtain a target feature vector, wherein the vector dimension of the target feature vector is smaller than that of the first feature vector.

Specifically, the target semantic model includes an input layer, an encoding layer, and a characterization layer. The input layer is used for carrying out vector mapping on input data of the model so as to convert the input data into a vector form, and the semantic information to be identified is processed by the input layer to obtain a first feature vector. The encoding layer is configured to encode output data of the input layer, and in this embodiment, the encoding layer of the target semantic model is configured by a candidate semantic model obtained by performing model training according to sample semantic information corresponding to a plurality of service scenarios, so that the first feature vector is processed by the candidate semantic model to obtain the second feature vector. The characterization layer is used for carrying out feature extraction on the second feature vector again to obtain a target feature vector. In order to fully embody the meaning of the semantic information to be recognized in the form of vectors, the first feature vector processed by the input layer generally has higher vector dimension, and the second feature vector processed by the coding layer has richer expression and also has higher vector dimension. The higher vector dimension can make the data volume of model calculation larger, in order to reduce the calculation volume of the subsequent step, after the feature extraction is carried out by the characterization layer, the vector dimension is reduced, namely, the vector dimension of the target feature vector is smaller than the vector dimension of the first feature vector, so that the semantic type recognition efficiency can be improved.

In one embodiment of the application, when the input layer performs vector mapping on semantic information to be identified, firstly, word segmentation processing is performed on the semantic information to be identified to obtain a word sequence to be identified; and then, carrying out vector mapping on each word in the word sequence to be recognized to obtain a first feature vector.

Specifically, the semantic information to be recognized is text data, and when the text data is converted into a vector, word segmentation processing is required to be performed on the text data, so that a word sequence forming the semantic information to be recognized is obtained and recorded as the word sequence to be recognized. For example, the semantic information to be recognized may be matched with words in a preset dictionary, and when a word is recognized in the prediction dictionary to be matched with a certain piece of information in the semantic information to be recognized, then a word in the semantic information to be recognized is recognized. After word segmentation, vector mapping can be performed on each word, each word in the word sequence to be recognized is converted into a vector corresponding to the word, and then the word sequence to be recognized is converted into a word vector sequence, and the word vector sequence is the first feature vector. For example, vector mapping of individual words may be achieved by way of Word Embedding (Word Embedding).

For the semantic information X to be identified, the semantic information X is segmented into words to obtain a word sequence to be identified Wherein->Representing the ith word in the semantic information X to be identified. After word embedding processing is carried out on each word, a first feature vector is obtained>Wherein->Is->The corresponding word vector is used to determine the word vector,d represents the vector dimension, in one exemplary solution d=768.

In one embodiment of the present application, when the characterization layer performs feature extraction on the second feature vector, firstly, pooling processing is performed on the second feature vector to obtain a sentence vector to be identified; and then carrying out dimension reduction processing on the sentence vector to be identified to obtain a target feature vector.

Specifically, when the encoding layer encodes the first feature vector, the encoding layer performs feature processing on each word vector in the first feature vector, so that the obtained second feature vector includes the encoding vector corresponding to each word, that is, the second feature vector still belongs to the word vector. The semantic type recognition needs to recognize sentence information corresponding to the semantic information to be recognized, so that word vectors represented by the second feature vectors need to be converted into sentence vectors, and the purpose can be achieved through pooling processing. The pooling process is implemented by a pooling layer (pooling layer), and sentence vectors to be recognized are obtained after the pooling process. The pooling process may be an average pooling process (mean pooling), a maximum pooling process (max pooling), or the like. After the sentence vector to be identified is obtained, the sentence vector is further subjected to dimension reduction processing through a dimension reduction layer (dense layer) so as to reduce the vector dimension and obtain the target feature vector.

Exemplary, first eigenvectorAfter the encoding process, a second feature vector is obtained>Wherein->Is->Corresponding encoded vectors. For the second feature vectorAnd carrying out average pooling processing to obtain a sentence vector sent_pooling_vec to be identified, and carrying out full-connection dimension reduction processing on the sentence vector sent_pooling_vec to be identified through a dimension reduction layer to obtain a target feature vector sent_fc_vec, wherein the dimension of the target feature vector can be 256 dimensions.

Step 330, determining the semantic type corresponding to the semantic information to be identified according to the target feature vector.

Specifically, after the target feature vector is obtained, the semantic type recognition device can perform semantic type recognition according to the target feature vector to obtain the semantic type corresponding to the semantic information to be recognized. If the semantic type of the voice information to be recognized can be recognized, the semantic type of the voice information to be recognized should belong to one of a plurality of semantic types set in advance, for example, the preset semantic types include "technical consultation is required" and "technical consultation is not required". If the semantic type of the semantic information to be identified does not belong to any preset semantic type, the semantic type identification device can feed back the semantic type of the semantic information not to be identified.

In an embodiment of the present application, the step of determining, according to the target feature vector, the semantic type corresponding to the semantic information to be identified may be integrated into the target semantic model, that is, feature extraction is performed on the semantic information to be identified by the target semantic model, and the semantic type corresponding to the semantic information to be identified is output according to the extracted target feature vector.

In an embodiment of the present application, when determining the semantic type corresponding to the semantic information to be identified according to the target feature vector, the semantic type identification device may determine the semantic type corresponding to the semantic information to be identified based on the similarity between the target feature vector and a plurality of preset feature vectors. The preset feature vector is a feature vector corresponding to the semantic information of the preset semantic type, and the preset feature vector is obtained by extracting features of the semantic information of the preset semantic type through the target semantic model. The semantic type recognition device pre-stores preset feature vectors corresponding to semantic information of a plurality of preset semantic types, calculates the similarity between a target feature vector corresponding to the semantic information to be recognized and each pre-stored preset feature vector when the semantic information to be recognized is subjected to semantic type recognition, and then determines the semantic type corresponding to the semantic information to be recognized according to the similarity. The similarity can be represented by parameters such as cosine similarity, euclidean distance and the like, and the technical scheme of the application is not limited to the above.

In one embodiment of the application, when the semantic type in the target service scene changes, for example, a new semantic type is added, then the semantic type of the preset feature vector is updated, that is, the preset feature vector corresponding to the new semantic type is added, so that the new semantic type can be identified without training the target semantic model again, and the semantic type change can be conveniently and quickly dealt with.

In an embodiment of the present application, the semantic type recognition device may consider that the preset feature vector corresponding to the maximum similarity and the target feature vector belong to the same semantic type, and the semantic type corresponding to the semantic information to be recognized is the preset semantic type to which the preset feature vector corresponding to the maximum similarity belongs.

In one embodiment of the present application, when determining a semantic type of semantic information to be identified according to a similarity, the semantic type identification device first determines a maximum similarity from the similarities between the target feature vector and each preset feature vector; the maximum similarity is then compared to a preset threshold. When the maximum similarity is greater than a preset threshold, the preset feature vector corresponding to the maximum similarity is considered to be highly similar to the target feature vector, so that the preset semantic type to which the preset feature vector corresponding to the maximum similarity belongs can be used as the semantic type corresponding to the semantic information to be identified. When the maximum similarity is smaller than the preset threshold value, the fact that the preset feature vector similar to the target feature vector does not exist is indicated, and the semantic type corresponding to the semantic information to be identified cannot be determined, and then the semantic type identification device can feed back the information that the semantic type corresponding to the semantic information to be identified is not found.

Fig. 4 schematically shows a flowchart of a semantic type recognition method according to an embodiment of the present application, which is a further refinement of the above embodiment. As shown in fig. 4, the semantic type recognition method provided in the embodiment of the present application includes steps 410 to 460, which are specifically as follows:

step 410, acquiring a positive sample data set and a negative sample data set corresponding to each preset semantic type in a target service scene; the positive sample data set includes a plurality of positive sample semantic information matching a preset semantic type, and the negative sample data set includes a plurality of negative sample semantic information not matching the preset semantic type.

Specifically, the target semantic model needs to be trained according to sample semantic information in a target service scene, the sample semantic information in the target service scene comprises positive sample semantic information and negative sample semantic information, and the semantic types of the sample semantic information belong to known preset semantic types. The positive sample semantic information refers to sample semantic information matched with the preset semantic type to which the positive sample semantic information belongs, namely, the semantic type corresponding to the positive sample semantic information is the preset semantic type. The negative-sample semantic information refers to sample semantic information which is not matched with the preset semantic type to which the negative-sample semantic information belongs, specifically, the preset semantic type corresponding to the negative-sample semantic information is a semantic type obtained by misjudgment under normal conditions, namely, the negative-sample semantic information is indicated to have the corresponding preset semantic type, but in fact, the negative-sample semantic information does not belong to the preset semantic type, but is easily misjudged to belong to the preset semantic type.

For example, the preset semantic types include direct relatives, other relatives, and non-relatives, the positive sample semantic information corresponding to the direct relatives may be "this is digo for three", and the negative sample semantic information may be "this is not digo for three". It can be seen that the negative sample semantic information is different from the positive sample semantic information in meaning, and normally, the semantic type of "direct relative" cannot be obtained based on the negative sample semantic information "the second-go of the third person is not obtained, but the semantic type of" direct relative "is obtained due to the fact that the semantic type of" the second-go of the third person is different from the semantic type of the second-go of the third person "only in a word, the similarity is high, and the semantic type of" direct relative "can be obtained due to misjudgment in many cases.

In one embodiment of the present application, the construction process of the sample dataset includes: pairing positive sample semantic information in a positive sample data set corresponding to a preset semantic type to generate a plurality of positive sample data pairs; pairing positive sample semantic information in a positive sample data set corresponding to a preset semantic type with negative sample semantic information in a negative sample data set corresponding to the preset semantic type to form a plurality of negative sample data pairs; generating a sample data set corresponding to a preset semantic type according to the positive sample data pairs and the negative sample data pairs; and constructing a sample data set based on the sample data set corresponding to each preset semantic type.

Specifically, for each preset semantic type, positive sample semantic information in a positive sample data set corresponding to the preset semantic type is paired in pairs to obtain a plurality of positive sample data pairs. Meanwhile, for each positive sample semantic information in the positive sample data set, pairing the positive sample semantic information with each negative sample semantic information in the negative sample data set, and further obtaining a plurality of negative sample data pairs. After the positive sample data set and the negative sample data set of each preset semantic type are subjected to the operation as above, the sample data set corresponding to each preset semantic type is obtained, and all the sample data sets corresponding to the preset semantic types form the sample data set for model training.

As can be seen from the construction process of the sample data set, the sample data set comprises a plurality of sample data pairs, each sample data pair comprising two data, denoted as first data and second data, respectively. In some cases, the sample data pair further includes a matching tag between the two data, the matching tag being used to represent a match of semantic types of the two data. For example, when both data belong to positive sample semantic information, it is indicated that both data belong to data that matches a preset semantic type, and the corresponding matching tag may be set to a first value; when one of the two data is positive sample semantic information and the other is negative sample semantic information, that is, the preset semantic types of the two data are not matched, the corresponding matching label can be set to a second value.

Illustratively, the positive sample dataset is denoted pos_data, and the positive sample semantic information in the positive sample dataset is denoted p; the negative sample data set is denoted neg_data, and the negative sample semantic information in the negative sample data set is denoted n. Since the sample semantic information is in text form, the positive sample semantic information p may be abbreviated as sentence p and the negative sample semantic information n may be abbreviated as sentence n. For each preset semantic type, the sentences in the positive sample data set pos_data are paired two by two to obtain positive sample data pairs (sentences pi, sentences pj and 1), wherein the sentences pi represent the ith sentence in the positive sample data set pos_data, the sentences pj represent the jth sentence in the positive sample data set pos_data, and 1 is the value of a matching tag in the positive sample data pairs. Then traversing the negative sample data set neg_data, pairing each sentence in the positive sample data set pos_data with all sentences in the negative sample data set pos_data to obtain a negative sample data pair (sentences n1, sentences p1, 0), wherein 0 is the value of a matched tag in the negative sample data pair. And finally, summarizing all the sample data pairs to obtain a sample data set.

According to the method, the device and the system, the inter-class distance of similar sample data is shortened by constructing the positive sample data pair, and the feature difference between different types of samples is enlarged by constructing the negative sample data pair, so that the target semantic model obtained based on sample data set training can accurately identify semantic information conforming to the preset semantic type, semantic information not conforming to the preset semantic type can also be accurately judged, the misjudgment rate of the model is greatly reduced, and the accuracy of semantic type identification is improved.

Step 420, a sample data set is constructed according to a positive sample data set and a negative sample data set corresponding to each preset semantic type, and a semantic type recognition model is constructed according to a candidate semantic model obtained by model training according to sample semantic information corresponding to a plurality of business scenes.

Specifically, the positive sample data set and the negative sample data set together constitute a sample data set required for model training. On the other hand, a semantic type recognition model is built by using the candidate semantic model, and the candidate is trained by the model, so that a target semantic model comprising the candidate semantic model can be obtained. When the semantic type recognition model is constructed, an input layer, an encoding layer and a characterization layer of the semantic type recognition model are required to be constructed, wherein the input layer is used for carrying out vector mapping on sample semantic information in a sample data set so as to convert the sample semantic information into a first sample feature vector, and the process specifically comprises word segmentation processing on the sample semantic information and vector mapping processing after word segmentation. The coding layer is composed of candidate semantic models and is used for coding the first sample feature vectors to obtain second sample feature vectors. The characterization layer is used for extracting sentence characteristics from the second sample characteristic vector and performing dimension reduction processing, the characterization layer can be composed of a pooling layer and a dimension reduction layer, and the pooling layer is used for extracting sentence characteristics from the second sample characteristic vector so as to convert a word vector sequence represented by the second sample characteristic vector into a sample sentence vector; the dimension reduction layer is used for carrying out dimension reduction processing on the sample sentence vector so as to reduce the vector dimension of the sample sentence vector.

Step 430, training the semantic type recognition model based on the sample data set to obtain a target semantic model.

Specifically, in the model training process, data in a sample data set is input into a semantic type recognition model to obtain output data of the semantic type recognition model, the output data is compared with the input data of the model, a loss function is calculated, and model parameters are adjusted according to the loss function. And then after the model parameters are adjusted, continuing to repeat the process by using the data in the sample data set until the loss function is lower than a threshold value or the model training times (iteration times) reach preset times, and completing the model training to obtain the target semantic model.

In one embodiment of the present application, during training of the model, the input data of the model is pairs of sample data in the sample data set. The semantic type recognition model respectively performs feature extraction on the first data and the second data in the sample data pair to obtain sample features corresponding to the first data and sample features corresponding to the second data; and then calculating a loss function according to the distance between the sample characteristic corresponding to the first data and the sample characteristic corresponding to the second data and the matching label of the first data and the second data.

The loss function L1 in the semantic type recognition model training process is calculated as follows:

wherein d represents a distance, e.g., euclidean distance, between the sample feature corresponding to the first data and the sample feature corresponding to the second data; y represents a matching tag of the first data and the second data, y=1 represents similarity or matching of semantic types of the two data, and y=0 represents mismatch of semantic types of the two data; the margin is a preset threshold, and in this embodiment, the margin takes a value of 1.N is the number of the sample data pair, and N is the total number of the sample data pairs for each training. In this embodiment, after the data in the sample data set is batched, model training is performed, that is, each model training is performed on data of one batch, N represents the data amount of one batch, that is, the batch size, and the loss function L1 is actually a loss function of one batch N. N is related to the memory of the training machine, and the larger the memory is, the larger the value of N can be set, for example, N can be 64, 128, etc.

By way of example, FIG. 5 schematically illustrates a schematic diagram of a semantic type recognition model training process provided by one embodiment of the present application, it being understood that the target semantic model is a trained semantic type recognition model, and therefore, the two are structurally identical. As shown in FIG. 5, the semantic type recognition model performs word segmentation processing on first data and second data in a sample data pair to obtain corresponding word sequences Word embedding is then performed on the word sequence to obtain a corresponding first sample feature vector (not shown in fig. 5). And then inputting the first sample feature vector into a candidate semantic model for coding processing to obtain a second sample feature vector, and sequentially carrying out pooling processing and dimension reduction processing on the second sample feature vector to respectively obtain sample features corresponding to the first data and sample features corresponding to the second data. Next, similarity label (similarity label) calculation is performed according to the sample feature corresponding to the first data and the sample feature corresponding to the second data, that is, a distance between the sample feature corresponding to the first data and the sample feature corresponding to the second data is calculated. And then, calculating a contrast loss (namely calculating a loss function according to the distance between the sample characteristic corresponding to the first data and the sample characteristic corresponding to the second data and the matching label of the first data and the second data, wherein the calculating mode refers to a calculating formula of the loss function L1. FinallyUpdating the model parameters according to the loss function, and continuing to repeat the above process until the model convergence condition is reached (the loss function is lower than a threshold value or the model training times (iteration times) reach the preset times).

Step 440, obtaining semantic information to be identified in the target service scene.

Step 450, extracting features of the semantic information to be identified through a target semantic model corresponding to the target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes and then performing model training according to sample semantic information corresponding to the target service scenes.

Step 460, determining the semantic type corresponding to the semantic information to be identified according to the target feature vector.

Steps 440-460 are the same as steps 310-330 in the previous embodiments, and are not described here.

Fig. 6 schematically shows a flowchart of a semantic type recognition method according to an embodiment of the present application, which is a further refinement of the above embodiment. As shown in fig. 6, the semantic type recognition method provided in the embodiment of the present application includes steps 610 to 650, which are specifically as follows:

step 610, dividing the sample semantic information corresponding to the multiple business scenes into multiple training data sets.

Specifically, sample semantic information corresponding to a plurality of business scenes is used for training candidate semantic models, and in the training process, the sample semantic information is divided into a plurality of batches for training, wherein one batch corresponds to one training data set. Each business scenario may include multiple types of semantic information, and the semantic information type included in each business scenario is the business semantic type, so that sample semantic information corresponding to multiple business scenarios is sample semantic information corresponding to multiple business semantic types. When the training data set is generated, part of sample semantic information can be extracted from sample semantic information corresponding to each business semantic type to form the training data set.

In one embodiment of the present application, the process of constructing the training data set includes: classifying sample semantic information corresponding to a plurality of business scenes to obtain sample information sets corresponding to all business semantic types; determining a target type and a preset number of candidate types from each service semantic type; extracting two pieces of sample semantic information from a sample information set corresponding to the target type to form a sample pair corresponding to the target type, and extracting two pieces of sample semantic information from a sample information set corresponding to each candidate type to form a sample pair corresponding to each candidate type; and generating a training data set according to the sample pairs corresponding to the target types and the sample pairs corresponding to the candidate types.

Specifically, classifying sample semantic information corresponding to a plurality of business scenes can be achieved through clustering operations, such as kmeans clustering. The plurality of sample semantic information corresponding to one service semantic type forms a sample information set corresponding to the service semantic type. The data in the training dataset is also formed in the form of data pairs, denoted as sample pairs, and then a sample pair comprises two pieces of data, i.e. two pieces of sample semantic information. The training data set needs to contain sample semantic information of different service semantic types, so that two pieces of sample semantic information can be extracted from the sample information set of each service semantic type to form a sample pair, a training data set is further formed by the sample pair of each service semantic type, and the operation is repeated for a plurality of times to obtain a plurality of training data sets.

The number of pairs of samples included in the training data set is determined according to the set batch size, for example, the set batch size is 64, and one training data set includes 64 pairs of samples. The training data set includes the same number of categories of business semantic types as the corresponding batch size. Then, in generating the training data set, a target type and a preset number of candidate types are determined from the respective business semantic types, the preset number being the batch size minus one, and illustratively the batch size being 64, then the preset number being 63. Both the target type and the candidate type may be randomly selected, but there is no duplication of the target type and the candidate type. Then, two pieces of sample semantic information are extracted from the sample information set corresponding to the target type to generate a sample pair, and similarly, two pieces of sample semantic information are extracted from the sample information set corresponding to each candidate type to generate a sample pair. A training data set can be generated based on the sample pairs corresponding to the target type and the sample pairs corresponding to the candidate types. Repeating the above operation to obtain multiple training data sets.

In one embodiment of the present application, when determining the target type and the candidate type, one target type may be determined first, and then a preset number of candidate types may be randomly selected from the rest of service semantic types. After generating the training data set based on the target type and the candidate type determined at this time, when the training data is generated next time, keeping the target type determined at the previous time unchanged, and continuing to randomly select a preset number of candidate types from the rest of service semantic types, so that a new training data set can be generated. The operation is repeated for a plurality of times (the times can be preset), one target type can be used as a reference, a plurality of training data sets are obtained, a new target type can be determined later, the operation is continued to be repeated, more training data sets can be generated by less data, and the acquisition pressure of sample semantic information corresponding to a plurality of business scenes is reduced. In this process, when a preset number of candidate types are re-selected, the service semantic types which are not selected before can be preferentially selected as candidate semantic types, and likewise, when sample semantic information is extracted to form a sample pair, sample semantic information which is not selected before in the sample information set can be preferentially selected.

For example, assuming that the size of the training data set is 64, first, one class a is selected from all the service semantic classes as a target class, and two pieces of sample semantic information a_text1 and a_text2 are randomly extracted from the sample information set of the class a to form a sample pair (a_text1, a_text2) to be added into the training data set. Then, 63 classes B1, B2, B63 are randomly extracted from the service semantic classes other than the class a as candidate classes, two pieces of sample semantic information bi_text1 and bi_text2 are taken for the class Bi, and a sample pair (bi_text1, bi_text2) is formed and added into the training data set, wherein i=1, 2, and 63. The training data set of the batch may be expressed as { (a_text1, a_text2), (b1_text1, b1_text2), (b2_text1, b2_text2), … … (b63_text1, b63_text2) }.

And repeating the two steps for n times, and fully adopting the A-class data to obtain a plurality of training data sets. And finally, selecting a new category as a target type, and repeating the process until all categories are traversed.

And 620, training the preset model according to the training data sets to obtain a candidate semantic model, wherein the candidate semantic model is used for constructing a target semantic model.

In this embodiment, the preset model may be a model using contrast learning, such as an M3E (Moka Massive Mixed Embedding, moka large scale hybrid embedding) model. In the model training process, data in the training data set are input into a preset model to obtain output data of the preset model, the output data are compared with the input data of the preset model, a loss function is calculated, and model parameters are adjusted according to the loss function. And then after the model parameters are adjusted, continuing to repeat the process by using the data in the training data set until the model convergence condition is reached (for example, the loss function is lower than a threshold value or the model training times (iteration times) reach the preset times), and completing the model training to obtain the candidate semantic model. It should be noted that, the model convergence condition corresponding to the candidate semantic model and the model convergence condition corresponding to the target semantic model may be the same or different, which is not limited by the technical scheme of the present application.

In one embodiment of the present application, when training a preset model, data in a training data set is processed in advance to shorten the distance between similar sample classes, and narrowing the decision boundary ensures the effect of negative sampling in a batch (i.e., in the training data set). Specifically, firstly, selecting a sample pair from a training data set, taking one sample semantic information of the selected sample pair as sample information to be recognized, taking the other sample semantic information as positive sample information matched with the sample information to be recognized, and taking sample semantic information except the selected sample pair in the training data set as negative sample information not matched with the sample information to be recognized; and training the preset model according to the sample information to be identified, the positive sample information and the negative sample information.

Because two pieces of sample semantic information contained in the same sample pair in the training data set belong to the same service semantic type, one sample semantic information in the selected sample pair is used as sample information to be identified, and the other sample semantic information can be used as positive sample information matched with the sample information to be identifiedThereby shortening the distance between the similar samples. Meanwhile, sample semantic information of different sample pairs belongs to different service semantic types, and sample semantic information except for the selected sample pair in the training data set is used as negative sample information which is not matched with sample information to be identified ('in-pair'>) So that the difference between the negative sample information and the positive sample information can be clearly distinguished.

In one embodiment of the application, in the training process of a preset model, firstly, the preset model respectively performs feature extraction on sample information to be identified, positive sample information and negative sample information to obtain sample features to be identified, positive sample features and negative sample features; then, calculating a first representative distance between the sample feature to be identified and the positive sample feature, and calculating a second representative distance between the sample feature to be identified and the target sample feature, the target sample feature including the positive sample feature and the negative sample feature; finally, a loss function is calculated based on the first representative distance and the second representative distance.

The feature extraction is performed on the sample information to be identified, the positive sample information and the negative sample information respectively, and the feature extraction can be regarded as the coding processing or the vector mapping processing of the sample information to be identified, the positive sample information and the negative sample information so as to convert the text information into vector data, thereby obtaining corresponding sample features to be identified, positive sample features and negative sample features.

The loss function L2 in the training process of the preset model can be calculated by referring to the following formula:

where K is the total amount of target sample features, i.e. the sum of the number of positive and negative sample features. Since the sample pair in the training data set is composed of two pieces of sample semantic information, assuming that the training data set is N in size (i.e., the number of sample pairs is N), the total amount of sample semantic information is 2N, and one piece of sample information to be identified exists, so that the sum of the number of positive and negative sample semantic information is 2N-1, and k=2n-1. i is the target sample feature number, which starts from 0 and has a maximum value of K-1 (if i starts from 1, the maximum value is K).Representing the ith target sample feature, k ₊ Representing positive sample characteristics, k is set to 0 when the number of positive sample characteristics is set to 0 ₊ I.e. k ₀ . τ is a preset constant, and in this embodiment, τ is set to 0.5. Molecule- >For the first representation distance of the sample feature to be identified and the positive sample feature, denominator +.>For the second representative distance of the sample feature to be identified and the target sample feature (which is the sum of the representative distances corresponding to the sample feature to be identified and each target sample feature), the target sample feature refers to all sample features in the training data set (batch), including positive sample features and negative sample features.

By way of example, fig. 7 schematically illustrates a schematic diagram of a training process of a preset model provided in an embodiment of the present application. As shown in figure 7 of the drawings,representing sample information to be identified->Representing positive sample information, ++>Representing negative sample information. Preset model for sample information to be identified>And (5) performing coding (encoding) to obtain the sample characteristic q to be identified. Meanwhile, the preset model sets up the sequence (queue) composed of the positive sample information and the negative sample information>) Vector encoding (momentum encoder) processing is performed to obtain positive sample characteristics k ₀ And negative sample characteristics->Composed vector sequence (, a)>). Then, similarity (similarity) is calculated, the similarity is represented by a representative distance, and the similarity comprises calculating a sample characteristic q to be identified and a positive sample characteristic k ₀ A first representative distance between them, and calculating a sample feature q to be identified and a target sample feature (/ -) >) A second representative distance therebetween. And finally, calculating contrast loss (namely calculating a loss function according to the first representative distance and the second representative distance, wherein the calculation mode refers to a calculation formula of the loss function L2.

Step 630, obtaining semantic information to be identified in the target service scene.

Step 640, extracting features of the semantic information to be identified through a target semantic model corresponding to the target service scene to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes after performing model training according to the sample semantic information corresponding to the target service scenes;

step 650, determining the semantic type corresponding to the semantic information to be identified according to the target feature vector.

Steps 630-650 are the same as steps 310-330 in the previous embodiment, and the process of constructing the target semantic model may refer to the related descriptions in steps 410-430 in the previous embodiment, which are not repeated here.

In one application scenario of the application, the sample semantic information of the plurality of service scenarios may be a plurality of pieces of sample semantic information corresponding to each service semantic type in the customer service field, and model training is performed based on the plurality of pieces of sample semantic information corresponding to each service semantic type in the customer service field to obtain a candidate semantic model, where the candidate semantic model may identify the semantic information of each service semantic type in the customer field, which is equivalent to obtaining a semantic type identification model with better universality. Then, model training can be continued by using sample semantic information in the customer service outbound scene to obtain a target semantic model in the customer service outbound scene, which is equivalent to fine-tuning (fine-tune) of candidate semantic models by using data in the customer service outbound scene to obtain a model capable of accurately identifying semantic types in the customer service outbound scene, thereby improving the semantic type identification accuracy in the customer service outbound scene.

It can be seen that after the candidate semantic model is obtained, the model training can be continued by using the sample semantic information in the target service scene to obtain a model with higher semantic type identification accuracy in the target service scene, and the change of the target service scene can be flexibly adapted, so that the universality of the model is ensured, and the identification accuracy in the target service scene is improved.

By using the technical scheme of the application to evaluate a certain service, 100 pieces of data of a sample data set, 88 pieces of data of a positive test set and 40 pieces of data of a negative test set are compared with the effects of 5 different models of Word2Vec+ SIF, simCSE, SBert, text2Vec and a target semantic model (M3E) of the technical scheme of the application, for the positive test set, if the model judged type is consistent with a real label and exceeds a threshold value, the model is 1, otherwise, the model judged type is 0; for negative samples, 0 if the model determines that the category is consistent with the true label and exceeds the threshold. The specific evaluation results are shown in the following table. It can be seen that the target semantic model (M3E) of the technical scheme has the best effect, the positive sample accuracy is 90.6% (the accuracy is 1-error rate), the negative sample error rate is 0%, and the erroneous judgment rate is greatly reduced.

It should be noted that although the steps of the methods in the present application are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

The following describes an embodiment of an apparatus of the present application, which may be used to perform the semantic type recognition method in the above-described embodiments of the present application. Fig. 8 schematically shows a block diagram of the semantic type recognition apparatus provided in the embodiment of the present application. As shown in fig. 8, the semantic type recognition apparatus provided in the embodiment of the present application includes:

the to-be-identified information obtaining module 810 is configured to obtain to-be-identified semantic information in the target service scenario;

the feature extraction module 820 is configured to perform feature extraction on the semantic information to be identified through a target semantic model corresponding to the target service scene, so as to obtain a target feature vector corresponding to the semantic information to be identified; the target semantic model is obtained by performing model training according to sample semantic information corresponding to a plurality of service scenes after performing model training according to the sample semantic information corresponding to the target service scenes;

The semantic type determining module 830 is configured to determine a semantic type corresponding to the semantic information to be identified according to the target feature vector.

In one embodiment of the present application, the feature extraction module 820 includes:

pooling the second feature vector to obtain a sentence vector to be identified;

In one embodiment of the present application, the apparatus further comprises:

In one embodiment of the present application, the semantic type determination module 830 includes:

Specific details of the semantic type recognition device provided in each embodiment of the present application have been described in detail in the corresponding method embodiments, and are not described herein.

Fig. 9 schematically shows a block diagram of a computer system for implementing an electronic device according to an embodiment of the present application.

It should be noted that, the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 9, the computer system 900 includes a central processing unit 901 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 902 (ROM) or a program loaded from a storage portion 908 into a random access Memory 903 (Random Access Memory, RAM). In the random access memory 903, various programs and data required for system operation are also stored. The cpu 901, the rom 902, and the ram 903 are connected to each other via a bus 904. An Input/Output interface 905 (i.e., an I/O interface) is also connected to bus 904.

The following components are connected to the input/output interface 905: an input section 906 including a keyboard, a mouse, and the like; an output section 907 including a speaker and the like, such as a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a local area network card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the input/output interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. When executed by the central processor 901, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal that propagates in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A semantic type recognition method, comprising:

performing batch training of a model through sample semantic information corresponding to a plurality of business scenes to obtain candidate semantic models, wherein one training batch uses a training data set to perform model training, and the training data set comprises positive sample information and negative sample information generated according to the sample semantic information corresponding to the business scenes; in the training process of a training batch, taking positive sample information in the training data set as sample information to be identified, and calculating a loss function according to the representative distance between the sample information to be identified and other positive sample information in the training data set and the representative distance between the sample information to be identified and other sample information in the training data set;

constructing a semantic type recognition model by taking the candidate semantic model as a coding layer, generating a sample data set according to sample semantic information corresponding to a target service scene, and carrying out batch training on the semantic type recognition model according to the sample data set to obtain a target semantic model; the sample data set comprises sample data pairs generated according to sample semantic information corresponding to the target service scene; in the training process of a training batch, calculating a loss function according to the distance between sample data pairs in the training batch and the matching label between the sample data pairs;

Acquiring semantic information to be identified in a target service scene;

extracting features of the semantic information to be identified through the target semantic model to obtain a target feature vector corresponding to the semantic information to be identified;

2. The semantic type recognition method according to claim 1, wherein the feature extraction of the semantic information to be recognized by the target semantic model to obtain a target feature vector corresponding to the semantic information to be recognized comprises:

vector mapping is carried out on the semantic information to be identified through an input layer of the target semantic model, and a first feature vector is obtained;

the first feature vector is encoded through an encoding layer of the target semantic model to obtain a second feature vector, wherein the encoding layer comprises candidate semantic models obtained by model training according to sample semantic information corresponding to a plurality of business scenes;

and extracting features of the second feature vector through a characterization layer of the target semantic model to obtain the target feature vector, wherein the vector dimension of the target feature vector is smaller than that of the first feature vector.

3. The semantic type recognition method according to claim 2, wherein performing vector mapping on the semantic information to be recognized to obtain a first feature vector comprises:

4. The semantic type recognition method according to claim 2, wherein performing feature extraction on the second feature vector to obtain the target feature vector comprises:

pooling the second feature vector to obtain a sentence vector to be identified;

5. The semantic type recognition method according to claim 1, wherein constructing a semantic type recognition model by using the candidate semantic model as a coding layer, generating a sample data set according to sample semantic information corresponding to a target business scene, and performing batch training on the semantic type recognition model according to the sample data set to obtain a target semantic model, comprises:

acquiring a positive sample data set and a negative sample data set corresponding to each preset semantic type in the target service scene; the positive sample data set comprises a plurality of positive sample semantic information matched with the preset semantic type, and the negative sample data set comprises a plurality of negative sample data semantic information not matched with the preset semantic type;

Constructing a sample data set according to a positive sample data set and a negative sample data set corresponding to each preset semantic type, and constructing a semantic type recognition model for a coding layer according to a candidate semantic model obtained by model training according to sample semantic information corresponding to a plurality of business scenes;

training the semantic type recognition model based on the sample data set to obtain the target semantic model.

6. The semantic type recognition method according to claim 5, wherein constructing a sample data set from the positive sample data set and the negative sample data set corresponding to each preset semantic type comprises:

7. The semantic type recognition method of claim 5, wherein the sample data set comprises a plurality of sample data pairs, the sample data pairs comprising first data, second data, and matching tags for the first data and the second data; in training the semantic type recognition model based on the sample dataset, the method comprises:

8. The semantic type recognition method according to claim 1, wherein the training of the model in batches is performed through sample semantic information corresponding to a plurality of business scenarios to obtain candidate semantic models, comprising:

dividing sample semantic information corresponding to the business scenes into a plurality of training data sets;

training a preset model according to a plurality of training data sets to obtain candidate semantic models, wherein the candidate semantic models are used for constructing the target semantic model.

9. The semantic type recognition method according to claim 8, wherein dividing the sample semantic information corresponding to the plurality of business scenarios into a plurality of training data sets comprises:

10. The semantic type recognition method of claim 8, wherein the training dataset comprises a plurality of sample pairs; training a preset model according to a plurality of training data sets, including:

selecting a sample pair from the training data set, taking one sample semantic information of the selected sample pair as sample information to be identified, taking the other sample semantic information as positive sample information matched with the sample information to be identified, and taking sample semantic information of the training data set except the selected sample pair as negative sample information not matched with the sample information to be identified;

Training a preset model according to the sample information to be identified, the positive sample information and the negative sample information.

11. The semantic type recognition method according to claim 10, wherein in training a preset model according to the sample information to be recognized, the positive sample information, and the negative sample information, the method comprises:

12. The semantic type recognition method according to any one of claims 1 to 11, wherein determining the semantic type corresponding to the semantic information to be recognized according to the target feature vector includes:

Calculating the similarity between the target feature vector and a plurality of preset feature vectors, wherein the preset feature vectors are obtained by extracting features of semantic information of preset semantic types through the target semantic model;

and determining the semantic type corresponding to the semantic information to be identified according to the similarity between the target feature vector and each preset feature vector.

13. The semantic type recognition method according to claim 12, wherein determining the semantic type corresponding to the semantic information to be recognized according to the similarity between the target feature vector and each preset feature vector comprises:

14. A semantic type recognition apparatus, comprising:

the candidate model training module is used for carrying out batch training on the model through sample semantic information corresponding to a plurality of business scenes to obtain a candidate semantic model, one training batch is used for carrying out model training by using a training data set, and the training data set comprises positive sample information and negative sample information generated according to the sample semantic information corresponding to the business scenes; in the training process of a training batch, taking positive sample information in the training data set as sample information to be identified, and calculating a loss function according to the representative distance between the sample information to be identified and other positive sample information in the training data set and the representative distance between the sample information to be identified and other sample information in the training data set;

The target model training module is used for constructing a semantic type recognition model by taking the candidate semantic model as a coding layer, generating a sample data set according to sample semantic information corresponding to a target service scene, and training the semantic type recognition model in batches according to the sample data set to obtain a target semantic model; the sample data set comprises sample data pairs generated according to sample semantic information corresponding to the target service scene; in the training process of a training batch, calculating a loss function according to the distance between sample data pairs in the training batch and the matching label between the sample data pairs;

the feature extraction module is used for extracting features of the semantic information to be identified through the target semantic model to obtain a target feature vector corresponding to the semantic information to be identified;

15. A computer readable medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the semantic type recognition method according to any one of claims 1 to 13.

16. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein execution of the executable instructions by the processor causes the electronic device to perform the semantic type recognition method of any one of claims 1 to 13.

17. A computer program product, the computer program product comprising computer instructions stored in a computer readable storage medium;

a processor of a computer device reads and executes the computer instructions from the computer-readable storage medium, causing the computer device to perform the semantic type recognition method of any one of claims 1 to 13.