CN116245103A

CN116245103A - Model training method, entity determining device, electronic equipment and medium

Info

Publication number: CN116245103A
Application number: CN202211091272.8A
Authority: CN
Inventors: 王泽勋; 冯明超; 陈蒙
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-06-09

Abstract

The embodiment of the disclosure discloses a model training method, an entity determining device, electronic equipment and a medium. One embodiment of the method comprises the following steps: acquiring a training sample and a sample label corresponding to the training sample; vector conversion is carried out on a plurality of identification texts by utilizing an initial text coding model so as to generate an initial conversion vector; inputting the sample entity into an initial entity coding model to generate an initial entity coding vector; inputting the initial conversion vector and the initial entity coding vector into an initial matching probability generation model to generate initial matching probability; and in response to determining that the initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generating model. The embodiment relates to artificial intelligence, and can generate a text coding model, an entity coding model and a matching probability generation model with higher entity retrieval accuracy.

Description

Model training method, entity determining device, electronic equipment and medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a model training method, an entity determining method, an apparatus, an electronic device, and a medium.

Background

Currently, the entity retrieval technology corresponding to the voice dialogue of the user is a technology for retrieving each entity mentioned by the user from an entity library. For retrieving individual entities from the target audio, the following is generally adopted: first, target audio is input to a speech recognition model, resulting in a recognized text sequence. Then, the recognition text with the highest text confidence coefficient is determined from the recognition text sequence. And finally, performing entity retrieval on the identification text with the highest text confidence coefficient by using an entity retrieval model to obtain each entity associated with the target audio.

However, when the entity retrieval model is trained in the above manner, there are often the following technical problems:

the recognition text with the highest recognition rate in the recognition text sequence output by the voice recognition model may have a problem of text recognition errors. Thus, the entity retrieval model is used for entity retrieval of the recognition text with the highest recognition rate, and the problem of retrieval errors exists.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a model training method, an entity determining method, an apparatus, an electronic device, and a medium to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a model training method, comprising: obtaining a training sample and a sample label corresponding to the training sample, wherein the training sample comprises: identifying text and sample entities involved in the audio sample for a plurality of the audio samples; performing vector conversion on the plurality of identification texts by using an initial text coding model to generate an initial conversion vector, wherein the initial conversion vector represents text characteristic information of the plurality of identification texts; inputting the sample entity into an initial entity coding model to generate an initial entity coding vector; inputting the initial conversion vector and the initial entity coding vector into an initial matching probability generation model to generate initial matching probability, wherein the initial matching probability represents a matching relationship between the initial conversion vector and the initial entity coding vector; and in response to determining that the initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generating model to obtain the text coding model, the entity coding model and the matching probability generating model, wherein the initial matching result is generated based on the initial matching probability.

Optionally, the vector converting the plurality of recognition texts by using the initial text coding model to generate an initial conversion vector includes: performing text splicing on the plurality of identification texts to obtain a spliced text; and inputting the spliced text into the initial text coding model to obtain the initial conversion vector.

Optionally, the vector converting the plurality of recognition texts by using the initial text coding model to generate an initial conversion vector includes: inputting the plurality of identification texts into the initial text coding model to obtain a plurality of initial text coding vectors; vector combination is carried out on the plurality of initial text coding vectors to obtain a combination matrix; and carrying out pooling operation on the combined matrix to obtain the initial conversion vector.

In a second aspect, some embodiments of the present disclosure provide an entity determining method, including: acquiring a plurality of identification texts aiming at target audio; vector converting the plurality of recognized texts by using a pre-trained text coding model to generate a conversion vector, wherein the conversion vector represents characteristic information of the plurality of recognized texts, and the text coding model is trained according to any one of the methods in the first aspect; performing entity coding on each entity in an entity library by using a pre-trained entity coding model to obtain an entity coding vector set, wherein the entity coding model is trained according to any one of the methods in the first aspect; and screening out the entity with the matching probability meeting the preset condition from the entity library by utilizing a pre-trained matching probability generation model as a target entity to obtain at least one target entity, wherein the matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector, and the matching probability generation model is trained according to the method in any one of the first aspect.

Optionally, the generating a model by using a pre-trained matching probability, screening an entity whose matching probability meets a preset condition from the entity library, as a target entity, including: determining a keyword set corresponding to the plurality of identification texts; screening out the entity with association relation with the keyword set from the entity library as an association entity to obtain an association entity set; screening entity coding vectors corresponding to each associated entity in the associated entity set from the entity coding vector set to be used as target entity coding vectors, and obtaining a target entity coding vector set; determining a matching probability between each target entity coding vector in the target entity coding vector set and the conversion vector by using the matching probability generation model; and screening out the associated entity with the corresponding matching probability meeting the preset condition from the associated entity set as a target entity.

Optionally, the generating a model by using a pre-trained matching probability, screening an entity whose matching probability meets a preset condition from the entity library, as a target entity, including: determining a matching probability between each entity coding vector in the entity coding vector set and the conversion vector by using the matching probability generation model; and screening out the entity with the matching probability meeting the preset condition from the entity library as a target entity.

In a third aspect, some embodiments of the present disclosure provide a model training apparatus comprising: the first obtaining unit is configured to obtain a training sample and a sample label corresponding to the training sample, wherein the training sample comprises: identifying text and sample entities involved in the audio sample for a plurality of the audio samples; a first vector conversion unit configured to perform vector conversion on the plurality of recognition texts using an initial text encoding model to generate an initial conversion vector, wherein the initial conversion vector characterizes text feature information of the plurality of recognition texts; a first input unit configured to input the sample entity to an initial entity encoding model to generate an initial entity encoding vector; a second input unit configured to input the initial conversion vector and the initial entity encoding vector into an initial matching probability generation model to generate an initial matching probability, wherein the initial matching probability characterizes a matching relationship between the initial conversion vector and the initial entity encoding vector; and a parameter training unit configured to perform parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generation model in response to determining that an initial matching result is different from the sample label, so as to obtain the text coding model, the entity coding model and the matching probability generation model, wherein the initial matching result is generated based on the initial matching probability.

Optionally, the first vector conversion unit is further configured to: performing text splicing on the plurality of identification texts to obtain a spliced text; and inputting the spliced text into the initial text coding model to obtain the initial conversion vector.

Optionally, the first vector conversion unit is further configured to: inputting the plurality of identification texts into the initial text coding model to obtain a plurality of initial text coding vectors; vector combination is carried out on the plurality of initial text coding vectors to obtain a combination matrix; and carrying out pooling operation on the combined matrix to obtain the initial conversion vector.

In a fourth aspect, some embodiments of the present disclosure provide an entity determining apparatus, including: a second acquisition unit configured to acquire a plurality of recognition texts for the target audio; a second vector conversion unit configured to vector-convert the plurality of recognition texts using a pre-trained text encoding model, the text encoding model being trained according to the method of any one of the first aspects, to generate a conversion vector, wherein the conversion vector characterizes characteristic information of the plurality of recognition texts; the entity coding unit is configured to perform entity coding on each entity in the entity library by utilizing a pre-trained entity coding model to obtain an entity coding vector set, wherein the entity coding model is trained according to any one of the methods in the first aspect; and the entity screening unit is configured to screen out an entity, corresponding to the matching probability meeting a preset condition, from the entity library by using a pre-trained matching probability generation model, as a target entity, so as to obtain at least one target entity, wherein the matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector, and the matching probability generation model is trained according to the method in any one of the first aspect.

Optionally, the entity screening unit is further configured to: determining a keyword set corresponding to the plurality of identification texts; screening out the entity with association relation with the keyword set from the entity library as an association entity to obtain an association entity set; screening entity coding vectors corresponding to each associated entity in the associated entity set from the entity coding vector set to be used as target entity coding vectors, and obtaining a target entity coding vector set; determining a matching probability between each target entity coding vector in the target entity coding vector set and the conversion vector by using the matching probability generation model; and screening out the associated entity with the corresponding matching probability meeting the preset condition from the associated entity set as a target entity.

Optionally, the entity screening unit is further configured to: determining a matching probability between each entity coding vector in the entity coding vector set and the conversion vector by using the matching probability generation model; and screening out the entity with the matching probability meeting the preset condition from the entity library as a target entity.

In a fifth aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method as described in any of the implementations of the first or second aspects.

In a sixth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as described in any of the implementations of the first or second aspects.

The above embodiments of the present disclosure have the following advantageous effects: the text coding model, the entity coding model and the matching probability generating model obtained by the model training method of some embodiments of the present disclosure improve the accuracy of retrieving entities. Specifically, the reason for the related entity retrieval not being accurate enough is that: the speech recognition model may have a problem of text recognition errors, resulting in a problem that subsequent entity retrieval may have a retrieval error. Based on this, the model training method of some embodiments of the present disclosure may first obtain training samples and sample tags. Wherein, the training sample comprises: a plurality of recognition texts for the audio sample and sample entities involved in the audio sample. Here, the plurality of recognition texts are a plurality of recognition results of the audio sample. The recognition result with the highest corresponding recognition probability among the plurality of recognition results may also have a text with a recognition error. And for the error information of the recognition result with the highest recognition probability, the rest recognition result sets may have the correct recognition result of the text recognition at the same position. Therefore, the sample entities involved in the multiple recognition texts and the audio samples are used as training samples, so that more text characteristic information in the multiple recognition texts is considered in the training process of the subsequent matching probability generation model. And the entity identification of the subsequent model generated based on the matching probability is more accurate. Then, vector conversion is performed on the plurality of recognition texts by using the initial text coding model to generate an initial conversion vector. Wherein the initial conversion vector characterizes text feature information of the plurality of recognized texts. Here, the initial conversion vector includes a plurality of text feature information of the recognized text by vector conversion of the recognized text. And more text characteristic information in a plurality of identification texts is considered in the training process of the subsequent matching probability generation model. Then, the initial entity coding model is utilized, and the initial entity coding vector corresponding to the sample entity is utilized. So as to facilitate the subsequent input of the initial matching probability generation model to perform initial matching probability calculation. And inputting the initial conversion vector and the initial entity coding vector into an initial matching probability generation model, so that the initial matching probability generation model generates more accurate initial matching probabilities according to text characteristic information of a plurality of identification texts. Wherein the initial matching probability characterizes a matching relationship between the initial conversion vector and the initial entity encoding vector. And finally, in response to determining that the initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generation model to obtain the text coding model, the entity coding model and the matching probability generation model, so that the output of the text coding model, the entity coding model and the matching probability generation model is more accurate. Wherein the initial matching result is generated based on the initial matching probability. Therefore, the entity is determined more accurately based on the text coding model, the entity coding model and the matching probability generating model trained by the model training method.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a schematic illustration of one application scenario of a model training method according to some embodiments of the present disclosure;

FIG. 2 is a flow chart of some embodiments of a model training method according to the present disclosure;

FIG. 3 is a flow chart of further embodiments of a model training method according to the present disclosure;

FIG. 4 is a schematic diagram of one application scenario of an entity determination method according to some embodiments of the present disclosure;

FIG. 5 is a flow chart of some embodiments of an entity determination method according to the present disclosure;

FIG. 6 is a schematic structural view of some embodiments of a model training apparatus according to the present disclosure;

FIG. 7 is a schematic diagram of the structure of some embodiments of an entity determining apparatus according to the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of one application scenario of a model training method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, first, the electronic device 101 may obtain a training sample 102 and a sample tag 113 corresponding to the training sample 102. Wherein the training sample 102 includes: a plurality of recognition texts 104 for the audio sample 103 and sample entities 105 involved in the audio sample 103 as described above. In the present application scenario, the plurality of recognition texts 104 may include: identifying text 1041, identifying text 1042, identifying text 1043. The identification text 1041 may be "play song achievement". The above-mentioned recognition text 1042 may be "play song dust". The above-described identification text 1043 may be "the degree of playing song". The sample entity 105 may be a "adult". Sample tag 113 may be a "1". The electronic device 101 may then vector convert the plurality of identified texts 104 using the initial text encoding model 106 to generate an initial conversion vector 107. Wherein the initial translation vector 107 characterizes text feature information of the plurality of identified texts 104. The electronic device 101 may then input the sample entities 105 to the initial entity encoding model 108 to generate an initial entity encoding vector 109. Further, the electronic device 101 may input the initial conversion vector 107 and the initial entity encoding vector 109 to an initial matching probability generation model 110 to generate an initial matching probability 111. Wherein the initial matching probability 111 characterizes a matching relationship between the initial conversion vector 107 and the initial entity encoding vector 109. In the present application scenario, the initial matching probability 111 may be "80%". Finally, in response to determining that the initial matching result 112 is different from the sample tag 113, the electronic device 101 may perform parameter training on the initial text encoding model 106, the initial entity encoding model 108, and the initial matching probability generating model 110 to obtain a text encoding model, an entity encoding model, and a matching probability generating model. Wherein the initial matching result 112 is generated based on the initial matching probability 111. In this application scenario, the initial matching result 112 may be "0".

The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware device. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices as desired for an implementation.

With continued reference to fig. 2, a flow 200 of some embodiments of a model training method according to the present disclosure is shown. The model training method comprises the following steps:

step 201, obtaining a training sample and a sample label corresponding to the training sample.

In some embodiments, the execution body of the model training method (for example, the electronic device 101 shown in fig. 1) may obtain the training sample and the sample tag corresponding to the training sample through a wired connection manner or a wireless connection manner. Wherein, the training sample comprises: a plurality of recognition texts for the audio sample and sample entities involved in the audio sample. The training samples may be samples for subsequent training of an initial text encoding model, an initial entity encoding model, and an initial matching probability generating model. The audio sample may be audio to be text recognized. The plurality of recognition texts are a plurality of character recognition results of the audio sample. The sample entity is an entity mentioned in the audio sample corresponding to the audio content. Where an Entity (Entity) is a named Entity involved in the audio content corresponding to an audio sample. The named entity may be an identified object in the audio content. For example, the identified object may be a person name, place name, organization name, date, etc. in the audio content. For example, the audio content is "play song achievement". The sample entity may be a "adult". The sample tag may be a pre-labeled tag. The sample tag may be a variety of forms of identification. For example, the sample tag may be a digital identification of "1" or "0". When the sample tag is 1, the sample entity may be characterized as the entity mentioned in the audio sample corresponding to the audio content. Accordingly, when the sample tag is 0, it may be characterized that the sample entity is not the entity mentioned in the audio content corresponding to the audio sample.

Alternatively, the audio sample is input to an audio recognition module, and a plurality of recognition texts can be obtained. That is, the audio recognition module may output a plurality of recognition text for the audio sample. The audio recognition module is a module for converting audio into recognition text. That is, the audio recognition module is a module employing an automatic speech recognition technique (ASR, automatic Speech Recognition).

There is a corresponding confidence level for each identified text. The confidence may characterize an association between the audio sample and the recognition text. The higher the confidence, the more representative the characterization recognition text is of the audio content of the audio sample. Conversely, the lower the confidence, the lower the degree to which the characterization recognition text represents the audio content of the audio sample.

It should be noted that, among the above-mentioned plurality of recognition texts, there may be a text in which the audio content of the recognition audio sample is correct, or may be a sample in which the recognition is incorrect.

Alternatively, the plurality of recognition texts may be output texts of the plurality of audio recognition modules. That is, the audio sample is input to the plurality of audio recognition modules, and a plurality of recognition texts are output.

Step 202, performing vector conversion on the plurality of recognition texts by using the initial text coding model to generate an initial conversion vector.

In some embodiments, the execution body may perform vector conversion on the plurality of recognition texts using an initial text encoding model to generate an initial conversion vector. The initial text encoding model may be a model for vector encoding text. Wherein the initial conversion vector characterizes text feature information of the plurality of recognized texts. The initial text encoding model may be, but is not limited to, one of the following: BERT (Bidirectional Encoder Representation from Transformers) model and TextCNN model. The initial text coding model is a text coding model after parameter initialization.

As an example, first, the execution subject may input a plurality of recognition texts into the initial text encoding model, resulting in a plurality of initial text encoding vectors. Then, the execution body may perform vector corresponding element averaging processing on the plurality of initial text encoding vectors, to obtain a processed vector, as an initial conversion vector.

In some optional implementations of some embodiments, the vector converting the plurality of recognition texts using an initial text encoding model to generate an initial conversion vector may include the steps of:

The first step, the executing body may perform text splicing on the plurality of recognition texts to obtain a spliced text.

In practice, the execution body can splice texts of a plurality of identification texts by inserting text identifiers, so as to obtain spliced texts.

As an example, the plurality of recognition texts may be { "play song degree", "play song achievement", "play song dust" }. The text identifier may be "[ SEP ]". The post-splice text may be "play song level [ SEP ] play song dust".

And secondly, the execution main body can input the spliced text into the initial text coding model to obtain the initial conversion vector.

In step 203, the sample entity is input to the initial entity encoding model to generate an initial entity encoding vector.

In some embodiments, the execution body may input the sample entity to an initial entity encoding model to generate an initial entity encoding vector. The initial entity coding model may be a model for vector coding of an entity. The initial entity coding model is the entity coding model after parameter initialization. The initial entity encoding vector may characterize entity characteristic information of the sample entity. The entity encoding model may be, but is not limited to, one of the following: word2vec model and TextCNN model.

Step 204, inputting the initial conversion vector and the initial entity coding vector into an initial matching probability generation model to generate an initial matching probability.

In some embodiments, the execution body may input the initial conversion vector and the initial entity encoding vector to an initial matching probability generation model to generate an initial matching probability. Wherein the initial matching probability characterizes a degree of matching between the initial conversion vector and the initial entity encoding vector. The higher the initial match probability, the more closely the initial conversion vector is characterized as matching the initial entity encoding vector. The initial matching probability generation model may be a model for determining a degree of matching of the initial conversion vector and the initial entity encoding vector to generate a matching probability. The initial entity coding model is the entity coding model after parameter initialization. The match probability generation model may be, but is not limited to, one of the following: depth residual network (Deep residual network, resNet) and VGG (Visual Geometry Group) models.

And step 205, in response to determining that the initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generation model.

In some embodiments, in response to determining that the initial matching result is different from the sample tag, the execution body may perform parameter training on the initial text encoding model, the initial entity encoding model, and the initial matching probability generation model to obtain a text encoding model, an entity encoding model, and a matching probability generation model. Wherein the initial matching result is generated based on the initial matching probability.

As an example, the initial matching result may be generated by:

first, determining whether the initial matching probability is larger than or equal to a preset probability value.

And secondly, generating an initial matching result representing that the initial entity coding vector is matched with the initial conversion vector in response to the fact that the initial matching probability is larger than or equal to a preset probability value.

And thirdly, generating an initial matching result representing that the initial entity coding vector is not matched with the initial conversion vector in response to the fact that the initial matching probability is smaller than a preset probability value.

With further reference to fig. 3, a flow 300 of further embodiments of the model training method according to the present disclosure is shown. The model training method comprises the following steps:

step 301, obtaining a training sample and a sample label corresponding to the training sample.

Step 302, inputting the plurality of recognition texts into the initial text coding model to obtain a plurality of initial text coding vectors.

In some embodiments, the executing body (e.g., the electronic device 101 shown in fig. 1) may input the plurality of recognition texts into the initial text encoding model to obtain a plurality of initial text encoding vectors.

And 303, vector combination is carried out on the plurality of initial text coding vectors to obtain a combination matrix.

In some embodiments, the execution body may perform vector combination on the plurality of initial text encoding vectors to obtain a combination matrix.

As an example, the vector dimension of the initial text encoding may be 1*d. Where d is the number of columns of the initial text encoding vector. The execution body may perform vector combination on a plurality of initial text vectors from top to bottom according to the order of the confidence coefficient corresponding to the recognition text from high to low, where the matrix of the obtained combination matrix is n×d. Where n is the number of initial text vectors in the plurality of initial text vectors. That is, each row vector in the combining matrix is an initial text encoding vector.

Step 304, performing pooling operation on the combined matrix to obtain the initial conversion vector.

In some embodiments, the execution body may pool the combination matrix to obtain the initial conversion vector. The pooling operation described above may be, but is not limited to, one of the following: average pooling operations and maximum pooling operations.

In step 305, the sample entity is input to the initial entity encoding model to generate an initial entity encoding vector.

Step 306, inputting the initial conversion vector and the initial entity encoding vector into an initial matching probability generation model to generate an initial matching probability.

And step 307, in response to determining that the initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generating model to obtain a text coding model, an entity coding model and a matching probability generating model.

In some embodiments, the specific implementation of steps 301, 305-307 and the technical effects thereof may refer to steps 201, 203-205 in the corresponding embodiment of fig. 2, which are not described herein.

As can be seen in fig. 3, the specific steps of generating an initial translation vector using an initial text encoding model are more emphasized by the flow 300 of the model training method in some embodiments corresponding to fig. 3 than in the description of some embodiments corresponding to fig. 2. Therefore, the schemes described in the embodiments can generate initial conversion vectors which can represent text characteristic information of a plurality of identification texts more accurately, so that training of subsequent models is more accurate.

Fig. 4 is a schematic diagram of an application scenario of an entity determination method according to some embodiments of the present disclosure.

In the application scenario of fig. 4, first, the electronic device 401 may obtain a plurality of recognition texts 403 for the target audio 402. In the present application scenario, the plurality of recognition texts 403 may include: recognition text 4031, recognition text 4032, and recognition text 4033. The identification text 4031 may be "play song achievement". The identification text 4032 may be "play song dust". The above-described recognition text 4033 may be "the degree of song play". The electronic device 401 may then vector convert the plurality of identified text 403 using the pre-trained text encoding model 404 to generate a converted vector 405. Wherein the translation vector 405 characterizes the feature information of the plurality of recognition texts 403, the text encoding model 404 is generated by a model training method of some embodiments of the present disclosure. The electronic device 401 may then use the pre-trained entity encoding model 407 to physically encode each entity in the entity library 406 to obtain the entity encoding vector set 408. Wherein the entity encoding model 408 is generated by the model training method of some embodiments of the present disclosure. In this application scenario, each entity in the entity library 406 includes: entity 4061, entity 4062, and entity 4063. The entity 4061 may be a "degree". The entity 4062 may be an "adult". The entity 4063 may be "dust". The set of entity-coded vectors 408 may include: entity encoding vector 4081 corresponding to entity 4061, entity encoding vector 4082 corresponding to entity 4062, and entity encoding vector 4083 corresponding to entity 4063. Finally, the electronic device 401 may utilize the pre-trained matching probability generation model 409 to screen out, from the entity library 406, the entity whose matching probability satisfies the preset condition, as the target entity, so as to obtain at least one target entity. The matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector, and the matching probability generation model is generated through a model training method of some embodiments of the present disclosure. In the present application scenario, the entity encoding vector 4081 and the conversion vector 405 are input to the matching probability generation model 409, resulting in the matching probability 4101. The match probability 4101 may be 50%. The entity encoding vector 4082 and the conversion vector 405 are input to the matching probability generation model 409, resulting in a matching probability 4102. The match probability 4102 may be 90%. The entity encoding vector 4083 and the conversion vector 405 are input to the matching probability generation model 409, resulting in a matching probability 4103. The match probability 4103 may be 60%. The entity in the entity library 406 having a corresponding match probability greater than 70% is determined to be the target entity. I.e. the target entity is entity 4062.

The electronic device 401 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware device. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 4 is merely illustrative. There may be any number of electronic devices as desired for an implementation.

With continued reference to fig. 5, a flow 500 of some embodiments of an entity determination method according to the present disclosure is shown. The entity determining method comprises the following steps:

step 501, a plurality of recognition texts for target audio is obtained.

In some embodiments, the entity determining method (e.g., the electronic device 401 shown in fig. 4) may obtain the plurality of recognition texts for the target audio through a wired connection manner or a wireless connection manner. The target audio is the audio of the entity to be identified. The multiple recognition texts of the target audio are not described herein, and a specific explanation may be referred to multiple recognition samples of the audio samples in step 201.

Step 502, performing vector conversion on the plurality of recognition texts by using a pre-trained text coding model to generate a conversion vector.

In some embodiments, the execution body may perform vector conversion on the plurality of recognition texts using a pre-trained text encoding model to generate a conversion vector. Wherein the transformation vector characterizes the feature information of the plurality of recognition texts, and the text coding model is generated by a model training method according to some embodiments of the present disclosure. The pre-trained text encoding model may be a model with updated model parameters.

As an example, the executing body performs vector conversion on the plurality of recognition texts using a pre-trained text coding model to generate a conversion vector, and may include the steps of:

And secondly, the execution main body can input the spliced text into the pre-trained text coding model to obtain the conversion vector.

As yet another example, the executing body performing vector conversion on the plurality of recognition texts using a pre-trained text coding model to generate a conversion vector may include the steps of:

In the first step, the execution body may input the plurality of recognition texts into the initial text encoding model to obtain a plurality of initial text encoding vectors.

And secondly, the execution main body can perform vector combination on the plurality of initial text coding vectors to obtain a combination matrix.

Thirdly, the execution body may perform pooling operation on the combination matrix to obtain the initial conversion vector.

And 503, performing entity coding on each entity in the entity library by utilizing a pre-trained entity coding model to obtain an entity coding vector set.

In some embodiments, the executing body may use a pre-trained entity encoding model to perform entity encoding on each entity in the entity library to obtain an entity encoding vector set. Wherein the entity coding model is generated by a model training method of some embodiments of the present disclosure. The pre-trained entity coding model can be a model with updated model parameters.

And 504, screening out the entity with the matching probability meeting the preset condition from the entity library by utilizing a pre-trained matching probability generation model, and obtaining at least one target entity by taking the entity as the target entity.

In some embodiments, the executing body may use a pre-trained matching probability generation model to screen out an entity whose matching probability meets a preset condition from the entity library, and obtain at least one target entity as the target entity. The matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector. The matching probability generation model is generated by a model training method of some embodiments of the present disclosure. The target entity is an entity which is predicted by the model and is involved in the audio content of the target audio.

In some optional implementations of some embodiments, the generating a model using pre-trained matching probabilities, screening, from the entity library, an entity whose corresponding matching probability meets a preset condition as a target entity may include the following steps:

in the first step, the executing body may determine keyword sets corresponding to the plurality of recognition texts.

As an example, the execution subject may perform text word segmentation processing on a plurality of recognition texts to obtain a word set. Then, keywords associated with the target domain (e.g., e-commerce domain) are determined from the word set, resulting in a keyword set. The target area may be a predetermined industry area.

And secondly, the execution subject can screen out the entity with the association relation with the keyword set from the entity library as an association entity to obtain an association entity set.

As an example, first, the execution subject may determine at least one homonym corresponding to each keyword in the keyword set, to obtain the homonym set. Then, the executing body may combine the homonym set and the keyword set to obtain a combined word set as the associated entity set.

And thirdly, the execution main body can screen out entity coding vectors corresponding to each associated entity in the associated entity set from the entity coding vector set to be used as target entity coding vectors, and a target entity coding vector set is obtained.

Fourth, the execution body may determine a matching probability between each target entity encoding vector in the target entity encoding vector set and the conversion vector using the matching probability generation model.

As an example, the execution body may input each target entity encoding vector in the target entity encoding vector set and the conversion vector into a matching probability generation model, to obtain a matching probability.

And fifthly, the execution subject can screen out the associated entity with the corresponding matching probability meeting the preset condition from the associated entity set as a target entity. The preset condition may be that the matching probability is greater than or equal to a preset probability value. For example, the preset probability value is 70%.

in the first step, the execution body may determine a matching probability between each entity encoding vector in the entity encoding vector set and the conversion vector using the matching probability generation model.

And secondly, the execution subject can screen out the entity with the matching probability meeting the preset condition from the entity library as a target entity. The preset condition may be that the matching probability is greater than or equal to a preset probability value. For example, the preset probability value is 70%.

The above embodiments of the present disclosure have the following advantageous effects: by using the feature information of a plurality of identification texts, a pre-trained text coding model, an entity detection model and a matching probability generation model, the entity determining method of some embodiments of the present disclosure can accurately determine at least one entity involved in the audio content corresponding to the target audio.

With further reference to fig. 6, as an implementation of the method illustrated in the above figures, the present disclosure provides some embodiments of a model training apparatus, which apparatus embodiments correspond to those illustrated in fig. 2, which apparatus is particularly applicable in a variety of electronic devices.

As shown in fig. 6, a model training apparatus 600 includes: a first acquisition unit 601, a first vector conversion unit 602, a first input unit 603, a second input unit 604, and a parameter training unit 605. The first obtaining unit 601 is configured to obtain a training sample and a sample label corresponding to the training sample, where the training sample includes: identifying text and sample entities involved in the audio sample for a plurality of the audio samples; a first vector conversion unit 602 configured to perform vector conversion on the plurality of recognition texts using an initial text encoding model, to generate an initial conversion vector, wherein the initial conversion vector characterizes text feature information of the plurality of recognition texts; a first input unit 603 configured to input the sample entities to an initial entity encoding model to generate an initial entity encoding vector; a second input unit 604 configured to input the initial conversion vector and the initial entity encoding vector into an initial matching probability generation model to generate an initial matching probability, wherein the initial matching probability characterizes a matching relationship between the initial conversion vector and the initial entity encoding vector; and a parameter training unit 605 configured to perform parameter training on the initial text coding model, the initial entity coding model, and the initial matching probability generating model in response to determining that an initial matching result is different from the sample label, to obtain the text coding model, the entity coding model, and the matching probability generating model, wherein the initial matching result is generated based on the initial matching probability.

In some optional implementations of some embodiments, the first vector conversion unit 602 may be further configured to: performing text splicing on the plurality of identification texts to obtain a spliced text; and inputting the spliced text into the initial text coding model to obtain the initial conversion vector.

In some optional implementations of some embodiments, the first vector conversion unit 602 may be further configured to: inputting the plurality of identification texts into the initial text coding model to obtain a plurality of initial text coding vectors; vector combination is carried out on the plurality of initial text coding vectors to obtain a combination matrix; and carrying out pooling operation on the combined matrix to obtain the initial conversion vector.

It will be appreciated that the elements described in the apparatus 600 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 600 and the units contained therein, and are not described in detail herein.

With further reference to fig. 7, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of an entity determining apparatus, which correspond to those method embodiments shown in fig. 5, and which are particularly applicable in various electronic devices.

As shown in fig. 7, an entity determining apparatus 700 includes: a second acquisition unit 701, a second vector conversion unit 702, an entity encoding unit 703, and an entity screening unit 704. Wherein the second acquisition unit 701 is configured to acquire a plurality of recognition texts for the target audio; a second vector conversion unit 702 configured to perform vector conversion on the plurality of recognition texts using a pre-trained text encoding model generated by a model training method of some embodiments of the present disclosure, to generate a conversion vector, wherein the conversion vector characterizes feature information of the plurality of recognition texts; an entity encoding unit 703 configured to perform entity encoding on each entity in the entity library by using a pre-trained entity encoding model, to obtain an entity encoding vector set, where the entity encoding model is generated by a model training method according to some embodiments of the present disclosure; and an entity filtering unit 704 configured to filter, from the entity library, an entity whose corresponding matching probability meets a preset condition, as a target entity, by using a pre-trained matching probability generation model, where the matching probability corresponding to the target entity is generated based on the corresponding entity encoding vector and the conversion vector, and the matching probability generation model is generated by a model training method according to some embodiments of the present disclosure.

In some optional implementations of some embodiments, the entity screening unit 704 may be further configured to: determining a keyword set corresponding to the plurality of identification texts; screening out the entity with association relation with the keyword set from the entity library as an association entity to obtain an association entity set; screening entity coding vectors corresponding to each associated entity in the associated entity set from the entity coding vector set to be used as target entity coding vectors, and obtaining a target entity coding vector set; determining a matching probability between each target entity coding vector in the target entity coding vector set and the conversion vector by using the matching probability generation model; and screening out the associated entity with the corresponding matching probability meeting the preset condition from the associated entity set as a target entity.

In some optional implementations of some embodiments, the entity screening unit 704 may be further configured to: determining a matching probability between each entity coding vector in the entity coding vector set and the conversion vector by using the matching probability generation model; and screening out the entity with the matching probability meeting the preset condition from the entity library as a target entity.

It will be appreciated that the elements described in the apparatus 700 correspond to the various steps in the method described with reference to fig. 5. Thus, the operations, features and resulting benefits described above for the method are equally applicable to the apparatus 700 and the units contained therein, and are not described in detail herein.

Referring now to fig. 8, a schematic diagram of an electronic device 800 (e.g., electronic device 101 of fig. 1) suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 8 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 8, the electronic device 800 may include a processing means (e.g., a central processor, a graphics processor, etc.) 801, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

In general, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, etc.; storage 808 including, for example, magnetic tape, hard disk, etc.; communication means 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 shows an electronic device 800 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 8 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communication device 809, or from storage device 808, or from ROM 802. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 801.

It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining a training sample and a sample label corresponding to the training sample, wherein the training sample comprises: identifying text and sample entities involved in the audio sample for a plurality of the audio samples; performing vector conversion on the plurality of identification texts by using an initial text coding model to generate an initial conversion vector, wherein the initial conversion vector represents text characteristic information of the plurality of identification texts; inputting the sample entity into an initial entity coding model to generate an initial entity coding vector; inputting the initial conversion vector and the initial entity coding vector into an initial matching probability generation model to generate initial matching probability, wherein the initial matching probability represents a matching relationship between the initial conversion vector and the initial entity coding vector; and in response to determining that the initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generating model to obtain the text coding model, the entity coding model and the matching probability generating model, wherein the initial matching result is generated based on the initial matching probability. Acquiring a plurality of identification texts aiming at target audio; vector converting the plurality of recognition texts by using a pre-trained text coding model to generate a conversion vector, wherein the conversion vector represents characteristic information of the plurality of recognition texts, and the text coding model is generated by a model training method according to some embodiments of the present disclosure; performing entity coding on each entity in an entity library by utilizing a pre-trained entity coding model to obtain an entity coding vector set, wherein the entity coding model is generated by a model training method of some embodiments of the present disclosure; and screening out the entity with the matching probability meeting the preset condition from the entity library by utilizing a pre-trained matching probability generation model as a target entity to obtain at least one target entity, wherein the matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector, and the matching probability generation model is generated by a model training method of some embodiments of the present disclosure.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor comprising: the device comprises a first acquisition unit, a first vector conversion unit, a first input unit, a second input unit and a parameter training unit. The names of these units do not limit the unit itself in some cases, and for example, the first obtaining unit may also be described as "a unit that obtains a training sample and a sample label corresponding to the training sample".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A model training method, comprising:

obtaining a training sample and a sample label corresponding to the training sample, wherein the training sample comprises: identifying text and sample entities involved in an audio sample for a plurality of the audio samples;

performing vector conversion on the plurality of identification texts by using an initial text coding model to generate an initial conversion vector, wherein the initial conversion vector represents text characteristic information of the plurality of identification texts;

inputting the sample entity into an initial entity coding model to generate an initial entity coding vector;

inputting the initial conversion vector and the initial entity coding vector into an initial matching probability generation model to generate initial matching probability, wherein the initial matching probability characterizes a matching relationship between the initial conversion vector and the initial entity coding vector;

and in response to determining that an initial matching result is different from the sample label, performing parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generating model to obtain a text coding model, an entity coding model and a matching probability generating model, wherein the initial matching result is generated based on the initial matching probability.

2. The method of claim 1, wherein vector converting the plurality of recognition texts using an initial text encoding model to generate an initial conversion vector comprises:

performing text splicing on the plurality of identification texts to obtain a spliced text;

and inputting the spliced text to the initial text coding model to obtain the initial conversion vector.

3. The method of claim 1, wherein vector converting the plurality of recognition texts using an initial text encoding model to generate an initial conversion vector comprises:

inputting the plurality of identification texts into the initial text coding model to obtain a plurality of initial text coding vectors;

vector combination is carried out on the plurality of initial text coding vectors to obtain a combination matrix;

and carrying out pooling operation on the combined matrix to obtain the initial conversion vector.

4. An entity determination method, comprising:

acquiring a plurality of identification texts aiming at target audio;

vector converting the plurality of recognized texts using a pre-trained text encoding model to generate a conversion vector, wherein the conversion vector characterizes characteristic information of the plurality of recognized texts, the text encoding model being trained according to the method of any one of claims 1-3;

Performing entity coding on each entity in an entity library by using a pre-trained entity coding model to obtain an entity coding vector set, wherein the entity coding model is trained according to the method of any one of claims 1-3;

and screening out the entity with the matching probability meeting the preset condition from the entity library by utilizing a pre-trained matching probability generation model as a target entity to obtain at least one target entity, wherein the matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector, and the matching probability generation model is trained according to the method of any one of claims 1-3.

5. The method of claim 4, wherein the screening the entity with the matching probability meeting the preset condition from the entity library by using the pre-trained matching probability generation model as the target entity comprises:

determining keyword sets corresponding to the plurality of identification texts;

screening out the entity with the association relation with the keyword set from the entity library as an association entity to obtain an association entity set;

screening entity coding vectors corresponding to each associated entity in the associated entity set from the entity coding vector set to be used as target entity coding vectors, and obtaining a target entity coding vector set;

Determining a matching probability between each target entity encoding vector in the target entity encoding vector set and the conversion vector by using the matching probability generation model;

and screening out the associated entity with the corresponding matching probability meeting the preset condition from the associated entity set as a target entity.

6. The method of claim 4, wherein the screening the entity with the matching probability meeting the preset condition from the entity library by using the pre-trained matching probability generation model as the target entity comprises:

determining a matching probability between each entity encoding vector in the entity encoding vector set and the conversion vector by using the matching probability generation model;

and screening out the entity with the corresponding matching probability meeting the preset condition from the entity library as a target entity.

7. A model training apparatus comprising:

the first acquisition unit is configured to acquire a training sample and a sample label corresponding to the training sample, wherein the training sample comprises: identifying text and sample entities involved in an audio sample for a plurality of the audio samples;

a first vector conversion unit configured to vector-convert the plurality of recognition texts using an initial text encoding model to generate an initial conversion vector, wherein the initial conversion vector characterizes text feature information of the plurality of recognition texts;

A first input unit configured to input the sample entities to an initial entity encoding model to generate an initial entity encoding vector;

a second input unit configured to input the initial conversion vector and the initial entity encoding vector into an initial matching probability generation model to generate an initial matching probability, wherein the initial matching probability characterizes a matching relationship between the initial conversion vector and the initial entity encoding vector;

and the parameter training unit is configured to perform parameter training on the initial text coding model, the initial entity coding model and the initial matching probability generating model to obtain a text coding model, an entity coding model and a matching probability generating model in response to determining that an initial matching result is different from the sample label, wherein the initial matching result is generated based on the initial matching probability.

8. An entity determining apparatus, comprising:

a second acquisition unit configured to acquire a plurality of recognition texts for the target audio;

a second vector conversion unit configured to vector-convert the plurality of recognition texts using a pre-trained text encoding model, wherein the conversion vector characterizes characteristic information of the plurality of recognition texts, the text encoding model being trained according to the method of any one of claims 1-3;

An entity coding unit configured to perform entity coding on each entity in an entity library by using a pre-trained entity coding model to obtain an entity coding vector set, wherein the entity coding model is trained according to the method of any one of claims 1-3;

and the entity screening unit is configured to screen out an entity with the matching probability meeting a preset condition from the entity library by utilizing a pre-trained matching probability generation model as a target entity to obtain at least one target entity, wherein the matching probability corresponding to the target entity is generated based on the corresponding entity coding vector and the conversion vector, and the matching probability generation model is trained according to the method of any one of claims 1-3.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-6.