CN113220841A

CN113220841A - Method, apparatus, electronic device and storage medium for determining authentication information

Info

Publication number: CN113220841A
Application number: CN202110537414.8A
Authority: CN
Inventors: 郭佳昌; 代小亚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2021-08-06
Anticipated expiration: 2041-05-17
Also published as: CN113220841B

Abstract

The present disclosure provides a method, an apparatus, a device and a storage medium for determining authentication information, which are applied to the field of artificial intelligence, in particular to the field of deep learning and the field of intelligent medical treatment. The specific implementation scheme of the method for determining the authentication information is as follows: determining a target entity described by the text to be processed and at least one piece of characteristic information aiming at the target entity; determining at least one entity with any one of the at least one characteristic information as a candidate entity; determining a degree of correlation between the candidate entity and the at least one feature information based on the degree of correlation determination model; and determining an entity related to the at least one characteristic information in the candidate entities as information to be authenticated aiming at the target entity based on the correlation degree.

Description

Method, apparatus, electronic device and storage medium for determining authentication information

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of deep learning and the field of intelligent medical care, and more particularly to a method, apparatus, device and storage medium for determining authentication information.

Background

With the development of informatization, people are more and more inclined to consult professional staff about information according to personal needs, such as consulting personal health status, or consulting functions of various electric appliances. In order to provide accurate information for people, professionals generally need to understand the voice information or text information described by people, and determine to feed back consultation information to people according to the understanding result.

When the consultation information relates to the recommendation or determination of the entity, because a plurality of entities with similar characteristics exist in the same field, before the consultation information is fed back to people, the entities need to be identified, and the consultation information is fed back to people according to the identification result. Due to the ability limitation and time limitation of professionals, a plurality of entities needing to be identified cannot be accurately determined.

Disclosure of Invention

A method, apparatus, device, and storage medium for determining authentication information that improves the accuracy of the authentication information are provided.

According to an aspect of the present disclosure, there is provided a method of determining authentication information, including: determining a target entity described by the text to be processed and at least one piece of characteristic information aiming at the target entity; determining at least one entity with any one of the at least one characteristic information as a candidate entity; determining a degree of correlation between the candidate entity and the at least one feature information based on the degree of correlation determination model; and determining an entity related to the at least one characteristic information in the candidate entities as information to be authenticated aiming at the target entity based on the correlation degree.

According to another aspect of the present disclosure, there is provided an apparatus for determining authentication information, including: the characteristic information determining module is used for determining a target entity described by the text to be processed and at least one characteristic information aiming at the target entity; the candidate entity determining module is used for determining at least one entity with any one of the at least one characteristic information as a candidate entity; a relevance determination module for determining a relevance between the candidate entity and the at least one feature information based on a relevance determination model; and the information to be authenticated determining module is used for determining an entity related to at least one piece of feature information in the candidate entities based on the correlation degree, and the entity is used as the information to be authenticated aiming at the target entity.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining authentication information provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of determining authentication information provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of determining authentication information provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a method, an apparatus, an electronic device and a storage medium for determining authentication information according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of determining authentication information according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of generating an entity thesaurus according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a principle of determining at least one characteristic information for a target entity according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a relevance determination model according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a relevance determination model according to another embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a relevance determination model according to yet another embodiment of the present disclosure;

fig. 8 is a block diagram of a structure of an apparatus for determining authentication information according to an embodiment of the present disclosure; and

fig. 9 is a block diagram of an electronic device for implementing a method of determining authentication information according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a method for determining authentication information, which includes a characteristic information determining stage, a candidate entity determining stage, a correlation determining stage and an information to be authenticated determining stage. In the characteristic information determining stage, a target entity described by the text to be processed and at least one piece of characteristic information aiming at the target entity are determined. In the candidate entity determining stage, at least one entity with any one of the at least one feature information is determined as a candidate entity. In a relevance determination stage, a relevance between the candidate entity and the at least one feature information is determined based on a relevance determination model. And in the stage of determining the information to be authenticated, determining an entity related to at least one piece of characteristic information in the candidate entities based on the correlation degree, wherein the entity is used as the information to be authenticated aiming at the target entity.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a method and an apparatus for determining authentication information according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 includes a terminal device 110 and a database 130, where the terminal device 110 is an electronic device having a display screen and a processing function, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, and the like. Terminal device 110 may access database 130 over a network to retrieve information from database 130. The network may include wired or wireless communication links. The database 130 may be a storage medium integrated in the terminal device 110, or may be a database system provided in an electronic device other than the terminal device 110.

According to the embodiment of the present disclosure, a user may record multimedia information through the terminal device 110 and form a text to be processed 120 via the terminal device 110. The terminal device 110 may further perform processes such as named entity recognition and semantic understanding on the text to be processed 120, determine the target entity and the feature information of the target entity described in the text to be processed 120, recall the candidate entity associated with the feature information from the database 130, and present the candidate entity as the entity to be identified 140 to the user. Where, for example, a full amount of entity information may be maintained in the database 130.

In an embodiment, after recalling the candidate entity from the database 130, the terminal device 110 may further determine the degree of correlation between the candidate entity and the text to be processed 120 by using, for example, a correlation determination model, select the candidate entity with higher degree of correlation with the text to be processed 120 as the entity to be authenticated 140, and present the entity to be authenticated 140 to the user.

According to the embodiment of the disclosure, as shown in fig. 1, the application scenario 100 may further include a server 150, and the terminal device 110 may be communicatively connected to the server 150 through a network. Server 150 may, for example, send a pre-trained relevance determination model 160 to terminal device 110 in response to a model acquisition request sent by terminal device 110, so that terminal device 110 selects entity to be authenticated 140 according to candidate entities recalled in relevance determination model 160.

Illustratively, the server 150 may be a server that provides various services, such as a background management server that provides support for websites or client applications that users browse with the terminal device 110. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

In an embodiment, for example, the terminal device 110 may also send the to-be-processed text 120 to the server 150, the server 150 recalls the candidate entities from the database 130, and selects the to-be-authenticated entity 140 from the candidate entities, so as to send the to-be-authenticated entity 140 to the terminal device 110 for presentation by the terminal device 110.

It should be noted that the method for determining authentication information provided by the present disclosure may be executed by terminal device 110 or server 150. Accordingly, the apparatus for determining authentication information provided by the present disclosure may be disposed in the terminal device 110 or disposed in the server 150.

It should be understood that the number and type of terminal devices, servers, and databases in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and databases, as the implementation requires.

The method of determining authentication information provided by the present disclosure will be described in detail below with reference to fig. 2 to 7.

Fig. 2 is a flow chart of a method of determining authentication information according to an embodiment of the present disclosure.

As shown in fig. 2, the method 200 of determining authentication information of this embodiment may include operations S210 to S240.

In operation S210, a target entity of the text description to be processed and at least one feature information for the target entity are determined.

According to an embodiment of the present disclosure, the text to be processed may be, for example, a pre-recorded text or may be a text obtained by recognizing audio data. The text to be processed may be described with feature information and entity information preliminarily determined according to the feature information. The preliminarily determined entity information is a target entity, and the described feature information is feature information for the target entity.

Illustratively, in the medical field, the target entity may be, for example, a disease, the text to be processed may be, for example, text information recorded in a diagnostic process, the characteristic information may be symptom information, and the like. In the article consultation field, the target entity can be an article, the text to be processed can be described with article requirement information and the like provided by a user in the consultation process, and the characteristic information can be functions, performance parameters and the like of the article.

According to the embodiment of the disclosure, a Named Entity Recognition technology (NER) can be adopted to process the text to be processed, extract feature information from the text to be processed, and recognize and obtain a target Entity. The named entity recognition technology may adopt, for example, a rule and dictionary based method, a statistical based method, or a method in which a rule and dictionary based method and a statistical based method are mixed, and the disclosure is not limited thereto.

For example, the text to be processed may be converted into structured data, and in the process of converting into the structured data, entity information is extracted from the text to be processed by using a named real-time recognition technology. The extracted entity information may be set according to actual requirements, for example, in the medical field, the extracted entity information may include symptom information, physical sign information, disease information, allergic drugs, allergens, and the like. After the entity information is extracted, the association relation between the entity information can be extracted through the relation extraction model and the reading understanding model, and the entity information is arranged into a word string form based on the association relation and the semantic understanding result of the text to be processed. First, a word string extracted from the conclusion field of the text to be processed is selected from all the word strings, and an entity represented by an entity word included in the selected word string may be used as a target entity. A string extracted from fields other than the conclusion field is used as the feature information.

Illustratively, in the medical field, the information extracted from the text to be processed by the named entity recognition technology may include symptom information, sign information, crowd category information, disease names, and the like. After information is extracted from the text to be processed, entity words of the target entity and fields for describing characteristic information are selected from the extracted information according to the types and the fields of the words in the information, and therefore the target entity and the characteristic information are determined.

In operation S220, at least one entity having any one of the at least one feature information is determined as a candidate entity.

The present disclosure may, for example, maintain a mapping relationship table of entities and feature information in advance, and the operation may search an entity having a mapping relationship with each of at least one feature information from the mapping relationship table, and use the searched entity as a candidate entity.

For example, after the entities having the mapping relationship with each feature information are found, the target entities may be removed from the found entities, and the remaining entities may be used as candidate entities to select the entities to be authenticated, which need to be authenticated with the target entities, from the candidate entities.

In operation S230, a degree of correlation between the candidate entity and the at least one feature information is determined based on the degree of correlation determination model.

According to an embodiment of the present disclosure, the relevancy determination model may perform character string matching on the entity word of the candidate entity and the at least one feature information using the edit distance, for example, and take the matching result as the relevancy of the two. Alternatively, the correlation determination model may also use a Mutual Information (MI) method or an Information Gain (IG) method to determine the correlation between the entity word of the candidate entity and the at least one feature Information, and the correlation is used as the correlation between the candidate entity and the at least one feature Information.

Illustratively, the relevance determination model may be further constructed based on a convolutional neural network, the relevance determination model inputs entity words and at least one feature information (for example, a character string) of the target entity, and the relevance determination model outputs the relevance between the candidate entity and the at least one feature information after processing the input data. The correlation determination model may adopt a structure described below, and is not described in detail herein.

In operation S240, an entity related to at least one feature information among the candidate entities is determined as information to be authenticated for the target entity based on the correlation.

According to the embodiment of the present disclosure, a candidate entity having a correlation degree with at least one feature information greater than a correlation degree threshold value may be taken as information to be authenticated. Alternatively, a predetermined number of candidate entities having a large degree of correlation with at least one piece of feature information may be used as the information to be authenticated. The correlation threshold may be any value greater than 0.5, and the predetermined number may be any value greater than or equal to 1, for example, which is not limited in this disclosure.

According to the embodiment of the disclosure, the candidate entity is recalled according to the feature information, and then the information to be identified is screened from the candidate entity according to the correlation degree between the feature information and the candidate entity, so that compared with a method for determining the information to be identified directly according to a statistical manner in the related art, the redundant information to be identified can be greatly reduced, the accuracy of the information to be identified is improved, and the identification efficiency and the user experience are improved.

FIG. 3 is a schematic diagram of generating an entity thesaurus according to an embodiment of the present disclosure.

The present disclosure may, for example, generate an entity thesaurus in advance, such that candidate entities may be recalled from the entity thesaurus. The entity word stock can be generated based on a specific text in a target field to which the target entity belongs, so that the authority of the generated entity word stock is improved, and the accuracy of the recalled candidate entity and the determined information to be identified are improved conveniently.

For example, as shown in fig. 3, the embodiment 300 may first obtain a unique text 310 of a target domain to which the target entity belongs from a text library, where the unique text 310 may include, for example, an authoritative book, an authoritative report text, or a text in an authoritative journal of the target domain. In obtaining the unique text 310, entities having the same characteristic information in the target domain may be determined based on the unique text, and at least one entity group may be obtained. For example, a method similar to the method for determining the target entity and the feature information for the target entity may be adopted to perform knowledge mining on the specific text 310, obtain the feature information of a plurality of entities and each entity in the target field, and establish a mapping relationship between each entity and the feature information. By counting the mapping relationship, entities with the same characteristic information can be summarized into an entity group. For example, the

entities

321, 322, … and 323 having the first characteristic information, the entities 322, … and 323 having the second characteristic information are obtained statistically, and the

entities

321 and 322 having the first characteristic information are grouped into the first entity group 331, the entities 322 and … are grouped into the second entity group 332 and …, and the entities 323 having the nth characteristic information are grouped into the nth entity group 333. Wherein n is an arbitrary value of 1 or more. The text library may be, for example, various online libraries.

After obtaining at least one entity group, for each entity group, the entity groups representing the entities may be combined into an entity group by using the same feature information of the entities in each entity group as an index. For example, with the first feature information as an index, the entity words representing the entity 321 may constitute a first entity word group 341. With the second feature information as an index, the entity words representing the entity 322 may constitute a second entity phrase 342. The n-th characteristic information is used as an index, and the entity words representing the entity 323 can form an n-th entity phrase 343. Thus, at least one entity phrase corresponding to at least one entity group one by one can be obtained. The at least one entity phrase may be combined into an entity lexicon 350.

Based on the pre-generated entity word stock, when at least one entity with any feature information in at least one feature information is determined, an entity word group using any feature information as an index can be selected from the entity word stock, and a word in the selected entity word group is used as a candidate entity word. And taking the entity represented by the candidate entity word as a candidate entity.

In an embodiment, after obtaining the entities represented by the candidate entity words, the target entities may be removed from the obtained entities, that is, other entities except the target entity in the entities represented by the candidate entity words are determined, so as to obtain at least one entity serving as the candidate entity.

In one embodiment, in the medical field, the unique text may be, for example, a differential diagnosis type book, which takes each symptom as an index and lists diseases to be differentiated with each symptom. In the embodiment, when the entity thesaurus is generated, the mapping relationship between the symptoms and the diseases can be obtained by performing named entity recognition on the index of the differential diagnosis class data, so that the differential disease groups with the same symptoms are generated according to the mapping relationship, and the entity thesaurus is formed by a plurality of differential disease groups.

Compared with the method of determining the relationship between the features and the entities by counting the co-occurrence times of the features and the entities in historical data in the related art, the embodiment of the invention can generate a more authoritative entity word stock, thereby improving the accuracy of the recalled candidate entities.

Fig. 4 is a schematic diagram of a principle of determining at least one characteristic information for a target entity according to an embodiment of the present disclosure.

The present disclosure may also generate a plurality of predetermined feature words based on a unique text of a target domain to which the target entity belongs. When determining at least one piece of feature information for the target entity, the plurality of predetermined feature words may be used to replace the same meaning in the text to be processed, express different words, and extract feature information from the replaced text. By the method, the expression of the extracted feature information can be more standardized, so that the accuracy of the correlation between the determined feature information and the candidate entity is improved conveniently.

For example, a specific text of a target field to which a target entity belongs may be obtained from a text library, and then by performing named entity recognition on the specific text, a feature word of the entity may be extracted from the specific text, and the extracted feature word may be used as a predetermined feature word. The specific text can comprise an authoritative book, an authoritative report text or a text in an authoritative periodical in the target field, so that the accuracy of the determined predetermined characteristic words is improved.

After obtaining the plurality of predetermined features, as shown in fig. 4, the embodiment 400 may convert the text 410 to be processed into structured data to obtain a plurality of strings when determining at least one feature information for the target entity. A string converted from a target field in the text to be processed 410 is selected from the plurality of strings to obtain a target string 420. And then determining a target characteristic word 440 matched with the word in the target word string from the plurality of predetermined characteristic words 430, replacing the word matched with the target characteristic word in the target word string with the target characteristic word 440, and using the replaced target word string as at least one characteristic information 450 for the target entity.

Illustratively, the target field comprises other fields except for the field describing the target entity, so that the determined relevancy is not influenced by the description field of the target entity, and the accuracy of the determined relevancy is improved.

For example, any method of structuring text included in natural language processing techniques may be employed to convert the text to be processed into structured data. A string matching algorithm may be used to determine the matching relationship between the target string and the target feature word, for example, a string matching algorithm based on the edit distance may be used.

By the method of the embodiment, the normalization standardization of the characteristic information can be realized, and the uniform semantic expression effect is realized in the text vectorization process, so that the candidate entity is convenient to determine, and the accuracy of the determined candidate entity and the information to be identified is improved. For example, in the medical field, the expression of symptoms such as headache and headache can be unified into headache by the method of this example.

Fig. 5 is a schematic structural diagram of a correlation determination model according to an embodiment of the present disclosure.

As shown in fig. 5, in this embodiment, the correlation determination model 500 may include a first feature extraction layer 510, a second feature extraction layer 520, and a correlation prediction layer 530. The first feature extraction layer 510 and the second feature extraction layer 520 may be, for example, word embedding layers, and may be configured to perform word segmentation and word frequency statistics on input information to obtain a word frequency vector of the input information. Alternatively, the first feature extraction layer 510 and the second feature extraction layer 520 may convert the input information into a word vector based on a prediction manner. The correlation prediction layer 530 may predict the similarity between the word vector extracted by the first feature extraction layer 510 and the word vector extracted by the second feature extraction layer based on, for example, cosine similarity, and take the similarity as the correlation between two input information.

In an embodiment, in order to obtain the correlation via the correlation prediction layer 530, fully connected layers (FC) may be further disposed at the rear ends of the first feature extraction layer 510 and the second feature extraction layer 520, respectively, so that two word vectors input to the correlation prediction layer 530 are vectors with the same dimension.

Based on the relevance determination model 500, the aforementioned operation of determining the relevance between the candidate entity and the at least one feature information may obtain the first feature data 503 by using the at least one feature information 501 as an input of the first feature extraction layer 510, and obtain the second feature data 504 by using the entity word 502 representing the candidate entity as an input of the second feature extraction layer 520. Then, the first feature data 503 and the second feature data 504 are used as input of a relevance prediction layer 530, and a relevance 505 between at least one feature information 501 and an entity word 502 of a candidate entity is output through the relevance prediction layer 530, so as to serve as a relevance between the at least one feature information 501 and the candidate entity.

Fig. 6 is a schematic structural diagram of a correlation determination model according to another embodiment of the present disclosure.

According to an embodiment of the present disclosure, as shown in fig. 6, a relevance determination model 600 of the embodiment includes a first feature extraction layer, a second feature extraction layer 620, and a relevance prediction layer 630. Wherein the first feature extraction layer comprises a first extraction layer 611, a second extraction layer 612, and a fusion layer 613. The input of the first extraction layer 611 is at least one feature information 601 for extracting feature data for the at least one feature information. The input of the second extraction layer 612 is key information 602 extracted from at least one piece of feature information 601, and feature data of the key information 602 is extracted. The fusion layer 613 is configured to fuse the feature data of the at least one feature information and the feature data of the key information to obtain the first feature data 604. The second feature extraction layer 620 and the correlation prediction layer 630 are similar to those described above, and are not described herein again. The first extraction layer 611 and the second extraction layer 612, similar to the second feature extraction layer 620, may be word embedding layers, so as to perform word segmentation and word frequency statistics on the input information to obtain a word frequency vector of the input information. Considering that the feature information is generally a long string, the embodiment can solve the technical problem that part of important information is easily ignored in the process of performing semantic understanding on the context only by performing feature extraction on the feature information by providing the two extraction layers 611 to 612, and thus can improve the accuracy of the obtained first feature data.

In an embodiment, in order to obtain the correlation degree via the correlation degree prediction layer 630, as shown in fig. 6, the correlation degree determination model 600 of this embodiment may set a full connection layer FC 640 at the rear end of the fusion layer 613, so that the feature data converted via the full connection layer FC 640 and the feature data extracted by the second feature extraction layer 620 are vectors with the same dimension. The fusion layer 613 may fuse the feature data of the at least one feature information and the feature data of the key information based on a concat function, for example. It is understood that the dimensions of the feature data and the method of fusing the feature data by the fusion layer 613 are merely examples to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

Based on the relevance determination model 600, the aforementioned operation of determining the relevance between the candidate entity and the at least one feature information may first perform named entity identification on the at least one feature information to obtain key information of the at least one feature information. The named entity recognition technique in this embodiment may, for example, set fewer categories of information than the information extracted by the transformed structured data process described above. For example, in the medical field, the embodiment may extract only symptom information, disease information, and the like among the at least one feature information.

After obtaining the key information, the first sub-feature data may be obtained by using at least one feature information 601 as an input of the first extraction layer 611. Meanwhile, the key information 602 is used as the input of the second extraction layer 612 to obtain second sub-feature data. The first sub feature data and the second feature data are input to the fusion layer 613, and processed by the fusion layer 613 to obtain first feature data. It is understood that, when the full connection layer FC 640 is provided, the first feature data 604 can be obtained after the feature data output by the fusion layer 613 is processed by the full connection layer FC 640.

Meanwhile, the entity words 603 representing the candidate entities may be used as input to the second feature extraction layer 620, obtaining second feature data 605. Then, the first feature data 604 and the second feature data 605 are used as input of a relevance prediction layer 630, and a relevance 606 between at least one feature information 601 and the entity word 603 of the candidate entity is output via the relevance prediction layer 630.

Fig. 7 is a schematic structural diagram of a correlation determination model according to yet another embodiment of the present disclosure.

According to an embodiment of the present disclosure, the at least one characteristic information may be plural. As shown in fig. 7, the correlation determination model 700 of this embodiment includes a first feature extraction layer, a second feature extraction layer 720, and a correlation prediction layer 730. Wherein the first feature extraction layer comprises a first extraction layer, a second extraction layer 713, and a fusion layer 715. The first extraction layer comprises a plurality of sub-extraction layers 711-712 and a sub-fusion layer 714. The plurality of sub-extraction layers 711 to 712 are similar and are configured to input one of a plurality of pieces of feature information, respectively, and extract features from the input feature information. The plurality of sub-extraction layers 711-712 are similar to the second extraction layer 713, and the second extraction layer 713, the fusion layer 715, and the second feature extraction layer 720 are similar to those described above and will not be described herein again.

When the first sub-feature data is obtained, a plurality of feature information may be input to the plurality of sub-extraction layers 711 to 712, respectively, to obtain a plurality of feature data. The plurality of feature data may be input to the sub-fusion layer 714, thereby obtaining a first sub-feature data into which the plurality of feature data is fused. In the embodiment, the relevance determination model 700 is configured in consideration that a plurality of pieces of feature information are independent from each other, and if the features are extracted by using the same extraction layer after splicing, the same extraction layer may unnecessarily understand the context of the plurality of pieces of feature information, so that the extracted feature information cannot accurately express the plurality of pieces of feature information. Therefore, with the correlation determination model 700 of this embodiment, the accuracy of the determined first sub-feature information can be improved.

Illustratively, the plurality of feature information may include first to mth feature information 701 to 702, and accordingly, the number of sub extraction layers is m, and m is any integer greater than 1. The first feature information 701 to the mth feature information 702 are input to m sub-extraction layers, respectively, and a plurality of feature data can be extracted and obtained via the m sub-extraction layers. In order to facilitate the fusion layer 715 to fuse the input information, as shown in fig. 7, the relevance determination model 700 may further have a full connection layer FC 740 at the rear end of the sub-fusion layer 714, and then the multiple feature data are input to the sub-fusion layer 714, and are fused by the sub-fusion layer 714 and subjected to dimension conversion by the full connection layer FC 740 to obtain first sub-feature data.

Illustratively, as shown in fig. 7, a full connection layer FC 750 may be further disposed at the rear end of the second extraction layer 713, so that the sub-feature data extracted from the key information 703 has the same dimension as the first sub-feature data. In an embodiment, the correlation determination model 700 may further set a Sequence Pooling Layer (Sequence Pooling Layer)770 between the second extraction Layer 713 and the full connection Layer FC 750, so that information having a large influence on the correlation may be screened from the key information through the processing of the Sequence Pooling Layer 770. The key information 703 is input to the second extraction layer 713, feature extraction is performed through the second extraction layer 713, and second sub-feature data can be obtained through processing of the sequence pooling layer 770 and the full connection layer FC 750. The first sub-feature data and the second sub-feature data are used as input of a fusion layer 715, and the first feature data 705 can be obtained after the fusion layer 715.

Illustratively, as shown in fig. 7, the relevance determination model 700 may set a full connection layer FC760 after the second feature extraction layer 720 so that feature data extracted from entity words of candidate entities has the same dimensionality as the first feature data 705. In an embodiment, as shown in fig. 7, the relevance determination model 700 may further provide a self-attention layer 780 between the second feature extraction layer 720 and the full connection layer FC760, so as to learn the association relationship between words in the entity words of the candidate entity. The entity words 704 of the candidate entities serve as input of the second feature extraction layer 720, feature extraction is performed via the second feature extraction layer 720, and the second feature data 706 can be obtained via processing from the attention layer 780 and the full connection layer FC 760.

The first feature data 705 and the second feature data 706 may be input as the input of the relevance prediction layer 730 to obtain the relevance 707 between the m pieces of feature information and the entity words of the candidate entity.

In an embodiment, Dropout layers may be further respectively disposed between the second extraction layer 713 and the sequence pooling layer 770, and between the second feature extraction layer 720 and the attention layer 780, in the correlation determination model 700 training process, so as to temporarily discard a part of neural network units from the network according to a certain probability, which is equivalent to finding a thinner network from the original network, thereby preventing an overfitting condition from occurring in the model training process, and improving the accuracy of the correlation determination model 700.

Based on the above method for determining authentication information, the present disclosure also provides an apparatus for determining authentication information, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a structure of an apparatus for determining authentication information according to an embodiment of the present disclosure.

As shown in fig. 8, the apparatus 800 for determining authentication information according to this embodiment may include a feature information determining module 810, a candidate entity determining module 820, a relevancy determining module 830, and an information to be authenticated determining module 840.

The characteristic information determination module 810 is configured to determine a target entity described by the text to be processed and at least one characteristic information for the target entity. In an embodiment, the characteristic information determining module 810 may be configured to perform the operation S210 described above, which is not described herein again.

The candidate entity determining module 820 is configured to determine at least one entity having any one of the at least one feature information as a candidate entity. In an embodiment, the candidate entity determining module 820 may be configured to perform the operation S220 described above, which is not described herein again.

The relevance determination module 830 is configured to determine a relevance between the candidate entity and the at least one feature information based on the relevance determination model. In an embodiment, the relevancy determining module 830 may be configured to perform the operation S230 described above, and is not described herein again.

The to-be-authenticated information determining module 840 is configured to determine, as to-be-authenticated information for the target entity, an entity related to at least one piece of feature information in the candidate entities based on the correlation degree. In an embodiment, the to-be-authenticated information determining module 840 may be configured to perform the operation S240 described above, which is not described herein again.

According to an embodiment of the present disclosure, the candidate entity determining module 820 may include a word selecting sub-module and an entity obtaining sub-module. The word selection submodule is used for selecting words in entity word groups with any characteristic information as indexes from the entity word stock as candidate entity words. The entity obtaining submodule is used for determining other entities except the target entity in the entities represented by the candidate entity words and obtaining at least one entity.

According to an embodiment of the present disclosure, the apparatus 800 for determining authentication information may further include an entity thesaurus generating module, configured to generate an entity thesaurus. The entity word stock generation module comprises an entity group obtaining sub-module and a word stock obtaining sub-module. The entity group obtaining submodule is used for determining entities with the same characteristic information in the target field based on the specific text of the target field to which the target entity belongs, and obtaining at least one entity group. And the word stock obtaining submodule is used for forming entity phrases by using the entity phrases representing the entities into entity phrases by taking the same characteristic information of the entities in each entity group as an index aiming at each entity group in at least one entity group, and obtaining an entity word stock formed by at least one entity phrase.

According to an embodiment of the present disclosure, the feature information determination module may include a structuring sub-module, a feature word determination sub-module, and a replacement sub-module. And the structuring submodule is used for converting the text to be processed into structured data to obtain a target string of the target field. The characteristic word determining submodule is used for determining a target characteristic word which is matched with a word in the target character string in the plurality of preset characteristic words. And the replacing submodule is used for replacing the words matched with the target characteristic words in the target character string by the target characteristic words to obtain at least one piece of characteristic information aiming at the target entity. Wherein the plurality of predetermined feature words are generated based on a unique text of a target domain to which the target entity belongs.

According to an embodiment of the present disclosure, the target field includes other fields than a field describing the target entity.

According to an embodiment of the present disclosure, a relevance determination model includes a first feature extraction layer, a second feature extraction layer, and a relevance prediction layer. The correlation determination module is specifically configured to: taking at least one piece of feature information as input of a first feature extraction layer to obtain first feature data; taking entity words representing candidate entities as input of a second feature extraction layer to obtain second feature data; and obtaining the correlation degree by taking the first characteristic data and the second characteristic data as the input of a correlation degree prediction layer.

According to an embodiment of the present disclosure, the first feature extraction layer includes a first extraction layer, a second extraction layer, and a fusion layer. The relevancy determination module is used for obtaining first characteristic data by the following modes: carrying out named entity identification on at least one piece of characteristic information to obtain key information of the at least one piece of characteristic information; taking at least one piece of feature information as input of a first extraction layer to obtain first sub-feature data; obtaining second sub-feature data by taking the key information as the input of a second extraction layer; and obtaining the first characteristic data by taking the first sub-characteristic data and the second sub-characteristic data as the input of the fusion layer.

According to an embodiment of the present disclosure, the first extraction layer includes a plurality of sub-extraction layers and a sub-fusion layer, and the at least one feature information is plural. The relevancy determination module is used for obtaining first sub-characteristic data by the following modes: respectively inputting the plurality of feature information into a plurality of sub-extraction layers to obtain a plurality of feature data; and taking the plurality of feature data as the input of the sub-fusion layer to obtain first sub-feature data.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 9 illustrates a schematic block diagram of an example electronic device 900 that may be used to implement the method of determining authentication information of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the method of determining authentication information. For example, in some embodiments, the method of determining authentication information may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into RAM 903 and executed by computing unit 901, one or more steps of the above-described method of determining authentication information may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the method of determining authentication information by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of determining authentication information, comprising:

determining a target entity described by a text to be processed and at least one characteristic message aiming at the target entity;

determining at least one entity with any one of the at least one characteristic information as a candidate entity;

determining a degree of correlation between the candidate entity and the at least one feature information based on a degree of correlation determination model; and

and determining an entity related to at least one piece of characteristic information in the candidate entities as information to be authenticated aiming at the target entity based on the correlation degree.

2. The method of claim 1, wherein determining at least one entity having any of the at least one characteristic information comprises:

selecting words in entity word groups with any characteristic information as indexes from an entity word bank as candidate entity words; and

and determining other entities except the target entity in the entities represented by the candidate entity words to obtain the at least one entity.

3. The method of claim 2, further comprising generating the entity thesaurus based on:

determining entities with the same characteristic information in the target field based on a unique text of the target field to which the target entity belongs, and obtaining at least one entity group; and

for each of the at least one entity group: and taking the same characteristic information of each entity in each entity group as an index, forming entity phrases representing each entity by using the entity phrases to obtain an entity word stock formed by at least one entity phrase.

4. The method of claim 1, wherein determining at least one characteristic information for the target entity comprises:

obtaining a target word string of a target field by converting the text to be processed into structured data;

determining a target characteristic word matched with a word in the target character string in a plurality of preset characteristic words; and

replacing the words matched with the target characteristic words in the target character string with the target characteristic words to obtain at least one characteristic message aiming at the target entity,

wherein the plurality of predetermined feature words are generated based on a unique text of a target domain to which the target entity belongs.

5. The method of claim 4, wherein the target field comprises a field other than a field describing the target entity.

6. The method of claim 1, wherein the relevance determination model comprises a first feature extraction layer, a second feature extraction layer, and a relevance prediction layer; determining a degree of correlation between the candidate entity and the at least one feature information comprises:

taking the at least one feature information as an input of the first feature extraction layer to obtain first feature data;

taking entity words representing the candidate entities as input of the second feature extraction layer to obtain second feature data; and

and obtaining the correlation degree by taking the first characteristic data and the second characteristic data as the input of the correlation degree prediction layer.

7. The method of claim 6, wherein the first feature extraction layer comprises a first extraction layer, a second extraction layer, and a fusion layer; obtaining the first characteristic data includes:

carrying out named entity identification on the at least one characteristic information to obtain key information of the at least one characteristic information;

taking the at least one feature information as an input of the first extraction layer to obtain first sub-feature data;

obtaining second sub-feature data by taking the key information as the input of the second extraction layer; and

and obtaining the first characteristic data by taking the first sub-characteristic data and the second sub-characteristic data as the input of the fusion layer.

8. The method of claim 7, wherein the first extraction layer comprises a plurality of sub-extraction layers and sub-fusion layers; the at least one piece of feature information is multiple, and obtaining the first sub-feature data comprises:

respectively inputting a plurality of feature information into the plurality of sub-extraction layers to obtain a plurality of feature data; and

and taking the plurality of feature data as the input of the sub-fusion layer to obtain the first sub-feature data.

9. An apparatus for determining authentication information, comprising:

the characteristic information determining module is used for determining a target entity described by the text to be processed and at least one piece of characteristic information aiming at the target entity;

a candidate entity determining module, configured to determine at least one entity having any feature information of the at least one feature information as a candidate entity;

a relevance determination module for determining a relevance between the candidate entity and the at least one feature information based on a relevance determination model; and

and the information to be authenticated determining module is used for determining an entity related to at least one piece of feature information in the candidate entities based on the correlation degree, and the entity is used as the information to be authenticated aiming at the target entity.

10. The apparatus of claim 9, wherein the candidate entity determination module comprises:

the word selection submodule is used for selecting words in the entity word groups with any characteristic information as an index from the entity word stock as candidate entity words; and

and the entity obtaining submodule is used for determining other entities except the target entity in the entities represented by the candidate entity words and obtaining the at least one entity.

11. The apparatus of claim 10, further comprising an entity thesaurus generation module to generate the entity thesaurus; the entity word stock generation module comprises:

an entity group obtaining sub-module, configured to determine, based on a unique text of a target field to which the target entity belongs, entities having the same feature information in the target field, and obtain at least one entity group; and

and the word stock obtaining sub-module is used for forming entity phrases representing the entities into entity phrases by taking the same characteristic information of the entities in each entity group as an index aiming at each entity group in the at least one entity group and obtaining an entity word stock formed by at least one entity phrase.

12. The apparatus of claim 9, wherein the feature information determination module comprises:

the structuring submodule is used for converting the text to be processed into structured data to obtain a target word string of a target field;

the characteristic word determining submodule is used for determining a target characteristic word matched with a word in the target character string in a plurality of preset characteristic words; and

a replacing submodule for replacing the word matched with the target characteristic word in the target character string with the target characteristic word to obtain at least one characteristic information aiming at the target entity,

13. The apparatus of claim 12, wherein the target field comprises a field other than a field describing the target entity.

14. The apparatus of claim 9, wherein the relevance determination model comprises a first feature extraction layer, a second feature extraction layer, and a relevance prediction layer; the correlation determination module is specifically configured to:

15. The apparatus of claim 14, wherein the first feature extraction layer comprises a first extraction layer, a second extraction layer, and a fusion layer; the relevancy determination module is configured to obtain the first feature data by:

16. The apparatus of claim 15, wherein the first extraction layer comprises a plurality of sub-extraction layers and sub-fusion layers; the at least one characteristic information is a plurality of information; the relevancy determination module is used for obtaining the first sub-feature data by the following method:

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 8.