CN117909764A

CN117909764A - Information matching method, device, equipment, medium and program product

Info

Publication number: CN117909764A
Application number: CN202410256821.5A
Authority: CN
Inventors: 朱雯君
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2024-04-19

Abstract

The disclosure provides an information matching method, an information matching device, information matching equipment, a storage medium and a program product, which can be applied to the technical field of artificial intelligence. The information matching method comprises the following steps: preprocessing the user information to obtain a first information set, wherein the first information set comprises a plurality of first information capable of reflecting user identity information; extracting target entities from a plurality of texts to obtain a plurality of second information sets, wherein each second information set comprises a plurality of target entities, and the target entities extracted from the same text belong to the same second information set; respectively calculating the matching degree of the first information set and the plurality of second information sets to obtain a matching value of each second information set; a target text that matches the user is determined based on the matching value.

Description

Information matching method, device, equipment, medium and program product

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to an information matching method, apparatus, device, medium, and program product.

Background

When a user handles financial business, a financial institution needs to check the risk of the user, wherein one way is to check the complaint information of the user.

In general, the user complaint information checking is to query whether the party in the currently disclosed legal document is the user through the user information matching. However, due to the protection of personal privacy, the principal information in the disclosed legal documents is usually encrypted, so that less information is available, and the matching of the information on the using corpse is difficult.

Disclosure of Invention

In view of the foregoing, the present disclosure provides information matching methods, apparatuses, devices, media, and program products that improve information matching accuracy.

According to a first aspect of the present disclosure, there is provided an information matching method, including: preprocessing the user information to obtain a first information set, wherein the first information set comprises a plurality of first information capable of reflecting the user identity; extracting target entities from a plurality of texts to obtain a plurality of second information sets, wherein each second information set comprises a plurality of target entities, and the target entities extracted from the same text belong to the same second information set; respectively calculating the matching degree of the first information set and the plurality of second information sets to obtain a matching value of each second information set; a target text that matches the user is determined based on the matching value.

According to an embodiment of the present disclosure, performing a preprocessing operation on user information to obtain a first information set includes: disassembling the user information based on a preset rule to obtain a plurality of sub-information; and obtaining the first information of the user based on the sub information, and constructing a first information set.

According to an embodiment of the present disclosure, extracting target information from a plurality of texts, resulting in a plurality of second information sets, includes: the following is performed for each text: converting text into a sequence of text; extracting text features from the text sequence, and labeling entities in the text sequence based on the text features to obtain a plurality of labeled entities; and extracting the target entity from the plurality of named entities to obtain a second information set of the text.

According to an embodiment of the present disclosure, text features include sentence features and word features, entity labeling is performed in a text sequence based on the text features, and a plurality of labeled entities are obtained, including: determining word relations in the sentence based on sentence characteristics; and identifying the named entities in the sentence based on the word relation and the word characteristics, and labeling the types of the named entities to obtain a plurality of labeled entities.

According to an embodiment of the present disclosure, extracting a target entity from a plurality of texts, resulting in a plurality of second information sets, includes: extracting a plurality of target entities from the marked entities based on the first information; a second set of information is constructed based on the plurality of target entities.

According to an embodiment of the present disclosure, calculating matching degrees of a first information set and a plurality of second information sets, respectively, includes: the following operations are performed on each second information set: constructing a data pair based on the data type, wherein the data pair comprises first information and second information with the same data type; performing similarity comparison on the first information and the second information in the data pair to obtain a similarity value of the second information; and carrying out weighted summation on the similarity values of the second information to obtain the matching degree of the second information set, wherein the weights corresponding to the second information of different types are different.

According to an embodiment of the present disclosure, determining a target text corresponding to a user based on a degree of matching includes: acquiring a text corresponding to a second information set with the matching degree larger than a preset threshold; filling first information into an encryption position of the text; and under the condition of successful filling, determining the text as the target text.

According to an embodiment of the present disclosure, the method further comprises: sending a request for acquiring user information to a user; and under the condition that the user agrees to acquire the user information request, acquiring the user information and executing a first preprocessing operation on the user information.

A second aspect of the present disclosure provides an information matching apparatus, including: the preprocessing module is used for executing preprocessing operation on the user information to obtain a first information set, wherein the first information set comprises a plurality of first information capable of reflecting the user identity; the extraction module is used for extracting target entities from a plurality of texts to obtain a plurality of second information sets, wherein each second information set comprises a plurality of target entities, and the extracted target entities in the same text belong to the same second information set; the matching module is used for respectively calculating the matching degree of the first information set and the plurality of second information sets to obtain a matching value of each second information set; and a determining module for determining a target text matching the user based on the matching value.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the information matching method described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described information matching method.

The fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described information matching method.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of an information matching method, apparatus, medium, and program product according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an information matching method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow chart of extracting structured information to obtain a plurality of second information sets, according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram for deriving a plurality of annotated entities based on textual features, according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a flow diagram for building a second information set based on annotated entities, according to an embodiment of the disclosure;

FIG. 6 schematically illustrates a flow chart for calculating the degree of matching of first information to each second information set according to an embodiment of the disclosure

FIG. 7 schematically illustrates a flow chart for determining target text corresponding to a user based on a degree of matching in accordance with an embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of a structure of an information matching apparatus according to an embodiment of the present disclosure; and

Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement an information matching method according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the related data are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all comply with related laws and regulations and standards, necessary security measures are adopted, no prejudice to the public order is provided, and corresponding operation entries are provided for the user to select authorization or rejection.

The embodiments of the present disclosure provide an information matching method, apparatus, medium, and program product, and before introducing the technical solutions provided by the embodiments of the present disclosure, description is given of related technologies related to the present disclosure.

When a financial institution handles business for a user, firstly, personal risks of the user need to be checked, wherein one checking way is to inquire about complaints of the user, and the complaints of the user can reflect credit conditions, reputation conditions and disputes of the user laterally, so that accuracy and comprehensiveness of risk assessment of the user are effectively improved.

The method commonly employed is to determine the complaint condition of the user based on the published legal documents. However, for protecting personal privacy of citizens, personal information in legal documents is generally presented in an encrypted manner, for example, in some public documents, part of numbers in the identity numbers of the parties are replaced by 'x', or only information such as birth month, household account, sex and the like of the parties is disclosed in the public documents, so that information which can be extracted from the legal documents is very limited, whether the parties are target users cannot be determined, and trouble is caused to inquiring about complaints of users.

The embodiment of the disclosure provides an information matching method, which comprises the following steps: preprocessing the user information to obtain a first information set, wherein the first information set comprises a plurality of first information capable of reflecting the user identity; extracting target entities from a plurality of texts to obtain a plurality of second information sets, wherein each second information set comprises a plurality of target entities, and the target entities extracted from the same text belong to the same second information set; respectively calculating the matching degree of the first information set and the plurality of second information sets to obtain a matching value of each second information set; and determining target text corresponding to the user based on the matching value.

According to the method and the device, the first preprocessing operation is carried out on the user information, the user information is converted into a plurality of pieces of information possibly disclosed in the legal document, so that the matching accuracy of the user information and the legal document is improved, and the accuracy of inquiring the complaint condition of the user is improved.

Fig. 1 schematically illustrates an application scenario diagram of an information matching method, apparatus, medium and program product according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a terminal device 101, a terminal device 102, a terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide communication links between the terminal device 101, the terminal device 102, the terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal device 101, the terminal device 102, the terminal device 103, to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on terminal devices 101, 102, 103.

Terminal device 101, terminal device 102, terminal device 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the terminal device 101, the terminal device 102, the terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the information matching method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the information matching apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The information matching method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal device 101, the terminal device 102, the terminal device 103, and/or the server 105. Accordingly, the information matching apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal device 101, the terminal device 102, the terminal device 103, and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The information matching method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 7 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a flow chart of an information matching method according to an embodiment of the present disclosure.

As shown in fig. 2, the information matching method of this embodiment includes operations S210 to S240.

In operation S210, a first preprocessing operation is performed on user information, resulting in a first information set.

In some embodiments, the user information may be personal information reserved by the user when submitting the business application, and may include, for example, the name, the identification card number, and the contact information of the user. The user information can be further disassembled to obtain more characteristic information which can be used for reflecting the user identity information from the limited user information, so that the quality and the quantity of the first information are improved, and the matching between the information is completed better.

Specifically, performing the preprocessing operation on the user information includes: disassembling the user information based on a preset rule to obtain a plurality of sub-information; and obtaining the first information of the user based on the sub information, and constructing a first information set.

In the specific implementation process, the meaning of the digits corresponding to different positions in the identification card number or the contact way is different, for example, more user personal information can be obtained by further disassembling the identification card number and the contact way. For example, the first six digits in the identification card number represent the native place of the user (province district), the middle digit is the date of birth of the user, the last digit is the gender of the user, the first seven digits of the mobile phone number are the number attribution, etc. And obtaining the first information of the user information by disassembling and identifying the user information.

It should be noted that, in the embodiment of the present disclosure, the user information is acquired after the user agrees. Before performing the operation of operation S210, a request for acquiring user information is sent to a user; in case the user agrees or authorizes that the user information can be obtained, the user information is obtained and the operation 210 is performed.

In operation S220, the structured information is extracted from the plurality of texts to obtain a plurality of second information sets, where each second information set includes a plurality of structured information, and the structured information extracted from the same text belongs to the same second information set.

In some embodiments, the text is unstructured data that may be obtained from public channels, where the text may include, for example, referee documents, execution cases, lists of trusted executives, and the like. Because the text is unstructured data and different types of text formats are different, information matching based on a unified algorithm is not suitable, and the text needs to be correspondingly processed to extract needed structured information from the text so as to execute subsequent information matching operation.

In the implementation process, the extracted structured information is classified and stored based on the texts, and the structured information extracted from the same text is stored in the same second information set, so that the matching efficiency between the user information and the texts is improved.

In operation S230, the matching degree between the first information set and each second information set is calculated, so as to obtain a matching value of each second information set.

In some embodiments, the second set of information includes principal features in text, the first set of information includes current user features, and the degree of matching of the second set of information to the user information is determined by calculating a similarity value for each feature of the user to the principal.

In operation S240, a target text matching the user is determined based on the matching value.

In some embodiments, a higher match value for the second set of information means that the more similar the principal in the text to which the second set of information corresponds to the current user, i.e., the greater the probability that the principal is the current user. Screening a second information set with the matching value larger than a preset threshold value, and determining a text corresponding to the second information set as a target text.

According to the embodiment of the disclosure, the user information is disassembled to obtain more user information, so that the feature quantity which can be used for user matching is improved, the accuracy of information matching is effectively improved, and the gap with larger dimension difference between the user information collected when a financial institution handles business and the principal information disclosed in legal documents is made up. For example, legal documents typically disclose the native place of a party, and financial institutions typically select a fill item when they acquire user information, which may result in the loss of native data in the user information. According to the embodiment of the disclosure, the existing user information is disassembled, so that the user information is more comprehensive, and information matching is facilitated. Furthermore, on the basis of obtaining the user information, the method and the device provide structural processing for the published legal text, screen target entities containing key information from the legal text, and screen documents meeting requirements from the published documents by comparing the target entities with the user information, so that the accuracy of information matching is effectively improved, and the range of risk investigation of users is reduced to a certain extent.

Fig. 3 schematically illustrates a flow chart of extracting structured information to obtain a plurality of second information sets, according to an embodiment of the disclosure.

As shown in fig. 3, the extraction of the structured information of this embodiment results in a plurality of second information sets, including operations S310 to S330.

In operation S310, text is converted into a text sequence.

In some embodiments, a preprocessing operation is performed on the text to extract a plurality of valid words from the text, a word vector for each valid word is constructed based on a word embedding model, and the plurality of word vectors are aggregated to obtain a text sequence of the text.

In a specific implementation process, the text preprocessing operation includes: dividing the text into single words or phrases, removing stop words in the text, performing word drying treatment on the text, and the like. Removing the stop words in the text comprises deleting some common words without actual meaning in the text, such as ' and ', and ' and the like, wherein the stop words can be set according to actual requirements. The word stem processing of the text comprises the following steps: the plural number of some nouns, different tenses of verbs, etc. are removed to obtain morphemes of the word. The text is preprocessed to remove useless information, so that the purity of a text sequence is ensured, and the interference of the useless information in the subsequent processing process is avoided.

In operation S320, text features are extracted from the text sequence, and entity labeling is performed in the text sequence based on the text features, resulting in a plurality of labeled entities.

In some embodiments, the text features at least comprise sentence features and word features, wherein the sentence features refer to features extracted from sentences and capable of representing semantic information of the sentences, the sentence features can reflect meaning and structure of the whole sentences, and relations among words can be better understood when entity labeling is performed. Word characteristics refer to characteristics extracted from individual words that represent word sense information, and are used to reflect the meaning, context, and relationship of each word to other words. The sentence features and the word features are adopted to carry out entity labeling together, so that the accuracy of entity labeling can be effectively improved.

In operation S330, a target entity is extracted from the plurality of labeled entities, resulting in a second set of information for the text.

In some embodiments, the second information set corresponding to the text is constructed by extracting structural information needed by information matching from the text sequence based on the entity type of the named entity.

In the implementation process, the text has a long space, and correspondingly, the named entity types are rich, and a plurality of named entities of different types are provided. Some named entity types are not used in information matching, so that when the second information set is constructed, the named entities need to be screened based on the named entity types to ensure the purity of the second information set.

FIG. 4 schematically illustrates a flow diagram for deriving a plurality of annotated entities based on textual features, according to an embodiment of the disclosure.

As shown in fig. 4, the text feature-based obtaining of the plurality of annotated entities according to the embodiment includes operations S410 to S420.

In operation S410, word relationships in the sentence are determined based on the sentence characteristics;

In operation S420, named entities in the sentence are identified based on the word relation and the word characteristics, and the named entities are labeled in type, so as to obtain a plurality of labeled entities.

In some embodiments, since the text in the present disclosure is a text with encrypted information, when only word features are adopted, a problem of inaccurate word meaning recognition caused by different word forms from common formats may occur, and some word features that can be used for information matching are ignored, which may result in low accuracy of named entity labeling, thereby affecting accuracy of information matching.

Accordingly, the present disclosure proposes text labeling in a text sequence based on word features and sentence features together. The sentence characteristics can comprise grammar structures and semantic roles of sentences, and relations between verbs and noun phrases can be acquired by carrying out grammar analysis and semantic role labeling on the sentences, so that entities in the sentences can be better identified. Word characteristics may provide specific information for each word to enable efficient determination of entity boundaries and types. The text sequence is labeled by adopting the sentence characteristics and the word characteristics together, so that the accuracy of identifying and labeling the named entities can be effectively improved.

The text "guo" is mentioned as litigation in 12 th 2010, and if text labeling is performed by using word features only, the text "guo" may be ignored and filtered as useless information because it is different from the common name format. However, as can be seen from the sentence characteristics, guo should appear as a name in the sentence, so that it is marked as a named entity, and guo is determined as a named entity of a name type.

The extraction of sentence features can be implemented based on convolutional neural networks (Convolutional Neural Network, CNN) or attention mechanisms (Attention Mechanism). When CNN is used, a sentence can be represented as a sequence of vectors, and the dimension of the input sentence is gradually reduced by applying convolution operation and pooling operation, so as to finally obtain a sentence characteristic representing the whole sentence. The attention mechanism is to utilize the attention weight to weight and calculate the representation of each word in the sentence, and then to weight average the representation of all the words to obtain a sentence characteristic representing the whole sentence, and the sentence characteristic can capture the contribution degree of each word in the sentence to the whole sentence, so as to better represent the semantic information of the sentence.

FIG. 5 schematically illustrates a flow diagram for building a second set of information based on annotated entities, according to an embodiment of the disclosure.

As shown in fig. 5, the second information set is constructed based on the labeled entities according to the embodiment, which includes operations S510 to S520.

In operation S510, a plurality of target entities are extracted from the annotated entities based on the first information.

In operation S520, a second information set is constructed based on the plurality of target entities.

In some embodiments, the number of named entities obtained based on the text is large, the types are rich, only a named entity with the same type as the first information in a small department is extracted from the named entities as the named entity for executing subsequent feature matching, interference of redundant entities on information matching is eliminated, and accuracy of information matching is effectively improved.

In the implementation process, the first identified second information set can be stored in a pre-constructed database so as to be convenient for subsequent repeated use, so that the computing resources and computing time are saved, and the efficiency of information matching is improved.

Fig. 6 schematically illustrates a flowchart of calculating the degree of matching of the first information with each of the second information sets according to an embodiment of the present disclosure.

As shown in fig. 6, the calculation of the matching degree between the first information and each second information set in this embodiment includes operations S610 to S630.

In operation S610, a data pair is constructed based on a data type, wherein the data pair includes first information and second information of the same data type.

In some embodiments, second information of the same data type as the first information is looked up, and a data pair is constructed based on the queried second information and the first information.

For example, taking the first information as the user name as an example, the data type of the user name is the name, so that the second information with the data type of the name is searched in the second information set, and the searched second information and the first information are constructed into a data pair for subsequent similarity comparison.

In operation S620, similarity comparison is performed on the first information and the second information in the data pair, so as to obtain a similarity value of the second information.

In some embodiments, the similarity of the first information and the second information in the first data pair is compared, and a similarity value of the second information is obtained based on a preset rule, wherein the more the same fields in the second information and the first information are, the higher the similarity value of the second information is. For example, the user name in the first information is "Wang Xiaoming", if the person name in the second information is "wang x", the similarity value of the second information is 2, and if the person name in the second information is "wang x", the similarity value of the second information is 1.

In operation S630, weighted summation is performed on the similarity values of the second information, so as to obtain the matching degree of the second information set, where weights corresponding to different types of second information are different.

In some embodiments, a similarity value of each second information in the second information set is obtained, and the similarity values of the second information are weighted and summed to obtain the matching degree of the second information set. The weight of the second information is determined based on the data type of the second information, the importance degree of the information matching by different data types is different, and the weight corresponding to the data type with higher importance degree is higher.

Fig. 7 schematically illustrates a flowchart of determining target text corresponding to a user based on a degree of matching according to an embodiment of the present disclosure.

As shown in fig. 7, the target text corresponding to the user is determined based on the degree of matching of this embodiment, including operations S710 to S730.

In operation S710, a text corresponding to the second information set with the matching degree greater than the preset threshold is obtained.

In operation S720, the first information is filled in to an encryption location of the text.

In operation S730, in case of successful filling, it is determined that the text is the target text.

In some embodiments, in addition to matching user information based on the matching degree, the present disclosure also proposes that the matched text information is verified through backfilling of the user information, for example, whether the backfilled information is reasonable or not may be determined based on context, and so on. If the backfilled information can be fused in the current text (for example, the logic and the like of the original disclosure content in the text cannot be influenced, the text is correctly filled, the text is determined to be a target text for later statistics of user complaints and the like, and the matched text is verified through the backfilling of the first information, so that the accuracy of information matching can be further improved.

Based on the information matching method, the disclosure also provides an information matching device. The device will be described in detail below in connection with fig. 8.

Fig. 8 schematically shows a block diagram of the structure of an information matching apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the information matching apparatus 800 of this embodiment includes a preprocessing module 810, an extraction module 820, a calculation module 830, and a determination module 840.

The preprocessing module 810 is configured to perform a preprocessing operation on user information to obtain a first information set, where the first information set includes a plurality of first information capable of reflecting user identities. In an embodiment, the preprocessing module 810 may be used to perform the operation S210 described above, which is not described herein.

The extracting module 820 is configured to extract target entities from a plurality of texts to obtain a plurality of second information sets, where each second information set includes a plurality of target entities, and the extracted target entities in the same text belong to the same second information set. In an embodiment, the extracting module 820 may be used to perform the operation S220 described above, which is not described herein.

The calculating module 830 is configured to calculate matching degrees between the first information set and the plurality of second information sets, respectively, to obtain a matching value of each second information set. In an embodiment, the calculating module 830 may be configured to perform the operation S230 described above, which is not described herein.

The determination module 840 is for determining a target text that matches the user based on the matching value. In an embodiment, the determining module 840 may be configured to perform the operation S240 described above, which is not described herein.

Any of the preprocessing module 810, the extraction module 820, the calculation module 830, and the determination module 840 may be combined in one module to be implemented, or any of them may be split into a plurality of modules, according to embodiments of the present disclosure. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the preprocessing module 810, the extraction module 820, the calculation module 830, and the determination module 840 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Or at least one of the preprocessing module 810, the extraction module 820, the calculation module 830 and the determination module 840 may be at least partially implemented as computer program modules which, when executed, may perform the corresponding functions.

As shown in fig. 9, an electronic device 900 according to an embodiment of the present disclosure includes a processor 901 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the disclosure, the electronic device 900 may also include an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the item recommendation method provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. An information matching method, the method comprising:

Preprocessing the user information to obtain a first information set, wherein the first information set comprises a plurality of first information capable of reflecting the user identity;

extracting target entities from a plurality of texts to obtain a plurality of second information sets, wherein each second information set comprises a plurality of target entities, and the target entities extracted from the same text belong to the same second information set;

Respectively calculating the matching degree of the first information set and the plurality of second information sets to obtain a matching value of each second information set;

And determining target text matched with the user based on the matching value.

2. The method of claim 1, wherein performing a preprocessing operation on the user information to obtain a first information set comprises:

Disassembling the user information based on a preset rule to obtain a plurality of sub-information;

And obtaining the first information of the user based on the sub information, and constructing the first information set.

3. The method for matching information according to claim 2, wherein extracting target information from a plurality of texts to obtain a plurality of second information sets comprises:

The following is performed for each text:

converting the text into a sequence of text;

extracting text features from the text sequence, and labeling entities in the text sequence based on the text features to obtain a plurality of labeled entities;

And extracting target entities from the plurality of marked entities to obtain a second information set of the text.

4. The method for matching information according to claim 3, wherein the text features include sentence features and word features, the entity labeling is performed in the text sequence based on the text features, and a plurality of labeled entities are obtained, including:

determining word relations in the sentence based on the sentence characteristics;

And identifying the named entities in the sentence based on the word relation and the word characteristics, and carrying out type labeling on the named entities to obtain a plurality of labeled entities.

5. The method for matching information according to claim 4, wherein extracting the target entity from the plurality of texts to obtain a plurality of second information sets comprises:

extracting a plurality of target entities from the noted entities based on the first information;

The second set of information is constructed based on the plurality of target entities.

6. The information matching method according to claim 1, wherein the calculating the matching degree of the first information set and the plurality of second information sets, respectively, includes:

the following operations are performed on each second information set:

constructing a data pair based on the data type, wherein the data pair comprises first information and second information with the same data type;

performing similarity comparison on the first information and the second information in the data pair to obtain a similarity value of the second information;

and carrying out weighted summation on the similarity values of the second information to obtain the matching degree of the second information set, wherein the weights corresponding to the second information of different types are different.

7. The information matching method according to claim 6, wherein the determining a target text corresponding to the user based on the degree of matching includes:

Acquiring a text corresponding to a second information set with the matching degree larger than a preset threshold;

filling the first information into an encryption position of the text;

And under the condition of successful filling, determining the text as a target text.

8. The method according to any one of claims 1-7, further comprising:

sending a request for acquiring user information to a user;

And under the condition that the user agrees to the request for acquiring the user information, acquiring the user information and executing a first preprocessing operation on the user information.

9. An information matching apparatus, the apparatus comprising:

The preprocessing module is used for executing preprocessing operation on the user information to obtain a first information set, wherein the first information set comprises a plurality of pieces of first information capable of reflecting the user identity;

The extraction module is used for extracting target entities from a plurality of texts to obtain a plurality of second information sets, wherein each second information set comprises a plurality of target entities, and the extracted target entities in the same text belong to the same second information set;

the computing module is used for respectively computing the matching degree of the first information set and the plurality of second information sets to obtain a matching value of each second information set; and

And the determining module is used for determining target text matched with the user based on the matching value.

10. An electronic device, comprising:

one or more processors;

Storage means for storing one or more computer programs,

Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to any one of claims 1-8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1-8.