CN114491318B

CN114491318B - Determination method, device, equipment and storage medium of target information

Info

Publication number: CN114491318B
Application number: CN202111547303.1A
Authority: CN
Inventors: 顾杰; 史亚冰; 蒋烨; 柴春光; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2023-09-01
Anticipated expiration: 2041-12-16
Also published as: CN114491318A

Abstract

The disclosure provides a method, a device, equipment and a storage medium for determining target information, and belongs to the technical field of computers, in particular to the technical field of artificial intelligence such as deep learning and knowledge graph. The specific implementation scheme is as follows: according to the target keywords contained in the received information to be matched, candidate information is determined in a database; respectively forming matching pairs by the information to be matched and each piece of candidate information; in each matching pair, ordering the contents in the information to be matched and the candidate information according to a preset rule to obtain two groups of ordering results in the matching pair; and comparing the information to be matched in each matching pair with the candidate information according to the two groups of sequencing results in each matching pair, and determining target information in the candidate information according to the comparison result. Has better disambiguation effect.

Description

Determination method, device, equipment and storage medium of target information

Technical Field

The disclosure relates to the field of computer technology, in particular to the technical field of artificial intelligence such as deep learning and knowledge graph. And more particularly, to a method, apparatus, device, and storage medium for determining target information.

Background

In the process of referring or updating the target information, it is necessary to correlate the input information with the stored target information. In the conventional technology, the conditions of excessive association quantity, association errors and the like exist in the association process, so that the recall rate is poor.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a storage medium for determining target information.

According to an aspect of the present disclosure, there is provided a method of determining target information, which may include the steps of:

according to the target keywords contained in the received information to be matched, candidate information is determined in a database;

respectively forming matching pairs by the information to be matched and each piece of candidate information;

in each matching pair, ordering the contents in the information to be matched and the candidate information according to a preset rule to obtain two groups of ordering results in the matching pair;

and comparing the information to be matched in each matching pair with the candidate information according to the two groups of sequencing results in each matching pair, and determining target information in the candidate information according to the comparison result.

According to another aspect of the present disclosure, there is provided a target information determining apparatus, which may include:

the candidate information determining module is used for determining candidate information in the database according to the target keywords contained in the received information to be matched;

the matching pair building module is used for respectively forming matching pairs with the information to be matched and each piece of candidate information;

the sorting module is used for sorting the contents of the information to be matched and the candidate information in each matching pair according to a preset rule to obtain two groups of sorting results in the matching pair;

and the target information determining module is used for comparing the information to be matched in each matching pair with the candidate information according to the two groups of sequencing results in each matching pair, and determining target information in the candidate information according to the comparison result.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

The technology according to the present disclosure can overcome the defects of generalization and poor versatility. The method is reused for comparison among different objects, and has a good disambiguation effect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of determining target information according to the present disclosure;

FIG. 2 is a flow chart of a manner of determining target keywords according to the present disclosure;

FIG. 3 is a flow chart of a comparison process according to the present disclosure;

FIG. 4 is one of the flow charts for deriving feature determination results in accordance with the present disclosure;

FIG. 5 is a schematic illustration of deriving a first text feature according to the present disclosure;

FIG. 6 is a second flow chart for obtaining feature determination results in accordance with the present disclosure;

FIG. 7 is a schematic illustration of obtaining feature determination results according to the present disclosure;

FIG. 8 is a schematic diagram of a determining apparatus of target information according to the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing a method of determining target information of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, the present disclosure relates to a method of determining target information, which may include the steps of:

s101: according to the target keywords contained in the received information to be matched, candidate information is determined in a database;

s102: respectively forming matching pairs by the information to be matched and each piece of candidate information;

s103: in each matching pair, ordering the contents in the information to be matched and the candidate information according to a preset rule to obtain two groups of ordering results in the matching pair;

s104: and comparing the information to be matched in each matching pair with the candidate information according to the two groups of sequencing results in each matching pair, and determining target information in the candidate information according to the comparison result.

The execution subject of the above-described scheme of the present disclosure may be a device for performing information matching disambiguation. Such as a smart phone, tablet computer, server, or the like.

The information to be matched may include query information. Taking inquiry movies or dramas as an example, the information to be matched may include titles, for example, "movies XXX recorded in 2021", "dramas XX", or "titles XX", etc. The information to be matched may also include the names of actors or directors, etc. Alternatively, the information to be matched may also include story obstruction, etc. In addition, the information to be matched can also be a poster of a movie or a television show, etc.

The expression form of the information to be matched can be words, sounds, images and the like. Taking text as an example, the information to be matched may be structured text or unstructured text.

The target keyword in the information to be matched can be determined according to a predetermined rule. For example, in the case where the name of a movie or a television show is included in the information to be matched, the name may be used as the target keyword. Alternatively, in the case where the name of the actor or director is contained in the information to be matched, the names of the director and the starring actor may be used as target keywords. Wherein, the starring actors can include actors with high heat in male number, female number or near time, etc. That is, the determination manner of the target keyword may include various manners, and particularly flexibly changes according to the use requirement.

Still taking the foregoing query movie or television show as an example, the database may be a movie or television show related database. Movie or television show information in the database includes, but is not limited to, title, show time, show platform, director information, starring actor information, drama information, duration information (album number, duration per album, movie duration), issuing company, story synopsis, etc. The content in the database may be in the form of a knowledge graph.

And obtaining candidate information related to the target keywords in the database by utilizing the target keywords in the information to be matched. By way of example, distributed searching may be employed to increase search efficiency. The distribution search may include an elastiscearch.

And respectively forming matching pairs by the information to be matched and each searched candidate information. That is, if M pieces of candidate information are retrieved, the number of matching pairs is M. The i-th matching pair contains information to be matched and i-th candidate information. Wherein M is not less than 1 and is a positive integer, i is not less than 1 and not more than M.

For each matching pair, the contents of the matching pair may be ordered according to a predetermined rule. Illustratively, the predetermined rule may be ordering according to importance. For example, the ranking may be in order of name, starring actor information, director information, time of day, etc., according to importance.

The purpose of ordering according to the predetermined rules is to ensure the integrity of the important information. For example, due to storage limitations, model computing power limitations, etc., the content in the information to be matched and the candidate information may be pruned. For example, only the first 200 characters, etc., are reserved. Therefore, by ordering the content in the matching pair according to a predetermined rule, it is possible to avoid important information being deleted in the deletion process to affect the accuracy of disambiguation.

And comparing the information to be matched in each matching pair with the candidate information according to the two groups of sequencing results in each matching pair. The comparison may be a similarity comparison, a difference comparison, or the like. By comparing the comparison result of each matching pair, the target information can be determined among the plurality of candidate information.

Through the process, the defects of generalization and poor generality can be overcome. The method is reused for comparison among different objects, and has a good disambiguation effect.

As shown in fig. 2, in one embodiment, the determining manner of the target keyword involved in step S101 may include the following procedures:

s201: preprocessing the received information to be matched to obtain a preprocessing result, wherein the preprocessing result is used for displaying the information to be matched in a natural language form;

s202: and determining the target keywords in the preprocessing result by utilizing a preset keyword determination rule.

In one scenario, the received information to be matched may be structured information. Still taking the foregoing query movie or television show as an example. For example, for an "show time" of an episode with a title of "movie XXX," the structured information may be represented as "< movie XXX, dateplumped, 2015-9-19>". The preprocessing of the received information to be matched may be converting the structured information into natural language information. Correspondingly, in the case where the structured information may be represented as "< movie XXX, dateplaid, 2015-9-19>", the natural language information may be represented as "the showing time of movie XXX is 2015, 9 months, 19 days".

The preset keyword determination rule may be in accordance with importance of the content, viewing heat of the content, or the like. For example, for movies, the importance of the title is highest, the viewing heat of the starring actor is highest, etc. By utilizing a preset keyword determination rule, target keywords can be determined in the preprocessing result, so that different types of target keywords can be determined.

Through the process, different preset keyword determination rules can be adopted, and candidate information can be queried according to the searching purpose.

In one embodiment, step S103 may specifically include the following procedure:

and when at least one of the information to be matched and the candidate information has the length exceeding the corresponding length threshold, sequencing the information to be matched and/or the candidate information exceeding the corresponding length threshold according to a preset rule.

The length threshold may be determined based on the accuracy requirements of the comparison, the speed requirements of the comparison, the computational power of the comparison model, and the like. In case the information length in the information to be matched and/or the candidate information exceeds the corresponding length threshold, it may happen that the information in the information to be matched and/or the candidate information is truncated. In order to meet the requirement that important information can be reserved during interception, when the length of the information exceeds the corresponding length threshold, the information to be matched and/or the candidate information which exceeds the corresponding length threshold are ordered according to a preset rule.

For example, the length threshold is L. And in the case that at least one of the information to be matched and the candidate information has the information length exceeding L, reordering the content in the information to be matched and/or the candidate information which exceeds the information length exceeding L.

The reordering may be performed according to a predetermined rule. For example, the predetermined rule may be the importance of the content, the viewing heat of the content, or the last update date of the content, etc. Illustratively, where the predetermined rule is the importance of the content, the reordering may be in order of name, director, story ileus, etc.

Through the process, the defect that important information is lost in the process of information interception can be overcome, and the corresponding important information can be reserved according to the requirement in the process of comparison.

As shown in fig. 3, in one embodiment, step S104 may include the following process:

s301: splitting each group of sequencing results in the matched pair according to the data processing capability to obtain N splitting results; n is a positive integer not less than 1;

s302: for each splitting result, carrying out feature determination to obtain a feature determination result;

s303: and comparing the information to be matched in the matching pair with the candidate information by utilizing the characteristic determination result.

The data processing capability may be the number of words that can be processed for a single time during the disambiguation process. And splitting each group of sequencing results in the matched pair according to the data processing capacity to obtain N splitting results. Typically, N may be set to 2. Namely, in the ith matching pair, the sorted information to be matched is split into 2 groups, and the sorted ith candidate information is split into 2 groups. For text characterization, the number of words that can be handled at a single time is typically about 512 characters. Setting N to 2 satisfies the similarity comparison between the input text within 1000 characters in length. Whereby most of the disambiguation needs can be fulfilled. In addition, in the case where the number of characters corresponding to the sorting result of any group in the matching pair is smaller than the aforementioned single processing capability, n=1 may be set, that is, the splitting result is 1, and splitting is not required. Otherwise, if n=2 still cannot satisfy the single processing capability, the value of N may be adjusted according to the actual situation.

For each split result, a feature determination is made. Namely, in each splitting result, the characteristics of the information to be matched and the characteristics of the candidate information are respectively extracted, and the characteristic determining result corresponding to the splitting result is obtained.

And comparing the information to be matched in the matched pair with the candidate information by utilizing the characteristic determination result of each split result, wherein the comparison result can be a similarity value or a difference value and the like.

And under the condition that N is not less than 2, combining the characteristic extraction results of each splitting result to obtain a final characteristic extraction result.

Through the above process, the comparison of the information to be matched and the candidate information in each matching pair can be realized. By comparing the results, disambiguation can be achieved.

As shown in fig. 4, in one embodiment, step S302 may include the following process:

s401: determining a first text feature of each split result using a knowledge-enhanced semantic representation model (ERNIE);

s402: determining a second text feature of each split result using a string matching model (Pattern);

s403: and taking the first text characteristic and the second text characteristic as characteristic determination results of each split result.

A schematic diagram of determining the first text feature of each split result using the ERNIE model is shown in connection with fig. 5. In fig. 5, an arbitrary matching pair (i-th matching pair) is taken as an example, and the split result in the matching pair is n=1. Query A may represent a natural language string of information to be matched in the ith matching pair, and Query B may represent a natural language string of candidate information in the ith matching pair. Taking the processing of the Query A as an example, word segmentation processing is performed on the Query A to obtain D word segmentation results (token) which respectively correspond to the A_part_1 to the A_part_D. Principles of the ERNIE model include capturing vocabulary and semantic information from an input token using an underlying text Encoder (T-Encoder). The knowledge Encoder (K-Encoder) at the upper layer is responsible for integrating the knowledge information guided by the external token into the text information output by the lower layer, so that the characteristic determination result of the split result can be obtained. The external token may be information acquired from a third party, including the association between different entity information. An entity generally refers to a real thing that can be identified by a name, such as a person's name, a work name, an organization's name, etc., and a broadly named entity also includes a time expression, a numerical expression, an address, etc.

The text encoder may include multiple layers of transducer units, each transducer unit in the first layer sequentially corresponding to an input token, the transducer units in the same layer not being connected to each other, and the transducer units in different layers being connected to each other in pairs. The output result of the last layer of transducer unit is averaged (Avg-Pooling shown in fig. 5) to obtain an average result U/U corresponding to each token ₁ To U/U _D . For the average result (U/U ₁ To U/U _D ) And splicing to obtain the first text characteristic of the splitting result. In the above description, n=1 is taken as an example, where N > 1, each splitting result may be spliced to obtain the final first text feature U of the information (Query a) to be matched in each matching pair.

Similarly, the final first text feature V of the candidate information (Query B) in each matching pair may be obtained.

For each matching pair, the vector difference mean of the first text feature of the information to be matched and the candidate information may be represented as |U-V|.

In addition, for the information to be matched and the candidate information in the ith matching pair, a character string matching model (Pattern) can be used for feature determination so as to determine the second text feature of each splitting result. Pattern is used for solving the problem of disambiguation which is difficult to solve by simple disambiguation and deep learning text matching models, and the characteristics used by Pattern are calculated by special attribute interaction comparison operators. The second text characteristic of each split result is determined by using the Pattern, and the first text characteristic can be effectively supplemented.

Finally, the first text feature and the second text feature may be used as feature determination results for each split result.

Through the process, the characteristics of the information to be matched and the candidate information can be extracted from different dimensions, so that the support of the underlying data can be provided for disambiguation.

As shown in fig. 6, in an embodiment, in a case that the information to be matched and the candidate information include non-text information, the method may further include the following steps:

s601: respectively determining the characteristics of non-text information in the information to be matched in the matching pair and the characteristics of non-text information in the candidate information;

s602: and taking the characteristics of the non-text information in the information to be matched in the matching pair and the characteristics of the non-text information in the candidate information as characteristic determining results.

The non-text information may include image information or the like. For the image information, the characteristic of the non-text information in the information to be matched and the characteristic of the non-text information in the candidate information can be calculated according to the image characteristic determining model, and the characteristic is taken as a characteristic determining result. Alternatively, as shown in fig. 7, the cosine similarity between the information to be matched in the matching pair and the non-text information in the candidate information may be calculated using a model for image feature determination (mobilet), and the calculation result is denoted as "non-text information feature similarity W" in fig. 7.

In addition, the non-text information may include voice information, (arabic) digital information, and the like. For the non-text information, the processing mode may include performing text conversion first, and then performing subsequent processing according to the processing procedure of the text information. For voice information, a process of identifying the reader may also be included. For example, if the information to be matched is a dialogue in a television show or a movie, a reader tag can be added to the voice recognition result. The reader tag can be a character 1 or a character 2, or a character man or a character woman. For finer granularity recognition, the names of the speakers, such as actor A, actor B, etc., may also be recognized. Thereby providing richer data support for disambiguation.

And combining the vector difference average value |U-V| of the first text features of the information to be matched and the candidate information in each matching pair, so that multi-mode information of the information to be matched and the candidate information in each matching pair can be obtained.

The similarity of the information to be matched and the candidate information in each matching pair can be obtained by using a binary classification model (Softmax). For example, according to the result output by the classification model, the similarity between the information to be matched and the candidate information in the i-th matching pair is the highest, or the difference is the smallest, and the candidate information in the i-th matching pair can be determined as the target information.

In one embodiment, the candidate information includes entity information in a knowledge-graph.

For example, the information to be matched may be a multi-modal entity, and the candidate information includes entity information in a knowledge graph. Finally, the application aims to correlate the received multi-modal entity with the entity in the knowledge graph so as to carry out entity recording and application of the knowledge graph.

The entity disambiguation technology plays an important role in knowledge graph construction and application scenes. Such as the listing of knowledge maps, intelligent questions and answers based on knowledge maps, intelligent customer service, etc. Through the process of the application, the entity disambiguation task can obtain high accuracy and recall rate.

As shown in fig. 8, the present disclosure relates to a target information determining apparatus, which may include:

a candidate information determining module 801, configured to determine candidate information in a database according to a target keyword included in the received information to be matched;

a matching pair constructing module 802, configured to form matching pairs of the information to be matched and each piece of candidate information;

the sorting module 803 is configured to sort the contents in the information to be matched and the candidate information in each matching pair according to a predetermined rule, so as to obtain two sets of sorting results in the matching pair;

the target information determining module 804 is configured to compare the information to be matched in each matching pair with the candidate information according to the two sets of sorting results in each matching pair, and determine the target information in the candidate information according to the comparison result.

In one embodiment, the candidate information determination module 801 may include:

the preprocessing sub-module is used for preprocessing the received information to be matched to obtain a preprocessing result, and the preprocessing result is used for displaying the information to be matched in a natural language form;

and the target keyword determination submodule is used for determining target keywords in the preprocessing result by utilizing preset keyword determination rules.

In one embodiment, the sorting module 803 is specifically configured to:

In one embodiment, the target information determination module 804 may include:

the splitting module is used for splitting each group of sorting results in the matched pair according to the data processing capacity to obtain N splitting results; n is a positive integer not less than 1;

the characteristic determination submodule is used for carrying out characteristic determination on each splitting result to obtain a characteristic determination result;

and the comparison sub-module is used for comparing the information to be matched in the matching pair with the candidate information by utilizing the characteristic determination result.

In one embodiment, the feature determination submodule may include:

a first text feature determining unit for determining a first text feature of each split result using a knowledge-enhanced semantic representation model (ERNIE);

a second text feature determining unit for determining a second text feature of each split result using a character string matching model (Pattern);

and the feature determination execution unit is used for taking the first text feature and the second text feature as feature determination results of each splitting result.

In one embodiment, in the case that the information to be matched and the candidate information include non-text information, the target information determining module 804 may further include:

respectively determining the characteristics of non-text information in the information to be matched and the characteristics of the non-text information in the candidate information;

and taking the characteristics of the non-text information in the information to be matched and the characteristics of the non-text information in the candidate information as characteristic determining results.

The acquisition, storage, application and the like of the personal information of the user, which are related to the technical proposal of the present disclosure, meet the regulations of the related laws and regulations, and do not violate the popular public order.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 910 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 920 or a computer program loaded from a storage unit 980 into a Random Access Memory (RAM) 930. In the RAM 930, various programs and data required for the operation of the device 900 may also be stored. The computing unit 910, ROM 920, and RAM 930 are connected to each other by a bus 940. An input/output (I/O) interface 950 is also connected to bus 940.

Various components in device 900 are connected to I/O interface 950, including: an input unit 960, such as a keyboard, mouse, etc.; an output unit 970 such as various types of displays, speakers, and the like; a storage unit 980, such as a magnetic disk, optical disk, etc.; and a communication unit 990 such as a network card, modem, wireless communication transceiver, etc. Communication unit 990 allows device 900 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 910 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 910 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 910 performs the respective methods and processes described above, for example, a determination method of target information. For example, in some embodiments, the method of determining target information may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 980. In some embodiments, some or all of the computer program may be loaded and/or installed onto device 900 via ROM 920 and/or communication unit 990. When the computer program is loaded into the RAM 930 and executed by the computing unit 910, one or more steps of the above-described method of determining target information may be performed. Alternatively, in other embodiments, the computing unit 910 may be configured to perform the method of determining the target information in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of determining target information, comprising:

in each matching pair, the contents in the information to be matched and the candidate information are segmented respectively to obtain a plurality of segmentation results; under the condition that the lengths of the information to be matched and the candidate information exist exceeds the corresponding length threshold, sorting word segmentation results of the information to be matched and the candidate information which exceed the corresponding length threshold according to a preset rule, and obtaining two groups of sorting results in the matching pair; the predetermined rule includes at least one of importance of the content, viewing heat of the content, and a last update date of the content;

and comparing the information to be matched in each matching pair with candidate information according to two groups of sorting results in each matching pair, and determining target information in the candidate information according to the comparison result.

2. The method of claim 1, wherein the determining the target keyword comprises:

preprocessing the received information to be matched to obtain a preprocessing result, wherein the preprocessing result is used for displaying the information to be matched in a natural language form;

and determining the target keywords in the preprocessing result by utilizing a preset keyword determination rule.

3. The method of claim 1, wherein comparing the information to be matched and the candidate information in each of the matching pairs according to the two sets of ordering results in each of the matching pairs comprises:

splitting each group of sorting results in the matching pair according to the data processing capability to obtain N splitting results; n is a positive integer not less than 1;

for each splitting result, carrying out feature determination to obtain a feature determination result;

and comparing the information to be matched in the matching pair with the candidate information by utilizing the characteristic determination result.

4. A method according to claim 3, wherein said performing a feature determination for each of said split results comprises:

determining a first text feature of each split result by using a knowledge enhancement semantic representation model ERNIE;

determining a second text feature of each split result by using a character string matching model;

and taking the first text characteristic and the second text characteristic as characteristic determination results of each splitting result.

5. The method of claim 3, wherein in the case that the information to be matched and the candidate information include non-text information, the comparing the information to be matched and the candidate information in each of the matching pairs further includes:

respectively determining the characteristics of non-text information in the information to be matched in the matching pair and the characteristics of non-text information in the candidate information;

and taking the characteristics of the non-text information in the information to be matched in the matching pair and the characteristics of the non-text information in the candidate information as the characteristic determining results.

6. The method of any one of claims 1 to 5, wherein the candidate information comprises entity information in a knowledge-graph.

7. A target information determining apparatus, comprising:

the matching pair building module is used for respectively forming matching pairs by the information to be matched and each piece of candidate information;

the sorting module is used for respectively segmenting the contents in the information to be matched and the candidate information in each matching pair to obtain a plurality of segmentation results; under the condition that the lengths of the information to be matched and the candidate information exist exceeds the corresponding length threshold, sorting word segmentation results of the information to be matched and the candidate information which exceed the corresponding length threshold according to a preset rule, and obtaining two groups of sorting results in the matching pair; the predetermined rule includes at least one of importance of the content, viewing heat of the content, and a last update date of the content;

and the target information determining module is used for comparing the information to be matched in each matching pair with the candidate information according to the two groups of sorting results in each matching pair, and determining target information in the candidate information according to the comparison result.

8. The apparatus of claim 7, wherein the candidate information determination module comprises:

and the target keyword determination submodule is used for determining the target keywords in the preprocessing result by utilizing a preset keyword determination rule.

9. The apparatus of claim 7, wherein the target information determination module comprises:

the splitting module is used for splitting each group of sorting results in the matching pair according to the data processing capacity to obtain N splitting results; n is a positive integer not less than 1;

10. The apparatus of claim 9, wherein the feature determination submodule comprises:

a first text feature determining unit, configured to determine a first text feature of each of the splitting results by using a knowledge-enhanced semantic representation model ERNIE;

a second text feature determining unit, configured to determine a second text feature of each of the splitting results using a string matching model;

11. The apparatus of claim 9, wherein in a case where the information to be matched and the candidate information include non-text information, the target information determining module further comprises:

12. The apparatus according to any one of claims 7 to 11, wherein the candidate information comprises entity information in a knowledge-graph.

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 6.