CN112949716A

CN112949716A - Similarity evaluation method, system, terminal device and computer readable storage medium

Info

Publication number: CN112949716A
Application number: CN202110230897.7A
Authority: CN
Inventors: 杨双仕; 徐雷; 程筱彪
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2021-03-02
Filing date: 2021-03-02
Publication date: 2021-06-11

Abstract

The embodiment of the disclosure provides a similarity evaluation method, a system, a terminal device and a computer readable storage medium, wherein the method comprises the following steps: respectively acquiring feature data of a plurality of targets to be evaluated, wherein the feature data of each target to be evaluated comprises a feature set consisting of a plurality of features and the number of the features; respectively calculating feature similarity values between each target to be evaluated and all other targets to be evaluated based on the feature data of the plurality of targets to be evaluated; and respectively screening the target to be evaluated with the highest characteristic similarity value from all other targets to be evaluated based on the characteristic similarity values between each target to be evaluated and all other targets to be evaluated, and taking the target to be evaluated with the highest similarity value as the target to be evaluated. When the feature similarity between the targets is calculated, the feature set and the feature set quantity value between the targets are considered, and the problems that the evaluation result is deviated when the two targets have different total feature quantities are solved at least.

Description

Similarity evaluation method, system, terminal device and computer readable storage medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a similarity evaluation method, a similarity evaluation system, a terminal device, and a computer-readable storage medium.

Background

Aiming at the analysis and evaluation of the similarity of the target characteristics, the analysis result can be applied to various recommendation, classification and other application processes, such as the operation fields of e-commerce, games and movies.

In the related technology, the target feature similarity evaluation method is generally based on an Euclidean distance algorithm, a cosine algorithm and the like, the similarity is evaluated according to the principle of the method mainly based on the number of the same features of two targets, and the above scheme does not consider the factor of the total feature number of the two targets, so that the feature similarity evaluation result has a large error.

Disclosure of Invention

The present disclosure provides a similarity evaluation method, system, terminal device, and computer-readable storage medium to at least solve the above-mentioned problems.

According to an aspect of the embodiments of the present disclosure, there is provided a similarity evaluation method, including:

respectively acquiring feature data of a plurality of targets to be evaluated, wherein the feature data of each target to be evaluated comprises a feature set consisting of a plurality of features and the number of the features;

respectively calculating feature similarity values between each target to be evaluated and all other targets to be evaluated based on the feature data of the plurality of targets to be evaluated; and the number of the first and second groups,

and respectively screening the target to be evaluated with the highest characteristic similarity value from all other targets to be evaluated based on the characteristic similarity values between each target to be evaluated and all other targets to be evaluated, and taking the target to be evaluated with the highest similarity value as the target to be evaluated.

In one embodiment, the respectively obtaining feature data of a plurality of targets to be evaluated includes:

respectively acquiring initial characteristic data of a plurality of targets to be evaluated;

respectively preprocessing the initial feature data of the plurality of targets to be evaluated to obtain a plurality of features and feature quantity of each target to be evaluated;

respectively establishing a feature set of each target to be evaluated based on a plurality of features of each target to be evaluated, wherein the same features are numbered in the feature set;

determining the feature quantity in the feature set of each target to be evaluated based on the feature quantity of each target to be evaluated; and the number of the first and second groups,

and obtaining the feature data of each target to be evaluated based on the feature set of each target to be evaluated and the feature quantity in the feature set.

In one embodiment, the feature similarity value between each target to be evaluated and all other targets to be evaluated is calculated based on the feature data of the targets to be evaluated, and is obtained according to the following formula:

p＝α*(a∩b)/(a∪b)+β*[1/(|Na-Nb|+1)] α,β∈[0,1]

in the formula, p represents the feature similarity between the target a to be evaluated and the target B to be evaluated, a represents the feature set of the target a to be evaluated, B represents the feature set of the target B to be evaluated, Na represents the feature quantity in the feature set of the target a to be evaluated, Nb represents the feature quantity in the feature set of the target B to be evaluated, Na-Nb | represents the difference between the feature quantities of the target a to be evaluated and the target B to be evaluated, and α and β are adjustment factors.

In one embodiment, after respectively calculating feature similarity values between each target to be evaluated and all other targets to be evaluated based on the feature data of the plurality of targets to be evaluated, the method further includes:

and respectively carrying out similarity ranking on each target to be evaluated and all other targets to be evaluated based on the feature similarity value between each target to be evaluated and all other targets to be evaluated.

According to another aspect of the embodiments of the present disclosure, there is provided a similarity evaluation system including:

the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for respectively acquiring the characteristic data of a plurality of targets to be evaluated, and the characteristic data of each target to be evaluated comprises a characteristic set consisting of a plurality of characteristics and the number of the characteristics;

the calculation module is arranged for respectively calculating the feature similarity value between each target to be evaluated and all other targets to be evaluated based on the feature data of the plurality of targets to be evaluated; and the number of the first and second groups,

and the evaluation module is arranged for screening out the target to be evaluated with the highest characteristic similarity value from all other targets to be evaluated respectively based on the characteristic similarity values between each target to be evaluated and all other targets to be evaluated, and taking the target to be evaluated with the highest similarity value as the target to be evaluated.

In one embodiment, the obtaining module includes:

an acquisition unit configured to acquire initial feature data of a plurality of targets to be evaluated, respectively;

the processing unit is arranged for respectively preprocessing the initial feature data of the plurality of targets to be evaluated to obtain a plurality of features and feature quantity of each target to be evaluated;

the system comprises an establishing unit, a judging unit and a judging unit, wherein the establishing unit is used for respectively establishing a feature set of each target to be evaluated based on a plurality of features of each target to be evaluated, and the same features are numbered in the feature sets;

a determination unit configured to determine the number of features in the feature set of each target to be evaluated based on the number of features of each target to be evaluated; and the number of the first and second groups,

the obtaining unit is further configured to obtain feature data of each target to be evaluated based on the feature set of each target to be evaluated and the number of features in the feature set.

p＝α*(a∩b)/(a∪b)+β*[1/(|Na-Nb|+1)] α,β∈[0,1]

In one embodiment, the method further comprises:

and the sequencing module is arranged to sequence the similarity of each target to be evaluated and all other targets to be evaluated respectively based on the feature similarity values of each target to be evaluated and all other targets to be evaluated after the calculation module calculates the feature similarity values of each target to be evaluated and all other targets to be evaluated.

According to still another aspect of the embodiments of the present disclosure, there is provided a terminal device including a memory and a processor, the memory having a computer program stored therein, and the processor executing the similarity evaluation method when the processor runs the computer program stored in the memory.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the processor executes the similarity evaluation method.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the similarity evaluation method provided by the embodiment of the disclosure obtains the feature data of a plurality of targets to be evaluated respectively, wherein the feature data of each target to be evaluated comprises a feature set consisting of a plurality of features and the number of the features; respectively calculating feature similarity values between each target to be evaluated and all other targets to be evaluated based on the feature data of the plurality of targets to be evaluated; and respectively screening the target to be evaluated with the highest characteristic similarity value from all other targets to be evaluated based on the characteristic similarity values between each target to be evaluated and all other targets to be evaluated, and taking the target to be evaluated with the highest similarity value as the target to be evaluated. When the feature similarity between the targets is calculated, the feature set and the feature set quantity value between the targets are considered, and the problems that the evaluation result is deviated when the two targets have different total feature quantities are solved at least.

Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosed embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the example serve to explain the principles of the disclosure and not to limit the disclosure.

Fig. 1 is a schematic flow chart of a similarity evaluation method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a similarity evaluation method according to another embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a similarity evaluation system according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of the obtaining module 31 in fig. 3;

fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, specific embodiments of the present disclosure are described below in detail with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order; also, the embodiments and features of the embodiments in the present disclosure may be arbitrarily combined with each other without conflict.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of explanation of the present disclosure, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

At present, target feature similarity evaluation methods are all based on an euclidean distance algorithm, a cosine algorithm and the like, the principle of the methods is that similarity evaluation is performed based on the number of the same features of two targets, and the methods do not consider factors of the total feature number of the two targets, such as: if the target A has 5 features, the target B has 10 features, and the target C has 100 features, it would be expected that the similarity of the target A, B is consistent with the similarity of the target A, C if the targets B and C both contain 5 features of the target A, but in reality, the similarity of the target A, B is much greater than that of the target A, C, and the evaluation result has a large deviation.

In order to solve the above problem, in the similarity evaluation method provided in the embodiment of the present disclosure, the feature data of the target to be evaluated is acquired, and the feature similarity between the targets is calculated by considering the feature set and the feature set quantity value, so as to improve the accuracy of the similarity result, as shown in fig. 1, the method includes steps S101 to S103.

In step S101, feature data of a plurality of targets to be evaluated are respectively obtained, where the feature data of each target to be evaluated includes a feature set composed of a plurality of features and a number of the features therein.

Assuming that the feature similarity of the target a and the target B needs to be evaluated, first, the related feature data of the target a and the target B are respectively obtained, where the feature data of the target a includes feature sets and feature set quantity values (feature quantities in the feature sets) of all features in the target a, and the feature data of the target B includes feature sets and feature set quantity values of all features in the target B, where the feature sets are sets of all features of the target to be evaluated, and the feature set quantity values are total quantity values of all features in the feature sets. The feature data of the target to be evaluated acquired in this embodiment includes a feature set and a feature set quantity value, so that the feature set and the feature set quantity value between the targets are considered when calculating the feature similarity between the targets, thereby avoiding the problems of deviation of evaluation results and the like caused when two targets have different total feature quantities.

In step S102, feature similarity values between each target to be evaluated and all other targets to be evaluated are calculated based on the feature data of the plurality of targets to be evaluated, respectively; and the number of the first and second groups,

in step S103, the target to be evaluated with the highest feature similarity value is screened from all other targets to be evaluated based on the feature similarity values between each target to be evaluated and all other targets to be evaluated, and the target to be evaluated with the highest feature similarity value is selected as the target to be evaluated with the highest similarity.

In this embodiment, a feature similarity value between targets to be evaluated is calculated according to a feature set and a feature set quantity value of each target to be evaluated based on a preset similarity calculation method, where a person skilled in the art can set the similarity value according to actual conditions.

In one embodiment, after step S103, the method further includes the following steps:

In this embodiment, the similarity between the objects is obtained by sorting the similarity values p from large to small according to the feature similarity values p calculated above, where the larger the value p is, the higher the similarity between the two objects is.

In the embodiment, the similarity contrast mode of the similarity contrast between the single features of the target a and the target B is not particularly limited, and cosine similarity or other similarity algorithms may be adopted.

In one embodiment, step S102 is obtained according to the following formula:

p＝α*(a∩b)/(a∪b)+β*[1/(|Na-Nb|+1)] α,β∈[0,1]

Referring to fig. 2, fig. 2 is a flowchart illustrating a similarity evaluation method according to another embodiment of the present disclosure, and compared with the previous embodiment, this embodiment provides a specific example of obtaining feature data, specifically, the feature data of a plurality of targets to be evaluated are respectively obtained (step S101), and the steps are further divided into steps S101a-S101 e.

In step S101a, initial feature data of a plurality of targets to be evaluated are acquired, respectively.

In step S101b, the initial feature data of the multiple targets to be evaluated are preprocessed, respectively, to obtain multiple features and feature quantities of the multiple targets to be evaluated.

Specifically, for example, the target to be evaluated is a text, the initial feature data may be data obtained by segmenting the text to be evaluated, the initial feature data is filtered and cleaned to obtain a plurality of features for similarity evaluation with other targets to be evaluated, and feature quantities of the plurality of features are obtained.

In step S101c, a feature set of each target to be evaluated is respectively established based on a plurality of features of each target to be evaluated, wherein the same features in the feature set are numbered the same.

Specifically, a feature set is established for each target to be evaluated, the feature sets are numbered according to the following mode, and the same features are numbered consistently.

For example, the feature set of the object a to be evaluated is: a is {1,2,3,4 … }, and the feature set of the target B to be evaluated is: b ═ 2,3,5,6 … }.

In step 101d, determining the feature quantity in the feature set of each target to be evaluated based on the feature quantity of each target to be evaluated; and the number of the first and second groups,

in step 101e, feature data of each target to be evaluated is obtained based on the feature set of each target to be evaluated and the number of features therein.

Based on the same technical concept, the embodiment of the present disclosure correspondingly provides a similarity evaluation system, as shown in fig. 3, the system includes:

the acquiring module 31 is configured to acquire feature data of a plurality of targets to be evaluated respectively, where the feature data of each target to be evaluated includes a feature set composed of a plurality of features and the number of the features therein;

a calculating module 32 configured to calculate feature similarity values between each target to be evaluated and all other targets to be evaluated based on the feature data of the plurality of targets to be evaluated; and the number of the first and second groups,

the evaluation module 33 is configured to screen out the target to be evaluated with the highest feature similarity value from all other targets to be evaluated based on the feature similarity values between each target to be evaluated and all other targets to be evaluated, and use the target to be evaluated with the highest feature similarity value as the target to be evaluated with the highest similarity.

In one embodiment, the obtaining module 31 includes:

an obtaining unit 311 configured to obtain initial feature data of a plurality of targets to be evaluated, respectively;

a processing unit 312, configured to perform preprocessing on the initial feature data of the multiple targets to be evaluated, respectively, to obtain multiple features and feature quantities of the multiple targets to be evaluated;

an establishing unit 313 configured to establish feature sets of the respective targets to be evaluated, respectively, based on a plurality of features of the respective targets to be evaluated, wherein the same features are numbered in the feature sets;

a determination unit 314 configured to determine the number of features in the feature set of each target to be evaluated based on the number of features of each target to be evaluated; and the number of the first and second groups,

the obtaining unit 311 is further configured to obtain feature data of each target to be evaluated based on the feature set of each target to be evaluated and the number of features therein.

p＝α*(a∩b)/(a∪b)+β*[1/(|Na-Nb|+1)] α,β∈[0,1]

In one embodiment, the system further comprises:

Based on the same technical concept, the embodiment of the present disclosure correspondingly provides a terminal device, as shown in fig. 5, the terminal device includes a memory 51 and a processor 52, the memory 51 stores a computer program, and when the processor 52 runs the computer program stored in the memory 51, the processor 52 executes the similarity evaluation method.

Based on the same technical concept, embodiments of the present disclosure correspondingly provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the processor executes the similarity evaluation method.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. A similarity evaluation method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the respectively obtaining feature data of a plurality of targets to be evaluated comprises:

3. The method according to claim 1 or 2, wherein the feature similarity value between each target to be evaluated and all other targets to be evaluated is calculated based on the feature data of the plurality of targets to be evaluated, and is obtained according to the following formula:

p＝α*(a∩b)/(a∪b)+β*[1/(|Na-Nb|+1)]α,β∈[0,1]

4. The method according to claim 1, wherein after the feature similarity value between each target to be evaluated and all other targets to be evaluated is calculated based on the feature data of the plurality of targets to be evaluated, the method further comprises:

5. A similarity evaluation system, comprising:

6. The system of claim 5, wherein the acquisition module comprises:

7. The system according to claim 5 or 6, wherein the feature similarity value between each target to be evaluated and all other targets to be evaluated is calculated based on the feature data of the plurality of targets to be evaluated, and is obtained according to the following formula:

p＝α*(a∩b)/(a∪b)+β*[1/(|Na-Nb|+1)]α,β∈[0,1]

8. The system of claim 5, further comprising:

9. A terminal device characterized by comprising a memory in which a computer program is stored and a processor that executes the similarity evaluation method according to any one of claims 1 to 4 when the processor runs the computer program stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored, wherein, when the computer program is executed by a processor, the processor executes the similarity evaluation method according to any one of claims 1 to 4.