CN107665247B - Article recall method and device and electronic equipment - Google Patents

Article recall method and device and electronic equipment Download PDF

Info

Publication number
CN107665247B
CN107665247B CN201710847727.7A CN201710847727A CN107665247B CN 107665247 B CN107665247 B CN 107665247B CN 201710847727 A CN201710847727 A CN 201710847727A CN 107665247 B CN107665247 B CN 107665247B
Authority
CN
China
Prior art keywords
article
recalled
identification information
original
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710847727.7A
Other languages
Chinese (zh)
Other versions
CN107665247A (en
Inventor
邓哲宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201710847727.7A priority Critical patent/CN107665247B/en
Publication of CN107665247A publication Critical patent/CN107665247A/en
Application granted granted Critical
Publication of CN107665247B publication Critical patent/CN107665247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Abstract

The embodiment of the invention provides an article recall method, an article recall device and electronic equipment, wherein the method comprises the following steps: acquiring the characteristics of an original article; aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature; aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled; processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair; and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article. By applying the embodiment of the invention, the technical problem of overlarge calculation unit in the prior art is solved.

Description

Article recall method and device and electronic equipment
Technical Field
The present invention relates to the field of information retrieval technologies, and in particular, to an article recall method and apparatus, and an electronic device.
Background
With the development of technology, the number of similar articles contained in the system is increasing, and when a user searches for an original article, how to recall other articles to be recalled, which are similar to the original article, to the user is an urgent technical problem to be solved.
At present, when the similarity evaluation between the article to be recalled and the original article is required, a plurality of features of the original article, such as Feature-1, Feature-2, …, and Feature-n, may be determined, and then n article lists to be recalled are formed according to all the articles to be recalled corresponding to each of the n features. And then generating n original article identification information and article list pairs to be recalled from the original article and the n article lists to be recalled, sending each original article identification information and article list pair to different actuators, so that the actuators calculate the similarity between the original article id-1 and each article in the corresponding article list to be recalled, and determining the article with higher similarity with the original article in all the article lists to be recalled as the article to be recalled.
However, in the prior art, since the original item and the to-be-recalled item list are used as a computing unit for computing, in general, the to-be-recalled item list includes identification information of a plurality of to-be-recalled items, and thus, the prior art has a technical problem that the computing unit is too large.
Disclosure of Invention
The embodiment of the invention aims to provide an article recalling method, an article recalling device and electronic equipment, so as to achieve the purpose of splitting a computing unit. The specific technical scheme is as follows:
in a first aspect, to achieve the above object, an embodiment of the present invention provides an article recall method, where the method includes:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article.
Optionally, in a specific implementation manner of the embodiment of the present invention, before generating, for each of the to-be-recalled articles, an article identification information pair by using the identification information of the original article and the identification information of each of the to-be-recalled articles, the method further includes:
processing to obtain a first quantity corresponding to the characteristic according to the product of the specific gravity of the characteristic in all relevant characteristics of the original item and the total quantity of the items to be recalled corresponding to the characteristic;
generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled aiming at each article to be recalled, wherein the method comprises the following steps:
for each feature, sequencing the articles to be recalled according to the sequence of similarity of the articles to be recalled and the feature from large to small to obtain a first sequence;
and for each first sequence, pairing the identification information of the to-be-recalled articles corresponding to the first number of sequences in the first sequence with the identification information of the original articles respectively to generate a first number of article identification information pairs.
Optionally, in a specific implementation manner of the embodiment of the present invention, the processing to obtain the first number corresponding to the feature according to a product of a specific gravity of the feature in all relevant features of the original article and a total number of the articles to be recalled corresponding to the feature includes:
by means of the formula (I) and (II),
Figure BDA0001412443660000021
and processing to obtain a first number corresponding to the feature, wherein,
rec _ len is a first number corresponding to the feature; fea _ len is the total number of the articles to be recalled corresponding to the characteristics;
Figure BDA0001412443660000031
and the score is the specific gravity of the feature in all relevant features of the original article, the score _ arg is a preset score corresponding to the feature, and the avg _ score is the average value of the preset scores corresponding to all relevant features of the original article.
Optionally, in a specific implementation manner of the embodiment of the present invention, before processing each pair of item identification information by using a similarity algorithm to obtain a similarity between the original item and the original item contained in the pair of item identification information and the item to be recalled, the method further includes:
and deleting the same article identification information pair as the article identification information pair aiming at each article identification information pair.
In a second aspect, to achieve the above object, an embodiment of the present invention provides another article recall method, including:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
according to the sequence of similarity from large to small, sequencing the articles to be recalled in the article identification information pairs to obtain a second sequence;
and taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
In a third aspect, to achieve the above object, an embodiment of the present invention provides an article recall apparatus, including: a first obtaining module, a second obtaining module, a generating module, a first processing module and a first setting module,
the first acquisition module is used for acquiring the characteristics of the original article;
the second obtaining module is configured to, for each feature, obtain an article to be recalled corresponding to the feature according to a similarity between the article to be recalled and the feature;
the generating module is used for generating an article identification information pair by aiming at each article to be recalled and the identification information of the original article and the identification information of each article to be recalled;
the first processing module is configured to, for each pair of article identification information, perform processing by using a similarity algorithm to obtain a similarity between the original article and the article to be recalled that are included in the pair of article identification information;
the first setting module is used for taking the article to be recalled, the similarity of which with the original article is greater than a preset threshold value, as a target recall article.
Optionally, in a specific implementation manner of the embodiment of the present invention, the apparatus further includes a second processing module, configured to process to obtain a first number corresponding to the feature according to a product of a specific gravity of the feature in all relevant features of the original article and a total number of articles to be recalled corresponding to the feature;
the generation module comprises: a sorting unit and a generating unit, wherein,
the sorting unit is used for sorting the articles to be recalled according to the similarity of the articles to be recalled and the features from large to small according to each feature to obtain a first sequence;
the generating unit is configured to, for each first sequence, pair identification information of the to-be-recalled articles corresponding to a first number of previous orders in the first sequence with identification information of the original article, respectively, and generate a first number of article identification information pairs.
Optionally, in a specific implementation manner of the embodiment of the present invention, the second processing module is further configured to:
by means of the formula (I) and (II),
Figure BDA0001412443660000041
and processing to obtain a first number corresponding to the feature, wherein,
rec _ len is a first number corresponding to the feature; fea _ len is the total number of the articles to be recalled corresponding to the characteristics;
Figure BDA0001412443660000051
and the score is the specific gravity of the feature in all relevant features of the original article, the score _ arg is a preset score corresponding to the feature, and the avg _ score is the average value of the preset scores corresponding to all relevant features of the original article.
Optionally, in a specific implementation manner of the embodiment of the present invention, the apparatus further includes: and the deleting module is used for deleting the object identification information pair which is the same as the object identification information pair aiming at each object identification information pair.
In a fourth aspect, to achieve the above object, an embodiment of the present invention further provides an article recall apparatus, where the apparatus includes: a first obtaining module, a second obtaining module, a generating module, a first processing module, a sorting module and a second setting module,
the first acquisition module is used for acquiring the characteristics of the original article;
the second obtaining module is configured to, for each feature, obtain an article to be recalled corresponding to the feature according to a similarity between the article to be recalled and the feature;
the generating module is used for generating an article identification information pair by aiming at each article to be recalled and the identification information of the original article and the identification information of each article to be recalled;
the first processing module is configured to, for each pair of article identification information, perform processing by using a similarity algorithm to obtain a similarity between the original article and the article to be recalled that are included in the pair of article identification information;
the sorting module is used for sorting the articles to be recalled in the article identification information pairs according to the sequence of the similarity from large to small to obtain a second sequence;
and the second setting module is used for taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
In a fifth aspect, to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor, configured to implement the steps of the method for article recall according to the first aspect when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform any of the above-described article recall methods.
In yet another aspect of the present invention, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the steps of the method for realizing article recall according to the first aspect.
In a sixth aspect, to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the article recall method of the second aspect when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform any of the above-described article recall methods.
In yet another aspect of the present invention, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the steps of the method for realizing article recall according to the second aspect.
According to the article recall method, the article recall device and the electronic equipment provided by the embodiment of the invention, the original article and each article to be recalled are generated into an article identification information pair, and one article identification information pair is used as a calculation unit. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first article recall method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a first article recall method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the correspondence relationship between each feature of an original item and an item to be recalled according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a second article recall method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a second article recall method according to an embodiment of the present invention;
FIG. 7 is a flowchart illustrating a third article recall method according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating a fourth article recall method according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a first article recall device according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a second article recall device according to an embodiment of the present invention;
FIG. 11 is a schematic structural view of a third article recall device according to an embodiment of the present invention;
FIG. 12 is a schematic structural view of a fourth article recall device according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In the prior art, for each feature of an original article, a to-be-recalled article list corresponding to the feature is formed according to the similarity between each to-be-recalled article and the feature, and then an original article-to-be-recalled article list pair is generated. For example, for Feature1 of the original item, the list of items to be recalled that generates Feature1 is item ID11 to be recalled, item ID12 to be recalled, item ID13 to be recalled; and the generated original article-article to be recalled list pair is as follows: original item a- (to-recall item ID11, to-recall item ID12, to-recall item ID 13). Since the original item may have a plurality of features, a plurality of original item-to-recall item list pairs are generated, each original item-to-recall item list pair is then sent to an actuator, and the actuator calculates the similarity between the original item and each of the to-recall items contained therein. Because the original article-to-be-recalled article list pair is a computing unit, the prior art has the technical problem that the computing unit cannot be split.
Fig. 1 is a schematic view of an application scenario of an embodiment of the present invention, and as shown in fig. 1, an article recall system includes an actuator 11, an actuator 12, and an actuator 13; a recall server 20 and an item information storage server 30; the actuator 11, the actuator 12 and the actuator 13 are used for calculating the similarity between the article to be recalled and the original article; the recall server 20 is used to generate an item identification information pair; the item information storage server 30 stores identification information of each of a plurality of items to be recalled and characteristic information of each item. The recall server 20 obtains, for Feature1 of the original item, an item ID11 to be recalled, an item ID12 to be recalled, and an item ID13 to be recalled that match Feature1, based on the similarities between the features and the plurality of items to be recalled stored in the item information storage server 11; then, the original article and each article to be recalled are respectively formed into an article identification information pair, the three generated article identification information pairs are respectively sent to the actuator 11, the actuator 12 and the actuator 13, and the similarity calculation steps are executed in parallel by the actuators. For example, the original item-to-recall item ID11 may be sent to the actuator 11; sending the original item-to-recall item ID12 to the actuator 12; the original item-to-recall item ID13 is sent to the actuator 13. Respectively calculating the similarity between an original article and an article to be recalled in one article identification information pair by using three actuators; if the original article has a plurality of characteristics, the original article and the article to be recalled corresponding to other characteristics of the original article are processed according to the method. Then, the recall server 20 takes the article to be recalled, whose similarity to the original article is greater than the preset threshold, as the target recall article, according to the result of the similarity calculation of each actuator. According to the embodiment of the invention, the original article and each article to be recalled generate an article identification information pair, the task of similarity calculation is divided into the smallest calculation units, and compared with the prior art that the original article and the article to be recalled are listed as one calculation unit, the technical problem that the calculation units are too large in the prior art is solved.
In order to solve the problems in the prior art, embodiments of the present invention provide an article recall method, an article recall device, and an electronic apparatus.
Fig. 2 is a schematic flow chart of a first article recall method according to an embodiment of the present invention, and fig. 3 is a schematic diagram of a first article recall method according to an embodiment of the present invention; as shown in fig. 2 and 3, the method includes:
s201: characteristics of the original article are obtained.
Illustratively, a plurality of features of the original article may be acquired, such as Feature1 (Feature 1), Feature2 (Feature 2), Feature3 (Feature 3), and Feature4 (Feature 4). In general, in order to improve the accuracy of the article recall, each feature of the original article, which is sufficient to characterize the original article, may also be obtained in this step, and therefore, the feature related to the embodiment of the present invention is a feature related to the article recall as needed, and may not include a feature unrelated to the article recall. It can be understood that, in practical application, the number of the features and the category of the features of the original article may be adjusted according to actual requirements, and the embodiment of the present invention does not limit the number of the features and the category of the features of the original article, and any method capable of characterizing the feature selection of the original article may be used. For example, the characteristics of the original article may be the color and shape of the article, or the title of the file, the uploading time of the file, the author of the file, the hash value of the file, the key frame of the video file, and the like, and a part or all of them may be selected as the characteristics of the original article as required.
In addition, the original item is an item for which the user inputs item information into the item recall system, the purpose of the user inputting the item information is to recall a target recall item similar to the original item, and the original item includes but is not limited to videos, pictures and texts.
This step corresponds to the patch (a type of application) data entry section in FIG. 3.
S202: and aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature.
For example, a plurality of recalled articles with similarity greater than a set value to Feature1 may be obtained according to the similarity between all the articles to be recalled and the Feature1 of the original article, for example, fig. 4 is a corresponding relationship diagram of each Feature of the original article and the article to be recalled provided by the embodiment of the present invention, and as shown in fig. 4, the step obtains the articles to be recalled matched with Feature1 as an article to be recalled ID11, an article to be recalled ID12, and an article to be recalled ID 13. In general, an article to be recalled may have a plurality of features, so that the features of the article to be recalled should be matched with Feature1 one by one, and if one of the features of an article has a similarity greater than a set value with Feature1, the article to be recalled may be regarded as the article to be recalled corresponding to Feature 1.
Similarly, items to be recalled corresponding to other features of the original item, such as Feature2, Feature3, and Feature4, are also processed in the manner described above.
For example, as shown in fig. 4, the to-be-recalled item matching Feature2 obtained in this step may be an item-to-be-recalled ID21, an item-to-be-recalled ID 22; the item to be recalled that matches Feature3 may be item to be recalled ID31, item to be recalled ID 32; the article to be recalled matching Feature4 may be article to be recalled ID41, article to be recalled ID 42.
In practical application, all the articles to be recalled corresponding to Feature1 may be sorted in the descending order of similarity to Feature1 to obtain a sequence of articles to be recalled corresponding to Feature1, and then the articles to be recalled corresponding to the sequence with the first set number of orders are taken as the articles to be recalled corresponding to Feature 1. In addition, in the embodiment of the present invention, the highest value of similarity may be set to 1, the lowest value may be set to 0, and the article to be recalled, which has a similarity degree of 1 with Feature1 in the sequence of articles to be recalled corresponding to Feature1, may be used as the article to be recalled corresponding to Feature1, so that the number of articles to be recalled corresponding to Feature1 may be further reduced. Each relevant feature of the original article in the embodiment of the present invention may be understood as a feature of the original article acquired as needed.
It is understood that the similarity with Feature1 may be the similarity or matching degree of features in the article to be recalled with Feature 1; and the number of items to be recalled for Feature1 includes, but is not limited to, three. In practical applications, the similarity degree of the features in the article to be recalled and Feature1 can be evaluated by using a similarity algorithm, for example, an existing similarity algorithm can be used for evaluation, and the embodiment of the present invention is not limited herein to the evaluation method for the similarity degree of the features in the article to be recalled and Feature1, as long as a result of obtaining the similarity degree of the features in the article to be recalled and Feature1 can be achieved.
S203: and aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled.
For example, if the identification information of the original article is M, the identification information may be an ID number, an MD5 value, or a hash value, and any information that can uniquely identify the original article may be used as the identification information of the original article, and the embodiment of the present invention is not limited to the identification information here. And (3) respectively forming the to-be-recalled articles obtained in the step S202 of the original article into article identification information pairs, wherein the obtained article identification information pairs are as follows: original article M-to-recall article ID11, original article M-to-recall article ID12, original article M-to-recall article ID11, original article M-to-recall article ID13, original article M-to-recall article ID21, original article M-to-recall article ID22, original article M-to-recall article ID31, original article M-to-recall article ID32, original article M-to-recall article ID41, and original article M-to-recall article ID 42.
S204: and processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair.
Illustratively, the similarity between the original item contained in original item M-to-be-recalled item ID11, original item M-to-be-recalled item ID12, original item M-to-be-recalled item ID11, original item M-to-be-recalled item ID13, original item M-to-be-recalled item ID21, original item M-to-recall item ID22, original item M-to-recall item ID31, original item M-to-recall item ID32, original item M-to-recall item ID41, and original item M-to-recall item ID42 and each of the items to be recalled is calculated using a similarity calculation method, respectively. For example, for the original article M — the item to be recalled ID11, the similarity of the original article M and the item to be recalled ID11 may be calculated using a similarity algorithm; for the original article M-the article to be recalled ID12, the similarity of the original article M and the article to be recalled ID12 may be calculated using a similarity algorithm; for the original article M — the article to be recalled ID13, the similarity between the original article M and the article to be recalled ID13 may be calculated by using a similarity algorithm, and similarly, the other pairs of identification information of the original article and the article to be recalled may be processed in the manner described above.
For example, the obtained similarity between the original article and the article to be recalled ID11 is 0.7, the similarity between the original article and the article to be recalled ID12 is 0.4, the similarity between the original article and the article to be recalled ID13 is 0.8, the similarity between the original article and the article to be recalled ID21 is 0.2, the similarity between the original article and the article to be recalled ID22 is 0.64, the similarity between the original article and the article to be recalled ID31 is 0.31, the similarity between the original article and the article to be recalled ID32 is 0.27, the similarity between the original article and the article to be recalled ID41 is 0.48, and the similarity between the original article and the article to be recalled ID42 is 0.72.
In general, each pair of article identification information is sent to different actuators respectively to calculate the similarity of the articles. The actuator can acquire corresponding article information according to the received article identification information pair, and then perform similarity calculation on the original article and the article to be recalled included in the article identification information pair by using a similarity calculation method, so as to obtain a similarity value of the article identification information pair. Commonly used similarity algorithms include, but are not limited to: euclidean distance algorithms, manhattan distance algorithms, minkowski distance algorithms, cosine similarity algorithms, and the like.
In addition, when a plurality of original article-to-be-recalled article list pairs of an original article are sent to different actuators in the prior art, since the number of to-be-recalled articles included in a list of to-be-recalled articles of a certain feature of the original article is relatively large, it takes a long time to calculate the actuator of the original article-to-be-recalled article list pair corresponding to the feature, and data skew is caused. By applying the embodiment of the invention, the original article and each article to be recalled form article identification information pairs, each article identification information pair is respectively sent to different actuators to carry out calculation of article similarity in parallel, each actuator only calculates one article identification information pair, and the time consumption for calculating the similarity of each actuator is relatively close, so that the data inclination is avoided, and the problem of slow calculation of a certain actuator or the problem of failure of an article recall task caused by slow calculation of a certain actuator is also avoided.
As shown in fig. 3, this step corresponds to the data scatter part of fig. 3.
S205: and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article.
Illustratively, the target recall item is a recall-to-be-recalled item of among items to be recalled ID11, item to be recalled ID12, item to be recalled ID13, item to be recalled ID21, item to be recalled ID22, item to be recalled ID31, item to be recalled ID32, item to be recalled ID41, and item to be recalled ID42 which have a similarity to the original item of greater than 0.5, which is a preset threshold value. Since the similarity between the original article and the article to be recalled ID11 obtained in the step S204 is 0.7, the similarity between the original article and the article to be recalled ID12 is 0.4, the similarity between the original article and the article to be recalled ID13 is 0.8, the similarity between the original article and the article to be recalled ID21 is 0.2, the similarity between the original article and the article to be recalled ID22 is 0.64, the similarity between the original article and the article to be recalled ID31 is 0.31, the similarity between the original article and the article to be recalled ID32 is 0.27, the similarity between the original article and the article to be recalled ID41 is 0.48, and the similarity between the original article and the article to be recalled ID42 is 0.72. Notably, article ID11 to be recalled, article ID13 to be recalled, article ID22 to be recalled, article ID32 to be recalled, article ID42 to be recalled are similar to the original article by more than 0.5, so article ID11 to be recalled, article ID13 to be recalled, article ID22 to be recalled, article ID32 to be recalled, article ID42 to be recalled are targeted recall articles.
By applying the embodiment of the invention, the articles to be recalled with smaller similarity to the original article can be removed, the number of the articles to be recalled with the similarity needing to be calculated is reduced, and the calculation amount in the article recalling process is further reduced.
In general, a list pair of original item-to-recall item is generated in the prior art for each relevant feature of the original item. Since the original item may have a plurality of features, a plurality of original item-to-recall item list pairs are generated, each original item-to-recall item list pair is then sent to an actuator, and the actuator calculates the similarity between the original item and each of the to-recall items contained therein. Because the original article-article to be recalled list pair is a computing unit, the computing unit cannot be split; and the computing unit generates the article identification information pairs of the original article and each article to be recalled in the articles to be recalled. And then calculating the similarity of the articles according to the article identification information. In general, the number of the to-be-recalled articles contained in the to-be-recalled article list is far greater than one, and therefore, the calculation unit generated by the embodiment of the invention is the minimum calculation unit for similarity calculation, so that the calculation unit is smaller, and further, the calculation speed of the calculation unit in calculation can be improved. In addition, the smallest computing unit in the computing process of article recall is the article identification information, so that the computing unit is favorably distributed to the actuator with less available computing resources, and the computing resources of the actuator can be fully utilized.
This step corresponds to the recalled article collecting section of fig. 3.
By applying the embodiment shown in fig. 2 of the present invention, an article identification information pair is generated from an original article and each article to be recalled, and one article identification information pair is used as a computing unit.
In addition, the smallest computing unit in the process of computing the article recall by the article identification information pair does not relate to the rectangular parallel operation compared with the article recall by adopting the rectangular parallel computing mode in the prior art. The rectangular parallel operation means that each relevant characteristic of the original article and the article to be recalled corresponding to the characteristic are constructed into a matrix, then the matrix is partitioned, and each partition is distributed to an actuator for calculation. However, in the operation process, because each matrix block has correlation, a huge communication data amount is formed among each actuator, which is data expansion, and by applying the embodiment of the invention, when the article identification information pair is calculated, no communication data is generated among each article identification information pair, so that the problem of data expansion in the rectangular parallel calculation process is avoided.
In order to solve the problem of the prior art, the embodiment of the invention further provides a second article recalling method.
Fig. 5 is a schematic flowchart of a second article recall method according to an embodiment of the present invention, and fig. 6 is a schematic diagram of a second article recall method according to an embodiment of the present invention; as shown in fig. 5 and fig. 6, on the basis of the embodiment of the present invention shown in fig. 2, the method adds S206 before the step S203: and processing to obtain a first quantity corresponding to the characteristic according to the product of the specific gravity of the characteristic in all the related characteristics of the original item and the total quantity of the items to be recalled corresponding to the characteristic.
In practical applications, the original article has a plurality of features, and each of the plurality of features may have different importance in all relevant features of the original article, so that the specific gravity of each relevant feature in all relevant features of the original article may be different. Generally, if Feature1 has a greater weight among all relevant features of the original item, it is considered that the features can significantly identify the original item, that is, the item recall accuracy according to Feature1 is higher.
Illustratively, for features Feature1, Feature2, Feature3 and Feature4 of the original article, the product of the specific gravity of the Feature in all relevant features of the original article and the total number of articles to be recalled corresponding to the Feature is taken as the first number corresponding to the Feature. For example, taking Feature1 as an example, if the specific gravity of Feature1 in all relevant features of the original item is 0.45, and the number of items to be recalled corresponding to Feature1 is 7, the calculated data is 0.45 × 7 — 3.15. 3.15 may be rounded down, then Feature1 may correspond to a first number of 3; in practical application, 3.15 may be rounded up, and the first number corresponding to Feature1 may also be 4; the calculated data may also be rounded to obtain a first number corresponding to Feature 1. The other features 2, Feature3, and Feature4 of the original item are also processed in accordance with step S206 to obtain a first number for features 2, Feature3, and Feature 4.
It is understood that the specific gravity of each feature in all relevant features in the original article can be preset by a user, and can also be calculated by the article recall system according to a deep learning algorithm. In practical applications, the original article may have a plurality of features, and some features may not be able to effectively characterize the original article; or the characteristic is common to a plurality of articles to be recalled and cannot be used for characterizing the original article. On this basis, all relevant features may be understood as being a part of all features that can characterize the original article, as well as all features that can characterize the original article, or as well as all features of the original article. Which features of the original article are taken as all relevant features can be adjusted according to actual conditions when the embodiment of the invention is applied.
This step corresponds to the dynamic filtering mechanism portion of fig. 3.
Corresponding S203 includes: S203A and S203B, wherein,
S203A: and aiming at each feature, sequencing the articles to be recalled according to the sequence of similarity of the articles to be recalled and the feature from large to small to obtain a first sequence.
Illustratively, taking Feature1 as an example, for 7 to-be-recalled articles corresponding to Feature1, the 7 to-be-recalled articles are sorted in an order from large to small in similarity between each of the 7 to-be-recalled articles and Feature1, so as to obtain a first sequence corresponding to Feature 1.
The other features 2, Feature3, and Feature4 of the original article are also processed in accordance with step S203A, and a first sequence corresponding to Feature2, a first sequence corresponding to Feature3, and a first sequence corresponding to Feature4 are obtained.
S203B: and for each first sequence, pairing the identification information of the to-be-recalled articles corresponding to the first number of sequences in the first sequence with the identification information of the original articles respectively to generate a first number of article identification information pairs.
Illustratively, taking Feature1 as an example, if the first number is 3, the first 3 of the 7 items to be recalled in the first sequence for Feature1 are selected. And respectively pairing the identification information of the selected first 3 to-be-recalled articles with the original article to obtain 3 article identification information pairs.
The other features 2, Feature3, and Feature4 of the original item are also processed according to step S203B, so that a first number of item identification information pairs corresponding to Feature2, a first number of item identification information pairs corresponding to Feature3, and a first number of item identification information pairs corresponding to Feature4 are obtained.
The embodiment of the invention is particularly suitable for the condition of large quantity of recalled articles, only the first quantity of article identification information pairs corresponding to each feature are reserved, the task amount of similarity calculation is reduced, the time consumption of the article recall process can be reduced, and the workload of an article recall system can be reduced.
As shown in fig. 6, fig. 6 is divided into three parts from top to bottom, and the part above the first arrow is the first part; the part between the first arrow and the second arrow is the second part, and the other part is the third part. The first section is a recall sequence corresponding to each original article obtained when the embodiment of the present invention is applied to article recall of a plurality of original articles. For example, the first original item is id1, and its corresponding recall sequence is id1_ recall (); the second original item is id2, and its corresponding recall sequence is id2_ recall (); the nth original item is idn, and the corresponding recall sequence is idn _ recall (). The second part is the recall sequence corresponding to each original item after the first number of items to be recalled for each feature of each original item is obtained by applying the embodiment of the present invention shown in fig. 5. For example, the first original item is id1, and its corresponding recall sequence is id1_1_ recall (); the second original item is id2, and its corresponding recall sequence is id2_2_ recall (); the nth original item is idn, and the corresponding recall sequence is idn _ n _ retrieve (). The third part is to explain the generation result of the item identification information by taking the item identification information pair corresponding to the first original item id1 as an example. The first original article id1 corresponds to the article to be recalled, i, ·, idj, idk, ·, idf; the generated pair of item identification information is (id1, idi), (id1, idj), (id1, idk), (id1, idf). It should be noted that (id1, idi) is only an expression form of the pair of item identification information, and an expression form of the pair of item identification information may also be id1-idi, and the expression form of the pair of item identification information is not limited in the embodiment of the present invention.
By applying the embodiment shown in fig. 5 of the present invention, the identification information of the first number of to-be-recalled articles in the to-be-recalled articles corresponding to each Feature is paired with the identification information of the original article, so that the number of article identification information pairs corresponding to Feature1 can be reduced.
Optionally, in a specific implementation manner of the embodiment of the present invention, S206 may be S206A:
by means of the formula (I) and (II),
Figure BDA0001412443660000161
and processing to obtain a first number corresponding to the feature, wherein,
rec _ len is a first number corresponding to the feature; fea _ len is the total number of the articles to be recalled corresponding to the characteristics;
Figure BDA0001412443660000162
and the score is the specific gravity of the feature in all relevant features of the original article, the score _ arg is a preset score corresponding to the feature, and the avg _ score is the average value of the preset scores corresponding to all relevant features of the original article.
For example, taking Feature1 as an example, if the predetermined score of Feature1 is 45, the predetermined score of all relevant features of the original item is 100, and the number of items to be recalled corresponding to Feature1 is 7, the first number may be 7 (45/100) ═ 3.15.
3.15 may be rounded down, then Feature1 may correspond to a first number of 3; in practical application, 3.15 may be rounded up, and the first number corresponding to Feature1 may also be 4; the calculated data may also be rounded to obtain a first number corresponding to Feature 1. The other features 2, Feature3, and Feature4 of the original item are also processed in accordance with step S206 to obtain a first number for features 2, Feature3, and Feature 4.
By applying the embodiment shown in fig. 5 of the present invention, for each feature of the original article, the first quantity is obtained by utilizing the product of the specific gravity of the feature in all the related features of the original article and the total quantity of the articles to be recalled corresponding to the feature, and the first quantity corresponding to each related feature can be calculated conveniently.
In order to solve the problems in the prior art, the embodiment of the invention also provides a third article recalling method.
Fig. 7 is a schematic flow chart of a third article recall method according to an embodiment of the present invention, as shown in fig. 7, based on the embodiment of the present invention shown in fig. 2, in the embodiment of the present invention shown in fig. 7, before step S204, step S207 is added: and deleting the same article identification information pair as the article identification information pair aiming at each article identification information pair.
For example, there may be duplication in the article to be recalled corresponding to each related feature of the original article, that is, different features may correspond to the same article to be recalled. For example, the resulting item identification information pair corresponding to Feature1 may be: original item M-item to recall ID11, original item M-item to recall ID12, original item M-item to recall ID 13; the resulting pair of item identification information corresponding to Feature2 may be: original item M-recall item ID13, original item M-recall item ID 22. The pair of item identification information obtained above includes two "original item M — item to be recalled ID 13", and in order to reduce the amount of computation, the pair of item identification information identical to the pair of item identification information is deleted for each pair of item identification information of the original item. For example, the article identification information pair "original article M-to-be-recalled article ID 13" obtained by Feature1 corresponding to the same article identification information pair "original article M-to-be-recalled article ID 13" obtained by Feature2 may be deleted, and the article identification information pair "original article M-to-be-recalled article ID 13" obtained by Feature2 corresponding to the same article identification information pair.
In practical application, the item identification information pair "original item M-to-be-recalled item ID 13" obtained corresponding to Feature1 may also be deleted.
By applying the embodiment shown in fig. 7 of the present invention, the object identification information pair identical to the object identification information pair of the original object is deleted, so that the repeated calculation of the object identification information pair can be avoided, and the number of the object identification information pairs is reduced, thereby reducing the calculation amount during the similarity calculation.
In order to solve the problems in the prior art, the embodiment of the invention also provides a fourth article recalling method.
Fig. 8 is a schematic flow chart of a fourth article recall method according to an embodiment of the present invention, as shown in fig. 8, the method includes:
s801: characteristics of the original article are obtained.
S802: and aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature.
S803: and aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled.
S804: and processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair.
It is emphasized that steps S801, S802, S803 and S804 in the embodiment of fig. 8 of the present invention are the same as the methods performed by steps S201, S202, S203 and S204 in the embodiment of fig. 2, respectively. Therefore, all the embodiments in fig. 2 are applicable to fig. 8, and can achieve the same or similar beneficial effects, and are not described herein again.
S805: and sequencing the to-be-recalled articles contained in the article identification information pairs according to the sequence of the similarity from large to small to obtain a second sequence.
Illustratively, the articles to be recalled ID11, the articles to be recalled ID12, the articles to be recalled ID13, the articles to be recalled ID21, the articles to be recalled ID22, the articles to be recalled ID31, the articles to be recalled ID32, the articles to be recalled ID41 and the articles to be recalled ID42 are sequentially sorted from large to small in similarity with the original articles, so that a second sequence corresponding to the original articles is obtained; for example, the resulting second sequence may be:
article to be recalled ID11, article to be recalled ID12, article to be recalled ID13, article to be recalled ID21, article to be recalled ID22, article to be recalled ID31, article to be recalled ID32, article to be recalled ID41, article to be recalled ID 42.
S806: and taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
Illustratively, if the second number is 5, the first 5 to-be-recalled items in the second sequence obtained in step S805 are:
item to recall ID11, item to recall ID12, item to recall ID13, item to recall ID21, item to recall ID 22.
The five items to be recalled in the second series are targeted recall items.
By applying the embodiment of the invention, the article to be recalled with smaller similarity with the original article can be removed.
In general, a list pair of original item-to-recall item is generated in the prior art for each relevant feature of the original item. Since the original item may have a plurality of features, a plurality of original item-to-recall item list pairs are generated, each original item-to-recall item list pair is then sent to an actuator, and the actuator calculates the similarity between the original item and each of the to-recall items contained therein. Because the original article-article to be recalled list pair is a computing unit, the computing unit cannot be split; and the computing unit generates the article identification information pairs of the original article and each article to be recalled in the articles to be recalled. And then calculating the similarity of the articles according to the article identification information. In general, the number of the to-be-recalled articles contained in the to-be-recalled article list is far greater than one, and therefore, the calculation unit generated by the embodiment of the invention is the minimum calculation unit for similarity calculation, so that the calculation unit is smaller, and further, the calculation speed of the calculation unit in calculation can be improved.
This step corresponds to the recalled article collecting section of fig. 3.
By applying the embodiment shown in fig. 8 of the present invention, an article identification information pair is generated from an original article and each article to be recalled, and one article identification information pair is used as a computing unit.
Corresponding to the embodiment of the invention shown in fig. 2, the embodiment of the invention also provides a first article recalling device.
Fig. 9 is a schematic structural diagram of a first article recall device according to an embodiment of the present invention, and as shown in fig. 9, the device includes: a first obtaining module 901, a second obtaining module 902, a generating module 903, a first processing module 904, and a first setting module 905, wherein,
the first obtaining module 901 is configured to obtain characteristics of an original article;
the second obtaining module 902 is configured to, for each feature, obtain an article to be recalled corresponding to the feature according to a similarity between the article to be recalled and the feature;
the generating module 903 is configured to generate an article identifier information pair from the identifier information of the original article and the identifier information of each article to be recalled, for each article to be recalled;
the first processing module 904 is configured to, for each pair of article identification information, perform processing by using a similarity algorithm to obtain a similarity between the original article and the article to be recalled that are included in the pair of article identification information;
the first setting module 905 is configured to use an article to be recalled, of which the similarity to the original article is greater than a preset threshold, as a target recall article.
By applying the embodiment shown in fig. 9 of the present invention, an article identification information pair is generated from an original article and each article to be recalled, and one article identification information pair is used as a computing unit.
Corresponding to the embodiment of the invention shown in fig. 5, the invention also provides a second article recalling device.
Fig. 10 is a schematic structural diagram of a second article recall device according to an embodiment of the present invention, and as shown in fig. 10, a second processing module 906 is added to the device shown in fig. 9, and is configured to process the product of the specific gravity of the feature in all the related features of the original article and the total number of articles to be recalled corresponding to the feature to obtain a first number corresponding to the feature.
Correspondingly, the generating module 903 includes: a sorting unit 903A and a generating unit 903B, wherein,
the sorting unit 903A (not shown in the figure) is configured to, for each feature, sort the articles to be recalled in an order from a large similarity to a small similarity between the articles to be recalled and the feature, so as to obtain a first sequence;
the generating unit 903B (not shown in the figure) is configured to, for each first sequence, pair identification information of the to-be-recalled articles corresponding to a first number of orders in the first sequence with identification information of the original articles, respectively, and generate a first number of article identification information pairs.
By applying the embodiment shown in fig. 10 of the present invention, the identification information of the first number of to-be-recalled articles in the to-be-recalled articles corresponding to each Feature is paired with the identification information of the original article, so that the number of article identification information pairs corresponding to Feature1 can be reduced.
Optionally, in a specific implementation manner of the embodiment of the present invention, the second processing module 906 is further configured to:
by means of the formula (I) and (II),
Figure BDA0001412443660000201
processed to obtain the correspondence of the characteristicsA first amount of, wherein,
rec _ len is a first number corresponding to the feature; fea _ len is the total number of the articles to be recalled corresponding to the characteristics;
Figure BDA0001412443660000202
and the score is the specific gravity of the feature in all relevant features of the original article, the score _ arg is a preset score corresponding to the feature, and the avg _ score is the average value of the preset scores corresponding to all relevant features of the original article.
By applying the embodiment shown in fig. 10 of the present invention, for each feature of the original article, the first quantity is obtained by utilizing the product of the specific gravity of the feature in all the related features of the original article and the total quantity of the articles to be recalled corresponding to the feature, and the first quantity corresponding to each related feature can be calculated conveniently.
The invention also provides a third article recalling device corresponding to the embodiment of the invention shown in the figure 7.
Fig. 11 is a schematic structural diagram of a third article recall device according to an embodiment of the present invention, and as shown in fig. 11, a deletion module 907 is added to the device on the basis of the embodiment shown in fig. 9, and is used for deleting, for each pair of article identification information, a pair of article identification information that is the same as the pair of article identification information.
By applying the embodiment shown in fig. 11 of the present invention, the pair of article identification information identical to the pair of article identification information of the original article is deleted, so that the repeated calculation of the pair of article identification information can be avoided, and the number of the pair of article identification information is reduced, thereby reducing the calculation amount during the similarity calculation.
Corresponding to the embodiment of the invention shown in fig. 8, the invention also provides a fourth article recalling device.
Fig. 12 is a schematic structural view of a fourth article recall device according to an embodiment of the present invention, and as shown in fig. 12, the device includes: a first obtaining module 1201, a second obtaining module 1202, a generating module 1203, a first processing module 1204, a sorting module 1205, and a second setting module 1206, wherein,
the first obtaining module 1201 is configured to obtain characteristics of an original article;
the second obtaining module 1202 is configured to, for each feature, obtain an article to be recalled corresponding to the feature according to the similarity between the article to be recalled and the feature;
the generating module 1203 is configured to generate an article identifier information pair from the identifier information of the original article and the identifier information of each article to be recalled, for each article to be recalled;
the first processing module 1204 is configured to, for each pair of article identification information, perform processing by using a similarity algorithm to obtain a similarity between the original article and the article to be recalled included in the pair of article identification information;
the sorting module 1205 is configured to sort the to-be-recalled articles contained in the article identification information pairs according to a descending order of the similarity, so as to obtain a second sequence;
the second setting module 1206 is configured to take the to-be-recalled articles in the second sequence corresponding to the first second number of orders as target recall articles.
By applying the embodiment shown in fig. 12 of the present invention, an article identification information pair is generated from an original article and each article to be recalled, and one article identification information pair is used as a computing unit.
Corresponding to the embodiment of the invention shown in fig. 2, the invention also provides an electronic device.
Fig. 13 is a schematic structural diagram of an article recall device 1300 according to an embodiment of the present invention, and as shown in fig. 13, the electronic device includes a processor 1301, a communication interface 1302, a memory 1303 and a communication bus 1304, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory 1301 for storing a computer program;
the processor 1303 is configured to implement the following steps when executing the program stored in the memory:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as a disk Memory. Alternatively, the memory may be a storage device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
By applying the embodiment shown in fig. 13 of the present invention, an article identification information pair is generated from an original article and each article to be recalled, and one article identification information pair is used as a computing unit.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the steps of:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article.
In yet another aspect of the present invention, an embodiment of the present invention further provides a computer program product including instructions, which when run on a computer, cause the computer to perform the following steps:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article.
The invention also provides another electronic device corresponding to the embodiment of the invention shown in fig. 8.
Fig. 14 is a schematic structural diagram of another article recalling device 1400 according to an embodiment of the present invention, as shown in fig. 14, the electronic device includes a processor 1401, a communication interface 1402, a memory 1403 and a communication bus 1404, wherein the processor 1401, the communication interface 1402, and the memory complete communication with each other through the communication bus;
a memory 1403 for storing a computer program;
the processor 1401, when executing the program stored in the memory, implements the following steps:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
according to the sequence of similarity from large to small, sequencing the articles to be recalled in the article identification information pairs to obtain a second sequence;
and taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
By applying the embodiment shown in fig. 14 of the present invention, an article identification information pair is generated from an original article and each article to be recalled, and one article identification information pair is used as a computing unit.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the steps of:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
according to the sequence of similarity from large to small, sequencing the articles to be recalled in the article identification information pairs to obtain a second sequence;
and taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
In yet another aspect of the present invention, an embodiment of the present invention further provides a computer program product containing instructions, which when executed on a computer, cause the computer to execute the following steps:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
aiming at each article to be recalled, generating an article identification information pair by using the identification information of the original article and the identification information of each article to be recalled;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
according to the sequence of similarity from large to small, sequencing the articles to be recalled in the article identification information pairs to obtain a second sequence;
and taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, the computer-readable medium of the electronic device, and the computer program embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for article recall, the method comprising:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
processing to obtain a first quantity corresponding to the characteristic according to the product of the specific gravity of the characteristic in all relevant characteristics of the original item and the total quantity of the items to be recalled corresponding to the characteristic;
for each feature, sequencing the articles to be recalled according to the sequence of similarity of the articles to be recalled and the feature from large to small to obtain a first sequence;
for each first sequence, respectively pairing the identification information of the to-be-recalled articles corresponding to the first number of sequences in the first sequence with the identification information of the original articles to generate a first number of article identification information pairs;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
and taking the article to be recalled with the similarity with the original article larger than a preset threshold value as a target recall article.
2. The method according to claim 1, wherein the processing to obtain the first number corresponding to the feature according to the product of the specific gravity of the feature in all relevant features of the original article and the total number of the articles to be recalled corresponding to the feature comprises:
by means of the formula (I) and (II),
Figure FDA0002743329600000011
and processing to obtain a first number corresponding to the feature, wherein,
rec _ len is a first number corresponding to the feature; fea _ len is the total number of the articles to be recalled corresponding to the characteristics;
Figure FDA0002743329600000012
is the specific gravity of the feature in all relevant features of the original article, and score _ arg is a preset score corresponding to the feature, and avg _ score is the originalThe average of the preset scores corresponding to all relevant features of the item.
3. The method according to claim 1, wherein before processing with a similarity algorithm for each of the pair of item identification information to obtain a similarity between the original item and the item to be recalled contained in the pair of item identification information, the method further comprises:
and deleting the same article identification information pair as the article identification information pair aiming at each article identification information pair.
4. A method for article recall, the method comprising:
acquiring the characteristics of an original article;
aiming at each feature, acquiring the article to be recalled corresponding to the feature according to the similarity of the article to be recalled and the feature;
processing to obtain a first quantity corresponding to the characteristic according to the product of the specific gravity of the characteristic in all relevant characteristics of the original item and the total quantity of the items to be recalled corresponding to the characteristic;
for each feature, sequencing the articles to be recalled according to the sequence of similarity of the articles to be recalled and the feature from large to small to obtain a first sequence;
for each first sequence, respectively pairing the identification information of the to-be-recalled articles corresponding to the first number of sequences in the first sequence with the identification information of the original articles to generate a first number of article identification information pairs;
processing each article identification information pair by using a similarity algorithm to obtain the similarity between the original article and the article to be recalled contained in the article identification information pair;
according to the sequence of similarity from large to small, sequencing the articles to be recalled in the article identification information pairs to obtain a second sequence;
and taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
5. An article recall apparatus, the apparatus comprising: a first obtaining module, a second obtaining module, a generating module, a first processing module, a second processing module and a first setting module,
the first acquisition module is used for acquiring the characteristics of the original article;
the second obtaining module is configured to, for each feature, obtain an article to be recalled corresponding to the feature according to a similarity between the article to be recalled and the feature;
the second processing module is used for processing to obtain a first quantity corresponding to the characteristic according to the product of the specific gravity of the characteristic in all relevant characteristics of the original article and the total quantity of the articles to be recalled corresponding to the characteristic;
the generation module comprises: a sorting unit and a generating unit, wherein,
the sorting unit is used for sorting the articles to be recalled according to the similarity of the articles to be recalled and the features from large to small according to each feature to obtain a first sequence;
the generating unit is configured to, for each first sequence, pair identification information of the to-be-recalled articles corresponding to a first number of previous orders in the first sequence with identification information of the original article, respectively, and generate a first number of article identification information pairs;
the first processing module is configured to, for each pair of article identification information, perform processing by using a similarity algorithm to obtain a similarity between the original article and the article to be recalled that are included in the pair of article identification information;
the first setting module is used for taking the article to be recalled, the similarity of which with the original article is greater than a preset threshold value, as a target recall article.
6. The apparatus of claim 5, wherein the second processing module is further configured to:
by means of the formula (I) and (II),
Figure FDA0002743329600000031
and processing to obtain a first number corresponding to the feature, wherein,
rec _ len is a first number corresponding to the feature; fea _ len is the total number of the articles to be recalled corresponding to the characteristics;
Figure FDA0002743329600000032
and the score is the specific gravity of the feature in all relevant features of the original article, the score _ arg is a preset score corresponding to the feature, and the avg _ score is the average value of the preset scores corresponding to all relevant features of the original article.
7. The apparatus of claim 5, further comprising: and the deleting module is used for deleting the object identification information pair which is the same as the object identification information pair aiming at each object identification information pair.
8. An article recall apparatus, the apparatus comprising: a first obtaining module, a second obtaining module, a generating module, a first processing module, a second processing module, a sorting module and a second setting module,
the first acquisition module is used for acquiring the characteristics of the original article;
the second obtaining module is configured to, for each feature, obtain an article to be recalled corresponding to the feature according to a similarity between the article to be recalled and the feature;
the second processing module is used for processing to obtain a first quantity corresponding to the characteristic according to the product of the specific gravity of the characteristic in all relevant characteristics of the original article and the total quantity of the articles to be recalled corresponding to the characteristic;
the generation module comprises: a sorting unit and a generating unit, wherein,
the sorting unit is used for sorting the articles to be recalled according to the similarity of the articles to be recalled and the features from large to small according to each feature to obtain a first sequence;
the generating unit is configured to, for each first sequence, pair identification information of the to-be-recalled articles corresponding to a first number of previous orders in the first sequence with identification information of the original article, respectively, and generate a first number of article identification information pairs;
the first processing module is configured to, for each pair of article identification information, perform processing by using a similarity algorithm to obtain a similarity between the original article and the article to be recalled that are included in the pair of article identification information;
the sorting module is used for sorting the articles to be recalled in the article identification information pairs according to the sequence of the similarity from large to small to obtain a second sequence;
and the second setting module is used for taking the articles to be recalled corresponding to the first second quantity of orders in the second sequence as target recall articles.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 3 when executing a program stored in the memory.
10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of claim 4 when executing a program stored in the memory.
CN201710847727.7A 2017-09-19 2017-09-19 Article recall method and device and electronic equipment Active CN107665247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710847727.7A CN107665247B (en) 2017-09-19 2017-09-19 Article recall method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710847727.7A CN107665247B (en) 2017-09-19 2017-09-19 Article recall method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN107665247A CN107665247A (en) 2018-02-06
CN107665247B true CN107665247B (en) 2020-12-25

Family

ID=61097410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710847727.7A Active CN107665247B (en) 2017-09-19 2017-09-19 Article recall method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107665247B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753515A (en) * 2018-12-29 2019-05-14 上海易点时空网络有限公司 The information processing method and device recalled for vehicle
CN112231453B (en) * 2020-10-13 2024-02-27 腾讯科技(深圳)有限公司 Intelligent question-answering method and device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096935B (en) * 2014-05-06 2019-08-09 阿里巴巴集团控股有限公司 A kind of pronunciation inputting method, device and system
CN105953520B (en) * 2016-05-06 2018-08-10 青岛海尔股份有限公司 Intelligent refrigerator control method and its control system
CN107066459A (en) * 2016-08-30 2017-08-18 广东百华科技股份有限公司 A kind of efficient image search method
CN106485567B (en) * 2016-09-14 2021-11-30 北京小米移动软件有限公司 Article recommendation method and device

Also Published As

Publication number Publication date
CN107665247A (en) 2018-02-06

Similar Documents

Publication Publication Date Title
US20180107933A1 (en) Web page training method and device, and search intention identifying method and device
CN110162695B (en) Information pushing method and equipment
US9967218B2 (en) Online active learning in user-generated content streams
CN107038173B (en) Application query method and device and similar application detection method and device
US10346496B2 (en) Information category obtaining method and apparatus
CN108334951B (en) Pre-statistics of data for nodes of a decision tree
CN110413867B (en) Method and system for content recommendation
CN107766467B (en) Information detection method and device, electronic equipment and storage medium
JP6932360B2 (en) Object search method, device and server
US20120084226A1 (en) Measuring or estimating user credibility
AU2018202112A1 (en) Scoring mechanism for discovery of extremist content
CN107665247B (en) Article recall method and device and electronic equipment
CN113360803A (en) Data caching method, device and equipment based on user behavior and storage medium
CN110968802B (en) Analysis method and analysis device for user characteristics and readable storage medium
CN110555165A (en) information identification method and device, computer equipment and storage medium
CN111667018B (en) Object clustering method and device, computer readable medium and electronic equipment
CN112287102B (en) Data mining method and device
CN109241360B (en) Matching method and device of combined character strings and electronic equipment
WO2021081914A1 (en) Pushing object determination method and apparatus, terminal device and storage medium
CN111708942A (en) Multimedia resource pushing method, device, server and storage medium
CN103530345A (en) Short text characteristic extension and fitting characteristic library building method and device
CN110959157A (en) Accelerating large-scale similarity calculations
CN115034826A (en) Advertisement putting method and device, electronic equipment and readable storage medium
CN114417102A (en) Text duplicate removal method and device and electronic equipment
CN111984867A (en) Network resource determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant