CN114090901A - Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device - Google Patents

Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device Download PDF

Info

Publication number
CN114090901A
CN114090901A CN202111367617.3A CN202111367617A CN114090901A CN 114090901 A CN114090901 A CN 114090901A CN 202111367617 A CN202111367617 A CN 202111367617A CN 114090901 A CN114090901 A CN 114090901A
Authority
CN
China
Prior art keywords
commodity
value
picture
commodities
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111367617.3A
Other languages
Chinese (zh)
Inventor
李斌
丁建伟
刘志洁
李航
陈周国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202111367617.3A priority Critical patent/CN114090901A/en
Publication of CN114090901A publication Critical patent/CN114090901A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Finance (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a dark net similar commodity judgment method, a storage medium and a computing device based on multimode fusion characteristics, wherein the method comprises the following steps: step 10, collecting dark net commodity data and classifying commodities; the collected dark net commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the collected commodity pictures; step 20, calculating a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of the commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1; and step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text. According to the method for calculating the similarity of the fused commodity picture and the commodity text, the problems that the dark net commodity picture is fuzzy, the character information is simple, and the judgment of similar commodities is difficult can be solved.

Description

Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device
Technical Field
The invention relates to the technical field of dark net similar commodity judgment, in particular to a dark net similar commodity judgment method, a storage medium and a computing device based on multimode fusion characteristics.
Background
The darknet market (or "darknet") is a commercial website that specializes in illegal commodity transactions. They are accessible through the darknet (e.g., Tor) and differ from the open e-commerce web site in specialization, technology, and primary support. Most markets are designed to promote the transaction between the buyer and the seller of illegal goods, but the dark web markets have a large number of sellers, and many released goods are extremely similar or even identical. In order to better monitor the dynamic state of dark net market transaction and master the timely information of various commodities, the information of site commodities and the like in the dark net market is required to be collected as much as possible, more work is to classify and count various commodities, filter similar commodities, find new commodities and give early warning. Therefore, the determination of similar products is extremely necessary.
At present, because the picture of the commodity of the open-web electronic commerce website is clear and high and the description of the characters is detailed, the judgment of the similar commodity can be completed basically by means of single picture similarity or character similarity. In addition, similar goods are judged to be recommended to the buyer users by using a collaborative filtering algorithm. On the contrary, the dark net commodity has fuzzy pictures, simple text information and difficult judgment of similar commodities, and the method is rarely available on the market.
Disclosure of Invention
The invention aims to provide a dark net similar commodity judgment method, a storage medium and a computing device based on multimode fusion characteristics, and aims to solve the problems that dark net commodity images are fuzzy, text information is simple, and similar commodities are difficult to judge.
The invention provides a dark net similar commodity judgment method based on multimode fusion characteristics, which comprises the following steps:
step 10, collecting dark net commodity data and classifying commodities; the collected dark net commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the collected commodity pictures;
step 20, calculating a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of the commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1;
and step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text.
Further, step 10 comprises the following sub-steps:
step 11, constructing a customized acquisition strategy aiming at a data typesetting format and a reverse-crawling mechanism of a dark net target site, and realizing acquisition of dark net commodity data, wherein the acquired dark net commodity data comprises structured commodity characters and commodity pictures of a commodity detail page; the commodity characters comprise a commodity id, a commodity name and a commodity description;
step 12, for the commodity with the commodity picture, acquiring the commodity picture, simultaneously acquiring an md5 value of the commodity picture by using a general md5 calculation method, taking the md5 value as the name of the commodity picture, storing the commodity picture in a Seaweed database according to a set storage position, and generating a corresponding storage address string;
and step 13, classifying the commodities, adding a secondary commodity label, and storing the secondary commodity label, the collected commodity text, the md5 value of the commodity picture and the storage address string in an ES database.
Further, step 20 comprises the following sub-steps:
step 21, reading an md5 value containing a commodity id, a commodity name, commodity characters described by the commodity and a commodity picture from an ES database, and acquiring the commodity picture from a Seaweed database according to a corresponding storage address string for the commodity with the md5 value not being empty;
step 22, combining the commodity id, the commodity name and the commodity description as complete commodity characters;
step 23, calculating a perceptual hash fingerprint value of the commodity picture;
step 24, calculating the vector value of Word2Vec sentence of the commodity characters;
and step 25, storing the data obtained in the steps 21, 23 and 24 into a MySQL commodity feature vector table.
Further, step 30 comprises the following sub-steps:
step 31, reading a commodity id of a new commodity, recording the commodity id as id1, reading a picture md5 value of the new commodity as md5_1, reading a perceptual hash fingerprint value of the commodity picture of the new commodity, recording the perceptual hash fingerprint value as h1, reading a Word2Vec sentence vector value of a commodity character of the new commodity, recording the vector value as v1, and reading a secondary commodity label of the new commodity, recording the secondary commodity label as c and t;
step 32, reading the attribute values of the commodities, which are the same as the first-level and second-level commodity labels of the new commodities, from the MySQL commodity feature vector table, wherein the attribute values also comprise a commodity id, an md5 value of a commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of a commodity text:
step 33, selecting one commodity which does not participate in comparison in sequence from the commodities corresponding to the attribute values of the commodities obtained in the step 32, setting the commodity id to be id2, the md5 value of the commodity picture to be md5_2, the perceptual hash fingerprint value of the commodity picture to be h2, and the Word2Vec sentence vector value of the commodity text to be v 2;
step 34, if md5_1 is md5_2, setting two commodity similarity s to 1;
step 35, if the condition in step 34 is not satisfied, then:
(1) calculating a hamming distance d of the perceived hash fingerprint values of the commodity pictures of the two commodities as hamming _ dist (h1, h 2);
(2) calculating cosine similarity c ═ cos _ similarity of vector values of Word2Vec sentences of the commodity texts of the two commodities (v1, v 2);
(3) setting the similarity of two commodities as (1/ln (e + d/10) + c)/2, wherein e is a natural logarithm;
step 36, comparing the similarity s of the two commodities solved in step 34 or 35 with a preset similarity threshold lambda, and screening the two commodities with the similarity s being larger than or equal to the lambda value;
step 37, merging the commodity ids and the similarity s of the two commodities meeting the condition of the step 36 into a triple (id1, id2, s) and storing the triple (id1, id2, s) into a MySQL commodity similarity table;
and step 38, returning to step 33 until all the commodities are compared.
The invention also provides a computer terminal storage medium which stores computer terminal executable instructions, and the computer terminal executable instructions are characterized in that the computer terminal executable instructions are used for executing the dark net similar commodity judgment method based on the multimode fusion characteristics.
The present invention also provides a computing apparatus, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the above dark web similar goods judging method based on the multi-mode fusion feature.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. according to the method for calculating the similarity of the commodity pictures and the commodity texts, the similarity calculation of the dark net market and the commodity can be realized, the similar commodities under various categories can be obtained, the dark net market commodities can be better classified, the judgment accuracy of the commodity similarity can be improved, the mode is simple, the interpretability is strong, and therefore the problems that the dark net commodity pictures are fuzzy, the character information is simple, and the judgment of the similar commodities is difficult are solved.
2. According to the invention, through dark net data acquisition, commodity picture characteristic calculation, commodity text characteristic calculation and similarity calculation, the buying and selling of new commodities can be effectively monitored, real-time early warning is realized, and the dark net market dynamics can be better tracked.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a general flowchart of a dark web similar commodity determination method based on multimode fusion characteristics according to an embodiment of the present invention.
Fig. 2 is a flowchart of step 10 in the dark web similar product determination method based on the multimode fusion characteristic according to the embodiment of the present invention.
Fig. 3 is a flowchart of step 20 in the dark web similar product determination method based on the multi-mode fusion feature according to the embodiment of the present invention.
Fig. 4 is a flowchart of step 30 in the dark web similar product determination method based on the multi-mode fusion feature according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
As shown in fig. 1, the present embodiment provides a dark web similar product determination method based on multi-mode fusion features, including the following steps:
step 10, collecting dark net commodity data and classifying commodities; the acquired darknet commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the acquired commodity pictures; the method mainly comprises the steps of adopting a dark net data acquisition technology to acquire structural data of commodity characters including commodity id, commodity names, commodity descriptions and the like, and acquiring corresponding commodity pictures for commodities with commodity pictures at the same time. And furthermore, an md5 value of the commodity picture is generated, the commodity is classified, the structured data is stored in an ES database, and the commodity picture is stored in a Seaweed database. As shown in fig. 2, step 10 comprises the following sub-steps:
step 11, constructing a customized acquisition strategy aiming at a data typesetting format and a reverse-crawling mechanism of a dark net target site, and realizing acquisition of dark net commodity data, wherein the acquired dark net commodity data comprises structured commodity characters and commodity pictures of a commodity detail page; the commodity characters comprise commodity id, commodity names and commodity descriptions;
step 12, for the commodities with the commodity pictures, acquiring the md5 values of the commodity pictures by using a general md5 calculation method while acquiring the commodity pictures, storing the md5 values as names of the commodity pictures in a Seaweed database according to a set storage position, and generating corresponding storage address strings;
and step 13, classifying the commodities, adding a secondary commodity label, and storing the secondary commodity label, the collected commodity text, the md5 value of the commodity picture and the storage address string in an ES database.
Step 20, calculating a perceptual hash fingerprint value of a commodity picture and a Word2Vec sentence vector value of a commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1; the method mainly comprises the steps of respectively calculating a perception hash fingerprint value of a commodity picture and a Word2Vec sentence vector value of commodity words based on commodity characters including a commodity id, a commodity name and a commodity description, a commodity picture, an md5 value of the commodity picture and a commodity classification result acquired in the step 1, and finally storing the characteristic values and basic information of the commodity into a MySQL commodity characteristic vector table. As shown in fig. 3, step 20 includes the steps of:
step 21, reading an md5 value containing a commodity id, a commodity name, commodity characters described by the commodity and a commodity picture from an ES database, and acquiring the commodity picture from a Seaweed database according to a corresponding storage address string for the commodity with the md5 value not being empty;
step 22, combining the commodity id, the commodity name and the commodity description as complete commodity characters;
step 23, calculating a perceptual hash fingerprint value of the commodity picture; the method for calculating the perceptual hash fingerprint value is the prior art, and is not described herein again.
Step 24, calculating the vector value of Word2Vec sentence of the commodity characters; the method for calculating the vector value of Word2Vec sentence is prior art and is not described herein again.
And step 25, storing the data obtained in the steps 21, 23 and 24 into a MySQL commodity feature vector table.
And step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text. The method mainly comprises the steps of calculating the similarity of the commodities by using a similarity calculation method based on a Hamming distance and a similarity calculation method based on a cosine similarity respectively based on the data obtained in the step 20, such as an md5 value of the commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of the commodity characters, and storing the commodity ids and the similarities of the two commodities with the similarities larger than a preset similarity threshold in a MySQL commodity similarity table. As shown in fig. 4, step 30 comprises the following sub-steps:
step 31, reading a commodity id of a new commodity (namely a newly collected commodity) and marking the commodity id as id1, reading a picture md5 value of the new commodity and marking the picture md5_1, reading a perceptual hash fingerprint value of the commodity picture of the new commodity and marking the value as h1, reading a Word2Vec sentence vector value of a commodity character of the new commodity and marking the vector value as v1, and reading a secondary commodity label of the new commodity and marking the vector value as c and t;
step 32, reading the attribute values of the commodities, which are the same as the first-level and second-level commodity labels of the new commodities, from the MySQL commodity feature vector table, wherein the attribute values also comprise a commodity id, an md5 value of a commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of a commodity text:
step 33, sequentially selecting one commodity which does not participate in comparison from the commodities corresponding to the attribute values of the commodities obtained in the step 32, setting the commodity id to be id2, the md5 value of the commodity picture to be md5_2, the perceptual hash fingerprint value of the commodity picture to be h2, and the Word2Vec sentence vector value of the commodity text to be v 2;
step 34, if md5_1 is md5_2, setting two commodity similarity s to 1;
step 35, if the condition in step 34 does not hold, then:
(1) calculating the hamming distance d of the perceived hash fingerprint values of the two commodity pictures (h1, h 2);
(2) calculating cosine similarity c ═ cos _ similarity (v1, v2) of Word2Vec sentence vector values of the two commodities;
(3) setting the similarity of two commodities as (1/ln (e + d/10) + c)/2, wherein e is a natural logarithm;
step 36, comparing the similarity s of the two commodities solved in step 34 or 35 with a preset similarity threshold lambda, and screening the two commodities with the similarity s being larger than or equal to the lambda value;
step 37, merging the commodity ids and the similarity s of the two commodities meeting the condition of the step 36 into a triple (id1, id2, s) and storing the triple in a MySQL commodity similarity table;
and step 38, returning to step 33 until all the commodities are compared.
The dark net similar commodity judgment is completed through the dark net similar commodity judgment method based on the multimode fusion characteristics. The method comprises the following steps:
(1) by constructing the similarity calculation method fusing the commodity picture and the commodity text, the similarity calculation of the dark net market with the commodity can be realized, similar commodities under various categories can be obtained, better classification of the dark net market commodities is facilitated, the judgment accuracy of the commodity similarity can be improved, the mode is simple, the interpretability is strong, and therefore the problems that the dark net commodity picture is fuzzy, the character information is simple, and the judgment of the similar commodities is difficult are solved.
(2) By means of dark net data acquisition, commodity picture characteristic calculation, commodity text characteristic calculation and similarity calculation, the buying and selling of new commodities can be effectively monitored, real-time early warning is achieved, and dark net market dynamics are tracked better.
In addition, in some embodiments, a computer terminal storage medium is provided, which stores computer terminal executable instructions, where the computer terminal executable instructions are configured to execute the dark web similar goods determination method based on the multi-mode fusion feature as described in the foregoing embodiments. Examples of the computer storage medium include a magnetic storage medium (e.g., a floppy disk, a hard disk, etc.), an optical recording medium (e.g., a CD-ROM, a DVD, etc.), or a memory such as a memory card, a ROM, a RAM, or the like. The computer storage media may also be distributed over a network-connected computer system, such as an application store.
Furthermore, in some embodiments, a computing device is presented, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for determining a dark web similar goods based on multi-mode fusion features as described in the previous embodiments. Examples of computing devices include PCs, tablets, smart phones, or PDAs, among others.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A dark net similar commodity judgment method based on multimode fusion characteristics is characterized by comprising the following steps:
step 10, collecting dark net commodity data and classifying commodities; the collected dark net commodity data comprise commodity characters and commodity pictures, and an md5 value is generated for the collected commodity pictures;
step 20, calculating a perceptual hash fingerprint value of a commodity picture and a Word2Vec sentence vector value of a commodity text based on the dark web commodity data and the commodity classification result acquired in the step 1;
and step 30, calculating the similarity of the commodities based on the md5 value of the commodity picture, the perceptual hash fingerprint value of the commodity picture and the Word2Vec sentence vector value of the commodity text.
2. The dark net similar commodity judgment method based on the multimode fusion characteristic as claimed in claim 1, wherein the step 10 comprises the following substeps:
step 11, constructing a customized acquisition strategy aiming at a data typesetting format and a reverse-crawling mechanism of a dark net target site, and realizing acquisition of dark net commodity data, wherein the acquired dark net commodity data comprises structured commodity characters and commodity pictures of a commodity detail page; the commodity characters comprise commodity id, commodity names and commodity descriptions;
step 12, for the commodity with the commodity picture, acquiring the commodity picture, simultaneously acquiring an md5 value of the commodity picture by using a general md5 calculation method, taking the md5 value as the name of the commodity picture, storing the commodity picture in a Seaweed database according to a set storage position, and generating a corresponding storage address string;
and step 13, classifying the commodities, adding a secondary commodity label, and storing the secondary commodity label, the collected commodity text, the md5 value of the commodity picture and the storage address string in an ES database.
3. The dark net similar commodity judgment method based on the multimode fusion characteristic as claimed in claim 2, wherein the step 20 comprises the following substeps:
step 21, reading an md5 value containing a commodity id, a commodity name, commodity characters described by the commodity and a commodity picture from an ES database, and acquiring the commodity picture from a Seaweed database according to a corresponding storage address string for the commodity with the md5 value not being empty;
step 22, combining the commodity id, the commodity name and the commodity description as complete commodity characters;
step 23, calculating a perceptual hash fingerprint value of the commodity picture;
step 24, calculating the vector value of Word2Vec sentence of the commodity characters;
and step 25, storing the data obtained in the steps 21, 23 and 24 into a MySQL commodity feature vector table.
4. The dark net similar commodity judgment method based on the multimode fusion characteristic as claimed in claim 3, wherein the step 30 comprises the following substeps:
step 31, reading a commodity id of a new commodity, recording the commodity id as id1, reading a picture md5 value of the new commodity as md5_1, reading a perceptual hash fingerprint value of the commodity picture of the new commodity, recording the perceptual hash fingerprint value as h1, reading a Word2Vec sentence vector value of a commodity character of the new commodity, recording the vector value as v1, and reading a secondary commodity label of the new commodity, recording the secondary commodity label as c and t;
step 32, reading the attribute values of the commodities, which are the same as the first-level and second-level commodity labels of the new commodities, from the MySQL commodity feature vector table, wherein the attribute values also comprise a commodity id, an md5 value of a commodity picture, a perceptual hash fingerprint value of the commodity picture and a Word2Vec sentence vector value of a commodity text:
step 33, sequentially selecting one commodity which does not participate in comparison from the commodities corresponding to the attribute values of the commodities obtained in the step 32, setting the commodity id to be id2, the md5 value of the commodity picture to be md5_2, the perceptual hash fingerprint value of the commodity picture to be h2, and the Word2Vec sentence vector value of the commodity text to be v 2;
step 34, if md5_1 is md5_2, setting two commodity similarity s to 1;
step 35, if the condition in step 34 is not satisfied, then:
(1) calculating a hamming distance d of the perceived hash fingerprint values of the commodity pictures of the two commodities as hamming _ dist (h1, h 2);
(2) calculating cosine similarity c ═ cos _ similarity (v1, v2) of Word2Vec sentence vector values of the two commodities;
(3) setting the similarity of two commodities as (1/ln (e + d/10) + c)/2, wherein e is a natural logarithm;
step 36, comparing the similarity s of the two commodities solved in step 34 or 35 with a preset similarity threshold lambda, and screening the two commodities with the similarity s being larger than or equal to the lambda value;
step 37, merging the commodity ids and the similarity s of the two commodities meeting the condition of the step 36 into a triple (id1, id2, s) and storing the triple (id1, id2, s) into a MySQL commodity similarity table;
and step 38, returning to step 33 until all the commodities are compared.
5. A computer terminal storage medium storing computer terminal-executable instructions for performing the method for determining dark web similar goods based on multi-mode fusion features according to any one of claims 1 to 4.
6. A computing device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for determining a darknet similar goods based on multimodal fusion characteristics as claimed in any one of claims 1 to 4.
CN202111367617.3A 2021-11-18 2021-11-18 Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device Pending CN114090901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111367617.3A CN114090901A (en) 2021-11-18 2021-11-18 Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111367617.3A CN114090901A (en) 2021-11-18 2021-11-18 Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device

Publications (1)

Publication Number Publication Date
CN114090901A true CN114090901A (en) 2022-02-25

Family

ID=80301542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111367617.3A Pending CN114090901A (en) 2021-11-18 2021-11-18 Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device

Country Status (1)

Country Link
CN (1) CN114090901A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798517A (en) * 2023-02-08 2023-03-14 南京邮电大学 Commodity searching method and system based on voice information characteristic data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798517A (en) * 2023-02-08 2023-03-14 南京邮电大学 Commodity searching method and system based on voice information characteristic data

Similar Documents

Publication Publication Date Title
CN106919619B (en) Commodity clustering method and device and electronic equipment
US10565498B1 (en) Deep neural network-based relationship analysis with multi-feature token model
CN107833082B (en) Commodity picture recommendation method and device
CN108664637B (en) Retrieval method and system
CN110543592B (en) Information searching method and device and computer equipment
US8688603B1 (en) System and method for identifying and correcting marginal false positives in machine learning models
CN112990973B (en) Online shop portrait construction method and system
JP6237168B2 (en) Information processing apparatus and information processing program
CN105825396B (en) Method and system for clustering advertisement labels based on co-occurrence
US10699112B1 (en) Identification of key segments in document images
JPWO2019224891A1 (en) Classification device, classification method, generation method, classification program and generation program
Baluja Learning typographic style: from discrimination to synthesis
CN113762309A (en) Object matching method, device and equipment
CN111666275A (en) Data processing method and device, electronic equipment and storage medium
CN114090901A (en) Dark net similar commodity judgment method based on multimode fusion characteristics, storage medium and computing device
JP6178480B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
CN112818687B (en) Method, device, electronic equipment and storage medium for constructing title recognition model
CN115391656A (en) User demand determination method, device and equipment
US20210312223A1 (en) Automated determination of textual overlap between classes for machine learning
CN114239569A (en) Analysis method and device for evaluation text and computer readable storage medium
CN113806641A (en) Deep learning-based recommendation method and device, electronic equipment and storage medium
CN113127597A (en) Processing method and device for search information and electronic equipment
US20220261856A1 (en) Method for generating search results in an advertising widget
Iliev et al. Fake Review Recognition Using an SVM Model
CN110837740B (en) Comment aspect opinion level mining method based on dictionary improvement LDA model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination