CN114065009A - Article information classification method and device - Google Patents

Article information classification method and device Download PDF

Info

Publication number
CN114065009A
CN114065009A CN202111419326.4A CN202111419326A CN114065009A CN 114065009 A CN114065009 A CN 114065009A CN 202111419326 A CN202111419326 A CN 202111419326A CN 114065009 A CN114065009 A CN 114065009A
Authority
CN
China
Prior art keywords
information
article
difference
item
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111419326.4A
Other languages
Chinese (zh)
Inventor
陆寅辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianshang Beijing Network Technology Co ltd
Original Assignee
Lianshang Beijing Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianshang Beijing Network Technology Co ltd filed Critical Lianshang Beijing Network Technology Co ltd
Priority to CN202111419326.4A priority Critical patent/CN114065009A/en
Publication of CN114065009A publication Critical patent/CN114065009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses an article information classification method and device. One specific embodiment of the article information classification method comprises the following steps: acquiring information of an article on a shopping website, wherein the information of the article comprises list page information and detail page information of the article; calculating the difference between the list page information and the detail page information; the information of the item is classified based on the difference. According to the embodiment, the matching condition of the list page information and the detail page information of the article can be rapidly determined, the shopping website is helped to filter the abnormal article information, and the waste of time caused by clicking the abnormal article information by a user is avoided.

Description

Article information classification method and device
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and equipment for classifying article information.
Background
With the rapid development of the mobile internet, more and more users are used to online shopping, and shopping websites are developed and grown. However, stores that enter shopping sites also experience more and more undesirable behavior. Among them, the discrepancy between the price of the article and the actual price of the article is one of the common bad behaviors. Specifically, the item price as seen by the user on the item listing page does not coincide with the actual price of the item in the item detail page. The price of the item displayed on the item list page is usually low-priced item, parts, and even other low-priced items having nothing to do with the item. This behavior of the store, while improving click through rates, greatly wastes user time, making it meaningless to sort item list pages by price.
Disclosure of Invention
The embodiment of the application provides an article information classification method and device.
In a first aspect, an embodiment of the present application provides an article information classification method, including: acquiring information of an article on a shopping website, wherein the information of the article comprises list page information and detail page information of the article; calculating the difference between the list page information and the detail page information; the information of the item is classified based on the difference.
In some embodiments, the listing page information includes a title of the item, and the detail page information includes a type tag of the item; and calculating a difference between the list page information and the detail page information, including: the text difference between the title and the type label is calculated through a text recognition technology.
In some embodiments, the listing page information includes a main map of the item, and the detail page information includes a type map of the item; and calculating a difference between the list page information and the detail page information, including: and calculating the image difference between the main image and the type image by an image recognition technology.
In some embodiments, classifying the information of the item based on the difference comprises: and if the character difference is smaller than a first preset difference threshold value or the image difference is smaller than a second preset difference threshold value, dividing the information of the article into a first category.
In some embodiments, classifying the information of the item based on the difference further comprises: and if the character difference is not less than the first preset difference threshold value and the image difference is not less than the second preset difference threshold value, dividing the information of the article into a second category.
In some embodiments, classifying the information of the item based on the difference further comprises: if the character difference is not smaller than a first preset difference threshold value and the image difference is not smaller than a second preset difference threshold value, acquiring historical user behavior data corresponding to the article; information of the item is classified based on historical user behavior data.
In some embodiments, classifying information of the item based on historical user behavior data includes: carrying out weighted summation on historical user behavior data to obtain a classification score of the information of the article; the information of the item is classified based on the classification score.
In some embodiments, classifying information of the item based on historical user behavior data includes: inputting historical user behavior data into a pre-trained classification model to obtain the classification score of the information of the article; the information of the item is classified based on the classification score.
In some embodiments, classifying information of the item based on the classification score includes: if the classification score is higher than a preset score threshold value, the information of the article is divided into a first category; and if the classification score is not higher than a preset score threshold value, classifying the information of the article into a second category.
In some embodiments, the method further comprises: obtaining a category review result of the information of the article obtained based on the category into which the information of the article is divided; adding a category label to the information of the article based on the category review result to generate a training sample; and carrying out iterative updating on the classification model by using the training samples.
In some embodiments, the historical user behavior data includes at least one of: the click rate of the history list page of the article, the reading time of the history detail page and the jump rate of the history detail page within the preset time.
In a second aspect, an embodiment of the present application provides an article information classification apparatus, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire information of an item on a shopping website, and the information of the item comprises list page information and detail page information of the item; a calculation module configured to calculate a difference of the list page information and the detail page information; a classification module configured to classify information of the item based on the difference.
In some embodiments, the listing page information includes a title of the item, and the detail page information includes a type tag of the item; and the calculation module comprises: a first calculation sub-module configured to calculate a text difference between the title and the type tag through a text recognition technique.
In some embodiments, the listing page information includes a main map of the item, and the detail page information includes a type map of the item; and the calculation module comprises: and the second calculation sub-module is configured to calculate the image difference of the main image and the type image through an image recognition technology.
In some embodiments, the classification module comprises: the first classification submodule is configured to classify the information of the article into a first category if the text difference is smaller than a first preset difference threshold value or the image difference is smaller than a second preset difference threshold value.
In some embodiments, the classification module further comprises: and the second classification submodule is configured to classify the information of the article into a second category if the character difference is not smaller than the first preset difference threshold value and the image difference is not smaller than the second preset difference threshold value.
In some embodiments, the classification module further comprises: the obtaining sub-module is configured to obtain historical user behavior data corresponding to the article if the character difference is not smaller than a first preset difference threshold value and the image difference is not smaller than a second preset difference threshold value; a third classification sub-module configured to classify information of the item based on historical user behavior data.
In some embodiments, the third classification submodule includes: the first scoring unit is configured to perform weighted summation on the historical user behavior data to obtain a classification score of the information of the article; a first classification unit configured to classify information of the item based on the classification score.
In some embodiments, the third classification submodule includes: the second scoring unit is configured to input historical user behavior data into a pre-trained classification model to obtain a classification score of the information of the article; a second classification unit configured to classify the information of the item based on the classification score.
In some embodiments, the first classification unit or the second classification unit is further configured to: if the classification score is higher than a preset score threshold value, the information of the article is divided into a first category; and if the classification score is not higher than a preset score threshold value, classifying the information of the article into a second category.
In some embodiments, the apparatus further comprises: a second acquisition module configured to acquire a category review result of the information of the article obtained based on the category into which the information of the article is divided; the generation module is configured to add a category label to the information of the article based on the category review result and generate a training sample; an update module configured to iteratively update the classification model using the training samples.
In some embodiments, the historical user behavior data includes at least one of: the click rate of the history list page of the article, the reading time of the history detail page and the jump rate of the history detail page within the preset time.
In a third aspect, an embodiment of the present application provides a computer device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
The article information classification method provided by the embodiment of the application can quickly determine the matching condition of the list page information and the detail page information of the article, is helpful for helping a shopping website to filter abnormal article information, and further avoids time waste caused by clicking the abnormal article information by a user.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of some embodiments of an item information classification method according to the present application;
FIG. 2 is a flow diagram of further embodiments of an item information classification method according to the present application;
FIG. 3 is a flow diagram of further embodiments of an item information classification method according to the present application;
FIG. 4 is a diagram of a scenario in which an article information classification method according to an embodiment of the present application may be implemented;
FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing the computer device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates a flow 100 of some embodiments of an item information classification method according to the present application. The article information classification method comprises the following steps:
step 101, obtaining information of an article on a shopping website.
In the present embodiment, the executing body of the item information sorting method may acquire information of items on a shopping site. The information of the item may include list page information and detail page information of the item, among others.
Here, the item may be an item sold by a store residing at a shopping site. The store may put on shelves or update information for items at a shopping site. Typically, shopping websites present information about items through a list page and a detail page. The brief information of the article can be displayed on a list page, called list page information, so that a user can obtain the key information of the article and further determine whether to click to enter the detail page. Detailed information of an item may be displayed on a detail page, called detail page information, for a user to obtain the detailed information of the item to determine whether to purchase the item on a shopping website. The list page information may include information such as a title of the item, a main map, and the like. The title of an item may include key information such as the name, model, price, etc. of the item. The main view of the item may be a front view of the item. The detail page information may include type labels, type diagrams, etc. of the items. The item may include at least one model, and one type tag may include detailed information on the name, model, price, color, accessories, functions, usage, etc. of one model of item. A type map of an item may include images of multiple angles of a type of item.
Step 102, calculating the difference between the list page information and the detail page information.
In this embodiment, the execution body may calculate a difference between the list page information and the detail page information.
In general, for an article including only one type of article, the difference between the list page information of the article and the detail page information of the article of the type can be calculated, and whether the difference corresponding to the article of the type is smaller than a preset difference threshold value can be determined. For the articles with various models, the difference between the list information of the articles and the detail page information of each model of the articles can be calculated respectively, and whether the difference corresponding to each model of the articles is smaller than a preset difference threshold value or not can be determined.
In some embodiments, where the listing page information includes a title of the item and the detail page information includes a type label for the item, textual differences between the title and the type label may be calculated by text recognition techniques. Generally, the text difference between the title and the type label can be determined by means of word-by-word comparison. In the case that a specific price number exists in the title, the specific price number in the title needs to be accurately compared with the specific price number in the type label to determine whether the specific price number is completely consistent with the specific price number in the type label. For the case that there is a price interval in the title, the price interval in the title needs to be fuzzy-compared with the specific price number in the type label to determine whether the specific price number falls in the price interval.
In some embodiments, where the listing page information includes a primary map of the item and the detail page information includes a type map of the item, the image difference of the primary map and the type map may be calculated by image recognition techniques. Among them, the way of calculating the image difference is various. For example, the image difference between the main image and the type image can be determined by comparing pixel points by pixel points. For another example, the articles in the main graph and the type graph are respectively identified through an image identification technology, and corresponding key information is obtained; the image difference between the main graph and the type graph can be determined by comparing key information.
Step 103, classifying the information of the article based on the difference.
In this embodiment, the execution subject may classify the information of the article based on the difference. Specifically, the difference may be compared with a preset difference threshold, and the category of the information of the article may be determined based on the comparison result.
Generally, for an article including only one type, if the difference corresponding to the article of the type is smaller than a preset difference threshold, the information of the article may be classified into a first category; if the difference corresponding to the article of the type is not less than the preset difference threshold, the information of the article can be classified into a second category. For articles comprising various types, if the difference corresponding to the articles of all types is smaller than a preset difference threshold value, the information of the articles can be divided into a first type; if the difference corresponding to the articles of the partial models is not smaller than the preset difference threshold, the information of the articles can be divided into a second category. Wherein the first category may be a normal category and the second category may be an abnormal category. In this case, the information on the first category of item may be distributed to a shopping site, and the information on the second category of item may be returned to the store for warning processing. In addition, in order to improve the classification accuracy, the classification of the information of the article can be reviewed manually. Based on the category review results, a final category of information for the item can be determined, which in turn determines whether the information for the item is to be posted or returned.
In some embodiments, for the case of calculating only text differences, if the text differences are less than a first preset difference threshold, the information of the item may be classified into a first category; if the text difference is not less than the first preset difference threshold, the information of the article can be classified into a second category.
In some embodiments, for the case of calculating only image differences, if the image differences are less than a second preset difference threshold, the information of the item may be classified into a first category; if the image difference is not less than the second preset difference threshold, the information of the article can be classified into a second category.
In some embodiments, for the case of calculating the text difference and the image difference simultaneously, if the text difference is smaller than a first preset difference threshold value, or the image difference is smaller than a second preset difference threshold value, the information of the article is divided into a first category; and if the character difference is not less than the first preset difference threshold value and the image difference is not less than the second preset difference threshold value, dividing the information of the article into a second category.
The article information classification method provided by the embodiment of the application can quickly determine the matching condition of the list page information and the detail page information of the article, is helpful for helping a shopping website to filter abnormal article information, and further avoids time waste caused by clicking the abnormal article information by a user.
With continued reference to FIG. 2, illustrated is a flow 200 of still further embodiments of the item information categorization method according to the present application. The article information classification method comprises the following steps:
step 201, information of an item on a shopping website is acquired.
In the present embodiment, the executing body of the item information sorting method may acquire information of items on a shopping site. The information of the item may include list page information and detail page information of the item, among others. The listing page information may include a title and a main map of the item, and the detail page information may include a type tag and a type map of the item.
In step 202, the text difference between the title and the type label is calculated by text recognition technology.
In this embodiment, the execution subject may calculate the text difference between the title and the type tag through a text recognition technique.
Generally, the text difference between the title and the type label can be determined by means of word-by-word comparison. In the case that a specific price number exists in the title, the specific price number in the title needs to be accurately compared with the specific price number in the type label to determine whether the specific price number is completely consistent with the specific price number in the type label. For the case that there is a price interval in the title, the price interval in the title needs to be fuzzy-compared with the specific price number in the type label to determine whether the specific price number falls in the price interval.
Step 203, determining whether the text difference is smaller than a first preset difference threshold.
In this embodiment, the execution subject may compare the text difference with a first preset difference threshold to determine whether the text difference is smaller than the first preset difference threshold. If the difference is smaller than the first preset difference threshold, go to step 206; if not, go to step 204.
And step 204, calculating the image difference between the main graph and the type graph through an image recognition technology.
In this embodiment, if the text difference is not less than the first preset difference threshold, the execution subject may calculate the image difference between the main graph and the type graph through an image recognition technique.
Among them, the way of calculating the image difference is various. For example, the image difference between the main image and the type image can be determined by comparing pixel points by pixel points. For another example, the articles in the main graph and the type graph are respectively identified through an image identification technology, and corresponding key information is obtained; the image difference between the main graph and the type graph can be determined by comparing key information.
Step 205, it is determined whether the image difference is less than a second preset difference threshold.
In this embodiment, the execution subject may compare the image difference with a second preset difference threshold value, and determine whether the image difference is smaller than the second preset difference threshold value. If the difference is smaller than the second preset difference threshold, go to step 206; if not, go to step 207.
At step 206, information about the item is classified into a first category.
In this embodiment, if the text difference is smaller than a first preset difference threshold, or the image difference is smaller than a second preset difference threshold, the execution main body may classify the information of the article into a first category.
Step 207, obtaining historical user behavior data corresponding to the article.
In this embodiment, if the text difference is not less than the first preset difference threshold and the image difference is not less than the second preset difference threshold, the execution subject may obtain historical user behavior data corresponding to the article.
Generally, historical user behavior data can be obtained statistically by performing big data analysis on historical operation data of a large number of users on information of items on a shopping site. Wherein the historical user behavior data may include, but is not limited to, at least one of: the click rate of the historical listing page of the article, the reading time of the historical detail page, the jumping rate of the historical detail page within a preset time period and the like.
At step 208, information of the item is classified based on historical user behavior data.
In this embodiment, the execution subject may classify the information of the article based on the historical user behavior data.
Generally, by analyzing historical user behavior data, categories of information for an item may be determined. For example, if the click rate of the history list page of the article is high, but the reading time of the history detail page of the article is short, the jump rate of the history detail page of the article within the preset time is high, which indicates that the probability that the information of the article belongs to the second category is high; if the click rate of the historical list page of the article is high, the reading time of the historical detail page of the article is long, the jumping rate of the historical detail page of the article in the preset time is low, and the probability that the information of the article is input into the first category is high.
In some embodiments, the information of the item may be scored by a mathematical calculation method. Specifically, historical user behavior data may be subjected to weighted summation to obtain a classification score of information of the item; the information of the item is classified based on the classification score. If the classification score is higher than a preset score threshold value, the information of the article is divided into a first category; and if the classification score is not higher than a preset score threshold value, classifying the information of the article into a second category. Wherein different types of historical user behavior data correspond to different weights. The value of the weight may be determined by the degree of influence of the corresponding historical user behavior data on the classification. Generally, the historical user behavior data that has a large impact on the classification is weighted more heavily, and the historical user behavior data that has a small impact on the classification is weighted less heavily.
As can be seen from fig. 2, compared with the embodiment shown in fig. 1, the flow 200 of the method for classifying item information in the present embodiment highlights the classification step. Therefore, the scheme described in the embodiment combines character difference, image difference and historical user behavior data to classify, and improves the classification accuracy.
With further reference to fig. 3, illustrated is a flow 300 of further embodiments of an item information classification method according to the present application. The article information classification method comprises the following steps:
step 301, information of an item on a shopping website is obtained.
Step 302, calculating the text difference between the title and the type label by using a text recognition technology.
Step 303, determine whether the text difference is smaller than a first preset difference threshold.
And step 304, calculating the image difference between the main graph and the type graph through an image recognition technology.
Step 305, it is determined whether the image difference is less than a second preset difference threshold.
Step 306, the information of the item is classified into a first category.
Step 307, obtaining historical user behavior data corresponding to the article.
In the present embodiment, the specific operations of steps 301-307 have been described in detail in steps 201-207 in the embodiment shown in fig. 2, and are not described herein again.
Step 308, inputting the historical user behavior data into a pre-trained classification model to obtain the classification score of the information of the article.
In this embodiment, the executing agent of the article information classification method may input the historical user behavior data into a classification model trained in advance, so as to obtain a classification score of the information of the article.
In general, the classification model may be obtained by supervised training of a machine learning model using training samples. The machine learning model may be, for example, a two-class model. In practice, the various parameters of the machine learning model (e.g., weight parameters and bias parameters) may be initialized with some different small random numbers. The small random number is used for ensuring that the model does not enter a saturation state due to overlarge weight value, so that training fails, and the difference is used for ensuring that the model can be normally learned. The parameters of the machine learning model can be continuously adjusted in the training process until a classification model with high enough classification accuracy is trained. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm may be used to adjust the parameters of the deep learning model.
Step 309, the information of the item is classified based on the classification score.
In this embodiment, the execution subject may classify the information of the item based on the classification score.
Generally, if the classification score is higher than a preset score threshold, the information of the article is classified into a first category; and if the classification score is not higher than a preset score threshold value, classifying the information of the article into a second category.
In step 310, a category review result of the information of the article obtained based on the category into which the information of the article is divided is obtained.
In this embodiment, the execution subject may acquire a result of the category review of the information of the article obtained based on the category into which the information of the article is divided.
Generally, in order to improve the classification accuracy, the classes into which the information of the article is classified may be reviewed manually to obtain class review results. Wherein, the category review result can comprise the final category of the information of the article.
And 311, adding a category label to the information of the article based on the category review result, and generating a training sample.
In this embodiment, the executing entity may add a category label to the information of the article based on the category review result, and generate a training sample.
Wherein the category label may be a final category of information for the item. The training samples may include historical user behavior data and category labels corresponding to the items.
And step 312, performing iterative updating on the classification model by using the training samples.
In this embodiment, the executing entity may iteratively update the classification model by using the training samples. Specifically, historical user behavior data in a training sample is used as input, a category label is used as output, the classification model is updated in an iterative mode, and the classification effect of the classification model is optimized continuously.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the item information classification method in the present embodiment highlights the model scoring step and the model updating step. Therefore, the scheme described in the embodiment performs classification and scoring by using the classification model, and improves the acquisition efficiency and scoring accuracy of the classification score. Meanwhile, in the using process of the model, the classification model is updated in an iterative mode, and the classification effect of the classification model is continuously optimized.
For ease of understanding, fig. 4 shows a scene diagram of an article information classification method that can implement an embodiment of the present application. As shown in FIG. 4, the entering merchant on the shopping website shelves or updates the merchandise information. The commodity information is submitted to an audit system. And the auditing system acquires the details of the commodity and performs character recognition with the title of the commodity. If the characters are not very different, marking the commodity information as a normal category, and carrying out manual review. If the character difference is large, the image recognition is carried out with the main picture of the commodity. If the image difference is not large, the commodity information is marked as a normal category, and manual review is carried out. And if the image difference is large, acquiring the user behavior through big data analysis, and inputting the user behavior into a machine learning evaluation system for scoring. If the score is low, the commodity information is marked as an abnormal category, and manual review is carried out. If the manual review result is in the normal category, the information of the article is published on a shopping website; and if the manual review result is finally in the abnormal category, returning the information of the article to the shop for warning processing. Meanwhile, training samples can be generated based on the results of the manual review, and the machine learning evaluation system is subjected to iterative updating and used for continuously improving the evaluation capability of the machine learning evaluation system.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing the computer devices of embodiments of the present application. The computer device shown in fig. 5 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a first acquisition module, a calculation module, and a classification module. The names of these modules do not constitute a limitation to the module itself in this case, and for example, the first acquisition module may also be described as a "module that acquires information of an item on a shopping site".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the computer device described in the above embodiments; or may exist separately and not be incorporated into the computer device. The computer readable medium carries one or more programs which, when executed by the computing device, cause the computing device to: acquiring information of an article on a shopping website, wherein the information of the article comprises list page information and detail page information of the article; calculating the difference between the list page information and the detail page information; the information of the item is classified based on the difference.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (13)

1. An item information classification method, comprising:
acquiring information of an article on a shopping website, wherein the information of the article comprises list page information and detail page information of the article;
calculating the difference between the list page information and the detail page information;
classifying information of the item based on the difference.
2. The method of claim 1, wherein the listing page information includes a title of the item and the detail page information includes a type tag of the item; and
the calculating the difference between the list page information and the detail page information comprises:
and calculating the text difference between the title and the type label by using a text recognition technology.
3. The method of claim 2, wherein said listing page information includes a main map of said item, said detail page information including a type map of said item; and
the calculating the difference between the list page information and the detail page information comprises:
and calculating the image difference of the main graph and the type graph through an image recognition technology.
4. The method of claim 3, wherein the classifying the information of the item based on the difference comprises:
and if the character difference is smaller than a first preset difference threshold value or the image difference is smaller than a second preset difference threshold value, dividing the information of the article into a first category.
5. The method of claim 4, wherein the classifying the information of the item based on the difference further comprises:
and if the character difference is not smaller than a first preset difference threshold value and the image difference is not smaller than a second preset difference threshold value, dividing the information of the article into a second category.
6. The method of claim 4, wherein the classifying the information of the item based on the difference further comprises:
if the character difference is not smaller than a first preset difference threshold value and the image difference is not smaller than a second preset difference threshold value, acquiring historical user behavior data corresponding to the article;
classifying information of the item based on the historical user behavior data.
7. The method of claim 6, wherein the classifying the information of the item based on the historical user behavior data comprises:
carrying out weighted summation on the historical user behavior data to obtain a classification score of the information of the article;
classifying information of the item based on the classification score.
8. The method of claim 6, wherein the classifying the information of the item based on the historical user behavior data comprises:
inputting the historical user behavior data into a pre-trained classification model to obtain the classification score of the information of the article;
classifying information of the item based on the classification score.
9. The method of claim 7 or 8, wherein said classifying information of the item based on the classification score comprises:
if the classification score is higher than a preset score threshold value, the information of the article is divided into the first category;
and if the classification score is not higher than the preset score threshold value, the information of the article is classified into the second category.
10. The method of claim 8, wherein the method further comprises:
obtaining a category review result of the information of the article obtained based on the category into which the information of the article is divided;
adding a category label to the information of the article based on the category review result to generate a training sample;
and performing iterative updating on the classification model by using the training sample.
11. The method of any of claims 6-10, wherein the historical user behavior data comprises at least one of: the click rate of the history list page, the reading time of the history detail page and the jump rate of the history detail page within the preset time are determined.
12. A computer device, comprising:
one or more processors;
a storage device on which one or more programs are stored;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-11.
CN202111419326.4A 2021-11-26 2021-11-26 Article information classification method and device Pending CN114065009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111419326.4A CN114065009A (en) 2021-11-26 2021-11-26 Article information classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111419326.4A CN114065009A (en) 2021-11-26 2021-11-26 Article information classification method and device

Publications (1)

Publication Number Publication Date
CN114065009A true CN114065009A (en) 2022-02-18

Family

ID=80276527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111419326.4A Pending CN114065009A (en) 2021-11-26 2021-11-26 Article information classification method and device

Country Status (1)

Country Link
CN (1) CN114065009A (en)

Similar Documents

Publication Publication Date Title
US20180197087A1 (en) Systems and methods for retraining a classification model
CN110555469B (en) Method and device for processing interactive sequence data
CN111460250B (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
EP3843017A2 (en) Automated, progressive explanations of machine learning results
CN110555451A (en) information identification method and device
CN109685537B (en) User behavior analysis method, device, medium and electronic equipment
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN111209478A (en) Task pushing method and device, storage medium and electronic equipment
CN112801773A (en) Enterprise risk early warning method, device, equipment and storage medium
CN107291774B (en) Error sample identification method and device
CN112328869A (en) User loan willingness prediction method and device and computer system
US11392798B2 (en) Automation rating for machine learning classification
CN110704803A (en) Target object evaluation value calculation method and device, storage medium and electronic device
CN111461757B (en) Information processing method and device, computer storage medium and electronic equipment
CN113269433B (en) Tax risk prediction method, apparatus, medium and computer program product
CN113239273B (en) Method, apparatus, device and storage medium for generating text
CN114065009A (en) Article information classification method and device
CN112989050A (en) Table classification method, device, equipment and storage medium
Acharya et al. Mileage Extraction from Odometer Pictures for Automating Auto Insurance Processes
CN111833085A (en) Method and device for calculating price of article
CN117708340B (en) Label text determining method, model training and adjusting method, device and medium
CN114565030B (en) Feature screening method and device, electronic equipment and storage medium
US20220309390A1 (en) Machine-learning-based unsupervised master data correction
CN113591932A (en) User abnormal behavior processing method and device based on support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination