CN112651439B - Material classification method, device, computer equipment and storage medium - Google Patents

Material classification method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN112651439B
CN112651439B CN202011559080.6A CN202011559080A CN112651439B CN 112651439 B CN112651439 B CN 112651439B CN 202011559080 A CN202011559080 A CN 202011559080A CN 112651439 B CN112651439 B CN 112651439B
Authority
CN
China
Prior art keywords
scene
category
annotation
random forest
forest model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011559080.6A
Other languages
Chinese (zh)
Other versions
CN112651439A (en
Inventor
张莉
王雅青
吴志成
乔延柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011559080.6A priority Critical patent/CN112651439B/en
Publication of CN112651439A publication Critical patent/CN112651439A/en
Application granted granted Critical
Publication of CN112651439B publication Critical patent/CN112651439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a material classification method, a device, computer equipment and a storage medium, wherein the material classification method comprises the following steps: training a first random forest model based on a first annotation scene category and a first feature vector of the plurality of historical scene materials; identifying a second annotation scene category of the scene material to be classified according to the second feature vector and the plurality of first feature vectors of the scene material to be classified; updating the first random forest model based on the second scene category and the second feature vector to obtain a second random forest model; correcting the first annotation scene category to be a first target scene category according to the first output of the first random forest model and the second output of the second random forest model; and calculating a second target scene category of the scene material to be classified according to the second annotation scene category and the second output of the second random forest model. The method and the device can accurately classify the scene materials and solve the problem of manual false labeling.

Description

Material classification method, device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a material classification method, a material classification device, computer equipment and a storage medium.
Background
The marketing campaign not only can attract the attention of consumers, but also can transfer out the core value of the brand, thereby improving the influence of the brand. The existing campaign materials platform may provide a wide variety of scene materials for selection by a planner of the marketing campaign to implement the marketing campaign based on the selected scene materials.
The inventor finds that in the process of realizing the invention, the existing active material platform marks the scenes of the existing scene materials manually, trains a machine learning model based on the marked scene materials, and uses the trained machine learning model to classify the newly uploaded scene materials.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a material classification method, apparatus, computer device and storage medium, which can accurately classify scene materials, continuously update scene types of historical scene materials, and solve the problem of manual mislabeling.
A first aspect of the present invention provides a material classification method, the method including:
acquiring first annotation scene categories of a plurality of historical scene materials, and extracting first feature vectors of the plurality of historical scene materials;
training a first random forest model based on a plurality of the first annotation scene categories and a plurality of the first feature vectors;
extracting a second feature vector of the scene material to be classified, and identifying a second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector;
updating the first random forest model based on the second scene category and the second feature vector to obtain a second random forest model;
correcting the first annotation scene category to be a first target scene category according to the first output of the first random forest model and the second output of the second random forest model;
and calculating a second target scene category of the scene material to be classified according to the second annotation scene category and the second output of the second random forest model.
In an alternative embodiment, the method further comprises:
receiving feedback of the user to the downloaded target scene material;
Analyzing the feedback to obtain the real scene category of the target scene material;
updating the second random forest model based on the target scene material and the corresponding real scene category to obtain a third random forest model, so that the scene category of the target scene material output by the third random forest model is the same as the real scene category;
and updating the scene category of other scene materials by using the third random forest model.
In an alternative embodiment, the correcting the first annotation scene category to the first target scene category based on the first output of the first random forest model and the second output of the second random forest model includes:
acquiring a first scene category to be confirmed of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first to-be-confirmed scene class and the second class probability of the corresponding second to-be-confirmed scene class are both larger than a preset class probability threshold;
when the first class probability and the second class probability are both larger than the preset class probability threshold, judging whether at least two identical scene classes exist in the first scene class to be confirmed, the second scene class to be confirmed and the first annotation scene class;
When at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first annotation scene category, correcting the first annotation scene category to be a first target scene category according to the identical scene category.
In an optional embodiment, the identifying the second annotation scene category of the scene material to be classified according to the plurality of the first feature vectors and the second feature vectors includes:
calculating the similarity between each first feature vector and each second feature vector;
determining a target first feature vector corresponding to the maximum similarity;
and determining the first annotation scene category corresponding to the target first feature vector as the second annotation scene category of the scene material to be classified.
In an optional embodiment, the identifying the second annotation scene category of the scene material to be classified according to the plurality of the first feature vectors and the second feature vectors includes:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster in which the second feature vector is located;
Calculating the target annotation scene category of the target feature vector cluster according to the first annotation scene category of the first feature vector in the target feature vector cluster;
and determining the target annotation scene category as a second annotation scene category of the scene material to be classified.
In an alternative embodiment, the method further comprises:
responding to a scene material downloading request of a user, and extracting scene categories in the scene material downloading request;
querying a plurality of scene materials corresponding to the scene category;
generating a download link of each scene material;
calculating the material quantity of each scene material;
and sequencing and displaying the downloading links according to the material quantity.
In an alternative embodiment, the extracting the first feature vector of the plurality of historical scene materials includes:
performing word segmentation processing on each historical scene material to obtain a plurality of segmented words;
extracting word vectors of each word segment by using word2 vector;
a first feature vector is generated based on word vectors of the plurality of segmented words for each of the historical scene materials.
A second aspect of the present invention provides a material sorting apparatus, the apparatus comprising:
The first extraction module is used for obtaining first annotation scene categories of a plurality of historical scene materials and extracting first feature vectors of the plurality of historical scene materials;
the model training module is used for training a first random forest model based on a plurality of the first annotation scene categories and a plurality of the first feature vectors;
the second extraction module is used for extracting a second feature vector of the scene material to be classified, and identifying a second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector;
the model updating module is used for updating the first random forest model based on the second scene category and the second feature vector to obtain a second random forest model;
the class correction module is used for correcting the first annotation scene class into a first target scene class according to the first output of the first random forest model and the second output of the second random forest model;
and the category calculating module is used for calculating a second target scene category of the scene material to be classified according to the second annotation scene category and the second output of the second random forest model.
A third aspect of the present invention provides a computer apparatus comprising a processor for implementing the material classification method when executing a computer program stored in a memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the material classification method.
In summary, according to the material classification method, device, computer equipment and storage medium of the invention, a first random forest model is supervised and trained based on feature vectors and annotation scene categories of historical scene materials, for the scene materials to be classified, one annotation scene category is given to the scene materials to be classified through clustering or similarity, then the first random forest model is supervised and updated to be a second random forest model based on the scene materials to be classified and the corresponding annotation scene categories, iterative update of the random forest model is realized, classification effect of the second random forest model is improved, and finally correction of the first annotation scene category is realized by combining first output of the first random forest model and second output of the second random forest model; and combining the second annotation scene category and the second output of the second random forest model to realize classification of the scene materials to be classified. The method and the device not only can accurately classify the scene materials to be classified, but also can correct the annotation scene types of the historical scene materials, and solve the problem of manual error annotation scene types.
Drawings
Fig. 1 is a flowchart of a material classification method according to an embodiment of the present invention.
Fig. 2 is a block diagram of a material classifying apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The material classification method provided by the embodiment of the invention is executed by the computer equipment, and correspondingly, the material classification device is operated in the computer equipment.
Fig. 1 is a flowchart of a material classification method according to an embodiment of the present invention. The material classification method specifically comprises the following steps, the sequence of the steps in the flow chart can be changed according to different requirements, and some steps can be omitted.
S11, obtaining first annotation scene categories of a plurality of historical scene materials, and extracting first feature vectors of the plurality of historical scene materials.
The historical scene materials are materials which are used by the held marketing activities and uploaded to the activity material platform, and each historical scene material is a material description text.
The first annotation scene category of each historical scene material in the activity material platform may be annotated using an annotation tool, the first annotation scene category being used to identify which type of marketing activity scene the corresponding historical scene material belongs to, e.g., a promotional type activity scene, a new user activity scene mined, an old user activity scene fed back, etc.
The method comprises the steps that a computer device obtains historical scene materials marked with first marking scene categories in an active material platform, and extracts first feature vectors of each historical scene material, so that a first random forest model is trained based on a plurality of first marking scene categories and a plurality of first feature vectors, scene materials uploaded to the active material platform in a follow-up mode are pre-marked, and manual marking of the scene materials is avoided.
In an alternative embodiment, the extracting the first feature vector of the plurality of historical scene materials includes:
Performing word segmentation processing on each historical scene material to obtain a plurality of segmented words;
extracting word vectors of each word segment by using word2 vector;
a first feature vector is generated based on word vectors of the plurality of segmented words for each of the historical scene materials.
The computer device may employ a barker word segmentation tool to segment each historical scene material to divide each historical scene material into a plurality of segments.
Because the plurality of word segments comprise nonsensical words such as stop words, the nonsensical words are filtered, word2vector is used for extracting word vectors of each word segment, and the dimensions of each word vector extracted by using the word2vector are the same.
In order to avoid different dimensionalities of the generated first feature vectors, elements belonging to the same dimensionality in the word vectors of each word in each historical scene material are added, the obtained dimensionality of the first feature vector is identical to that of the word vector, the dimensionality of the first feature vector of each historical scene material is kept consistent, a random forest model is convenient to train subsequently, and the random forest model can be converged rapidly.
For example, assuming that 3 words are included in the first historical scene material, where the word vector of the word segment A1 is (a 11, a12, a 13), the word vector of the word segment A2 is (a 21, a22, a 23), and the word vector of the word segment A3 is (a 31, a32, a 33), the first feature vector generated based on the word vector of the 3 words of the first historical scene material is ((a11+a21+a31)/3, (a12+a22+a32)/3, (a13+a23+a33)/3).
Assuming that the second history scene material includes 2 segmented words, where the word vector of the segmented word B1 is (B11, B12, B13), the word vector of the segmented word B2 is (B21, B22, B23), the first feature vector generated based on the word vector of the 2 segmented words of the second history scene material is ((b11+b21)/3, (b12+b22)/3, (b13+b23)/3).
And S12, training a first random forest model based on the plurality of first annotation scene categories and the plurality of first feature vectors.
And taking each first annotation scene category and the corresponding first feature vector as a data pair, taking a plurality of data pairs as a data set, and performing supervised training on a first random forest model based on the data set.
The training process of the random forest model is the prior art and will not be described in detail.
S13, extracting a second feature vector of the scene material to be classified, and identifying a second annotation scene category of the scene material to be classified according to the first feature vectors and the second feature vectors.
The scene materials to be classified are materials which are newly uploaded to an active material platform and need to be classified.
The method comprises the steps that computer equipment performs word segmentation on scene materials to be classified by using a barking word segmentation tool to obtain a plurality of segmented words, and word2vector is used for extracting word vectors of each segmented word after nonsensical words are removed; and generating a second feature vector based on the word vectors of the plurality of segmented words of the scene material to be classified.
The second feature vector has the same dimensions as the first feature vector.
Although the scene category of the scene material to be classified can be predicted to a certain extent by directly using the first random forest model, the classification accuracy of the first random forest model is not high because of the mislabel of the first scene category in the first random forest model, so that the accuracy of predicting the second scene category of the scene material to be classified is also not high. And because the first random forest model is trained in a supervised mode, the scene materials to be classified have no scene category, namely no label, so the classified scene materials cannot be directly used for iteratively updating the first random forest model, and after the second feature vector of the scene materials to be classified is extracted, the second labeling scene category of the scene materials to be classified is firstly identified according to a plurality of first feature vectors and the second feature vectors, and the scene category pre-labeling is carried out on the scene materials to be classified.
In an optional embodiment, the identifying the second annotation scene category of the scene material to be classified according to the plurality of the first feature vectors and the second feature vectors includes:
calculating the similarity between each first feature vector and each second feature vector;
determining a target first feature vector corresponding to the maximum similarity;
and determining the first annotation scene category corresponding to the target first feature vector as the second annotation scene category of the scene material to be classified.
The first feature vector is a feature representation of the historical scene material, the second feature is a feature representation of the scene material to be classified, and the similarity between the historical scene material and the scene material to be classified is calculated by calculating the similarity between the first feature vector and the second feature vector. The larger the similarity is, the more similar the corresponding historical scene material is to the scene material to be classified, and the more the historical scene material and the scene material to be classified belong to the same class. The smaller the similarity is, the more dissimilar the corresponding historical scene material and the scene material to be classified are, and the more dissimilar the historical scene material and the scene material to be classified are. And determining the first annotation scene category of the target first feature vector corresponding to the maximum similarity as the second annotation scene category of the scene material to be classified.
In another optional embodiment, the identifying the second annotation scene category of the scene material to be classified according to the plurality of the first feature vectors and the second feature vectors includes:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster in which the second feature vector is located;
calculating the target annotation scene category of the target feature vector cluster according to the first annotation scene category of the first feature vector in the target feature vector cluster;
and determining the target annotation scene category as a second annotation scene category of the scene material to be classified.
The computer device may use a K-means clustering algorithm to cluster the plurality of first feature vectors and the plurality of second feature vectors, thereby dividing the plurality of first feature vectors and the plurality of second feature vectors into a plurality of feature vector clusters, each feature vector cluster including one or more feature vectors.
The clustering realizes that the feature vectors with the same annotation scene category are clustered into the same category, and the feature vectors with different annotation scene categories are clustered into different categories.
And determining a target feature vector cluster in which the second feature vector is located, namely determining a second annotation scene category of the second feature vector according to the first annotation scene category of the first feature vector in the target feature vector cluster. If the target first annotation scene categories corresponding to the target first feature vectors in the target feature vector cluster are all the same, the target first annotation scene categories are scene categories in the target feature vector cluster, and the target first annotation scene categories are second annotation scene categories of scene materials to be classified. If the target first annotation scene categories corresponding to the target first feature vectors in the target feature vector cluster are not all the same, calculating the number of each same first annotation scene category in the target first annotation scene categories, determining the first annotation scene category with the largest number as the scene category in the target feature vector cluster, and determining the first annotation scene category with the largest number as the second annotation scene category of the scene material to be classified.
And S14, updating the first random forest model based on the second scene category and the second feature vector to obtain a second random forest model.
And taking the second annotation scene category and the second feature vector as a new data pair, adding the new data pair into the data set to obtain a new data set, and performing supervised training on the first random forest model based on the new data set to obtain a second random forest model, thereby realizing iterative updating of the first random forest model.
And S15, correcting the first annotation scene category to be a first target scene category according to the first output of the first random forest model and the second output of the second random forest model.
And outputting the scene category of each feature vector and the category probability of the scene category by the random forest model after training is completed. The output of the first random forest model is referred to as a first output and the output of the second random forest model is referred to as a second output.
And correcting the first annotation scene category of the historical scene by combining the first output of the first random forest model and the second output of the second random forest model.
In an alternative embodiment, the correcting the first annotation scene category to the first target scene category based on the first output of the first random forest model and the second output of the second random forest model includes:
acquiring a first scene category to be confirmed of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first to-be-confirmed scene class and the second class probability of the corresponding second to-be-confirmed scene class are both larger than a preset class probability threshold;
When the first class probability and the second class probability are both larger than the preset class probability threshold, judging whether at least two identical scene classes exist in the first scene class to be confirmed, the second scene class to be confirmed and the first annotation scene class;
when at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first annotation scene category, correcting the first annotation scene category to be a first target scene category according to the identical scene category.
When the first class probability of the first scene class to be confirmed and the second class probability of the second scene class to be confirmed are both larger than a preset class probability threshold, the classification accuracy of the first random forest model and the second random forest model on the same historical scene material is higher. In this case, if the first scene category to be confirmed, the second scene category to be confirmed, and the first annotation scene category are all the same, the first target scene category of the historical scene material is the first annotation scene category.
When the first class probability and the second class probability are both smaller than the preset class probability threshold, comparing the first class probability with the second class probability, correcting the first annotation scene class to be a first target scene class according to the first scene class to be confirmed when the first class probability is larger than the second class probability, and correcting the first annotation scene class to be a first target scene class according to the second scene class to be confirmed when the second class probability is larger than the first class probability.
S16, calculating a second target scene category of the scene material to be classified according to the second annotation scene category and a second output of the second random forest model.
Determining that the second annotation scene category of the scene material to be classified may be incorrect by calculating the similarity or clustering, then calculating a second target scene category of the scene material to be classified in combination with the second annotation scene category and a second output of the second random forest model.
And if the second annotation scene category is the same as the scene category of the scene material to be classified in the second output, the second target scene category of the scene material to be classified is the second annotation scene category.
If the second annotation scene category is different from the scene category of the scene material to be classified in the second output, when the category probability of the scene category of the scene material to be classified in the second output is greater than the preset probability threshold, the second target scene category of the scene material to be classified is the scene category of the scene material to be classified in the second output; when the class probability of the scene class of the scene material to be classified in the second output is smaller than the preset probability threshold, the second target scene class of the scene material to be classified is the first target scene class corresponding to the second annotation scene class.
In an alternative embodiment, the method further comprises:
responding to a scene material downloading request of a user, and extracting scene categories in the scene material downloading request;
querying a plurality of scene materials corresponding to the scene category;
generating a download link of each scene material;
calculating the material quantity of each scene material;
and sequencing and displaying the downloading links according to the material quantity.
When a user needs to formulate a scene material for a certain marketing campaign, the relevant scene material can be downloaded in the campaign material platform, so that the scene material is prevented from being re-formulated, and the campaign efficiency of the marketing campaign is improved.
The user may input a scene category in a user interface provided by the active material platform to trigger a scene material download request, and the computer device extracts a scene Jing Leibie from the scene material download request, and queries a plurality of scene materials in the active material platform corresponding to the extracted scene category.
The method comprises the steps that scene materials of each scene category of an active material platform are stored in a folder, the scene materials in the same folder have different storage paths, and a download link of each scene material is generated based on the storage paths.
The size of different materials is different, the number of the word segmentation of the scene materials is calculated, the material quantity of the scene materials can be obtained, the larger the material quantity is, the corresponding download links are displayed at the top end of the user interface, and the smaller the material quantity is, the corresponding download links are displayed at the bottom end of the user interface. After the downloading links are displayed in sequence according to the material quantity, the material quantity can be displayed at the downloading links so as to prompt a user to download the downloading resources which are consumed by the scene materials at the downloading links, and therefore the purpose of saving the downloading resources of the user can be achieved.
In an alternative embodiment, the method further comprises:
receiving feedback of the user to the downloaded target scene material;
analyzing the feedback to obtain the real scene category of the target scene material;
updating the second random forest model based on the target scene material and the corresponding real scene category to obtain a third random forest model, so that the scene category of the target scene material output by the third random forest model is the same as the real scene category;
and updating the scene category of other scene materials by using the third random forest model.
And a feedback input box can be displayed in a user interface provided by the active material platform, so that a user can feed back whether the scene category of the downloaded scene material is the correct scene category. If the scene category of the downloaded scene material is the correct scene category, yes may be entered in the feedback input box. If the scene category of the downloaded scene material is not the correct scene category, the real scene category of the scene material can be input in a feedback input box.
The computer equipment can record the real scene category of the target scene material, retrain the second random forest model based on the target scene material and the corresponding real scene category when the recording times of the real scene category exceeds a preset time threshold, and update the second random forest model by taking the scene category of the target scene material output by the retrained second random forest model and the real scene category as training targets, thereby improving the classification effect of the third random forest model. And finally, obtaining a third output of the third random forest model, and obtaining scene categories of other scene materials in the third output as the latest scene category of the other scene materials.
The other scene materials refer to scene materials except the target scene materials in the second random forest model, and the scene materials comprise the plurality of historical scene materials and materials uploaded to an active material platform in a follow-up mode. In the optional embodiment, the second random forest model is updated when the record times of the real scene category obtained by the feedback exceeds the preset time threshold by receiving the feedback of the user on the downloaded scene material, the scene category of the downloaded scene material is ensured to be the real scene category, the correction of the scene category of the downloaded scene material is realized, the second random forest model is updated to be the third random forest model at present by taking the correction as training, the process of the embodiment is repeated continuously, the long-term iterative update of the random forest model is realized, and the classification effect of the third random forest model is improved continuously.
In summary, according to the material classification method disclosed by the invention, the first random forest model is subjected to supervised training based on the feature vector and the annotation scene category of the historical scene material, the annotation scene category is given to the scene material to be classified in a clustering or similarity mode, the first random forest model is supervised updated to be the second random forest model based on the scene material to be classified and the corresponding annotation scene category, the iterative update of the random forest model is realized, the classification effect of the second random forest model is improved, and the correction of the first annotation scene category is realized by combining the first output of the first random forest model and the second output of the second random forest model; and combining the second annotation scene category and the second output of the second random forest model to realize classification of the scene materials to be classified. The method and the device not only can accurately classify the scene materials to be classified, but also can correct the annotation scene types of the historical scene materials, and solve the problem of manual error annotation scene types.
It is emphasized that to further guarantee the privacy and security of the random forest model, the random forest model may be stored in nodes of the blockchain.
Fig. 2 is a block diagram of a material classifying apparatus according to a second embodiment of the present invention.
In some embodiments, the material classification device 20 may include a plurality of functional modules comprised of computer program segments. The computer program of the individual program segments in the material classification apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform the functions of material classification (described in detail with reference to fig. 1).
In this embodiment, the material classifying device 20 may be divided into a plurality of functional modules according to the functions it performs. The functional module may include: a first extraction module 201, a model training module 202, a second extraction module 203, a model updating module 204, a category correction module 205, a category calculation module 206, a link display module 207, and a category feedback module 208. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The first extraction module 201 is configured to obtain a first annotation scene category of a plurality of historical scene materials, and extract a first feature vector of the plurality of historical scene materials.
The historical scene materials are materials which are used by the held marketing activities and uploaded to the activity material platform, and each historical scene material is a material description text.
The first annotation scene category of each historical scene material in the activity material platform may be annotated using an annotation tool, the first annotation scene category being used to identify which type of marketing activity scene the corresponding historical scene material belongs to, e.g., a promotional type activity scene, a new user activity scene mined, an old user activity scene fed back, etc.
The method comprises the steps that a computer device obtains historical scene materials marked with first marking scene categories in an active material platform, and extracts first feature vectors of each historical scene material, so that a first random forest model is trained based on a plurality of first marking scene categories and a plurality of first feature vectors, scene materials uploaded to the active material platform in a follow-up mode are pre-marked, and manual marking of the scene materials is avoided.
In an alternative embodiment, the first extracting module 201 extracts the first feature vectors of the plurality of historical scene materials includes:
Performing word segmentation processing on each historical scene material to obtain a plurality of segmented words;
extracting word vectors of each word segment by using word2 vector;
a first feature vector is generated based on word vectors of the plurality of segmented words for each of the historical scene materials.
The computer device may employ a barker word segmentation tool to segment each historical scene material to divide each historical scene material into a plurality of segments.
Because the plurality of word segments comprise nonsensical words such as stop words, the nonsensical words are filtered, word2vector is used for extracting word vectors of each word segment, and the dimensions of each word vector extracted by using the word2vector are the same.
In order to avoid different dimensionalities of the generated first feature vectors, elements belonging to the same dimensionality in the word vectors of each word in each historical scene material are added, the obtained dimensionality of the first feature vector is identical to that of the word vector, the dimensionality of the first feature vector of each historical scene material is kept consistent, a random forest model is convenient to train subsequently, and the random forest model can be converged rapidly.
For example, assuming that 3 words are included in the first historical scene material, where the word vector of the word segment A1 is (a 11, a12, a 13), the word vector of the word segment A2 is (a 21, a22, a 23), and the word vector of the word segment A3 is (a 31, a32, a 33), the first feature vector generated based on the word vector of the 3 words of the first historical scene material is ((a11+a21+a31)/3, (a12+a22+a32)/3, (a13+a23+a33)/3).
Assuming that the second history scene material includes 2 segmented words, where the word vector of the segmented word B1 is (B11, B12, B13), the word vector of the segmented word B2 is (B21, B22, B23), the first feature vector generated based on the word vector of the 2 segmented words of the second history scene material is ((b11+b21)/3, (b12+b22)/3, (b13+b23)/3).
The model training module 202 is configured to train a first random forest model based on a plurality of the first annotation scene categories and a plurality of the first feature vectors.
And taking each first annotation scene category and the corresponding first feature vector as a data pair, taking a plurality of data pairs as a data set, and performing supervised training on a first random forest model based on the data set.
The training process of the random forest model is the prior art and will not be described in detail.
The second extracting module 203 is configured to extract a second feature vector of the scene material to be classified, and identify a second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector.
The scene materials to be classified are materials which are newly uploaded to an active material platform and need to be classified.
The method comprises the steps that computer equipment performs word segmentation on scene materials to be classified by using a barking word segmentation tool to obtain a plurality of segmented words, and word2vector is used for extracting word vectors of each segmented word after nonsensical words are removed; and generating a second feature vector based on the word vectors of the plurality of segmented words of the scene material to be classified.
The second feature vector has the same dimensions as the first feature vector.
Although the scene category of the scene material to be classified can be predicted to a certain extent by directly using the first random forest model, the classification accuracy of the first random forest model is not high because of the mislabel of the first scene category in the first random forest model, so that the accuracy of predicting the second scene category of the scene material to be classified is also not high. And because the first random forest model is trained in a supervised mode, the scene materials to be classified have no scene category, namely no label, so the classified scene materials cannot be directly used for iteratively updating the first random forest model, and after the second feature vector of the scene materials to be classified is extracted, the second labeling scene category of the scene materials to be classified is firstly identified according to a plurality of first feature vectors and the second feature vectors, and the scene category pre-labeling is carried out on the scene materials to be classified.
In an optional embodiment, the identifying, by the second extracting module 203, the second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vectors includes:
calculating the similarity between each first feature vector and each second feature vector;
determining a target first feature vector corresponding to the maximum similarity;
and determining the first annotation scene category corresponding to the target first feature vector as the second annotation scene category of the scene material to be classified.
The first feature vector is a feature representation of the historical scene material, the second feature is a feature representation of the scene material to be classified, and the similarity between the historical scene material and the scene material to be classified is calculated by calculating the similarity between the first feature vector and the second feature vector. The larger the similarity is, the more similar the corresponding historical scene material is to the scene material to be classified, and the more the historical scene material and the scene material to be classified belong to the same class. The smaller the similarity is, the more dissimilar the corresponding historical scene material and the scene material to be classified are, and the more dissimilar the historical scene material and the scene material to be classified are. And determining the first annotation scene category of the target first feature vector corresponding to the maximum similarity as the second annotation scene category of the scene material to be classified.
In another optional embodiment, the identifying, by the second extracting module 203, the second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vectors includes:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster in which the second feature vector is located;
calculating the target annotation scene category of the target feature vector cluster according to the first annotation scene category of the first feature vector in the target feature vector cluster;
and determining the target annotation scene category as a second annotation scene category of the scene material to be classified.
The computer device may use a K-means clustering algorithm to cluster the plurality of first feature vectors and the plurality of second feature vectors, thereby dividing the plurality of first feature vectors and the plurality of second feature vectors into a plurality of feature vector clusters, each feature vector cluster including one or more feature vectors.
The clustering realizes that the feature vectors with the same annotation scene category are clustered into the same category, and the feature vectors with different annotation scene categories are clustered into different categories.
And determining a target feature vector cluster in which the second feature vector is located, namely determining a second annotation scene category of the second feature vector according to the first annotation scene category of the first feature vector in the target feature vector cluster. If the target first annotation scene categories corresponding to the target first feature vectors in the target feature vector cluster are all the same, the target first annotation scene categories are scene categories in the target feature vector cluster, and the target first annotation scene categories are second annotation scene categories of scene materials to be classified. If the target first annotation scene categories corresponding to the target first feature vectors in the target feature vector cluster are not all the same, calculating the number of each same first annotation scene category in the target first annotation scene categories, determining the first annotation scene category with the largest number as the scene category in the target feature vector cluster, and determining the first annotation scene category with the largest number as the second annotation scene category of the scene material to be classified.
The model updating module 204 is configured to update the first random forest model based on the second scene category and the second feature vector to obtain a second random forest model.
And taking the second annotation scene category and the second feature vector as a new data pair, adding the new data pair into the data set to obtain a new data set, and performing supervised training on the first random forest model based on the new data set to obtain a second random forest model, thereby realizing iterative updating of the first random forest model.
The class correction module 205 is configured to correct the first annotation scene class to be the first target scene class according to the first output of the first random forest model and the second output of the second random forest model.
And outputting the scene category of each feature vector and the category probability of the scene category by the random forest model after training is completed. The output of the first random forest model is referred to as a first output and the output of the second random forest model is referred to as a second output.
And correcting the first annotation scene category of the historical scene by combining the first output of the first random forest model and the second output of the second random forest model.
In an alternative embodiment, the class correction module 205 corrects the first annotation scene class to a first target scene class based on the first output of the first random forest model and the second output of the second random forest model comprises:
Acquiring a first scene category to be confirmed of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first to-be-confirmed scene class and the second class probability of the corresponding second to-be-confirmed scene class are both larger than a preset class probability threshold;
when the first class probability and the second class probability are both larger than the preset class probability threshold, judging whether at least two identical scene classes exist in the first scene class to be confirmed, the second scene class to be confirmed and the first annotation scene class;
when at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first annotation scene category, correcting the first annotation scene category to be a first target scene category according to the identical scene category.
When the first class probability of the first scene class to be confirmed and the second class probability of the second scene class to be confirmed are both larger than a preset class probability threshold, the classification accuracy of the first random forest model and the second random forest model on the same historical scene material is higher. In this case, if the first scene category to be confirmed, the second scene category to be confirmed, and the first annotation scene category are all the same, the first target scene category of the historical scene material is the first annotation scene category.
When the first class probability and the second class probability are both smaller than the preset class probability threshold, comparing the first class probability with the second class probability, correcting the first annotation scene class to be a first target scene class according to the first scene class to be confirmed when the first class probability is larger than the second class probability, and correcting the first annotation scene class to be a first target scene class according to the second scene class to be confirmed when the second class probability is larger than the first class probability.
The class calculation module 206 is configured to calculate a second target scene class of the scene material to be classified according to the second annotation scene class and a second output of the second random forest model.
Determining that the second annotation scene category of the scene material to be classified may be incorrect by calculating the similarity or clustering, then calculating a second target scene category of the scene material to be classified in combination with the second annotation scene category and a second output of the second random forest model.
And if the second annotation scene category is the same as the scene category of the scene material to be classified in the second output, the second target scene category of the scene material to be classified is the second annotation scene category.
If the second annotation scene category is different from the scene category of the scene material to be classified in the second output, when the category probability of the scene category of the scene material to be classified in the second output is greater than the preset probability threshold, the second target scene category of the scene material to be classified is the scene category of the scene material to be classified in the second output; when the class probability of the scene class of the scene material to be classified in the second output is smaller than the preset probability threshold, the second target scene class of the scene material to be classified is the first target scene class corresponding to the second annotation scene class.
The link display module 207 is configured to respond to a scene material downloading request of a user, and extract a scene category in the scene material downloading request; querying a plurality of scene materials corresponding to the scene category; generating a download link of each scene material; calculating the material quantity of each scene material; and sequencing and displaying the downloading links according to the material quantity.
When a user needs to formulate a scene material for a certain marketing campaign, the relevant scene material can be downloaded in the campaign material platform, so that the scene material is prevented from being re-formulated, and the campaign efficiency of the marketing campaign is improved.
The user may input a scene category in a user interface provided by the active material platform to trigger a scene material download request, and the computer device extracts a scene Jing Leibie from the scene material download request, and queries a plurality of scene materials in the active material platform corresponding to the extracted scene category.
The method comprises the steps that scene materials of each scene category of an active material platform are stored in a folder, the scene materials in the same folder have different storage paths, and a download link of each scene material is generated based on the storage paths.
The size of different materials is different, the number of the word segmentation of the scene materials is calculated, the material quantity of the scene materials can be obtained, the larger the material quantity is, the corresponding download links are displayed at the top end of the user interface, and the smaller the material quantity is, the corresponding download links are displayed at the bottom end of the user interface. After the downloading links are displayed in sequence according to the material quantity, the material quantity can be displayed at the downloading links so as to prompt a user to download the downloading resources which are consumed by the scene materials at the downloading links, and therefore the purpose of saving the downloading resources of the user can be achieved.
The category feedback module 208 is configured to receive feedback from the user on the downloaded target scene material; analyzing the feedback to obtain the real scene category of the target scene material; updating the second random forest model based on the target scene material and the corresponding real scene category to obtain a third random forest model, so that the scene category of the target scene material output by the third random forest model is the same as the real scene category; and updating the scene category of other scene materials by using the third random forest model.
And a feedback input box can be displayed in a user interface provided by the active material platform, so that a user can feed back whether the scene category of the downloaded scene material is the correct scene category. If the scene category of the downloaded scene material is the correct scene category, yes may be entered in the feedback input box. If the scene category of the downloaded scene material is not the correct scene category, the real scene category of the scene material can be input in a feedback input box.
The computer equipment can record the real scene category of the target scene material, retrain the second random forest model based on the target scene material and the corresponding real scene category when the recording times of the real scene category exceeds a preset time threshold, and update the second random forest model by taking the scene category of the target scene material output by the retrained second random forest model and the real scene category as training targets, thereby improving the classification effect of the third random forest model. And finally, obtaining a third output of the third random forest model, and obtaining scene categories of other scene materials in the third output as the latest scene category of the other scene materials.
The other scene materials refer to scene materials except the target scene materials in the second random forest model, and the scene materials comprise the plurality of historical scene materials and materials uploaded to an active material platform in a follow-up mode. In the optional embodiment, the second random forest model is updated when the record times of the real scene category obtained by the feedback exceeds the preset time threshold by receiving the feedback of the user on the downloaded scene material, the scene category of the downloaded scene material is ensured to be the real scene category, the correction of the scene category of the downloaded scene material is realized, the second random forest model is updated to be the third random forest model at present by taking the correction as training, the process of the embodiment is repeated continuously, the long-term iterative update of the random forest model is realized, and the classification effect of the third random forest model is improved continuously.
In summary, according to the material classification device disclosed by the invention, the first random forest model is subjected to supervised training based on the feature vector and the annotation scene category of the historical scene material, the annotation scene category is given to the scene material to be classified in a clustering or similarity mode, the first random forest model is supervised and updated to be a second random forest model based on the scene material to be classified and the corresponding annotation scene category, the iterative update of the random forest model is realized, the classification effect of the second random forest model is improved, and the correction of the first annotation scene category is realized by combining the first output of the first random forest model and the second output of the second random forest model; and combining the second annotation scene category and the second output of the second random forest model to realize classification of the scene materials to be classified. The method and the device not only can accurately classify the scene materials to be classified, but also can correct the annotation scene types of the historical scene materials, and solve the problem of manual error annotation scene types.
It is emphasized that to further guarantee the privacy and security of the random forest model, the random forest model may be stored in nodes of the blockchain.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 is not limiting of the embodiments of the present invention, and that either a bus-type configuration or a star-type configuration is possible, and that the computer device 3 may include more or less other hardware or software than that shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client by way of a keyboard, mouse, remote control, touch pad, or voice control device, such as a personal computer, tablet, smart phone, digital camera, etc.
It should be noted that the computer device 3 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.
In some embodiments, the memory 31 stores a computer program that, when executed by the at least one processor 32, performs all or part of the steps in the material classification method as described. The Memory 31 includes Read-Only Memory (ROM), programmable Read-Only Memory (PROM), erasable programmable Read-Only Memory (EPROM), one-time programmable Read-Only Memory (One-time Programmable Read-Only Memory, OTPROM), electrically erasable rewritable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects the various components of the entire computer device 3 using various interfaces and lines, and performs various functions and processes of the computer device 3 by running or executing programs or modules stored in the memory 31, and invoking data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the material classification method described in embodiments of the invention; or to implement all or part of the functions of the material sorting apparatus. The at least one processor 32 may be comprised of integrated circuits, such as a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functionality, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like.
In some embodiments, the at least one communication bus 33 is arranged to enable connected communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further comprise a power source (such as a battery) for powering the various components, preferably the power source is logically connected to the at least one processor 32 via a power management means, whereby the functions of managing charging, discharging, and power consumption are performed by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The computer device 3 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.
The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or processor (processor) to perform portions of the methods described in the various embodiments of the invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. The units or means stated in the invention may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (8)

1. A method of classifying material, the method comprising:
acquiring first annotation scene categories of a plurality of historical scene materials, and extracting first feature vectors of the plurality of historical scene materials;
training a first random forest model based on a plurality of the first annotation scene categories and a plurality of the first feature vectors;
extracting a second feature vector of the scene material to be classified, and identifying a second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector;
updating the first random forest model based on the second annotation scene category and the second feature vector to obtain a second random forest model;
correcting the first annotation scene category to a first target scene category according to the first output of the first random forest model and the second output of the second random forest model, comprising: acquiring a first scene category to be confirmed of each historical scene material in the first output; acquiring a second scene category to be confirmed of each historical scene material in the second output; judging whether the first class probability of the first to-be-confirmed scene class and the second class probability of the corresponding second to-be-confirmed scene class are both larger than a preset class probability threshold; when the first class probability and the second class probability are both larger than the preset class probability threshold, judging whether at least two identical scene classes exist in the first scene class to be confirmed, the second scene class to be confirmed and the first annotation scene class; when at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first annotation scene category, correcting the first annotation scene category to be a first target scene category according to the identical scene category;
Calculating a second target scene category of the scene material to be classified according to the second annotation scene category and a second output of the second random forest model; receiving feedback of a user on the downloaded target scene material; analyzing the feedback to obtain the real scene category of the target scene material; updating the second random forest model based on the target scene material and the corresponding real scene category to obtain a third random forest model, so that the scene category of the target scene material output by the third random forest model is the same as the real scene category; and updating the scene category of other scene materials by using the third random forest model.
2. The material classification method of claim 1, wherein the identifying the second annotation scene category of the scene material to be classified based on the plurality of the first feature vectors and the second feature vectors comprises:
calculating the similarity between each first feature vector and each second feature vector;
determining a target first feature vector corresponding to the maximum similarity;
and determining the first annotation scene category corresponding to the target first feature vector as the second annotation scene category of the scene material to be classified.
3. The material classification method of claim 1, wherein the identifying the second annotation scene category of the scene material to be classified based on the plurality of the first feature vectors and the second feature vectors comprises:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster in which the second feature vector is located;
calculating the target annotation scene category of the target feature vector cluster according to the first annotation scene category of the first feature vector in the target feature vector cluster;
and determining the target annotation scene category as a second annotation scene category of the scene material to be classified.
4. A material classification method as claimed in claim 2 or 3, wherein the method further comprises:
responding to a scene material downloading request of a user, and extracting scene categories in the scene material downloading request;
querying a plurality of scene materials corresponding to the scene category;
generating a download link of each scene material;
calculating the material quantity of each scene material;
and sequencing and displaying the downloading links according to the material quantity.
5. A material classification method as claimed in claim 2 or 3, wherein said extracting a first feature vector of said plurality of historical scene materials comprises:
performing word segmentation processing on each historical scene material to obtain a plurality of segmented words;
extracting word vectors of each word segment by using word2 vector;
a first feature vector is generated based on word vectors of the plurality of segmented words for each of the historical scene materials.
6. A material classifying apparatus, characterized in that the apparatus comprises:
the first extraction module is used for obtaining first annotation scene categories of a plurality of historical scene materials and extracting first feature vectors of the plurality of historical scene materials;
the model training module is used for training a first random forest model based on a plurality of the first annotation scene categories and a plurality of the first feature vectors;
the second extraction module is used for extracting a second feature vector of the scene material to be classified, and identifying a second annotation scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector;
the model updating module is used for updating the first random forest model based on the second annotation scene category and the second feature vector to obtain a second random forest model;
The category correction module is configured to correct the first annotation scene category to be a first target scene category according to the first output of the first random forest model and the second output of the second random forest model, and includes: acquiring a first scene category to be confirmed of each historical scene material in the first output; acquiring a second scene category to be confirmed of each historical scene material in the second output; judging whether the first class probability of the first to-be-confirmed scene class and the second class probability of the corresponding second to-be-confirmed scene class are both larger than a preset class probability threshold; when the first class probability and the second class probability are both larger than the preset class probability threshold, judging whether at least two identical scene classes exist in the first scene class to be confirmed, the second scene class to be confirmed and the first annotation scene class; when at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first annotation scene category, correcting the first annotation scene category to be a first target scene category according to the identical scene category;
The category calculating module is used for calculating a second target scene category of the scene material to be classified according to the second annotation scene category and a second output of the second random forest model; receiving feedback of a user on the downloaded target scene material; analyzing the feedback to obtain the real scene category of the target scene material; updating the second random forest model based on the target scene material and the corresponding real scene category to obtain a third random forest model, so that the scene category of the target scene material output by the third random forest model is the same as the real scene category; and updating the scene category of other scene materials by using the third random forest model.
7. A computer device comprising a processor for implementing the material classification method of any one of claims 1 to 5 when executing a computer program stored in a memory.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the material classification method according to any one of claims 1 to 5.
CN202011559080.6A 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium Active CN112651439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011559080.6A CN112651439B (en) 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011559080.6A CN112651439B (en) 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112651439A CN112651439A (en) 2021-04-13
CN112651439B true CN112651439B (en) 2023-12-22

Family

ID=75362890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011559080.6A Active CN112651439B (en) 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112651439B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
CN108304490A (en) * 2018-01-08 2018-07-20 有米科技股份有限公司 Text based similarity determines method, apparatus and computer equipment
CN108734214A (en) * 2018-05-21 2018-11-02 Oppo广东移动通信有限公司 Image-recognizing method and device, electronic equipment, storage medium
CN109145965A (en) * 2018-08-02 2019-01-04 深圳辉煌耀强科技有限公司 Cell recognition method and device based on random forest disaggregated model
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN111914881A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Random forest generation method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726153B2 (en) * 2015-11-02 2020-07-28 LeapYear Technologies, Inc. Differentially private machine learning using a random forest classifier

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
CN108304490A (en) * 2018-01-08 2018-07-20 有米科技股份有限公司 Text based similarity determines method, apparatus and computer equipment
CN108734214A (en) * 2018-05-21 2018-11-02 Oppo广东移动通信有限公司 Image-recognizing method and device, electronic equipment, storage medium
CN109145965A (en) * 2018-08-02 2019-01-04 深圳辉煌耀强科技有限公司 Cell recognition method and device based on random forest disaggregated model
CN111914881A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Random forest generation method and device, electronic equipment and storage medium
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm

Also Published As

Publication number Publication date
CN112651439A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN112257774B (en) Target detection method, device, equipment and storage medium based on federal learning
CN112801718B (en) User behavior prediction method, device, equipment and medium
CN113157927B (en) Text classification method, apparatus, electronic device and readable storage medium
CN110705719A (en) Method and apparatus for performing automatic machine learning
CN113626606B (en) Information classification method, device, electronic equipment and readable storage medium
CN113688923A (en) Intelligent order abnormity detection method and device, electronic equipment and storage medium
CN112860848A (en) Information retrieval method, device, equipment and medium
CN111985545B (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN113570269A (en) Operation and maintenance project management method, device, equipment, medium and program product
CN114781832A (en) Course recommendation method and device, electronic equipment and storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN113658002B (en) Transaction result generation method and device based on decision tree, electronic equipment and medium
CN112860989A (en) Course recommendation method and device, computer equipment and storage medium
CN114756669A (en) Intelligent analysis method and device for problem intention, electronic equipment and storage medium
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
CN113313211B (en) Text classification method, device, electronic equipment and storage medium
CN113656690A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN112651439B (en) Material classification method, device, computer equipment and storage medium
CN113570286B (en) Resource allocation method and device based on artificial intelligence, electronic equipment and medium
CN113420847B (en) Target object matching method based on artificial intelligence and related equipment
CN113626605B (en) Information classification method, device, electronic equipment and readable storage medium
CN113674065B (en) Service contact-based service recommendation method and device, electronic equipment and medium
CN113515591B (en) Text defect information identification method and device, electronic equipment and storage medium
CN112215336B (en) Data labeling method, device, equipment and storage medium based on user behaviors
CN115114073A (en) Alarm information processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40041446

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant