CN112651439A - Material classification method and device, computer equipment and storage medium - Google Patents

Material classification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112651439A
CN112651439A CN202011559080.6A CN202011559080A CN112651439A CN 112651439 A CN112651439 A CN 112651439A CN 202011559080 A CN202011559080 A CN 202011559080A CN 112651439 A CN112651439 A CN 112651439A
Authority
CN
China
Prior art keywords
scene
category
random forest
materials
forest model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011559080.6A
Other languages
Chinese (zh)
Other versions
CN112651439B (en
Inventor
张莉
王雅青
吴志成
乔延柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011559080.6A priority Critical patent/CN112651439B/en
Publication of CN112651439A publication Critical patent/CN112651439A/en
Application granted granted Critical
Publication of CN112651439B publication Critical patent/CN112651439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a material classification method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: training a first random forest model based on a first labeling scene category and a first feature vector of a plurality of historical scene materials; identifying a second labeling scene category of the scene materials to be classified according to the second feature vector of the scene materials to be classified and the plurality of first feature vectors; updating the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model; correcting the first annotation scene type into a first target scene type according to the first output of the first random forest model and the second output of the second random forest model; and calculating a second target scene type of the scene materials to be classified according to the second labeled scene type and a second output of the second random forest model. The method and the system can accurately classify the scene materials and solve the problem of manual mislabeling.

Description

Material classification method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a material classification method and device, computer equipment and a storage medium.
Background
Marketing activities can not only attract the attention of consumers, but also can transmit the core value of the brand, and further improve the influence of the brand. The existing campaign material platform can provide various scene materials for a planner of the marketing campaign to select, thereby implementing the marketing campaign according to the selected scene materials.
The inventor discovers that in the process of implementing the invention, the existing activity material platform manually marks the scenes of the existing scene materials, trains the machine learning model based on the marked scene materials, and uses the trained machine learning model to classify the newly uploaded scene materials, and because the manually marked scenes have errors, the classification accuracy of the trained machine learning model is low, so that the classification accuracy of the newly uploaded scene materials is low, and the scene classes of the manually incorrectly marked scene materials cannot be updated.
Disclosure of Invention
In view of the above, there is a need for a method, an apparatus, a computer device and a storage medium for classifying scene materials, which can accurately classify the scene materials, continuously update the scene types of historical scene materials, and solve the problem of manual mislabeling.
A first aspect of the present invention provides a method of classifying material, the method comprising:
acquiring first labeling scene categories of a plurality of historical scene materials, and extracting first feature vectors of the plurality of historical scene materials;
training a first random forest model based on the plurality of first labeling scene categories and the plurality of first feature vectors;
extracting a second feature vector of the scene materials to be classified, and identifying a second labeled scene category of the scene materials to be classified according to the plurality of first feature vectors and the second feature vector;
updating the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model;
correcting the first annotation scene type into a first target scene type according to the first output of the first random forest model and the second output of the second random forest model;
and calculating a second target scene type of the scene materials to be classified according to the second labeling scene type and a second output of the second random forest model.
In an optional embodiment, the method further comprises:
receiving feedback of the user on the downloaded target scene materials;
analyzing the feedback to obtain the real scene category of the target scene material;
updating the second random forest model based on the target scene materials and the corresponding real scene types to obtain a third random forest model, so that the scene types of the target scene materials output by the third random forest model are the same as the real scene types;
and updating the scene types of other scene materials by using the third random forest model.
In an alternative embodiment, the correcting the first annotated scene class to the first target scene class according to the first output of the first random forest model and the second output of the second random forest model comprises:
acquiring a first to-be-confirmed scene category of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first scene class to be confirmed and the corresponding second class probability of the second scene class to be confirmed are both larger than a preset class probability threshold value;
when the first category probability and the second category probability are both greater than the preset category probability threshold, judging whether at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first labeled scene category;
and when at least two identical scene categories are selected from the first scene category to be confirmed, the second scene category to be confirmed and the first labeling scene category, correcting the first labeling scene category to be a first target scene category according to the identical scene categories.
In an optional embodiment, the identifying a second annotated scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector comprises:
calculating a similarity between each of the first feature vectors and the second feature vector;
determining a first feature vector of a target corresponding to the maximum similarity;
and determining the first labeling scene category corresponding to the target first feature vector as a second labeling scene category of the scene materials to be classified.
In an optional embodiment, the identifying a second annotated scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector comprises:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster where the second feature vector is located;
calculating a target labeling scene type of the target feature vector cluster according to a first labeling scene type of the first feature vector in the target feature vector cluster;
and determining the target labeling scene category as a second labeling scene category of the scene materials to be classified.
In an optional embodiment, the method further comprises:
responding to a scene material downloading request of a user, and extracting a scene category in the scene material downloading request;
inquiring a plurality of scene materials corresponding to the scene categories;
generating a download link of each scene material;
calculating the material quantity of each scene material;
and sequencing and displaying the plurality of download links according to the material amount.
In an alternative embodiment, said extracting the first feature vector of the plurality of historical scene assets comprises:
performing word segmentation processing on each historical scene material to obtain a plurality of words;
extracting a word vector of each participle by using word2 vector;
generating a first feature vector based on word vectors of the plurality of participles of each of the historical scene stories.
A second aspect of the present invention provides a material classifying device, including:
the first extraction module is used for acquiring first labeled scene categories of a plurality of historical scene materials and extracting first feature vectors of the plurality of historical scene materials;
the model training module is used for training a first random forest model based on the first labeling scene categories and the first feature vectors;
the second extraction module is used for extracting a second feature vector of the scene material to be classified and identifying a second labeling scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector;
the model updating module is used for updating the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model;
the class correction module is used for correcting the first annotation scene class into a first target scene class according to the first output of the first random forest model and the second output of the second random forest model;
and the category calculation module is used for calculating a second target scene category of the scene materials to be classified according to the second labeled scene category and the second output of the second random forest model.
A third aspect of the invention provides a computer apparatus comprising a processor for implementing the material classification method when executing a computer program stored in a memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the material classification method.
In summary, according to the material classification method, device, computer equipment and storage medium of the present invention, a supervised training first random forest model is performed based on the feature vectors of historical scene materials and the labeled scene categories, for the scene materials to be classified, a labeled scene category is first given to the scene materials to be classified in a clustering or similarity manner, then the first random forest model is updated to be a second random forest model based on the scene materials to be classified and the corresponding labeled scene category, which is supervised, so that iterative update of the random forest model is realized, the classification effect of the second random forest model is improved, and finally the first labeled scene category is corrected by combining the first output of the first random forest model and the second output of the second forest random model; and realizing the classification of the scene materials to be classified by combining the second labeling scene type and the second output of the second random forest model. The method and the device can accurately classify the scene materials to be classified, can correct the labeled scene types of the historical scene materials, and solve the problem of artificially and wrongly labeling the scene types.
Drawings
Fig. 1 is a flowchart of a material classifying method according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a material classifying apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The material classification method provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the material classification device runs in the computer equipment.
Fig. 1 is a flowchart of a material classifying method according to an embodiment of the present invention. The material classification method specifically comprises the following steps, and the sequence of the steps in the flow chart can be changed and some steps can be omitted according to different requirements.
S11, acquiring first labeling scene categories of a plurality of historical scene materials, and extracting first feature vectors of the plurality of historical scene materials.
The historical scene materials are materials which are used by the held marketing campaign and uploaded to the campaign material platform, and each historical scene material is a material description text.
A first annotation scene category for each historical scene material in the campaign material platform can be annotated using an annotation tool, where the first annotation scene category is used to identify which type of marketing campaign scene the corresponding historical scene material belongs to, e.g., a promotional campaign scene, mine new user campaign scenes, feed back old user campaign scenes, and the like.
The computer equipment obtains historical scene materials marked with first marking scene categories in the active material platform, and extracts first feature vectors of the historical scene materials, so that a first random forest model is trained based on the first marking scene categories and the first feature vectors, scene materials uploaded to the active material platform subsequently are pre-marked, and manual marking of the scene materials is avoided.
In an alternative embodiment, said extracting the first feature vector of the plurality of historical scene assets comprises:
performing word segmentation processing on each historical scene material to obtain a plurality of words;
extracting a word vector of each participle by using word2 vector;
generating a first feature vector based on word vectors of the plurality of participles of each of the historical scene stories.
The computer device may employ a tie segmentation tool to perform a segmentation process on each historical scene story to segment each historical scene story into a plurality of segments.
Because the multiple participles comprise nonsense words such as stop words and the like, the nonsense words are filtered, then word vectors of each participle are extracted by using word2vector, and the dimensionality of each word vector extracted by using the word2vector is the same.
The quantity of the participles in different historical scene materials is different, in order to avoid the difference of dimensionality of the generated first characteristic vector, elements which belong to the same dimensionality in the word vector of each participle in each historical scene material are added, the dimensionality of the obtained first characteristic vector is the same as that of the word vector, the dimensionality of the first characteristic vector of each historical scene material is kept consistent, the random forest model is convenient to train subsequently, and the random forest model can be rapidly converged.
For example, assuming that 3 participles are included in the first historical scene material, wherein a word vector of the participle a1 is (a11, a12, a13), a word vector of the participle a2 is (a21, a22, a23), and a word vector of the participle A3 is (a31, a32, a33), a first feature vector generated based on the word vectors of the 3 participles of the first historical scene material is ((a11+ a21+ a31)/3, (a12+ a22+ a32)/3, and (a13+ a23+ a 33)/3).
Assuming that the second historical scene material includes 2 participles, wherein the word vector of the participle B1 is (B11, B12, B13), and the word vector of the participle B2 is (B21, B22, B23), the first feature vector generated based on the word vectors of the 2 participles of the second historical scene material is ((B11+ B21)/3, (B12+ B22)/3, (B13+ B23)/3).
S12, training a first random forest model based on the first labeling scene types and the first feature vectors.
And taking each first labeling scene category and the corresponding first feature vector as a data pair, taking a plurality of data pairs as a data set, and training a first random forest model based on the data set in a supervision mode.
The training process of the random forest model is prior art and is not elaborated.
S13, extracting a second feature vector of the scene material to be classified, and identifying a second labeling scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector.
The scene materials to be classified are materials which are newly uploaded to the active material platform and need to be classified.
The computer equipment performs word segmentation processing on the scene materials to be classified by using a crust word segmentation tool to obtain a plurality of segmented words, and extracts word vectors of each segmented word by using a word2vector after removing meaningless words; and generating a second feature vector based on the word vectors of the multiple word segments of the scene material to be classified.
The second feature vector has the same dimensions as the first feature vector.
Although the scene classification of the scene materials to be classified can be predicted to a certain extent by directly using the first random forest model, the classification accuracy of the first random forest model is not high due to the fact that the label of the first scene classification in the first random forest model is marked by mistake, and the accuracy of predicting the second scene classification of the scene materials to be classified is not high. And because the first random forest model is trained in a supervision mode, the scene materials to be classified have no scene categories, namely no labels, and therefore the first random forest model cannot be updated iteratively by directly using the classified scene materials, after the second feature vector of the scene materials to be classified is extracted, the second labeled scene category of the scene materials to be classified is identified according to the first feature vectors and the second feature vectors, and the scene categories of the scene materials to be classified are pre-labeled.
In an optional embodiment, the identifying a second annotated scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector comprises:
calculating a similarity between each of the first feature vectors and the second feature vector;
determining a first feature vector of a target corresponding to the maximum similarity;
and determining the first labeling scene category corresponding to the target first feature vector as a second labeling scene category of the scene materials to be classified.
The first feature vector is the feature representation of the historical scene materials, the second feature is the feature representation of the scene materials to be classified, and the similarity between the historical scene materials and the scene materials to be classified is calculated by calculating the similarity between the first feature vector and the second feature vector. The greater the similarity, the more similar the corresponding historical scene materials and the scene materials to be classified, and the more the historical scene materials and the scene materials to be classified belong to the same class. The smaller the similarity is, the more dissimilar the corresponding historical scene materials and the scene materials to be classified are, and the more dissimilar the historical scene materials and the scene materials to be classified are. And determining the first labeling scene category of the target first characteristic vector corresponding to the maximum similarity as the second labeling scene category of the scene materials to be classified.
In another optional embodiment, the identifying a second annotated scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector comprises:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster where the second feature vector is located;
calculating a target labeling scene type of the target feature vector cluster according to a first labeling scene type of the first feature vector in the target feature vector cluster;
and determining the target labeling scene category as a second labeling scene category of the scene materials to be classified.
The computer device may cluster the first feature vectors and the second feature vectors by using a K-means clustering algorithm, so as to divide the first feature vectors and the second feature vectors into a plurality of feature vector clusters, wherein each feature vector cluster includes one or more feature vectors.
The clustering realizes that the feature vectors with the same labeling scene category are grouped into the same category, and the feature vectors with different labeling scene categories are grouped into different categories.
Determining the target feature vector cluster where the second feature vector is located, that is, determining the second labeling scene category of the second feature vector according to the first labeling scene category of the first feature vector in the target feature vector cluster. And if the first target labeling scene categories corresponding to the first target feature vectors in the target feature vector cluster are all the same, the first target labeling scene category is the scene category in the target feature vector cluster, and the first target labeling scene category is the second target labeling scene category of the scene materials to be classified. If the target first labeling scene categories corresponding to the target first feature vectors in the target feature vector cluster are not all the same, calculating the number of each same first labeling scene category in the target first labeling scene categories, determining the first labeling scene category with the largest number as the scene category in the target feature vector cluster, and determining the first labeling scene category with the largest number as the second labeling scene category of the scene materials to be classified.
And S14, updating the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model.
And taking the second labeling scene category and the second feature vector as a new data pair, adding the new data pair into the data set to obtain a new data set, and training the first random forest model based on the new data set in a supervision mode to obtain a second random forest model, so that iterative updating of the first random forest model is realized.
S15, correcting the first annotation scene type to be a first target scene type according to the first output of the first random forest model and the second output of the second random forest model.
And outputting the scene type of each characteristic vector and the type probability of the scene type by the random forest model after training. And the output of the first random forest model is called a first output, and the output of the second random forest model is called a second output.
And correcting the first annotation scene type of the historical scene by combining a first output of the first random forest model and a second output of the second random forest model.
In an alternative embodiment, the correcting the first annotated scene class to the first target scene class according to the first output of the first random forest model and the second output of the second random forest model comprises:
acquiring a first to-be-confirmed scene category of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first scene class to be confirmed and the corresponding second class probability of the second scene class to be confirmed are both larger than a preset class probability threshold value;
when the first category probability and the second category probability are both greater than the preset category probability threshold, judging whether at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first labeled scene category;
and when at least two identical scene categories are selected from the first scene category to be confirmed, the second scene category to be confirmed and the first labeling scene category, correcting the first labeling scene category to be a first target scene category according to the identical scene categories.
And when the first category probability of the first scene category to be confirmed and the corresponding second category probability of the second scene category to be confirmed are both greater than a preset category probability threshold, indicating that the classification accuracy of the first random forest model and the classification accuracy of the second random forest model to the same historical scene material are both higher. In this case, if the first to-be-confirmed scene category, the second to-be-confirmed scene category, and the first annotation scene category are all the same, the first target scene category of the historical scene material is the first annotation scene category.
When the first category probability and the second category probability are both smaller than the preset category probability threshold, comparing the first category probability with the second category probability, when the first category probability is larger than the second category probability, correcting the first labeled scene category to be the first target scene category according to the first to-be-confirmed scene category, and when the second category probability is larger than the first category probability, correcting the first labeled scene category to be the first target scene category according to the second to-be-confirmed scene category.
And S16, calculating a second target scene type of the scene material to be classified according to the second labeling scene type and the second output of the second random forest model.
And determining that the second labeled scene category of the scene material to be classified is possibly wrong by calculating the similarity or clustering, and calculating a second target scene category of the scene material to be classified by combining the second labeled scene category and the second output of the second random forest model.
And if the second labeling scene category is the same as the scene category of the scene material to be classified in the second output, the second target scene category of the scene material to be classified is the second labeling scene category.
If the second labeled scene category is different from the scene category of the scene material to be classified in the second output, when the category probability of the scene category of the scene material to be classified in the second output is greater than the preset probability threshold, the second target scene category of the scene material to be classified is the scene category of the scene material to be classified in the second output; and when the class probability of the scene class of the scene material to be classified in the second output is smaller than the preset probability threshold, the second target scene class of the scene material to be classified is the first target scene class corresponding to the second labeled scene class.
In an optional embodiment, the method further comprises:
responding to a scene material downloading request of a user, and extracting a scene category in the scene material downloading request;
inquiring a plurality of scene materials corresponding to the scene categories;
generating a download link of each scene material;
calculating the material quantity of each scene material;
and sequencing and displaying the plurality of download links according to the material amount.
When a user needs to formulate scene materials for a certain marketing activity, the related scene materials can be downloaded in the activity material platform, so that the scene materials are prevented from being reformulated, and the activity efficiency of the marketing activity is improved.
The user can input the scene type in the user interface provided by the active material platform to trigger the scene material downloading request, the computer equipment extracts the scene type from the scene material downloading request, and a plurality of scene materials corresponding to the extracted scene type in the active material platform are inquired.
The scene materials of each scene category of the active material platform are stored in one folder, the scene materials in the same folder have different storage paths, and the download link of each scene material is generated based on the storage paths.
The sizes of different materials are different, the material quantity of the scene materials can be obtained by calculating the number of the word segments of the scene materials, the larger the material quantity is, the top end of the corresponding download link in the user interface is displayed, and the smaller the material quantity is, the bottom end of the corresponding download link in the user interface is displayed. After the plurality of download links are sequenced and displayed according to the material quantity, the material quantity can be displayed at the download links so as to prompt a user to download the download resources required to be consumed by downloading the scene materials at the download links, and therefore the purpose of saving the download resources of the user can be achieved.
In an optional embodiment, the method further comprises:
receiving feedback of the user on the downloaded target scene materials;
analyzing the feedback to obtain the real scene category of the target scene material;
updating the second random forest model based on the target scene materials and the corresponding real scene types to obtain a third random forest model, so that the scene types of the target scene materials output by the third random forest model are the same as the real scene types;
and updating the scene types of other scene materials by using the third random forest model.
And a feedback input box can be further displayed in the user interface provided by the activity material platform, so that a user can feed back whether the scene type of the downloaded scene material is the correct scene type. If the scene category of the downloaded scene material is the correct scene category, yes may be entered in the feedback input box. If the scene category of the downloaded scene material is not the correct scene category, the actual scene category of the scene material may be entered in a feedback input box.
The computer device can record the real scene category of the target scene material, retrain the second random forest model based on the target scene material and the corresponding real scene category when the recording times of the real scene category exceed a preset time threshold, and update the second random forest model by taking the scene category of the target scene material output by the retrained second random forest model and the real scene category as training targets, so that the classification effect of the third random forest model is improved. And finally, acquiring a third output of the third random forest model, and acquiring the scene category of other scene materials in the third output as the latest scene category of other scene materials.
The other scene materials refer to scene materials except for the target scene material which participate in updating the second random forest model, and comprise the plurality of historical scene materials and materials which are subsequently uploaded to the active material platform. In the optional embodiment, the feedback of the user on the downloaded scene material is received, the second random forest model is updated only when the recording times of the real scene types obtained through feedback exceed the preset time threshold value, the scene types of the downloaded scene material are ensured to be real, the correction of the scene types of the downloaded scene material is realized, the second random forest model is updated to be a third random forest model at present by training, the process of the embodiment is continuously repeated, the long-term iterative update of the random forest model is realized, and the classification effect of the third random forest model is continuously improved.
In summary, in the material classification method, supervised training is performed on a first random forest model based on a feature vector and a labeled scene category of historical scene materials, for the scene materials to be classified, a labeled scene category is given to the scene materials to be classified in a clustering or similarity mode, then the first random forest model is updated to be a second random forest model based on the scene materials to be classified and the corresponding labeled scene category in a supervised mode, iterative updating of the random forest model is achieved, the classification effect of the second random forest model is improved, and finally correction on the first labeled scene category is achieved by combining a first output of the first random forest model and a second output of the second random forest model; and realizing the classification of the scene materials to be classified by combining the second labeling scene type and the second output of the second random forest model. The method and the device can accurately classify the scene materials to be classified, can correct the labeled scene types of the historical scene materials, and solve the problem of artificially and wrongly labeling the scene types.
It is emphasized that to further ensure privacy and security of the random forest model, the random forest model may be stored in the nodes of the blockchain.
Fig. 2 is a structural diagram of a material classifying apparatus according to a second embodiment of the present invention.
In some embodiments, the material classifying apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the material classifying apparatus 20 can be stored in the memory of the computer device and executed by at least one processor to perform the function of material classification (described in detail in fig. 1).
In this embodiment, the material classifying device 20 may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: a first extraction module 201, a model training module 202, a second extraction module 203, a model update module 204, a category correction module 205, a category calculation module 206, a link display module 207, and a category feedback module 208. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The first extraction module 201 is configured to obtain a first labeled scene category of a plurality of historical scene materials, and extract a first feature vector of the plurality of historical scene materials.
The historical scene materials are materials which are used by the held marketing campaign and uploaded to the campaign material platform, and each historical scene material is a material description text.
A first annotation scene category for each historical scene material in the campaign material platform can be annotated using an annotation tool, where the first annotation scene category is used to identify which type of marketing campaign scene the corresponding historical scene material belongs to, e.g., a promotional campaign scene, mine new user campaign scenes, feed back old user campaign scenes, and the like.
The computer equipment obtains historical scene materials marked with first marking scene categories in the active material platform, and extracts first feature vectors of the historical scene materials, so that a first random forest model is trained based on the first marking scene categories and the first feature vectors, scene materials uploaded to the active material platform subsequently are pre-marked, and manual marking of the scene materials is avoided.
In an alternative embodiment, the extracting the first feature vector of the plurality of historical scene stories by the first extracting module 201 includes:
performing word segmentation processing on each historical scene material to obtain a plurality of words;
extracting a word vector of each participle by using word2 vector;
generating a first feature vector based on word vectors of the plurality of participles of each of the historical scene stories.
The computer device may employ a tie segmentation tool to perform a segmentation process on each historical scene story to segment each historical scene story into a plurality of segments.
Because the multiple participles comprise nonsense words such as stop words and the like, the nonsense words are filtered, then word vectors of each participle are extracted by using word2vector, and the dimensionality of each word vector extracted by using the word2vector is the same.
The quantity of the participles in different historical scene materials is different, in order to avoid the difference of dimensionality of the generated first characteristic vector, elements which belong to the same dimensionality in the word vector of each participle in each historical scene material are added, the dimensionality of the obtained first characteristic vector is the same as that of the word vector, the dimensionality of the first characteristic vector of each historical scene material is kept consistent, the random forest model is convenient to train subsequently, and the random forest model can be rapidly converged.
For example, assuming that 3 participles are included in the first historical scene material, wherein a word vector of the participle a1 is (a11, a12, a13), a word vector of the participle a2 is (a21, a22, a23), and a word vector of the participle A3 is (a31, a32, a33), a first feature vector generated based on the word vectors of the 3 participles of the first historical scene material is ((a11+ a21+ a31)/3, (a12+ a22+ a32)/3, and (a13+ a23+ a 33)/3).
Assuming that the second historical scene material includes 2 participles, wherein the word vector of the participle B1 is (B11, B12, B13), and the word vector of the participle B2 is (B21, B22, B23), the first feature vector generated based on the word vectors of the 2 participles of the second historical scene material is ((B11+ B21)/3, (B12+ B22)/3, (B13+ B23)/3).
The model training module 202 is configured to train a first random forest model based on the plurality of first labeled scene categories and the plurality of first feature vectors.
And taking each first labeling scene category and the corresponding first feature vector as a data pair, taking a plurality of data pairs as a data set, and training a first random forest model based on the data set in a supervision mode.
The training process of the random forest model is prior art and is not elaborated.
The second extracting module 203 is configured to extract a second feature vector of the scene material to be classified, and identify a second labeled scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector.
The scene materials to be classified are materials which are newly uploaded to the active material platform and need to be classified.
The computer equipment performs word segmentation processing on the scene materials to be classified by using a crust word segmentation tool to obtain a plurality of segmented words, and extracts word vectors of each segmented word by using a word2vector after removing meaningless words; and generating a second feature vector based on the word vectors of the multiple word segments of the scene material to be classified.
The second feature vector has the same dimensions as the first feature vector.
Although the scene classification of the scene materials to be classified can be predicted to a certain extent by directly using the first random forest model, the classification accuracy of the first random forest model is not high due to the fact that the label of the first scene classification in the first random forest model is marked by mistake, and the accuracy of predicting the second scene classification of the scene materials to be classified is not high. And because the first random forest model is trained in a supervision mode, the scene materials to be classified have no scene categories, namely no labels, and therefore the first random forest model cannot be updated iteratively by directly using the classified scene materials, after the second feature vector of the scene materials to be classified is extracted, the second labeled scene category of the scene materials to be classified is identified according to the first feature vectors and the second feature vectors, and the scene categories of the scene materials to be classified are pre-labeled.
In an optional embodiment, the identifying, by the second extraction module 203, the second labeled scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vectors includes:
calculating a similarity between each of the first feature vectors and the second feature vector;
determining a first feature vector of a target corresponding to the maximum similarity;
and determining the first labeling scene category corresponding to the target first feature vector as a second labeling scene category of the scene materials to be classified.
The first feature vector is the feature representation of the historical scene materials, the second feature is the feature representation of the scene materials to be classified, and the similarity between the historical scene materials and the scene materials to be classified is calculated by calculating the similarity between the first feature vector and the second feature vector. The greater the similarity, the more similar the corresponding historical scene materials and the scene materials to be classified, and the more the historical scene materials and the scene materials to be classified belong to the same class. The smaller the similarity is, the more dissimilar the corresponding historical scene materials and the scene materials to be classified are, and the more dissimilar the historical scene materials and the scene materials to be classified are. And determining the first labeling scene category of the target first characteristic vector corresponding to the maximum similarity as the second labeling scene category of the scene materials to be classified.
In another optional embodiment, the identifying, by the second extraction module 203, the second labeled scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vectors includes:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster where the second feature vector is located;
calculating a target labeling scene type of the target feature vector cluster according to a first labeling scene type of the first feature vector in the target feature vector cluster;
and determining the target labeling scene category as a second labeling scene category of the scene materials to be classified.
The computer device may cluster the first feature vectors and the second feature vectors by using a K-means clustering algorithm, so as to divide the first feature vectors and the second feature vectors into a plurality of feature vector clusters, wherein each feature vector cluster includes one or more feature vectors.
The clustering realizes that the feature vectors with the same labeling scene category are grouped into the same category, and the feature vectors with different labeling scene categories are grouped into different categories.
Determining the target feature vector cluster where the second feature vector is located, that is, determining the second labeling scene category of the second feature vector according to the first labeling scene category of the first feature vector in the target feature vector cluster. And if the first target labeling scene categories corresponding to the first target feature vectors in the target feature vector cluster are all the same, the first target labeling scene category is the scene category in the target feature vector cluster, and the first target labeling scene category is the second target labeling scene category of the scene materials to be classified. If the target first labeling scene categories corresponding to the target first feature vectors in the target feature vector cluster are not all the same, calculating the number of each same first labeling scene category in the target first labeling scene categories, determining the first labeling scene category with the largest number as the scene category in the target feature vector cluster, and determining the first labeling scene category with the largest number as the second labeling scene category of the scene materials to be classified.
The model updating module 204 is configured to update the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model.
And taking the second labeling scene category and the second feature vector as a new data pair, adding the new data pair into the data set to obtain a new data set, and training the first random forest model based on the new data set in a supervision mode to obtain a second random forest model, so that iterative updating of the first random forest model is realized.
The category correction module 205 is configured to correct the first annotated scene category as a first target scene category according to a first output of the first random forest model and a second output of the second random forest model.
And outputting the scene type of each characteristic vector and the type probability of the scene type by the random forest model after training. And the output of the first random forest model is called a first output, and the output of the second random forest model is called a second output.
And correcting the first annotation scene type of the historical scene by combining a first output of the first random forest model and a second output of the second random forest model.
In an optional embodiment, the class correction module 205 correcting the first annotated scene class to the first target scene class according to the first output of the first random forest model and the second output of the second random forest model comprises:
acquiring a first to-be-confirmed scene category of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first scene class to be confirmed and the corresponding second class probability of the second scene class to be confirmed are both larger than a preset class probability threshold value;
when the first category probability and the second category probability are both greater than the preset category probability threshold, judging whether at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first labeled scene category;
and when at least two identical scene categories are selected from the first scene category to be confirmed, the second scene category to be confirmed and the first labeling scene category, correcting the first labeling scene category to be a first target scene category according to the identical scene categories.
And when the first category probability of the first scene category to be confirmed and the corresponding second category probability of the second scene category to be confirmed are both greater than a preset category probability threshold, indicating that the classification accuracy of the first random forest model and the classification accuracy of the second random forest model to the same historical scene material are both higher. In this case, if the first to-be-confirmed scene category, the second to-be-confirmed scene category, and the first annotation scene category are all the same, the first target scene category of the historical scene material is the first annotation scene category.
When the first category probability and the second category probability are both smaller than the preset category probability threshold, comparing the first category probability with the second category probability, when the first category probability is larger than the second category probability, correcting the first labeled scene category to be the first target scene category according to the first to-be-confirmed scene category, and when the second category probability is larger than the first category probability, correcting the first labeled scene category to be the first target scene category according to the second to-be-confirmed scene category.
The category calculating module 206 is configured to calculate a second target scene category of the scene material to be classified according to the second labeled scene category and a second output of the second random forest model.
And determining that the second labeled scene category of the scene material to be classified is possibly wrong by calculating the similarity or clustering, and calculating a second target scene category of the scene material to be classified by combining the second labeled scene category and the second output of the second random forest model.
And if the second labeling scene category is the same as the scene category of the scene material to be classified in the second output, the second target scene category of the scene material to be classified is the second labeling scene category.
If the second labeled scene category is different from the scene category of the scene material to be classified in the second output, when the category probability of the scene category of the scene material to be classified in the second output is greater than the preset probability threshold, the second target scene category of the scene material to be classified is the scene category of the scene material to be classified in the second output; and when the class probability of the scene class of the scene material to be classified in the second output is smaller than the preset probability threshold, the second target scene class of the scene material to be classified is the first target scene class corresponding to the second labeled scene class.
The link display module 207 is configured to respond to a scene material downloading request of a user, and extract a scene category in the scene material downloading request; inquiring a plurality of scene materials corresponding to the scene categories; generating a download link of each scene material; calculating the material quantity of each scene material; and sequencing and displaying the plurality of download links according to the material amount.
When a user needs to formulate scene materials for a certain marketing activity, the related scene materials can be downloaded in the activity material platform, so that the scene materials are prevented from being reformulated, and the activity efficiency of the marketing activity is improved.
The user can input the scene type in the user interface provided by the active material platform to trigger the scene material downloading request, the computer equipment extracts the scene type from the scene material downloading request, and a plurality of scene materials corresponding to the extracted scene type in the active material platform are inquired.
The scene materials of each scene category of the active material platform are stored in one folder, the scene materials in the same folder have different storage paths, and the download link of each scene material is generated based on the storage paths.
The sizes of different materials are different, the material quantity of the scene materials can be obtained by calculating the number of the word segments of the scene materials, the larger the material quantity is, the top end of the corresponding download link in the user interface is displayed, and the smaller the material quantity is, the bottom end of the corresponding download link in the user interface is displayed. After the plurality of download links are sequenced and displayed according to the material quantity, the material quantity can be displayed at the download links so as to prompt a user to download the download resources required to be consumed by downloading the scene materials at the download links, and therefore the purpose of saving the download resources of the user can be achieved.
The category feedback module 208 is configured to receive feedback of the user on the downloaded target scene material; analyzing the feedback to obtain the real scene category of the target scene material; updating the second random forest model based on the target scene materials and the corresponding real scene types to obtain a third random forest model, so that the scene types of the target scene materials output by the third random forest model are the same as the real scene types; and updating the scene types of other scene materials by using the third random forest model.
And a feedback input box can be further displayed in the user interface provided by the activity material platform, so that a user can feed back whether the scene type of the downloaded scene material is the correct scene type. If the scene category of the downloaded scene material is the correct scene category, yes may be entered in the feedback input box. If the scene category of the downloaded scene material is not the correct scene category, the actual scene category of the scene material may be entered in a feedback input box.
The computer device can record the real scene category of the target scene material, retrain the second random forest model based on the target scene material and the corresponding real scene category when the recording times of the real scene category exceed a preset time threshold, and update the second random forest model by taking the scene category of the target scene material output by the retrained second random forest model and the real scene category as training targets, so that the classification effect of the third random forest model is improved. And finally, acquiring a third output of the third random forest model, and acquiring the scene category of other scene materials in the third output as the latest scene category of other scene materials.
The other scene materials refer to scene materials except for the target scene material which participate in updating the second random forest model, and comprise the plurality of historical scene materials and materials which are subsequently uploaded to the active material platform. In the optional embodiment, the feedback of the user on the downloaded scene material is received, the second random forest model is updated only when the recording times of the real scene types obtained through feedback exceed the preset time threshold value, the scene types of the downloaded scene material are ensured to be real, the correction of the scene types of the downloaded scene material is realized, the second random forest model is updated to be a third random forest model at present by training, the process of the embodiment is continuously repeated, the long-term iterative update of the random forest model is realized, and the classification effect of the third random forest model is continuously improved.
In summary, the material classification device provided by the invention is used for training a first random forest model in a supervised manner based on the feature vectors and the labeled scene categories of historical scene materials, for the scene materials to be classified, firstly, a labeled scene category is given to the scene materials to be classified in a clustering or similarity manner, then, the first random forest model is updated to be a second random forest model based on the scene materials to be classified and the corresponding labeled scene category in a supervised manner, so that iterative updating of the random forest model is realized, the classification effect of the second random forest model is improved, and finally, the first labeled scene category is corrected by combining the first output of the first random forest model and the second output of the second random forest model; and realizing the classification of the scene materials to be classified by combining the second labeling scene type and the second output of the second random forest model. The method and the device can accurately classify the scene materials to be classified, can correct the labeled scene types of the historical scene materials, and solve the problem of artificially and wrongly labeling the scene types.
It is emphasized that to further ensure privacy and security of the random forest model, the random forest model may be stored in the nodes of the blockchain.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.
It should be noted that the computer device 3 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 has stored therein a computer program that, when executed by the at least one processor 32, performs all or part of the steps of the method of material classification as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the material classification method described in the embodiments of the present invention; or to implement all or part of the functions of the material sorting apparatus. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present invention can also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for classifying material, the method comprising:
acquiring first labeling scene categories of a plurality of historical scene materials, and extracting first feature vectors of the plurality of historical scene materials;
training a first random forest model based on the plurality of first labeling scene categories and the plurality of first feature vectors;
extracting a second feature vector of the scene materials to be classified, and identifying a second labeled scene category of the scene materials to be classified according to the plurality of first feature vectors and the second feature vector;
updating the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model;
correcting the first annotation scene type into a first target scene type according to the first output of the first random forest model and the second output of the second random forest model;
and calculating a second target scene type of the scene materials to be classified according to the second labeling scene type and a second output of the second random forest model.
2. The method of classifying material as in claim 1 further comprising:
receiving feedback of the user on the downloaded target scene materials;
analyzing the feedback to obtain the real scene category of the target scene material;
updating the second random forest model based on the target scene materials and the corresponding real scene types to obtain a third random forest model, so that the scene types of the target scene materials output by the third random forest model are the same as the real scene types;
and updating the scene types of other scene materials by using the third random forest model.
3. A method as claimed in claim 2 wherein said correcting the first annotated scene class to the first target scene class based on the first output of the first random forest model and the second output of the second random forest model comprises:
acquiring a first to-be-confirmed scene category of each historical scene material in the first output;
acquiring a second scene category to be confirmed of each historical scene material in the second output;
judging whether the first class probability of the first scene class to be confirmed and the corresponding second class probability of the second scene class to be confirmed are both larger than a preset class probability threshold value;
when the first category probability and the second category probability are both greater than the preset category probability threshold, judging whether at least two identical scene categories exist in the first scene category to be confirmed, the second scene category to be confirmed and the first labeled scene category;
and when at least two identical scene categories are selected from the first scene category to be confirmed, the second scene category to be confirmed and the first labeling scene category, correcting the first labeling scene category to be a first target scene category according to the identical scene categories.
4. The method of material classification as claimed in claim 3 wherein said identifying a second annotated scene category of the scene material to be classified based on the plurality of first feature vectors and the second feature vector comprises:
calculating a similarity between each of the first feature vectors and the second feature vector;
determining a first feature vector of a target corresponding to the maximum similarity;
and determining the first labeling scene category corresponding to the target first feature vector as a second labeling scene category of the scene materials to be classified.
5. The method of material classification as claimed in claim 3 wherein said identifying a second annotated scene category of the scene material to be classified based on the plurality of first feature vectors and the second feature vector comprises:
clustering the first feature vectors and the second feature vectors to obtain a plurality of feature vector clusters;
determining a target feature vector cluster where the second feature vector is located;
calculating a target labeling scene type of the target feature vector cluster according to a first labeling scene type of the first feature vector in the target feature vector cluster;
and determining the target labeling scene category as a second labeling scene category of the scene materials to be classified.
6. A method for classifying material as claimed in claim 4 or 5 wherein said method further comprises:
responding to a scene material downloading request of a user, and extracting a scene category in the scene material downloading request;
inquiring a plurality of scene materials corresponding to the scene categories;
generating a download link of each scene material;
calculating the material quantity of each scene material;
and sequencing and displaying the plurality of download links according to the material amount.
7. The method of material classification according to claim 4 or 5, characterized in that said extracting a first feature vector of said plurality of historical scene materials comprises:
performing word segmentation processing on each historical scene material to obtain a plurality of words;
extracting a word vector of each participle by using word2 vector;
generating a first feature vector based on word vectors of the plurality of participles of each of the historical scene stories.
8. A material sorting apparatus, the apparatus comprising:
the first extraction module is used for acquiring first labeled scene categories of a plurality of historical scene materials and extracting first feature vectors of the plurality of historical scene materials;
the model training module is used for training a first random forest model based on the first labeling scene categories and the first feature vectors;
the second extraction module is used for extracting a second feature vector of the scene material to be classified and identifying a second labeling scene category of the scene material to be classified according to the plurality of first feature vectors and the second feature vector;
the model updating module is used for updating the first random forest model based on the second scene type and the second feature vector to obtain a second random forest model;
the class correction module is used for correcting the first annotation scene class into a first target scene class according to the first output of the first random forest model and the second output of the second random forest model;
and the category calculation module is used for calculating a second target scene category of the scene materials to be classified according to the second labeled scene category and the second output of the second random forest model.
9. A computer device, characterized in that the computer device comprises a processor for implementing the material classification method according to any one of claims 1 to 7 when executing a computer program stored in a memory.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a material classifying method according to any one of claims 1 to 7.
CN202011559080.6A 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium Active CN112651439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011559080.6A CN112651439B (en) 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011559080.6A CN112651439B (en) 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112651439A true CN112651439A (en) 2021-04-13
CN112651439B CN112651439B (en) 2023-12-22

Family

ID=75362890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011559080.6A Active CN112651439B (en) 2020-12-25 2020-12-25 Material classification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112651439B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
CN108304490A (en) * 2018-01-08 2018-07-20 有米科技股份有限公司 Text based similarity determines method, apparatus and computer equipment
CN108734214A (en) * 2018-05-21 2018-11-02 Oppo广东移动通信有限公司 Image-recognizing method and device, electronic equipment, storage medium
CN109145965A (en) * 2018-08-02 2019-01-04 深圳辉煌耀强科技有限公司 Cell recognition method and device based on random forest disaggregated model
US20190026489A1 (en) * 2015-11-02 2019-01-24 LeapYear Technologies, Inc. Differentially private machine learning using a random forest classifier
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN111914881A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Random forest generation method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026489A1 (en) * 2015-11-02 2019-01-24 LeapYear Technologies, Inc. Differentially private machine learning using a random forest classifier
CN105574544A (en) * 2015-12-16 2016-05-11 平安科技(深圳)有限公司 Data processing method and device
CN108304490A (en) * 2018-01-08 2018-07-20 有米科技股份有限公司 Text based similarity determines method, apparatus and computer equipment
CN108734214A (en) * 2018-05-21 2018-11-02 Oppo广东移动通信有限公司 Image-recognizing method and device, electronic equipment, storage medium
CN109145965A (en) * 2018-08-02 2019-01-04 深圳辉煌耀强科技有限公司 Cell recognition method and device based on random forest disaggregated model
CN111914881A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Random forest generation method and device, electronic equipment and storage medium
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm

Also Published As

Publication number Publication date
CN112651439B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN112257774B (en) Target detection method, device, equipment and storage medium based on federal learning
CN112860989B (en) Course recommendation method and device, computer equipment and storage medium
CN112801718A (en) User behavior prediction method, device, equipment and medium
CN112288337B (en) Behavior recommendation method, behavior recommendation device, behavior recommendation equipment and behavior recommendation medium
CN113688923A (en) Intelligent order abnormity detection method and device, electronic equipment and storage medium
CN114511038A (en) False news detection method and device, electronic equipment and readable storage medium
CN113626607A (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN115081538A (en) Customer relationship identification method, device, equipment and medium based on machine learning
CN113806434A (en) Big data processing method, device, equipment and medium
CN113658002B (en) Transaction result generation method and device based on decision tree, electronic equipment and medium
CN111985545A (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN113570286B (en) Resource allocation method and device based on artificial intelligence, electronic equipment and medium
CN112651439B (en) Material classification method, device, computer equipment and storage medium
CN113420847B (en) Target object matching method based on artificial intelligence and related equipment
CN115146064A (en) Intention recognition model optimization method, device, equipment and storage medium
CN115221274A (en) Text emotion classification method and device, electronic equipment and storage medium
CN112215336B (en) Data labeling method, device, equipment and storage medium based on user behaviors
CN112434071B (en) Metadata blood relationship and influence analysis platform based on data map
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
CN114399368A (en) Commodity recommendation method and device based on artificial intelligence, electronic equipment and medium
CN113918296A (en) Model training task scheduling execution method and device, electronic equipment and storage medium
CN113469291A (en) Data processing method and device, electronic equipment and storage medium
CN112036641A (en) Retention prediction method, device, computer equipment and medium based on artificial intelligence
CN112699285B (en) Data classification method and device, computer equipment and storage medium
CN113554130B (en) Data labeling method and device based on artificial intelligence, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40041446

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant