CN109376259B - Label classification method based on big data analysis - Google Patents
Label classification method based on big data analysis Download PDFInfo
- Publication number
- CN109376259B CN109376259B CN201811514705.XA CN201811514705A CN109376259B CN 109376259 B CN109376259 B CN 109376259B CN 201811514705 A CN201811514705 A CN 201811514705A CN 109376259 B CN109376259 B CN 109376259B
- Authority
- CN
- China
- Prior art keywords
- label
- sequence
- new
- classification
- label sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a label classification method based on big data analysis, S11, identifying a label sequence and capturing a label sequence image; s12, carrying out depth recognition on the label sequence image, and recognizing bytes in the label sequence image; s13, segmenting the label sequence image in the step S12 according to bytes to obtain sub label images; s14, extracting the characteristics of the segmented label sequence images, extracting the digital characteristics in each sub-label image, and arranging the extracted digital characteristics according to the sequence from front to back to obtain a new label sequence combination; s15, matching the new label sequence combination obtained according to the digital features in the label images extracted in the step S14 with the label classification rules stored in the database, and performing label classification on the new label sequence combination; the invention has the advantages of accurate classification and quickness.
Description
Technical Field
The invention belongs to the technical field of label classification, and particularly relates to a label classification method based on big data analysis.
Background
In recent years, with rapid development of science and technology, big data are more and more attracted by people, the big data are used as a huge database based on internet science and technology to bring convenience to life of people, and people can accurately analyze sales modes, sales scales, market prospects and the like of products according to internet big data information, so that managers can make quick, efficient and accurate decisions conveniently.
Meanwhile, big data needs to be supported by a huge database, the database needs to be established before data analysis, and commodities are taken as an example and need to be accurately classified, so that the commodities sold by the commodity selling system can be accurately classified according to unique labels of the commodities in the office, so that analysts can quickly know the types of the sold commodities, and can accurately analyze the sold commodities according to daily sales behaviors to quickly analyze hot commodities. When the commodity is analyzed, the rapid and accurate classification of the labels of the commodity is of great importance, and the current commodity label classification method is not accurate enough, so that the commodity classification is not accurate enough in actual operation, and the subsequent commodity sales behavior classification is not accurate enough.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a label classification method based on big data analysis for quickly and accurately classifying labels.
The technical scheme of the invention is as follows:
a label classification method based on big data analysis comprises the following specific processes:
s11, identifying the tag sequence and capturing a tag sequence image;
s12, carrying out depth recognition on the label sequence image, and recognizing bytes in the label sequence image;
s13, segmenting the label sequence image in the step S12 according to bytes to obtain sub label images;
s14, extracting the characteristics of the segmented label sequence images, extracting the digital characteristics in each sub-label image, and arranging the extracted digital characteristics according to the sequence from front to back to obtain a new label sequence combination;
and S15, matching the new label sequence combination obtained according to the digital features in the label images extracted in the step S14 with the label classification rules stored in the database, and performing label classification on the new label sequence combination.
Further, in step S15, the MLAIM discretization algorithm is used to discretize the new tag sequence combinations and then perform step-by-step comparison until the new tag sequence combinations are finally subjected to fine classification.
Further, the label classification method based on big data analysis further comprises establishing a label classification number set, and the specific method comprises the following steps:
s31, setting a predetermined tag sequence set, Y ═ Y0、y1、y2、y3…ynWherein n is a positive integer greater than 10 and less than 15, and y0、y1、y2、y3…ynAll values of (A) are any one of 0 to 9;
s32, defining the category information of each numerical value in the set Y;
s33, classifying the label sequence set according to the category information defined in the step S32.
Further, after the new label sequence combination is dispersed, each digit of the dispersed label sequence is compared with the labels at corresponding positions in a preset label sequence set, and the numerical values of the new label sequence combination are classified one by one.
Further, a new tag sequence set M ═ M is defined0、m1、m2、m3…miWherein i is a positive integer greater than 10 and less than 15, and m0、m1、m2、m3…miAll values of (A) are any one of 0 to 9;
comparing the digital labels of the new label sequences after the discretization with the numerical values in the preset label sequence set one by one, classifying the successfully-compared digital labels into corresponding categories, and comparing the next digital label until the comparison of all the label sequences is completed.
Compared with the prior art, the invention has the beneficial effects that:
the invention combines and disperses new label sequences by an MLAIM discretization algorithm and then carries out accurate comparison, thereby improving the accuracy of label identification and classification; and in the process of comparing the new label sequence combination, the sequence combination in the new label sequence combination is compared with the corresponding sequence numerical value in the preset label sequence set one by one according to the sequence, so that the accuracy and the rapidness of label classification are effectively improved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A label classification method based on big data analysis comprises the following specific processes:
s11, identifying the tag sequence and capturing a tag sequence image;
s12, carrying out depth recognition on the label sequence image, and recognizing bytes in the label sequence image;
s13, segmenting the label sequence image in the step S12 according to bytes to obtain sub label images;
s14, extracting the characteristics of the segmented label sequence images, extracting the digital characteristics in each sub-label image, and arranging the extracted digital characteristics according to the sequence from front to back to obtain a new label sequence combination;
and S15, matching the new label sequence combination obtained according to the digital features in the label images extracted in the step S14 with the label classification rules stored in the database, and performing label classification on the new label sequence combination.
Further, in step S15, the MLAIM discretization algorithm is used to discretize the new tag sequence combinations and then perform step-by-step comparison until the new tag sequence combinations are finally subjected to fine classification.
Further, the label classification method based on big data analysis further comprises establishing a label classification number set, and the specific method comprises the following steps:
s31, setting a predetermined tag sequence set, Y ═ Y0、y1、y2、y3…ynWherein n is a positive integer greater than 10 and less than 15, and y0、y1、y2、y3…ynAll values of (A) are any one of 0 to 9;
s32, defining the category information of each numerical value in the set Y;
s33, classifying the label sequence set according to the category information defined in the step S32.
Further, after the new label sequence combination is dispersed, each digit of the dispersed label sequence is compared with the labels at corresponding positions in a preset label sequence set, and the numerical values of the new label sequence combination are classified one by one.
Further, a new tag sequence set M ═ M is defined0、m1、m2、m3…miWherein i is a positive integer greater than 10 and less than 15, and m0、m1、m2、m3…miAll values of (A) are any one of 0 to 9A plurality of;
comparing the digital labels of the new label sequences after the discretization with the numerical values in the preset label sequence set one by one, classifying the successfully-compared digital labels into corresponding categories, and comparing the next digital label until the comparison of all the label sequences is completed.
Specifically, when the label classification number set is established, the set Y is set to { Y ═ Y0、y1、y2、y3…ynAssigning values of each value in the data to define y0Maximum classes are assigned different values from 0 to 9, e.g. y0Take 0 for commodity, then pair y1Assigning a different value from 0 to 9 to a sub-class, e.g. y10 represents an article of daily use, y2And taking 0 to represent the washing article, and assigning different categories to each numerical value in the set Y when taking different numbers by analogy, so that the set Y represents different label categories finally.
When comparing the values in the set M and the set Y, using M in the set M0And Y in the set Y0Alignment, when aligned, e.g. m0Assigned a value of 3, m during the alignment0Comparing Y in the set Y0After the large class represented by the value 3 of (A), the set M is classified into y0Taking the category represented by 3, and then m1Other pairs of, e.g. m1Taking 2, comparing time to m1And Y in the set Y1Comparing to obtain m1Take 2 and Y in the set Y1After the category represented by 2, stopping y1By comparison of (2), the set M is classified into y0Take 3 and y1The category represented by 2 is taken, and M in the set M is similar2And Y in the set Y2M in the set M3And Y in the set Y3Performing comparison until M in the set M is finally comparediAnd Y in the set YnAnd finishing comparison, and finally performing label fine classification on the number series represented by the M set.
The invention combines and disperses new label sequences by an MLAIM discretization algorithm and then carries out accurate comparison, thereby improving the accuracy of label identification and classification; and in the process of comparing the new label sequence combination, the sequence combination in the new label sequence combination is compared with the corresponding sequence numerical value in the preset label sequence set one by one according to the sequence, so that the accuracy and the rapidness of label classification are effectively improved.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.
Claims (2)
1. A label classification method based on big data analysis is characterized by comprising the following specific processes:
s11, identifying the tag sequence and capturing a tag sequence image;
s12, carrying out depth recognition on the label sequence image, and recognizing bytes in the label sequence image;
s13, segmenting the label sequence image in the step S12 according to bytes to obtain sub label images;
s14, extracting the characteristics of the segmented label sequence images, extracting the digital characteristics in each sub-label image, and arranging the extracted digital characteristics according to the sequence from front to back to obtain a new label sequence combination;
s15, matching the new label sequence combination obtained according to the digital features in the label images extracted in the step S14 with the label classification rules stored in the database, and performing label classification on the new label sequence combination;
in step S15, the MLAIM discretization algorithm is used to discretize the new tag sequence combinations and then perform step-by-step comparison until the new tag sequence combinations are finally subjected to fine classification;
the method further comprises the following steps of establishing a label classification number set, wherein the specific method comprises the following steps:
s31, setting a predetermined label sequence setLet Y be { Y ═ Y0、y1、y2、y3…ynWherein n is a positive integer greater than 10 and less than 15, and y0、y1、y2、y3…ynAll values of (a) are any one of 0 to 9, y0、y1、y2、y3…ynThe categories of the Chinese characters are gradually increased from large categories to small categories;
s32, defining the category information of each numerical value in the set Y;
s33, classifying the label sequence set according to the category information defined in the step S32;
and after the new label sequences are combined and dispersed, comparing each digit of the dispersed label sequences with the labels at corresponding positions in a preset label sequence set, and classifying the numerical values of the new label sequence combinations one by one.
2. The big-data-analysis-based label classification method according to claim 1, wherein: defining a new tag sequence set M ═ M0、m1、m2、m3…miWherein i is a positive integer greater than 10 and less than 15, and m0、m1、m2、m3…miAll values of (A) are any one of 0 to 9;
comparing the digital labels of the new label sequences after the discretization with the numerical values in the preset label sequence set one by one, classifying the successfully-compared digital labels into corresponding categories, and comparing the next digital label until the comparison of all the label sequences is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811514705.XA CN109376259B (en) | 2018-12-10 | 2018-12-10 | Label classification method based on big data analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811514705.XA CN109376259B (en) | 2018-12-10 | 2018-12-10 | Label classification method based on big data analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109376259A CN109376259A (en) | 2019-02-22 |
CN109376259B true CN109376259B (en) | 2022-03-01 |
Family
ID=65373385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811514705.XA Active CN109376259B (en) | 2018-12-10 | 2018-12-10 | Label classification method based on big data analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109376259B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530405A (en) * | 2013-10-23 | 2014-01-22 | 天津大学 | Image retrieval method based on layered structure |
CN105740402A (en) * | 2016-01-28 | 2016-07-06 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring semantic labels of digital images |
CN108170849A (en) * | 2018-01-18 | 2018-06-15 | 厦门理工学院 | A kind of picture classification labeling method of zoned diffustion |
CN108829826A (en) * | 2018-06-14 | 2018-11-16 | 清华大学深圳研究生院 | A kind of image search method based on deep learning and semantic segmentation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7809722B2 (en) * | 2005-05-09 | 2010-10-05 | Like.Com | System and method for enabling search and retrieval from image files based on recognized information |
-
2018
- 2018-12-10 CN CN201811514705.XA patent/CN109376259B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530405A (en) * | 2013-10-23 | 2014-01-22 | 天津大学 | Image retrieval method based on layered structure |
CN105740402A (en) * | 2016-01-28 | 2016-07-06 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring semantic labels of digital images |
CN108170849A (en) * | 2018-01-18 | 2018-06-15 | 厦门理工学院 | A kind of picture classification labeling method of zoned diffustion |
CN108829826A (en) * | 2018-06-14 | 2018-11-16 | 清华大学深圳研究生院 | A kind of image search method based on deep learning and semantic segmentation |
Also Published As
Publication number | Publication date |
---|---|
CN109376259A (en) | 2019-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111104466B (en) | Method for quickly classifying massive database tables | |
CN116188475B (en) | Intelligent control method, system and medium for automatic optical detection of appearance defects | |
CN110991657A (en) | Abnormal sample detection method based on machine learning | |
CN112926045B (en) | Group control equipment identification method based on logistic regression model | |
CN110610193A (en) | Method and device for processing labeled data | |
CN112037222B (en) | Automatic updating method and system of neural network model | |
CN108537119A (en) | A kind of small sample video frequency identifying method | |
CN113449725B (en) | Object classification method, device, equipment and storage medium | |
CN108875727B (en) | The detection method and device of graph-text identification, storage medium, processor | |
CN109903053B (en) | Anti-fraud method for behavior recognition based on sensor data | |
CN111241502B (en) | Cross-device user identification method and device, electronic device and storage medium | |
CN116738551B (en) | Intelligent processing method for acquired data of BIM model | |
CN112395881A (en) | Material label construction method and device, readable storage medium and electronic equipment | |
CN112926621A (en) | Data labeling method and device, electronic equipment and storage medium | |
CN112069315A (en) | Method, device, server and storage medium for extracting text multidimensional information | |
CN109376259B (en) | Label classification method based on big data analysis | |
CN103984756B (en) | Semi-supervised probabilistic latent semantic analysis based software change log classification method | |
Jin et al. | A generative semi-supervised model for multi-view learning when some views are label-free | |
CN115296851B (en) | Network intrusion detection method based on mutual information and wolf lifting algorithm | |
CN116579351A (en) | Analysis method and device for user evaluation information | |
CN111814922B (en) | Video clip content matching method based on deep learning | |
CN104778478A (en) | Handwritten numeral identification method | |
CN109829713B (en) | Mobile payment mode identification method based on common drive of knowledge and data | |
Zhao et al. | Barcode character defect detection method based on Tesseract-OCR | |
CN114006986A (en) | Outbound call compliance early warning method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |