CN108170811B - Deep learning sample labeling method based on online education big data - Google Patents

Deep learning sample labeling method based on online education big data Download PDF

Info

Publication number
CN108170811B
CN108170811B CN201711469133.3A CN201711469133A CN108170811B CN 108170811 B CN108170811 B CN 108170811B CN 201711469133 A CN201711469133 A CN 201711469133A CN 108170811 B CN108170811 B CN 108170811B
Authority
CN
China
Prior art keywords
data
display
online education
labeling
displayed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711469133.3A
Other languages
Chinese (zh)
Other versions
CN108170811A (en
Inventor
熊利
陈靖
李晓清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dasheng Online Technology Co ltd
Original Assignee
Beijing Dasheng Online Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dasheng Online Technology Co ltd filed Critical Beijing Dasheng Online Technology Co ltd
Priority to CN201711469133.3A priority Critical patent/CN108170811B/en
Publication of CN108170811A publication Critical patent/CN108170811A/en
Application granted granted Critical
Publication of CN108170811B publication Critical patent/CN108170811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention relates to a deep learning sample labeling method based on online education big data, which comprises the steps of inputting M online education data to be labeled, displaying N data, and labeling the displayed N data; when M is larger than N display times, randomly selecting N data from the rest M-N display times, and when N is less than N, selecting all the rest data; when M is less than the display times N, randomly repeating N data from M, wherein the display times are set to be 1; after the sum of all the labeled data is greater than the multiple J of M, the labeled times of the data are greater than K, and the data are effective data which accord with the classification; and obtaining a classification library of different types of data. The deep learning sample marking method based on the online education big data improves the identification of online education data, improves the user satisfaction degree, improves the user experience, avoids the individual subjectivity, releases the dryness of single repeated labor, and is greatly convenient for all marked users.

Description

Deep learning sample labeling method based on online education big data
Technical Field
The invention relates to an internet online education system, in particular to a deep learning sample labeling method based on online education big data.
Background
The method for labeling the online education files (videos, audios, pictures and the like) mainly comprises the following steps: firstly, a single person discriminates a file to realize labeling, the method is purely physical labor, and the subjectivity of the single person is too strong, so that the labeling is inaccurate; secondly, the automatic labeling is still in the research stage, and the labeling of pictures cannot be applied, even if video and audio are not mentioned.
The prior art has the defects that if thousands of files are marked, the users need to repeatedly watch audio, video or pictures for screening, a large amount of manpower and material resources are consumed, and the prior art is very subjective. Therefore, the labeled samples are not accurate enough, and finally, the output of deep learning is inaccurate.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the deep learning sample labeling method based on the online education big data, which improves the identification of online education data, the satisfaction degree of users, the experience of users and the propagation rate of products.
The technical scheme adopted by the invention is as follows:
a deep learning sample labeling method based on online education big data,
inputting M online education data to be labeled, ensuring that the input data are of the same category (such as all audio or all video), and ensuring that M is a large data volume and generally needs more than 10 thousands, thereby ensuring that the best model for deep learning training is obtained.
Storing online education data to be marked into a database;
displaying N data to be marked through an online education data marking system; the N data are from random data which are not marked in the database;
marking the displayed N data;
when M is larger than the display times of N, randomly selecting N data from the rest display times of M-N, and when N is less than N, selecting all the rest data;
when M is less than N display times, randomly repeating N data from M again, wherein the display times are set to be 1;
after multiple times of labeling are carried out, namely the sum of all the labeled data is greater than the multiple J of M, classification can be carried out;
sorting the labeled times of the data in each classification from high to low;
the number of times of the data fetching is marked to be more than K, and the data is effective data according with the classification;
and obtaining a classification library of different types of data.
And marking one data in the displayed N data.
The number N of the data displayed each time is determined according to the display equipment. The definition of each presented data is ensured to be clearly displayed according to the size and resolution of the mainstream display.
The K value is greater than 60.
The J value is greater than 100.
The online education data includes audio, video, or pictures.
The step of completing one showing of all types of M data is as follows:
Step s100, inputting M online education data to be labeled, and ensuring that the input data are of the same category;
step s101, storing the data in a database, and setting data marking types, wherein the data types are A, B, C and … … K;
step s102, judging whether the data of type A is displayed completely, executing step s202 after the display is completed, otherwise executing step s 103;
step s103, setting a class A display counter n1 to zero;
step s104, randomly selecting N data from the M-N × N1 data to display;
step s105, labeling one data in the displayed N data;
step s106, showing counter n1 ═ n1+ 1;
step s107, determining whether M is greater than nn 1, if M is greater than nn 1, executing step s 104; if M is not greater than N × N1, go to step s 108;
step s108, selecting all the remaining data to display, labeling one of the data, and executing the step s 102;
step s202, judging whether the data of type A is displayed completely, executing step s302 after the display is completed, otherwise executing step s 203;
step s203, setting a B type display counter n2 to zero;
step s204, randomly selecting N data from the M-N × N2 data to display;
step s205, labeling one data of the displayed N data;
Step s206, showing counter n1 ═ n2+ 1;
step s207, determining whether M is greater than nn 2, if M is greater than nn 2, executing step s 104; if M is not greater than N × N2, performing step s 208;
step s208, selecting all the remaining data to display, labeling one of the data, and executing step s 202;
step s302, judging whether the data of type A is displayed completely, executing step s309 after the display is completed, otherwise executing step s 303;
step s303, setting a class C display counter n3 to zero;
step s304, randomly selecting N data from the M-N × N3 data to display;
step s305, labeling one data in the displayed N data;
step s306, showing the counter n1 ═ n3+ 1;
step s307, determining whether M is greater than N × N3, if M is greater than N × N3, executing step s 104; if M is not greater than N × N3, go to step s 308;
step s308, selecting all the remaining data to display, labeling one of the data, and executing step s 302;
and step s309, finishing one-time display of all types in the M data of the database, and ending.
Compared with the prior art, the invention has the beneficial effects that:
according to the deep learning sample labeling method based on the online education big data, accurate and effective input is provided for deep learning classification by labeling the online education big data samples, and the labor of checking and screening mass data by a single person is relieved; through accurate classification, a data entry independent of individuals is provided for a deep learning model, the identification of online education data is improved, the user satisfaction is improved, the user experience is improved, and the product propagation rate is improved; through the marking system of the online education big data, the whole company can be started, so that a large number of people on the Internet assist in marking the mass data, the subjectivity of individuals is avoided, the dryness of single repeated labor is released, and great convenience is brought to all marked users.
Drawings
FIG. 1 is a schematic flow chart of a deep learning sample labeling method based on online education big data according to the invention;
fig. 2 is a flow chart illustrating that all types in M data in the database of the deep learning sample labeling method based on big data of online education have been shown once.
Detailed Description
The invention is described in detail below with reference to the figures and examples:
as shown in the attached figures 1-2, a deep learning sample labeling method based on online education big data,
inputting M online education data to be labeled, ensuring that the input data are of the same category (such as all audio or all video), and ensuring that M is a large data volume and generally needs more than 10 thousands, thereby ensuring that the best model for deep learning training is obtained.
Storing online education data to be marked into a database;
displaying N data to be marked through an online education data marking system; the N data are from random data which are not marked in the database;
marking the displayed N data;
when M is larger than N display times, randomly selecting N data from the rest M-N display times, and when N is less than N, selecting all the rest data;
When M is less than N display times, randomly repeating N data from M again, wherein the display times are set to be 1;
after multiple times of labeling, namely the sum of all the labeled data is greater than the multiple J of M, classifying;
sorting the marked times of the data in each classification from high to low;
the marked times of the data are more than K and are effective data which accord with the classification;
and obtaining a classification library of different types of data.
And marking one data in the displayed N data.
The number N of the data displayed each time is determined according to the display equipment. The definition of each presented data is ensured based on the size and resolution of the mainstream display.
The K value is greater than 60.
The J value is greater than 100.
The online education data includes audio, video, or pictures.
The step of displaying M data in all types at one time is as follows:
step s100, inputting M online education data to be labeled, and ensuring that the input data are of the same category;
step s101, storing the data in a database, and setting data annotation types, wherein the data types are A, B, C and … … K; the examples were illustrated with A, B, C types.
Step s102, judging whether the data of type A is displayed completely, executing step s202 after the display is completed, otherwise executing step s 103;
Step s103, setting a class A display counter n1 to zero;
step s104, randomly selecting N data from the M-N × N1 data to display;
step s105, labeling one data in the displayed N data;
step s106, showing counter n1 ═ n1+ 1;
step s107, determining whether M is greater than nn 1, if M is greater than nn 1, executing step s 104; if M is not greater than N × N1, performing step s 108;
step s108, selecting all the remaining data to display, labeling one of the data, and executing step s 102;
step s202, judging whether the data of type A is displayed completely, executing step s302 after the display is completed, otherwise executing step s 203;
step s203, setting a B type display counter n2 to zero;
step s204, randomly selecting N data from the M-N × N2 data to display;
step s205, labeling one data of the displayed N data;
step s206, showing counter n1 ═ n2+ 1;
step s207, determining whether M is greater than N × N2, if M is greater than N × N2, executing step s 104; if M is not greater than N × N2, go to step s 208;
step s208, selecting all the remaining data to display, labeling one of the data, and executing step s 202;
Step s302, determining whether the data of type a is displayed completely, executing step s309 after the display is completed, otherwise executing step s 303;
step s303, setting a class C display counter n3 to zero;
step s304, randomly selecting N data from the M-N × N3 data to display;
step s305, labeling one data of the displayed N data;
step s306, show counter n1 ═ n3+ 1;
step s307, determining whether M is greater than nn 3, if M is greater than nn 3, executing step s 104; if M is not greater than N × N3, performing step s 308;
step s308, selecting all the remaining data to display, labeling one of the data, and executing the step s 302;
and step s309, completing one-time display of all types in the M data of the database, and ending.
According to the deep learning sample labeling method based on the online education big data, accurate and effective input is provided for deep learning classification by labeling the online education big data samples, and the labor of checking and screening mass data by a single person is relieved; through accurate classification, a data entry independent of individuals is provided for a deep learning model, the identification of online education data is improved, the user satisfaction is improved, the user experience is improved, and the product propagation rate is improved; through the marking system of the online education big data, the whole company can be started, so that a large number of people on the Internet assist in marking the mass data, the subjectivity of individuals is avoided, the dryness of single repeated labor is released, and great convenience is brought to all marked users.
The invention can be applied to a system for carrying out voice recognition, speech rate detection, expression recognition and face detection through deep learning.
The specific implementation process is as follows:
1. preparing data:
1000000 pictures are imported into a database of an online education big data annotation system through a web page. Setting classification labels as 5 classes such as happy, angry, sadness, surprise, disgust and the like, setting the number of each classification display wheel as 100, and effectively marking as 60; and the online education big data labeling system generates a labeling URL address.
2. First visit callout URL address:
the first time the user accesses the URL-tagged link address, the page will have 10 photos, which are randomly chosen from 1000000 photos, and the user is prompted to select a photo with the teacher's expression as happy (or angry, sad, surprised, disliked, i.e. one of the 5 categories set). And when the user selection is completed, entering the next URL address labeling access. If any one photo is not selected, the next visit is also entered.
3. Second visit callout URL address:
the user, visiting the labeled URL link address a second time, will have 10 photos presented on the page, which 10 photos are randomly chosen from 999990 (i.e. a certain classification of photos that the round has not labeled, 1000000-10), and the user is prompted to select a photo with the teacher's expression happy (or one of the 5 classifications set). And when the user selection is completed, entering the next URL address labeling access. If any one photo is not selected, the next visit is also entered.
4. Visit the annotation URL address for the nth time:
the nth visit by the user to the labeled URL link address causes the page to present 10 photos, randomly selected from 1000000-10 (N-1) (i.e., photos that have not been labeled in the round in a certain category), and prompts the user to select a photo with a revitalized (or revitalized, sad, surprised, and disliked, i.e., one of the 5 categories set) expression of the teacher. And when the user selection is completed, entering the next URL address labeling access.
5. One round of labeling for a certain classification is finished:
when the category of expression happy (or angry, sad, surprised, disliked, i.e. one of the set 5 categories) has been labeled for 1000000 pictures, the labeling of the category is finished.
6. End of a certain classification label:
when the classification with expression as happy (or angry, sad, surprised, disliked, that is, one of the set 5 classifications) has been subjected to 100 rounds of labeling (set values in the first-step data preparation), that is, the classification is considered to be completely labeled, classification data can be generated. And if the requirement of the number of classification rounds is not met, the classification data cannot be generated.
7. A classification generates valid labeling results:
When the classification marking of the expression as happy (or angry, sadness, surprise and dislike, namely one of the set 5 classifications) is finished, the pictures marked for more than 60 times (set values in the first step of data preparation) are effective data which accord with the classification, and a library of the classification is derived through an online education big data marking system.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the structure of the present invention in any way. Any simple modification, equivalent change and modification of the above embodiments according to the technical spirit of the present invention are within the technical scope of the present invention.

Claims (4)

1. A deep learning sample labeling method based on online education big data is characterized in that,
inputting M online education data to be labeled, and ensuring that the input data are of the same category;
storing online education data to be labeled into a database;
displaying N data to be marked through an online education data marking system; the N data are from random data which are not marked in the database;
marking the displayed N data;
when M is larger than N display times, randomly selecting N data from the rest M-N display times, and when N is less than N, selecting all the rest data;
When M is less than N display times, randomly repeating N data from M again, wherein the display times are set to be 1;
after the sum of all the labeled data is larger than the multiple J of M, classification can be carried out;
sorting the marked times of the data in each classification from high to low;
the number of times of the data fetching is marked to be more than K, and the data is effective data according with the classification;
obtaining classification libraries of different types of data;
marking one data in the displayed N data;
the number N of the data displayed each time is determined according to the display equipment;
the step of displaying M data in all types at one time is as follows:
step s100, inputting M online education data to be labeled, and ensuring that the input data are of the same category;
step s101, storing the data in a database, and setting data annotation types, wherein the data types are A, B, C and … … K;
step s102, judging whether the data of type A is displayed completely, executing step s202 after the display is completed, otherwise executing step s 103;
step s103, setting a class A display counter n1 to zero;
step s104, randomly selecting N data from the M-N × N1 data to display;
step s105, labeling one data in the displayed N data;
step s106, showing counter n1 ═ n1+ 1;
Step s107, determining whether M is greater than nn 1, if M is greater than nn 1, executing step s 104; if M is not greater than N × N1, performing step s 108;
step s108, selecting all the remaining data to display, labeling one of the data, and executing the step s 102;
step s202, judging whether the data of type A is displayed completely, executing step s302 after the display is completed, otherwise executing step s 203;
step s203, setting a B type display counter n2 to zero;
step s204, randomly selecting N data from the M-N × N2 data to display;
step s205, labeling one data of the displayed N data;
step s206, showing counter n1 ═ n2+ 1;
step s207, determining whether M is greater than nn 2, if M is greater than nn 2, executing step s 104; if M is not greater than N × N2, performing step s 208;
step s208, selecting all the remaining data to display, labeling one of the data, and executing step s 202;
step s302, judging whether the data of type A is displayed completely, executing step s309 after the display is completed, otherwise executing step s 303;
step s303, setting a class C display counter n3 to zero;
step s304, randomly selecting N data from the M-N × N3 data to display;
Step s305, labeling one data of the displayed N data;
step s306, show counter n1 ═ n3+ 1;
step s307, determining whether M is greater than nn 3, if M is greater than nn 3, executing step s 104; if M is not greater than N × N3, performing step s 308;
step s308, selecting all the remaining data to display, labeling one of the data, and executing step s 302;
and step s309, completing one-time display of all types in the M data of the database, and ending.
2. The deep learning sample labeling method based on the online education big data as claimed in claim 1, wherein: the K value is greater than 60.
3. The deep learning sample labeling method based on the online education big data as claimed in claim 1, wherein: the J value is greater than 100.
4. The deep learning sample labeling method based on the online education big data as claimed in claim 1, wherein: the online education data includes audio, video, or pictures.
CN201711469133.3A 2017-12-29 2017-12-29 Deep learning sample labeling method based on online education big data Active CN108170811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711469133.3A CN108170811B (en) 2017-12-29 2017-12-29 Deep learning sample labeling method based on online education big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711469133.3A CN108170811B (en) 2017-12-29 2017-12-29 Deep learning sample labeling method based on online education big data

Publications (2)

Publication Number Publication Date
CN108170811A CN108170811A (en) 2018-06-15
CN108170811B true CN108170811B (en) 2022-07-15

Family

ID=62519878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711469133.3A Active CN108170811B (en) 2017-12-29 2017-12-29 Deep learning sample labeling method based on online education big data

Country Status (1)

Country Link
CN (1) CN108170811B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783661A (en) * 2018-12-06 2019-05-21 安徽教育网络出版有限公司 A kind of deep learning sample mask method based on online education big data
CN110930997B (en) * 2019-12-10 2022-08-16 四川长虹电器股份有限公司 Method for labeling audio by using deep learning model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750347A (en) * 2012-06-08 2012-10-24 天津大学 Method for reordering image or video search
CN103886779A (en) * 2012-12-20 2014-06-25 北大方正集团有限公司 Method and system for processing data
CN106021406A (en) * 2016-05-12 2016-10-12 南京大学 Data-driven iterative image online annotation method
CN106886580A (en) * 2017-01-23 2017-06-23 北京工业大学 A kind of picture feeling polarities analysis method based on deep learning
CN107506736A (en) * 2017-08-29 2017-12-22 北京大生在线科技有限公司 Online education video fineness picture intercept method based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285121B2 (en) * 2007-10-07 2012-10-09 Fall Front Wireless Ny, Llc Digital network-based video tagging system
US20130334300A1 (en) * 2011-01-03 2013-12-19 Curt Evans Text-synchronized media utilization and manipulation based on an embedded barcode

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750347A (en) * 2012-06-08 2012-10-24 天津大学 Method for reordering image or video search
CN103886779A (en) * 2012-12-20 2014-06-25 北大方正集团有限公司 Method and system for processing data
CN106021406A (en) * 2016-05-12 2016-10-12 南京大学 Data-driven iterative image online annotation method
CN106886580A (en) * 2017-01-23 2017-06-23 北京工业大学 A kind of picture feeling polarities analysis method based on deep learning
CN107506736A (en) * 2017-08-29 2017-12-22 北京大生在线科技有限公司 Online education video fineness picture intercept method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images;Shadi Albarqouni; Christoph Baur; Felix Achilles; Vasileios Bela;《IEEE Transactions on Medical Imaging》;20160211;第35卷(第5期);第1313-1321页 *
基于深度学习与集成方法的情感分析研究;寇凯陈芳李云鹏王明明;《电脑编程技巧与维护》;20161115;第37、52页 *

Also Published As

Publication number Publication date
CN108170811A (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN110334759B (en) Comment-driven depth sequence recommendation method
CN112800097A (en) Special topic recommendation method and device based on deep interest network
CN110321291A (en) Test cases intelligent extraction system and method
CN109816438B (en) Information pushing method and device
CN110659311B (en) Topic pushing method and device, electronic equipment and storage medium
CN108280164A (en) A kind of short text filtering and sorting technique based on classification related words
CN108170811B (en) Deep learning sample labeling method based on online education big data
CN116415017B (en) Advertisement sensitive content auditing method and system based on artificial intelligence
CN106445908A (en) Text identification method and apparatus
CN109800309A (en) Classroom Discourse genre classification methods and device
CN112232933A (en) House source information recommendation method, device, equipment and readable storage medium
CN110674854B (en) Image classification model training method, image classification method, device and equipment
CN107291774A (en) Error sample recognition methods and device
CN107563394A (en) A kind of method and system of predicted pictures popularity
CN112995690B (en) Live content category identification method, device, electronic equipment and readable storage medium
CN117114745B (en) Method and device for predicting intent vehicle model
CN107480126B (en) Intelligent identification method for engineering material category
CN110765352B (en) User interest identification method and device
CN105915957A (en) Intelligent television playing content display method, device and system
CN108038191A (en) Automatic generation method, electronic equipment and the computer-readable storage medium of e-book problem
CN116703509A (en) Online shopping assistant construction method for live marketing commodity quality perception analysis
CN113312445B (en) Data processing method, model construction method, classification method and computing equipment
US11134045B2 (en) Message sorting system, message sorting method, and program
CN115423600A (en) Data screening method, device, medium and electronic equipment
CN112950261A (en) Method and system for determining user value

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant