CN110826101A - Privatization deployment data processing method for enterprise - Google Patents

Privatization deployment data processing method for enterprise Download PDF

Info

Publication number
CN110826101A
CN110826101A CN201911071132.2A CN201911071132A CN110826101A CN 110826101 A CN110826101 A CN 110826101A CN 201911071132 A CN201911071132 A CN 201911071132A CN 110826101 A CN110826101 A CN 110826101A
Authority
CN
China
Prior art keywords
private data
marking
task
marked
annotator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911071132.2A
Other languages
Chinese (zh)
Other versions
CN110826101B (en
Inventor
吴鑫坤
张子斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Data Hall Technology Co Ltd
Original Assignee
Anhui Data Hall Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Data Hall Technology Co Ltd filed Critical Anhui Data Hall Technology Co Ltd
Priority to CN201911071132.2A priority Critical patent/CN110826101B/en
Publication of CN110826101A publication Critical patent/CN110826101A/en
Application granted granted Critical
Publication of CN110826101B publication Critical patent/CN110826101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention discloses a privatization deployment data processing method for an enterprise, which is used for solving the problems that the privacy protection can not be carried out on enterprise private data and the efficiency is low and the data can not be reasonably distributed by adopting all manual marking in the prior art; the method comprises the following steps: s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server; s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud; the invention is characterized in that a labeling task module sends a task to be labeled to a labeling distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.

Description

Privatization deployment data processing method for enterprise
Technical Field
The invention relates to the technical field of data annotation processing, in particular to a privatized deployment data processing method for enterprises.
Background
The existing labeling platform can rapidly carry out customized cleaning and processing on data of a client, required training data are rapidly provided for artificial intelligence application, but part of enterprise data relates to national confidentiality or client privacy and cannot be put into a cloud for data processing, the data used by the artificial intelligence application in an enterprise may be sensitive non-public data containing enterprise confidentiality or personal privacy, and for safety, the data cannot be transferred to the Internet for processing; the customer self-establishes the marking platform, which is time-consuming and labor-consuming.
In order to meet the requirements of enterprise customized AI data processing, a privatized deployment data processing method for the enterprise is provided. In the traditional artificial intelligence data production process, manual marking and training are carried out; artificial intelligence re-feedback and error correction is lacking.
Disclosure of Invention
The invention aims to provide a privatized deployment data processing method for an enterprise; according to the invention, internal personnel of an enterprise send private data of the enterprise to an enterprise private cloud through an intranet, so that privacy protection of the enterprise data is improved, tasks to be annotated are annotated through an intelligent tool module based on an incremental data-assisted annotation technology of a human-in-loop, and the intelligent tool module sends the tasks to be annotated which cannot be annotated to an annotation distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
The technical problem to be solved by the invention is as follows:
(1) how to carry out intelligent marking by an intelligent tool through utilizing an intranet to transmit and process the private data, and carrying out sequencing calculation on the private data which cannot be marked, and reasonably distributing the private data to corresponding markers for marking; the problems that privacy protection cannot be performed on enterprise private data, manual marking is adopted completely, efficiency is low, and data cannot be distributed reasonably in the prior art are solved;
the purpose of the invention can be realized by the following technical scheme: the privatized deployment data processing method for the enterprise comprises the following steps:
s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server;
s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud;
s3: the method comprises the steps that a marking task module is used for creating a task for private data, the created binding data of a task to be marked are sent to an intelligent tool module for processing, the intelligent tool module marks the task to be marked based on an incremental data auxiliary marking technology of a human-in-loop, and the intelligent tool module sends the task to be marked which cannot be marked to a marking distribution module;
s4: the label distribution module distributes the private data to the corresponding label operator computer terminal, different tasks to be labeled are configured in a distinguishing way through the template configuration module, and the label operator labels the tasks to be labeled through a labeling tool on the computer terminal;
s5: and the annotator sends the annotated task to be annotated to the intelligent tool module, and the annotation result is exported to the local through the result export module.
Preferably, the user roles described in S1 include intra-enterprise personnel, administrators, and general personnel; the system comprises a plurality of managers, a data management system and a task management system, wherein the managers comprise an authorization manager and an organization manager, and the authorization manager is used for managing the system and comprises authorization of user roles, data management, user management, project management and task management; an organization administrator performs overall management on personnel, projects, tasks and data of an organization; the ordinary personnel comprise a annotator, a quality inspector and an inspector; the annotator is used for processing and annotating the annotation data; the quality inspector is used for carrying out quality inspection on the marked data; and (4) the acceptance personnel accepts the marked data.
Preferably, the labeling task module in S3 is configured to acquire, by the server, private data in the enterprise private cloud and a level corresponding to the private data to create a task, and mark the private data to be labeled as a task to be labeled; after the task is created, data need to be bound to the task to be marked, the task binding data support batch binding and index binding under a data set, and after the data are bound, the task to be marked is sent to a marking distribution module by a marking task module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal, and the specific distribution steps are as follows:
the method comprises the following steps: setting the task to be marked as Dji, wherein j is 1, 2, 3 and 4; 1, 1 … … n; d1i, D2i, D3i and D4i are sequentially represented as voice, pictures, video and text; setting the task level to be marked as GDji(ii) a The size of the file corresponding to the task to be marked is KDji
Step two: setting integral values corresponding to tasks to be marked as Cj, wherein j is 1, 2, 3 and 4; and C4> C2> C3> C1;
step three: using formulas
Figure BDA0002260965940000031
Obtaining the sequencing value P of the task to be markedDji(ii) a Wherein lambda is a correction factor and takes a value of 1.2; v1, v2 and v3 are all preset fixed values of proportionality coefficients;
step four: classifying the tasks to be annotated according to the sequence values, setting classification intervals as A1, A2, A3 and A4, and sequentially reducing the value intervals of A1, A2, A3 and A4; when the task to be marked is in the classification interval A1, marking the task to be marked as first private data; when the task to be marked is in the classification interval A2, marking the task to be marked as second private data; when the task to be marked is in the classification interval A3, marking the task to be marked as third private data; when the task to be marked is in the classification interval A4, marking the task to be marked as fourth private data;
step five: counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set;
step six: dividing the annotators into a first annotator, a second annotator, a third annotator and a fourth annotator according to the annotation authority values of the annotators; the first annotator is used for annotating the first private data, and the second annotator is used for annotating the second private data; the third annotator is used for annotating the third private data, and the fourth annotator is used for annotating the fourth private data;
step seven: counting all people of the first annotator as R1, and sequencing the first annotator according to the annotation weight value from large to small; counting the quantity of the first private data in the first private data set and recording the quantity as R2; obtaining the marking number R3 of the first marking member by using a formula R3 ═ R2/R1, and when the marking number R3 cannot be divided completely, directly adding a numerical value one to the quotient by taking the value of R3 as a value;
step eight: sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; similarly, the second private data set, the third private data set and the fourth private data set are distributed to the computer terminals of the corresponding annotators according to the above steps; and the first annotator, the second annotator, the third annotator and the fourth annotator annotate the corresponding first private data, second private data, third private data and fourth private data through the annotation tool.
Preferably, the labeling tools in S4 include an image labeling tool, a voice labeling tool, a text labeling tool, and a video labeling tool; the image class labeling tool comprises target detection, picture classification, instance segmentation, semantic segmentation and face segmentation; the voice labeling tool comprises a single paragraph, multiple paragraphs, voice playing speed regulation, voice waveform scaling and spectrogram switching; the text labeling tool comprises entity labeling, intention labeling and word segmentation labeling; the video labeling tool comprises picture labeling after frame extraction, marking of main body attributes and track tracking; the first annotator, the second annotator, the third annotator and the fourth annotator send the marked private data to a computer terminal of a quality inspector for quality inspection, and when the quality inspector performs spot check on the marked private data and the spot check on the private data is qualified, the quality inspector sends the private marked private data to a computer terminal of an inspector; when the private data of the spot check is unqualified, sending the private data to a corresponding annotator computer terminal for re-annotation, and simultaneously increasing the total number of times of annotation errors of the annotator by one time; and the acceptance checker sends the accepted private data to the server for storage.
Preferably, the template configuration module in S4 is configured to perform differentiated configuration on different tasks to be labeled, and perform attribute allocation on the different tasks to be labeled through the frame labeling template to configure different labeling tools; the frame mark injection molding plate comprises a human face frame marking tool and an automobile frame marking tool.
Preferably, the labeling permission value in the sixth step is calculated by a permission calculation module, and the specific calculation steps are as follows:
SS 1: setting a label member as Wi, wherein i is 1, … … and n; the annotator is an engineer or an enterprise internal staff for marking the interior of the organization; setting the quantity of the private labeling data of the labeling personnel as MWi(ii) a The total number of times of marking errors of the marker is marked as CWi
SS 2: using the formula QWi=MWi*Zk1-CWiZk2 obtaining the marking authority value Q of the markerWi(ii) a Wherein Zk1 and Zk2 are preset proportionality coefficients; k is 1, 2; z11 and Z12 represent a preset proportionality coefficient for marking the quantity of private data and a coefficient for marking the total number of errors by an internal marking engineer of the organization; z21 and Z22 represent a preset proportion coefficient for marking the quantity of private data and a coefficient for marking the total number of errors for personnel in the enterprise;
SS 3: setting a first threshold, a second threshold and a third threshold from large to small in sequence; when the marking authority value of the marker is greater than or equal to a first threshold value, marking the marker as a first marker; when the marking authority value of the marker is smaller than the first threshold value and larger than or equal to the second threshold value, the marker is marked as a second marker; when the marking authority value of the marker is smaller than the second threshold value and larger than or equal to a third threshold value, the marker is marked as a third marker; and when the marking authority value of the marker is smaller than the fourth threshold value, marking the marker as a fourth marker.
Preferably, the result export module is used for exporting the annotation result of the task to be annotated to the local on line by the user, and the export includes manual export or export through an openAPI.
The invention has the beneficial effects that:
(1) the method comprises the steps that internal personnel of an enterprise send private data of the enterprise to an enterprise private cloud through an intranet, private protection of the enterprise data is improved, tasks to be marked are marked through an intelligent tool module based on an incremental data auxiliary marking technology of a person in a loop, and the intelligent tool module sends the tasks to be marked which cannot be marked to a marking distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; the marking task module is used for acquiring private data in the enterprise private cloud and the grade corresponding to the private data through the server to create a task, and the marking task module sends the task to be marked to the marking distribution module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal; obtaining the sequencing value of the task to be marked by using a formula; classifying the tasks to be marked according to the sequencing values, and counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set; sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
Drawings
The invention will be further described with reference to the accompanying drawings.
FIG. 1 is a functional block diagram of a method of privatized deployment data processing for an enterprise of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a method for processing privatized deployment data of an enterprise, the method includes the following steps:
s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server; the user roles comprise internal personnel, managers and common personnel of the enterprise; the system comprises a plurality of managers, a data management system and a task management system, wherein the managers comprise an authorization manager and an organization manager, and the authorization manager is used for managing the system and comprises authorization of user roles, data management, user management, project management and task management; an organization administrator performs overall management on personnel, projects, tasks and data of an organization; the ordinary personnel comprise a annotator, a quality inspector and an inspector; the annotator is used for processing and annotating the annotation data; the quality inspector is used for carrying out quality inspection on the marked data; the inspection and acceptance personnel inspect and accept the marked data;
s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud;
s3: the method comprises the steps that a labeling task module is used for task creation of private data, created task binding data to be labeled are sent to an intelligent tool module for processing, the intelligent tool module labels the task to be labeled based on an incremental data auxiliary labeling technology of a human-in-loop, and the intelligent tool module processes data acquisition, data processing and labeling; the data acquisition comprises web crawler, camera acquisition, microphone array acquisition, camera/mobile phone acquisition and the like; the data processing comprises transcoding, segmentation, frame extraction, desensitization, format standardization, data combination and format conversion; the annotations include voice, image, video, text, and 3D point clouds; the intelligent tool module sends the tasks to be marked which cannot be marked to the marking distribution module; the marking task module is used for acquiring private data in the enterprise private cloud and the corresponding grade of the private data through the server to create a task, and marking the private data to be marked as a task to be marked; after the task is created, data need to be bound to the task to be marked, the task binding data support batch binding and index binding under a data set, and after the data are bound, the task to be marked is sent to a marking distribution module by a marking task module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal, and the specific distribution steps are as follows:
the method comprises the following steps: setting the task to be marked as Dji, wherein j is 1, 2, 3 and 4; 1, 1 … … n; d1i, D2i, D3i and D4i are sequentially represented as voice, pictures, video and text; setting the task level to be marked as GDji(ii) a The size of the file corresponding to the task to be marked is KDji
Step two: setting integral values corresponding to tasks to be marked as Cj, wherein j is 1, 2, 3 and 4; and C4> C2> C3> C1;
step three: using formulas
Figure BDA0002260965940000081
Obtaining the sequencing value P of the task to be markedDji(ii) a Wherein lambda is a correction factor and takes a value of 1.2; v1, v2 and v3 are all preset fixed values of proportionality coefficients;
step four: classifying the tasks to be annotated according to the sequence values, setting classification intervals as A1, A2, A3 and A4, and sequentially reducing the value intervals of A1, A2, A3 and A4; when the task to be marked is in the classification interval A1, marking the task to be marked as first private data; when the task to be marked is in the classification interval A2, marking the task to be marked as second private data; when the task to be marked is in the classification interval A3, marking the task to be marked as third private data; when the task to be marked is in the classification interval A4, marking the task to be marked as fourth private data;
step five: counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set;
step six: dividing the annotators into a first annotator, a second annotator, a third annotator and a fourth annotator according to the annotation authority values of the annotators; the first annotator is used for annotating the first private data, and the second annotator is used for annotating the second private data; the third annotator is used for annotating the third private data, and the fourth annotator is used for annotating the fourth private data; the labeling authority value is calculated through an authority calculation module, and the specific calculation steps are as follows:
SS 1: setting a label member as Wi, wherein i is 1, … … and n; the annotator is an engineer or an enterprise internal staff for marking the interior of the organization; setting the quantity of the private labeling data of the labeling personnel as MWi(ii) a The total number of times of marking errors of the marker is marked as CWi
SS 2: using the formula QWi=MWi*Zk1-CWiZk2 obtaining the marking authority value Q of the markerWi(ii) a Wherein Zk1 and Zk2 are preset proportionality coefficients; k is 1, 2; z11 and Z12 represent a preset proportionality coefficient for marking the quantity of private data and a coefficient for marking the total number of errors by an internal marking engineer of the organization; z21 and Z22 represent a preset proportion coefficient for marking the quantity of private data and a coefficient for marking the total number of errors for personnel in the enterprise;
SS 3: setting a first threshold, a second threshold and a third threshold from large to small in sequence; when the marking authority value of the marker is greater than or equal to a first threshold value, marking the marker as a first marker; when the marking authority value of the marker is smaller than the first threshold value and larger than or equal to the second threshold value, the marker is marked as a second marker; when the marking authority value of the marker is smaller than the second threshold value and larger than or equal to a third threshold value, the marker is marked as a third marker; when the marking authority value of the marker is smaller than a fourth threshold value, the marker is marked as a fourth marker;
step seven: counting all people of the first annotator as R1, and sequencing the first annotator according to the annotation weight value from large to small; counting the quantity of the first private data in the first private data set and recording the quantity as R2; obtaining the marking number R3 of the first marking member by using a formula R3 ═ R2/R1, and when the marking number R3 cannot be divided completely, directly adding a numerical value one to the quotient by taking the value of R3 as a value;
step eight: sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; similarly, the second private data set, the third private data set and the fourth private data set are distributed to the computer terminals of the corresponding annotators according to the above steps; the first annotator, the second annotator, the third annotator and the fourth annotator are used for annotating the corresponding first private data, second private data, third private data and fourth private data through the annotation tool; the template configuration module is used for performing differentiated configuration on different tasks to be marked and performing attribute allocation on the different tasks to be marked through the frame marking template to configure different marking tools; the frame mark injection molding plate comprises a human face frame marking tool and an automobile frame marking tool; configuring different tools by respectively configuring attributes of different labels through a frame labeling template; for example: face frame marking tool: setting attributes such as gender (male and female), category (infant, adult and old), skin color (yellow, white and black); automobile frame marking tool: attributes such as color (blue, red and white), category (truck, bus, off-road vehicle and car) and the like can be set; the marking tool can be divided into a configurable template and a customized template according to whether the user-defined label configuration is supported or not; the configurable template comprises a voice labeling template and a picture labeling template, and mainly comprises the following steps: a single-paragraph voice template, a multi-paragraph voice template, a point labeling template, a rectangular frame labeling template and a polygon labeling template; under the condition that the configurable template cannot meet the labeling requirement, providing a customized template for the specific requirements of the specific labeling field of an enterprise; the customized template provided comprises: a multi-section speech annotation template and a semantic understanding text annotation template; the template customization work of voice, text, image and video is supported;
s4: the label distribution module distributes the private data to the corresponding label operator computer terminal, different tasks to be labeled are configured in a distinguishing way through the template configuration module, and the label operator labels the tasks to be labeled through a labeling tool on the computer terminal; the marking tools comprise an image marking tool, a voice marking tool, a text marking tool and a video marking tool; the image class labeling tool comprises target detection, picture classification, instance segmentation, semantic segmentation and face segmentation; the voice labeling tool comprises a single paragraph, multiple paragraphs, voice playing speed regulation, voice waveform scaling and spectrogram switching; the text labeling tool comprises entity labeling, intention labeling and word segmentation labeling; the video labeling tool comprises picture labeling after frame extraction, marking of main body attributes and track tracking; the first annotator, the second annotator, the third annotator and the fourth annotator send the marked private data to a computer terminal of a quality inspector for quality inspection, and when the quality inspector performs spot check on the marked private data and the spot check on the private data is qualified, the quality inspector sends the private marked private data to a computer terminal of an inspector; when the private data of the spot check is unqualified, sending the private data to a corresponding annotator computer terminal for re-annotation, and simultaneously increasing the total number of times of annotation errors of the annotator by one time; the acceptance checker sends the private data of the acceptance to the server for storage;
s5: the annotator sends the marked tasks to be annotated to the intelligent tool module, and the annotation result is exported to the local through the result export module; and the result exporting module is used for exporting the marking result of the task to be marked to the local on line by the user, and the exporting comprises manual exporting or exporting through an openAPI.
The working principle of the invention is as follows: the internal personnel of enterprise send enterprise's private data to enterprise's private cloud through the intranet, improve the privacy protection to enterprise's data, through intelligent tool module based on people returningThe incremental data auxiliary labeling technology of the road labels the tasks to be labeled, and the intelligent tool module sends the tasks to be labeled which cannot be labeled to the label distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; the marking task module is used for acquiring private data in the enterprise private cloud and the grade corresponding to the private data through the server to create a task, and the marking task module sends the task to be marked to the marking distribution module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal; using formulas
Figure BDA0002260965940000111
Obtaining the sequencing value P of the task to be markedDji(ii) a Classifying the tasks to be marked according to the sequencing values, and counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set; sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the invention as defined in the following claims.

Claims (7)

1. A privatized deployment data processing method for an enterprise, the processing method comprising the steps of:
s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server;
s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud;
s3: the method comprises the steps that a marking task module is used for creating a task for private data, the created binding data of a task to be marked are sent to an intelligent tool module for processing, the intelligent tool module marks the task to be marked based on an incremental data auxiliary marking technology of a human-in-loop, and the intelligent tool module sends the task to be marked which cannot be marked to a marking distribution module;
s4: the label distribution module distributes the private data to the corresponding label operator computer terminal, different tasks to be labeled are configured in a distinguishing way through the template configuration module, and the label operator labels the tasks to be labeled through a labeling tool on the computer terminal;
s5: and the annotator sends the annotated task to be annotated to the intelligent tool module, and the annotation result is exported to the local through the result export module.
2. The privatized deployment data processing method for an enterprise according to claim 1, wherein the user roles in S1 include intra-enterprise personnel, administrator and general personnel; the system comprises a plurality of managers, a data management system and a task management system, wherein the managers comprise an authorization manager and an organization manager, and the authorization manager is used for managing the system and comprises authorization of user roles, data management, user management, project management and task management; an organization administrator performs overall management on personnel, projects, tasks and data of an organization; the ordinary personnel comprise a annotator, a quality inspector and an inspector; the annotator is used for processing and annotating the annotation data; the quality inspector is used for carrying out quality inspection on the marked data; and (4) the acceptance personnel accepts the marked data.
3. The method for processing the privatized deployment data of the enterprise according to claim 1, wherein the labeling task module in S3 is configured to obtain, through the server, the private data in the private cloud of the enterprise and the level corresponding to the private data to create a task, and mark the private data to be labeled as a task to be labeled; after the task is created, data need to be bound to the task to be marked, the task binding data support batch binding and index binding under a data set, and after the data are bound, the task to be marked is sent to a marking distribution module by a marking task module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal, and the specific distribution steps are as follows:
the method comprises the following steps: setting the task to be marked as Dji, wherein j is 1, 2, 3 and 4; 1, 1 … … n; d1i, D2i, D3i and D4i are sequentially represented as voice, pictures, video and text; setting the task level to be marked as GDji(ii) a The size of the file corresponding to the task to be marked is KDji
Step two: setting integral values corresponding to tasks to be marked as Cj, wherein j is 1, 2, 3 and 4; and C4> C2> C3> C1;
step three: using formulas
Figure FDA0002260965930000021
Obtaining the sequencing value P of the task to be markedDji(ii) a Wherein lambda is a correction factor and takes a value of 1.2; v1, v2 and v3 are all preset fixed values of proportionality coefficients;
step four: classifying the tasks to be annotated according to the sequence values, setting classification intervals as A1, A2, A3 and A4, and sequentially reducing the value intervals of A1, A2, A3 and A4; when the task to be marked is in the classification interval A1, marking the task to be marked as first private data; when the task to be marked is in the classification interval A2, marking the task to be marked as second private data; when the task to be marked is in the classification interval A3, marking the task to be marked as third private data; when the task to be marked is in the classification interval A4, marking the task to be marked as fourth private data;
step five: counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set;
step six: dividing the annotators into a first annotator, a second annotator, a third annotator and a fourth annotator according to the annotation authority values of the annotators; the first annotator is used for annotating the first private data, and the second annotator is used for annotating the second private data; the third annotator is used for annotating the third private data, and the fourth annotator is used for annotating the fourth private data;
step seven: counting all people of the first annotator as R1, and sequencing the first annotator according to the annotation weight value from large to small; counting the quantity of the first private data in the first private data set and recording the quantity as R2; obtaining the marking number R3 of the first marking member by using a formula R3 ═ R2/R1, and when the marking number R3 cannot be divided completely, directly adding a numerical value one to the quotient by taking the value of R3 as a value;
step eight: sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; similarly, the second private data set, the third private data set and the fourth private data set are distributed to the computer terminals of the corresponding annotators according to the above steps; and the first annotator, the second annotator, the third annotator and the fourth annotator annotate the corresponding first private data, second private data, third private data and fourth private data through the annotation tool.
4. The privatized deployment data processing method for enterprises according to claim 1, wherein the annotation tools in S4 include an image class annotation tool, a voice class annotation tool, a text class annotation tool and a video class annotation tool; the image class labeling tool comprises target detection, picture classification, instance segmentation, semantic segmentation and face segmentation; the voice labeling tool comprises a single paragraph, multiple paragraphs, voice playing speed regulation, voice waveform scaling and spectrogram switching; the text labeling tool comprises entity labeling, intention labeling and word segmentation labeling; the video labeling tool comprises picture labeling after frame extraction, marking of main body attributes and track tracking; the first annotator, the second annotator, the third annotator and the fourth annotator send the marked private data to a computer terminal of a quality inspector for quality inspection, and when the quality inspector performs spot check on the marked private data and the spot check on the private data is qualified, the quality inspector sends the private marked private data to a computer terminal of an inspector; when the private data of the spot check is unqualified, sending the private data to a corresponding annotator computer terminal for re-annotation, and simultaneously increasing the total number of times of annotation errors of the annotator by one time; and the acceptance checker sends the accepted private data to the server for storage.
5. The privatized deployment data processing method for an enterprise according to claim 1, wherein the template configuration module in S4 is configured to differentiate and configure different tasks to be annotated, and allocate attributes to the different tasks to be annotated through the box annotation template to configure different annotation tools; the frame mark injection molding plate comprises a human face frame marking tool and an automobile frame marking tool.
6. The method for processing the privatized deployment data of the enterprise according to claim 3, wherein the labeling permission value in the sixth step is calculated by a permission calculation module, and the specific calculation steps are as follows:
SS 1: setting a label member as Wi, wherein i is 1, … … and n; the annotator is an engineer or an enterprise internal staff for marking the interior of the organization; setting the quantity of the private labeling data of the labeling personnel as MWi(ii) a The total number of times of marking errors of the marker is marked as CWi
SS 2: using the formula QWi=MWi*Zk1-CWiZk2 obtaining the marking authority value Q of the markerWi(ii) a Wherein Zk1 and Zk2 are preset proportionality coefficients; k is 1, 2; z11 and Z12 represent a preset proportionality coefficient for marking the quantity of private data and a coefficient for marking the total number of errors by an internal marking engineer of the organization; z21, Z22 represent tagging private data for personnel within an enterpriseThe number of the preset proportional coefficients and the coefficient of the total number of times of the labeling errors;
SS 3: setting a first threshold, a second threshold and a third threshold from large to small in sequence; when the marking authority value of the marker is greater than or equal to a first threshold value, marking the marker as a first marker; when the marking authority value of the marker is smaller than the first threshold value and larger than or equal to the second threshold value, the marker is marked as a second marker; when the marking authority value of the marker is smaller than the second threshold value and larger than or equal to a third threshold value, the marker is marked as a third marker; and when the marking authority value of the marker is smaller than the fourth threshold value, marking the marker as a fourth marker.
7. The privatized deployment data processing method for enterprises according to claim 1, wherein the result export module is configured to export the annotation result of the task to be annotated to the local online by the user, and the exporting includes manual exporting or exporting through openAPI.
CN201911071132.2A 2019-11-05 2019-11-05 Privatization deployment data processing method for enterprise Active CN110826101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911071132.2A CN110826101B (en) 2019-11-05 2019-11-05 Privatization deployment data processing method for enterprise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911071132.2A CN110826101B (en) 2019-11-05 2019-11-05 Privatization deployment data processing method for enterprise

Publications (2)

Publication Number Publication Date
CN110826101A true CN110826101A (en) 2020-02-21
CN110826101B CN110826101B (en) 2021-01-05

Family

ID=69552467

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911071132.2A Active CN110826101B (en) 2019-11-05 2019-11-05 Privatization deployment data processing method for enterprise

Country Status (1)

Country Link
CN (1) CN110826101B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553161A (en) * 2020-04-28 2020-08-18 郑州大学 Entity and relation labeling system for medical texts
CN113591888A (en) * 2020-04-30 2021-11-02 上海禾赛科技有限公司 Point cloud data labeling network system and method for laser radar
CN114036495A (en) * 2022-01-11 2022-02-11 北京顶象技术有限公司 Method and device for updating privatized deployment verification code system
CN115248831A (en) * 2021-04-28 2022-10-28 马上消费金融股份有限公司 Labeling method, device, system, equipment and readable storage medium

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872343A (en) * 2009-04-24 2010-10-27 罗彤 Semi-supervised mass data hierarchy classification method
CN102571703A (en) * 2010-12-23 2012-07-11 鸿富锦精密工业(深圳)有限公司 Security control system and security control method for cloud data
CN102799684A (en) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 Video-audio file catalogue labeling, metadata storage indexing and searching method
CN103077236A (en) * 2013-01-09 2013-05-01 公安部第三研究所 System and method for realizing video knowledge acquisition and marking function of portable-type device
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN104917848A (en) * 2015-07-03 2015-09-16 成都怡云科技有限公司 Smart cloud platform for enterprises based on enterprise management and service
CN106411857A (en) * 2016-09-07 2017-02-15 河海大学 Private cloud GIS service access control method based on virtual isolation mechanism
US20170118279A1 (en) * 2015-10-23 2017-04-27 International Business Machines Corporation Synchronizing proprietary data in an external cloud with data in a private storage system
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN107622056A (en) * 2016-07-13 2018-01-23 百度在线网络技术(北京)有限公司 The generation method and device of training sample
CN108062341A (en) * 2016-11-08 2018-05-22 中国移动通信有限公司研究院 The automatic marking method and device of data
WO2019005239A1 (en) * 2017-06-27 2019-01-03 Western Digital Technologies, Inc. Hybrid data storage system with private storage cloud and public storage cloud
CN109165293A (en) * 2018-08-08 2019-01-08 上海宝尊电子商务有限公司 A kind of expert data mask method and program towards fashion world
CN109255044A (en) * 2018-08-31 2019-01-22 江苏大学 A kind of image intelligent mask method based on YOLOv3 deep learning network
CN109389275A (en) * 2017-08-08 2019-02-26 北京图森未来科技有限公司 A kind of image labeling method and device
CN109885648A (en) * 2018-12-29 2019-06-14 清华大学 Subtitle scene and speaker information automatic marking method and system based on drama
CN109992763A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 Language marks processing method, system, electronic equipment and computer-readable medium
US20190251182A1 (en) * 2018-02-12 2019-08-15 International Business Machines Corporation Extraction of information and smart annotation of relevant information within complex documents
US20190325318A1 (en) * 2018-04-18 2019-10-24 Ants Technology (Hk) Limited Method and system for learning in a trustless environment
CN110555084A (en) * 2019-08-26 2019-12-10 电子科技大学 remote supervision relation classification method based on PCNN and multi-layer attention

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872343A (en) * 2009-04-24 2010-10-27 罗彤 Semi-supervised mass data hierarchy classification method
CN102571703A (en) * 2010-12-23 2012-07-11 鸿富锦精密工业(深圳)有限公司 Security control system and security control method for cloud data
CN102799684A (en) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 Video-audio file catalogue labeling, metadata storage indexing and searching method
CN103077236A (en) * 2013-01-09 2013-05-01 公安部第三研究所 System and method for realizing video knowledge acquisition and marking function of portable-type device
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN104917848A (en) * 2015-07-03 2015-09-16 成都怡云科技有限公司 Smart cloud platform for enterprises based on enterprise management and service
US20170118279A1 (en) * 2015-10-23 2017-04-27 International Business Machines Corporation Synchronizing proprietary data in an external cloud with data in a private storage system
CN107153664A (en) * 2016-03-04 2017-09-12 同方知网(北京)技术有限公司 A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted
CN107622056A (en) * 2016-07-13 2018-01-23 百度在线网络技术(北京)有限公司 The generation method and device of training sample
CN106411857A (en) * 2016-09-07 2017-02-15 河海大学 Private cloud GIS service access control method based on virtual isolation mechanism
CN108062341A (en) * 2016-11-08 2018-05-22 中国移动通信有限公司研究院 The automatic marking method and device of data
WO2019005239A1 (en) * 2017-06-27 2019-01-03 Western Digital Technologies, Inc. Hybrid data storage system with private storage cloud and public storage cloud
CN109389275A (en) * 2017-08-08 2019-02-26 北京图森未来科技有限公司 A kind of image labeling method and device
CN109992763A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 Language marks processing method, system, electronic equipment and computer-readable medium
US20190251182A1 (en) * 2018-02-12 2019-08-15 International Business Machines Corporation Extraction of information and smart annotation of relevant information within complex documents
US20190325318A1 (en) * 2018-04-18 2019-10-24 Ants Technology (Hk) Limited Method and system for learning in a trustless environment
CN109165293A (en) * 2018-08-08 2019-01-08 上海宝尊电子商务有限公司 A kind of expert data mask method and program towards fashion world
CN109255044A (en) * 2018-08-31 2019-01-22 江苏大学 A kind of image intelligent mask method based on YOLOv3 deep learning network
CN109885648A (en) * 2018-12-29 2019-06-14 清华大学 Subtitle scene and speaker information automatic marking method and system based on drama
CN110555084A (en) * 2019-08-26 2019-12-10 电子科技大学 remote supervision relation classification method based on PCNN and multi-layer attention

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘鹏等: "《数据标注工程》", 1 June 2019 *
蔡莉等: ""数据标注研究综述"", 《软件学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553161A (en) * 2020-04-28 2020-08-18 郑州大学 Entity and relation labeling system for medical texts
CN111553161B (en) * 2020-04-28 2022-11-18 郑州大学 Entity and relation labeling system for medical texts
CN113591888A (en) * 2020-04-30 2021-11-02 上海禾赛科技有限公司 Point cloud data labeling network system and method for laser radar
CN115248831A (en) * 2021-04-28 2022-10-28 马上消费金融股份有限公司 Labeling method, device, system, equipment and readable storage medium
CN115248831B (en) * 2021-04-28 2024-03-15 马上消费金融股份有限公司 Labeling method, labeling device, labeling system, labeling equipment and readable storage medium
CN114036495A (en) * 2022-01-11 2022-02-11 北京顶象技术有限公司 Method and device for updating privatized deployment verification code system

Also Published As

Publication number Publication date
CN110826101B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN110826101B (en) Privatization deployment data processing method for enterprise
Jerry Fjermestad Group support systems: A descriptive evaluation of case and field studies
CN109492981A (en) The checking method and device of information
CN112333420B (en) Big data information security management system of smart campus
CN110059978B (en) Teacher evaluation system based on cloud computing auxiliary teaching evaluation
CN106327379A (en) Mobile smart campus system based on IOT (Internet of Things)
CN112036166A (en) Data labeling method and device, storage medium and computer equipment
CN115409658A (en) Enterprise training post course matching method, system and storage medium
CN113850537B (en) Multi-state mixed operation data management system
CN115221380A (en) Method, system and platform for managing urban construction files in batches
CN107256515A (en) The method of the financial integrated OCR identification softwares of cloud platform
CN106408470A (en) Teaching quality evaluation device
Folkerts et al. Analyzing sentiments of German job references
CN111862436A (en) Digital campus management system for primary and middle schools
CN102722790A (en) Human resource service system
CN109711799A (en) Guide the teaching software and its operation method of the standardization office of administration hilllock
CN108805394A (en) A kind of method and device of automatic management employee
CN108717674A (en) System of examining for the levels on line and method of examining for the levels
CN114742412A (en) Software technology service system and method
CN113642291A (en) Method, system, storage medium and terminal for constructing logical structure tree reported by listed companies
Ebrahimzadeh Dastjerdi et al. The effects of leader’s communication styles on tendency to change: A study on the effective inter-organizational conveyance and readiness for change
CN113610676B (en) Computer teaching system of giving lessons based on cloud platform
Jintalan et al. Organizational Culture and Job Satisfaction of Private School Teachers
CN116823555A (en) Analysis report writing practical training method and system
CN115730005A (en) Method and system for analyzing data standard difference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant