CN110826101A - Privatization deployment data processing method for enterprise - Google Patents
Privatization deployment data processing method for enterprise Download PDFInfo
- Publication number
- CN110826101A CN110826101A CN201911071132.2A CN201911071132A CN110826101A CN 110826101 A CN110826101 A CN 110826101A CN 201911071132 A CN201911071132 A CN 201911071132A CN 110826101 A CN110826101 A CN 110826101A
- Authority
- CN
- China
- Prior art keywords
- private data
- marking
- task
- marked
- annotator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Abstract
The invention discloses a privatization deployment data processing method for an enterprise, which is used for solving the problems that the privacy protection can not be carried out on enterprise private data and the efficiency is low and the data can not be reasonably distributed by adopting all manual marking in the prior art; the method comprises the following steps: s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server; s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud; the invention is characterized in that a labeling task module sends a task to be labeled to a labeling distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
Description
Technical Field
The invention relates to the technical field of data annotation processing, in particular to a privatized deployment data processing method for enterprises.
Background
The existing labeling platform can rapidly carry out customized cleaning and processing on data of a client, required training data are rapidly provided for artificial intelligence application, but part of enterprise data relates to national confidentiality or client privacy and cannot be put into a cloud for data processing, the data used by the artificial intelligence application in an enterprise may be sensitive non-public data containing enterprise confidentiality or personal privacy, and for safety, the data cannot be transferred to the Internet for processing; the customer self-establishes the marking platform, which is time-consuming and labor-consuming.
In order to meet the requirements of enterprise customized AI data processing, a privatized deployment data processing method for the enterprise is provided. In the traditional artificial intelligence data production process, manual marking and training are carried out; artificial intelligence re-feedback and error correction is lacking.
Disclosure of Invention
The invention aims to provide a privatized deployment data processing method for an enterprise; according to the invention, internal personnel of an enterprise send private data of the enterprise to an enterprise private cloud through an intranet, so that privacy protection of the enterprise data is improved, tasks to be annotated are annotated through an intelligent tool module based on an incremental data-assisted annotation technology of a human-in-loop, and the intelligent tool module sends the tasks to be annotated which cannot be annotated to an annotation distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
The technical problem to be solved by the invention is as follows:
(1) how to carry out intelligent marking by an intelligent tool through utilizing an intranet to transmit and process the private data, and carrying out sequencing calculation on the private data which cannot be marked, and reasonably distributing the private data to corresponding markers for marking; the problems that privacy protection cannot be performed on enterprise private data, manual marking is adopted completely, efficiency is low, and data cannot be distributed reasonably in the prior art are solved;
the purpose of the invention can be realized by the following technical scheme: the privatized deployment data processing method for the enterprise comprises the following steps:
s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server;
s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud;
s3: the method comprises the steps that a marking task module is used for creating a task for private data, the created binding data of a task to be marked are sent to an intelligent tool module for processing, the intelligent tool module marks the task to be marked based on an incremental data auxiliary marking technology of a human-in-loop, and the intelligent tool module sends the task to be marked which cannot be marked to a marking distribution module;
s4: the label distribution module distributes the private data to the corresponding label operator computer terminal, different tasks to be labeled are configured in a distinguishing way through the template configuration module, and the label operator labels the tasks to be labeled through a labeling tool on the computer terminal;
s5: and the annotator sends the annotated task to be annotated to the intelligent tool module, and the annotation result is exported to the local through the result export module.
Preferably, the user roles described in S1 include intra-enterprise personnel, administrators, and general personnel; the system comprises a plurality of managers, a data management system and a task management system, wherein the managers comprise an authorization manager and an organization manager, and the authorization manager is used for managing the system and comprises authorization of user roles, data management, user management, project management and task management; an organization administrator performs overall management on personnel, projects, tasks and data of an organization; the ordinary personnel comprise a annotator, a quality inspector and an inspector; the annotator is used for processing and annotating the annotation data; the quality inspector is used for carrying out quality inspection on the marked data; and (4) the acceptance personnel accepts the marked data.
Preferably, the labeling task module in S3 is configured to acquire, by the server, private data in the enterprise private cloud and a level corresponding to the private data to create a task, and mark the private data to be labeled as a task to be labeled; after the task is created, data need to be bound to the task to be marked, the task binding data support batch binding and index binding under a data set, and after the data are bound, the task to be marked is sent to a marking distribution module by a marking task module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal, and the specific distribution steps are as follows:
the method comprises the following steps: setting the task to be marked as Dji, wherein j is 1, 2, 3 and 4; 1, 1 … … n; d1i, D2i, D3i and D4i are sequentially represented as voice, pictures, video and text; setting the task level to be marked as GDji(ii) a The size of the file corresponding to the task to be marked is KDji;
Step two: setting integral values corresponding to tasks to be marked as Cj, wherein j is 1, 2, 3 and 4; and C4> C2> C3> C1;
step three: using formulasObtaining the sequencing value P of the task to be markedDji(ii) a Wherein lambda is a correction factor and takes a value of 1.2; v1, v2 and v3 are all preset fixed values of proportionality coefficients;
step four: classifying the tasks to be annotated according to the sequence values, setting classification intervals as A1, A2, A3 and A4, and sequentially reducing the value intervals of A1, A2, A3 and A4; when the task to be marked is in the classification interval A1, marking the task to be marked as first private data; when the task to be marked is in the classification interval A2, marking the task to be marked as second private data; when the task to be marked is in the classification interval A3, marking the task to be marked as third private data; when the task to be marked is in the classification interval A4, marking the task to be marked as fourth private data;
step five: counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set;
step six: dividing the annotators into a first annotator, a second annotator, a third annotator and a fourth annotator according to the annotation authority values of the annotators; the first annotator is used for annotating the first private data, and the second annotator is used for annotating the second private data; the third annotator is used for annotating the third private data, and the fourth annotator is used for annotating the fourth private data;
step seven: counting all people of the first annotator as R1, and sequencing the first annotator according to the annotation weight value from large to small; counting the quantity of the first private data in the first private data set and recording the quantity as R2; obtaining the marking number R3 of the first marking member by using a formula R3 ═ R2/R1, and when the marking number R3 cannot be divided completely, directly adding a numerical value one to the quotient by taking the value of R3 as a value;
step eight: sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; similarly, the second private data set, the third private data set and the fourth private data set are distributed to the computer terminals of the corresponding annotators according to the above steps; and the first annotator, the second annotator, the third annotator and the fourth annotator annotate the corresponding first private data, second private data, third private data and fourth private data through the annotation tool.
Preferably, the labeling tools in S4 include an image labeling tool, a voice labeling tool, a text labeling tool, and a video labeling tool; the image class labeling tool comprises target detection, picture classification, instance segmentation, semantic segmentation and face segmentation; the voice labeling tool comprises a single paragraph, multiple paragraphs, voice playing speed regulation, voice waveform scaling and spectrogram switching; the text labeling tool comprises entity labeling, intention labeling and word segmentation labeling; the video labeling tool comprises picture labeling after frame extraction, marking of main body attributes and track tracking; the first annotator, the second annotator, the third annotator and the fourth annotator send the marked private data to a computer terminal of a quality inspector for quality inspection, and when the quality inspector performs spot check on the marked private data and the spot check on the private data is qualified, the quality inspector sends the private marked private data to a computer terminal of an inspector; when the private data of the spot check is unqualified, sending the private data to a corresponding annotator computer terminal for re-annotation, and simultaneously increasing the total number of times of annotation errors of the annotator by one time; and the acceptance checker sends the accepted private data to the server for storage.
Preferably, the template configuration module in S4 is configured to perform differentiated configuration on different tasks to be labeled, and perform attribute allocation on the different tasks to be labeled through the frame labeling template to configure different labeling tools; the frame mark injection molding plate comprises a human face frame marking tool and an automobile frame marking tool.
Preferably, the labeling permission value in the sixth step is calculated by a permission calculation module, and the specific calculation steps are as follows:
SS 1: setting a label member as Wi, wherein i is 1, … … and n; the annotator is an engineer or an enterprise internal staff for marking the interior of the organization; setting the quantity of the private labeling data of the labeling personnel as MWi(ii) a The total number of times of marking errors of the marker is marked as CWi;
SS 2: using the formula QWi=MWi*Zk1-CWiZk2 obtaining the marking authority value Q of the markerWi(ii) a Wherein Zk1 and Zk2 are preset proportionality coefficients; k is 1, 2; z11 and Z12 represent a preset proportionality coefficient for marking the quantity of private data and a coefficient for marking the total number of errors by an internal marking engineer of the organization; z21 and Z22 represent a preset proportion coefficient for marking the quantity of private data and a coefficient for marking the total number of errors for personnel in the enterprise;
SS 3: setting a first threshold, a second threshold and a third threshold from large to small in sequence; when the marking authority value of the marker is greater than or equal to a first threshold value, marking the marker as a first marker; when the marking authority value of the marker is smaller than the first threshold value and larger than or equal to the second threshold value, the marker is marked as a second marker; when the marking authority value of the marker is smaller than the second threshold value and larger than or equal to a third threshold value, the marker is marked as a third marker; and when the marking authority value of the marker is smaller than the fourth threshold value, marking the marker as a fourth marker.
Preferably, the result export module is used for exporting the annotation result of the task to be annotated to the local on line by the user, and the export includes manual export or export through an openAPI.
The invention has the beneficial effects that:
(1) the method comprises the steps that internal personnel of an enterprise send private data of the enterprise to an enterprise private cloud through an intranet, private protection of the enterprise data is improved, tasks to be marked are marked through an intelligent tool module based on an incremental data auxiliary marking technology of a person in a loop, and the intelligent tool module sends the tasks to be marked which cannot be marked to a marking distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; the marking task module is used for acquiring private data in the enterprise private cloud and the grade corresponding to the private data through the server to create a task, and the marking task module sends the task to be marked to the marking distribution module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal; obtaining the sequencing value of the task to be marked by using a formula; classifying the tasks to be marked according to the sequencing values, and counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set; sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
Drawings
The invention will be further described with reference to the accompanying drawings.
FIG. 1 is a functional block diagram of a method of privatized deployment data processing for an enterprise of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention is a method for processing privatized deployment data of an enterprise, the method includes the following steps:
s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server; the user roles comprise internal personnel, managers and common personnel of the enterprise; the system comprises a plurality of managers, a data management system and a task management system, wherein the managers comprise an authorization manager and an organization manager, and the authorization manager is used for managing the system and comprises authorization of user roles, data management, user management, project management and task management; an organization administrator performs overall management on personnel, projects, tasks and data of an organization; the ordinary personnel comprise a annotator, a quality inspector and an inspector; the annotator is used for processing and annotating the annotation data; the quality inspector is used for carrying out quality inspection on the marked data; the inspection and acceptance personnel inspect and accept the marked data;
s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud;
s3: the method comprises the steps that a labeling task module is used for task creation of private data, created task binding data to be labeled are sent to an intelligent tool module for processing, the intelligent tool module labels the task to be labeled based on an incremental data auxiliary labeling technology of a human-in-loop, and the intelligent tool module processes data acquisition, data processing and labeling; the data acquisition comprises web crawler, camera acquisition, microphone array acquisition, camera/mobile phone acquisition and the like; the data processing comprises transcoding, segmentation, frame extraction, desensitization, format standardization, data combination and format conversion; the annotations include voice, image, video, text, and 3D point clouds; the intelligent tool module sends the tasks to be marked which cannot be marked to the marking distribution module; the marking task module is used for acquiring private data in the enterprise private cloud and the corresponding grade of the private data through the server to create a task, and marking the private data to be marked as a task to be marked; after the task is created, data need to be bound to the task to be marked, the task binding data support batch binding and index binding under a data set, and after the data are bound, the task to be marked is sent to a marking distribution module by a marking task module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal, and the specific distribution steps are as follows:
the method comprises the following steps: setting the task to be marked as Dji, wherein j is 1, 2, 3 and 4; 1, 1 … … n; d1i, D2i, D3i and D4i are sequentially represented as voice, pictures, video and text; setting the task level to be marked as GDji(ii) a The size of the file corresponding to the task to be marked is KDji;
Step two: setting integral values corresponding to tasks to be marked as Cj, wherein j is 1, 2, 3 and 4; and C4> C2> C3> C1;
step three: using formulasObtaining the sequencing value P of the task to be markedDji(ii) a Wherein lambda is a correction factor and takes a value of 1.2; v1, v2 and v3 are all preset fixed values of proportionality coefficients;
step four: classifying the tasks to be annotated according to the sequence values, setting classification intervals as A1, A2, A3 and A4, and sequentially reducing the value intervals of A1, A2, A3 and A4; when the task to be marked is in the classification interval A1, marking the task to be marked as first private data; when the task to be marked is in the classification interval A2, marking the task to be marked as second private data; when the task to be marked is in the classification interval A3, marking the task to be marked as third private data; when the task to be marked is in the classification interval A4, marking the task to be marked as fourth private data;
step five: counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set;
step six: dividing the annotators into a first annotator, a second annotator, a third annotator and a fourth annotator according to the annotation authority values of the annotators; the first annotator is used for annotating the first private data, and the second annotator is used for annotating the second private data; the third annotator is used for annotating the third private data, and the fourth annotator is used for annotating the fourth private data; the labeling authority value is calculated through an authority calculation module, and the specific calculation steps are as follows:
SS 1: setting a label member as Wi, wherein i is 1, … … and n; the annotator is an engineer or an enterprise internal staff for marking the interior of the organization; setting the quantity of the private labeling data of the labeling personnel as MWi(ii) a The total number of times of marking errors of the marker is marked as CWi;
SS 2: using the formula QWi=MWi*Zk1-CWiZk2 obtaining the marking authority value Q of the markerWi(ii) a Wherein Zk1 and Zk2 are preset proportionality coefficients; k is 1, 2; z11 and Z12 represent a preset proportionality coefficient for marking the quantity of private data and a coefficient for marking the total number of errors by an internal marking engineer of the organization; z21 and Z22 represent a preset proportion coefficient for marking the quantity of private data and a coefficient for marking the total number of errors for personnel in the enterprise;
SS 3: setting a first threshold, a second threshold and a third threshold from large to small in sequence; when the marking authority value of the marker is greater than or equal to a first threshold value, marking the marker as a first marker; when the marking authority value of the marker is smaller than the first threshold value and larger than or equal to the second threshold value, the marker is marked as a second marker; when the marking authority value of the marker is smaller than the second threshold value and larger than or equal to a third threshold value, the marker is marked as a third marker; when the marking authority value of the marker is smaller than a fourth threshold value, the marker is marked as a fourth marker;
step seven: counting all people of the first annotator as R1, and sequencing the first annotator according to the annotation weight value from large to small; counting the quantity of the first private data in the first private data set and recording the quantity as R2; obtaining the marking number R3 of the first marking member by using a formula R3 ═ R2/R1, and when the marking number R3 cannot be divided completely, directly adding a numerical value one to the quotient by taking the value of R3 as a value;
step eight: sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; similarly, the second private data set, the third private data set and the fourth private data set are distributed to the computer terminals of the corresponding annotators according to the above steps; the first annotator, the second annotator, the third annotator and the fourth annotator are used for annotating the corresponding first private data, second private data, third private data and fourth private data through the annotation tool; the template configuration module is used for performing differentiated configuration on different tasks to be marked and performing attribute allocation on the different tasks to be marked through the frame marking template to configure different marking tools; the frame mark injection molding plate comprises a human face frame marking tool and an automobile frame marking tool; configuring different tools by respectively configuring attributes of different labels through a frame labeling template; for example: face frame marking tool: setting attributes such as gender (male and female), category (infant, adult and old), skin color (yellow, white and black); automobile frame marking tool: attributes such as color (blue, red and white), category (truck, bus, off-road vehicle and car) and the like can be set; the marking tool can be divided into a configurable template and a customized template according to whether the user-defined label configuration is supported or not; the configurable template comprises a voice labeling template and a picture labeling template, and mainly comprises the following steps: a single-paragraph voice template, a multi-paragraph voice template, a point labeling template, a rectangular frame labeling template and a polygon labeling template; under the condition that the configurable template cannot meet the labeling requirement, providing a customized template for the specific requirements of the specific labeling field of an enterprise; the customized template provided comprises: a multi-section speech annotation template and a semantic understanding text annotation template; the template customization work of voice, text, image and video is supported;
s4: the label distribution module distributes the private data to the corresponding label operator computer terminal, different tasks to be labeled are configured in a distinguishing way through the template configuration module, and the label operator labels the tasks to be labeled through a labeling tool on the computer terminal; the marking tools comprise an image marking tool, a voice marking tool, a text marking tool and a video marking tool; the image class labeling tool comprises target detection, picture classification, instance segmentation, semantic segmentation and face segmentation; the voice labeling tool comprises a single paragraph, multiple paragraphs, voice playing speed regulation, voice waveform scaling and spectrogram switching; the text labeling tool comprises entity labeling, intention labeling and word segmentation labeling; the video labeling tool comprises picture labeling after frame extraction, marking of main body attributes and track tracking; the first annotator, the second annotator, the third annotator and the fourth annotator send the marked private data to a computer terminal of a quality inspector for quality inspection, and when the quality inspector performs spot check on the marked private data and the spot check on the private data is qualified, the quality inspector sends the private marked private data to a computer terminal of an inspector; when the private data of the spot check is unqualified, sending the private data to a corresponding annotator computer terminal for re-annotation, and simultaneously increasing the total number of times of annotation errors of the annotator by one time; the acceptance checker sends the private data of the acceptance to the server for storage;
s5: the annotator sends the marked tasks to be annotated to the intelligent tool module, and the annotation result is exported to the local through the result export module; and the result exporting module is used for exporting the marking result of the task to be marked to the local on line by the user, and the exporting comprises manual exporting or exporting through an openAPI.
The working principle of the invention is as follows: the internal personnel of enterprise send enterprise's private data to enterprise's private cloud through the intranet, improve the privacy protection to enterprise's data, through intelligent tool module based on people returningThe incremental data auxiliary labeling technology of the road labels the tasks to be labeled, and the intelligent tool module sends the tasks to be labeled which cannot be labeled to the label distribution module; the efficiency is improved by adopting an iterative interactive production process of manual marking and intelligent tool marking; the marking task module is used for acquiring private data in the enterprise private cloud and the grade corresponding to the private data through the server to create a task, and the marking task module sends the task to be marked to the marking distribution module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal; using formulasObtaining the sequencing value P of the task to be markedDji(ii) a Classifying the tasks to be marked according to the sequencing values, and counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set; sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; and reasonably distributing the tasks to be labeled to the corresponding labeling personnel according to the labeling right limit value of the labeling personnel by sequencing the tasks to be labeled, thereby carrying out better labeling.
The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the invention as defined in the following claims.
Claims (7)
1. A privatized deployment data processing method for an enterprise, the processing method comprising the steps of:
s1: the user login module is used for carrying out identity verification on the user and the user role and establishing communication connection between the user passing the identity verification and the server;
s2: the method comprises the steps that internal personnel of an enterprise send private data of the enterprise and the corresponding grade of the private data through an intranet to be stored in an enterprise private cloud;
s3: the method comprises the steps that a marking task module is used for creating a task for private data, the created binding data of a task to be marked are sent to an intelligent tool module for processing, the intelligent tool module marks the task to be marked based on an incremental data auxiliary marking technology of a human-in-loop, and the intelligent tool module sends the task to be marked which cannot be marked to a marking distribution module;
s4: the label distribution module distributes the private data to the corresponding label operator computer terminal, different tasks to be labeled are configured in a distinguishing way through the template configuration module, and the label operator labels the tasks to be labeled through a labeling tool on the computer terminal;
s5: and the annotator sends the annotated task to be annotated to the intelligent tool module, and the annotation result is exported to the local through the result export module.
2. The privatized deployment data processing method for an enterprise according to claim 1, wherein the user roles in S1 include intra-enterprise personnel, administrator and general personnel; the system comprises a plurality of managers, a data management system and a task management system, wherein the managers comprise an authorization manager and an organization manager, and the authorization manager is used for managing the system and comprises authorization of user roles, data management, user management, project management and task management; an organization administrator performs overall management on personnel, projects, tasks and data of an organization; the ordinary personnel comprise a annotator, a quality inspector and an inspector; the annotator is used for processing and annotating the annotation data; the quality inspector is used for carrying out quality inspection on the marked data; and (4) the acceptance personnel accepts the marked data.
3. The method for processing the privatized deployment data of the enterprise according to claim 1, wherein the labeling task module in S3 is configured to obtain, through the server, the private data in the private cloud of the enterprise and the level corresponding to the private data to create a task, and mark the private data to be labeled as a task to be labeled; after the task is created, data need to be bound to the task to be marked, the task binding data support batch binding and index binding under a data set, and after the data are bound, the task to be marked is sent to a marking distribution module by a marking task module; the label distribution module is used for distributing the private data to the corresponding label operator computer terminal, and the specific distribution steps are as follows:
the method comprises the following steps: setting the task to be marked as Dji, wherein j is 1, 2, 3 and 4; 1, 1 … … n; d1i, D2i, D3i and D4i are sequentially represented as voice, pictures, video and text; setting the task level to be marked as GDji(ii) a The size of the file corresponding to the task to be marked is KDji;
Step two: setting integral values corresponding to tasks to be marked as Cj, wherein j is 1, 2, 3 and 4; and C4> C2> C3> C1;
step three: using formulasObtaining the sequencing value P of the task to be markedDji(ii) a Wherein lambda is a correction factor and takes a value of 1.2; v1, v2 and v3 are all preset fixed values of proportionality coefficients;
step four: classifying the tasks to be annotated according to the sequence values, setting classification intervals as A1, A2, A3 and A4, and sequentially reducing the value intervals of A1, A2, A3 and A4; when the task to be marked is in the classification interval A1, marking the task to be marked as first private data; when the task to be marked is in the classification interval A2, marking the task to be marked as second private data; when the task to be marked is in the classification interval A3, marking the task to be marked as third private data; when the task to be marked is in the classification interval A4, marking the task to be marked as fourth private data;
step five: counting all the first private data, the second private data, the third private data and the fourth private data; respectively forming a first private data set, a second private data set, a third private data set and a fourth private data set;
step six: dividing the annotators into a first annotator, a second annotator, a third annotator and a fourth annotator according to the annotation authority values of the annotators; the first annotator is used for annotating the first private data, and the second annotator is used for annotating the second private data; the third annotator is used for annotating the third private data, and the fourth annotator is used for annotating the fourth private data;
step seven: counting all people of the first annotator as R1, and sequencing the first annotator according to the annotation weight value from large to small; counting the quantity of the first private data in the first private data set and recording the quantity as R2; obtaining the marking number R3 of the first marking member by using a formula R3 ═ R2/R1, and when the marking number R3 cannot be divided completely, directly adding a numerical value one to the quotient by taking the value of R3 as a value;
step eight: sorting the first private data in the first private data set from big to small according to the sorting values, distributing R3 first private data with the top sorting values to the computer terminal of the first annotator with the largest annotation authority value, and so on; similarly, the second private data set, the third private data set and the fourth private data set are distributed to the computer terminals of the corresponding annotators according to the above steps; and the first annotator, the second annotator, the third annotator and the fourth annotator annotate the corresponding first private data, second private data, third private data and fourth private data through the annotation tool.
4. The privatized deployment data processing method for enterprises according to claim 1, wherein the annotation tools in S4 include an image class annotation tool, a voice class annotation tool, a text class annotation tool and a video class annotation tool; the image class labeling tool comprises target detection, picture classification, instance segmentation, semantic segmentation and face segmentation; the voice labeling tool comprises a single paragraph, multiple paragraphs, voice playing speed regulation, voice waveform scaling and spectrogram switching; the text labeling tool comprises entity labeling, intention labeling and word segmentation labeling; the video labeling tool comprises picture labeling after frame extraction, marking of main body attributes and track tracking; the first annotator, the second annotator, the third annotator and the fourth annotator send the marked private data to a computer terminal of a quality inspector for quality inspection, and when the quality inspector performs spot check on the marked private data and the spot check on the private data is qualified, the quality inspector sends the private marked private data to a computer terminal of an inspector; when the private data of the spot check is unqualified, sending the private data to a corresponding annotator computer terminal for re-annotation, and simultaneously increasing the total number of times of annotation errors of the annotator by one time; and the acceptance checker sends the accepted private data to the server for storage.
5. The privatized deployment data processing method for an enterprise according to claim 1, wherein the template configuration module in S4 is configured to differentiate and configure different tasks to be annotated, and allocate attributes to the different tasks to be annotated through the box annotation template to configure different annotation tools; the frame mark injection molding plate comprises a human face frame marking tool and an automobile frame marking tool.
6. The method for processing the privatized deployment data of the enterprise according to claim 3, wherein the labeling permission value in the sixth step is calculated by a permission calculation module, and the specific calculation steps are as follows:
SS 1: setting a label member as Wi, wherein i is 1, … … and n; the annotator is an engineer or an enterprise internal staff for marking the interior of the organization; setting the quantity of the private labeling data of the labeling personnel as MWi(ii) a The total number of times of marking errors of the marker is marked as CWi;
SS 2: using the formula QWi=MWi*Zk1-CWiZk2 obtaining the marking authority value Q of the markerWi(ii) a Wherein Zk1 and Zk2 are preset proportionality coefficients; k is 1, 2; z11 and Z12 represent a preset proportionality coefficient for marking the quantity of private data and a coefficient for marking the total number of errors by an internal marking engineer of the organization; z21, Z22 represent tagging private data for personnel within an enterpriseThe number of the preset proportional coefficients and the coefficient of the total number of times of the labeling errors;
SS 3: setting a first threshold, a second threshold and a third threshold from large to small in sequence; when the marking authority value of the marker is greater than or equal to a first threshold value, marking the marker as a first marker; when the marking authority value of the marker is smaller than the first threshold value and larger than or equal to the second threshold value, the marker is marked as a second marker; when the marking authority value of the marker is smaller than the second threshold value and larger than or equal to a third threshold value, the marker is marked as a third marker; and when the marking authority value of the marker is smaller than the fourth threshold value, marking the marker as a fourth marker.
7. The privatized deployment data processing method for enterprises according to claim 1, wherein the result export module is configured to export the annotation result of the task to be annotated to the local online by the user, and the exporting includes manual exporting or exporting through openAPI.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911071132.2A CN110826101B (en) | 2019-11-05 | 2019-11-05 | Privatization deployment data processing method for enterprise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911071132.2A CN110826101B (en) | 2019-11-05 | 2019-11-05 | Privatization deployment data processing method for enterprise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110826101A true CN110826101A (en) | 2020-02-21 |
CN110826101B CN110826101B (en) | 2021-01-05 |
Family
ID=69552467
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911071132.2A Active CN110826101B (en) | 2019-11-05 | 2019-11-05 | Privatization deployment data processing method for enterprise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110826101B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553161A (en) * | 2020-04-28 | 2020-08-18 | 郑州大学 | Entity and relation labeling system for medical texts |
CN113591888A (en) * | 2020-04-30 | 2021-11-02 | 上海禾赛科技有限公司 | Point cloud data labeling network system and method for laser radar |
CN114036495A (en) * | 2022-01-11 | 2022-02-11 | 北京顶象技术有限公司 | Method and device for updating privatized deployment verification code system |
CN115248831A (en) * | 2021-04-28 | 2022-10-28 | 马上消费金融股份有限公司 | Labeling method, device, system, equipment and readable storage medium |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872343A (en) * | 2009-04-24 | 2010-10-27 | 罗彤 | Semi-supervised mass data hierarchy classification method |
CN102571703A (en) * | 2010-12-23 | 2012-07-11 | 鸿富锦精密工业(深圳)有限公司 | Security control system and security control method for cloud data |
CN102799684A (en) * | 2012-07-27 | 2012-11-28 | 成都索贝数码科技股份有限公司 | Video-audio file catalogue labeling, metadata storage indexing and searching method |
CN103077236A (en) * | 2013-01-09 | 2013-05-01 | 公安部第三研究所 | System and method for realizing video knowledge acquisition and marking function of portable-type device |
CN103530282A (en) * | 2013-10-23 | 2014-01-22 | 北京紫冬锐意语音科技有限公司 | Corpus tagging method and equipment |
CN104917848A (en) * | 2015-07-03 | 2015-09-16 | 成都怡云科技有限公司 | Smart cloud platform for enterprises based on enterprise management and service |
CN106411857A (en) * | 2016-09-07 | 2017-02-15 | 河海大学 | Private cloud GIS service access control method based on virtual isolation mechanism |
US20170118279A1 (en) * | 2015-10-23 | 2017-04-27 | International Business Machines Corporation | Synchronizing proprietary data in an external cloud with data in a private storage system |
CN107153664A (en) * | 2016-03-04 | 2017-09-12 | 同方知网(北京)技术有限公司 | A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted |
CN107622056A (en) * | 2016-07-13 | 2018-01-23 | 百度在线网络技术(北京)有限公司 | The generation method and device of training sample |
CN108062341A (en) * | 2016-11-08 | 2018-05-22 | 中国移动通信有限公司研究院 | The automatic marking method and device of data |
WO2019005239A1 (en) * | 2017-06-27 | 2019-01-03 | Western Digital Technologies, Inc. | Hybrid data storage system with private storage cloud and public storage cloud |
CN109165293A (en) * | 2018-08-08 | 2019-01-08 | 上海宝尊电子商务有限公司 | A kind of expert data mask method and program towards fashion world |
CN109255044A (en) * | 2018-08-31 | 2019-01-22 | 江苏大学 | A kind of image intelligent mask method based on YOLOv3 deep learning network |
CN109389275A (en) * | 2017-08-08 | 2019-02-26 | 北京图森未来科技有限公司 | A kind of image labeling method and device |
CN109885648A (en) * | 2018-12-29 | 2019-06-14 | 清华大学 | Subtitle scene and speaker information automatic marking method and system based on drama |
CN109992763A (en) * | 2017-12-29 | 2019-07-09 | 北京京东尚科信息技术有限公司 | Language marks processing method, system, electronic equipment and computer-readable medium |
US20190251182A1 (en) * | 2018-02-12 | 2019-08-15 | International Business Machines Corporation | Extraction of information and smart annotation of relevant information within complex documents |
US20190325318A1 (en) * | 2018-04-18 | 2019-10-24 | Ants Technology (Hk) Limited | Method and system for learning in a trustless environment |
CN110555084A (en) * | 2019-08-26 | 2019-12-10 | 电子科技大学 | remote supervision relation classification method based on PCNN and multi-layer attention |
-
2019
- 2019-11-05 CN CN201911071132.2A patent/CN110826101B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872343A (en) * | 2009-04-24 | 2010-10-27 | 罗彤 | Semi-supervised mass data hierarchy classification method |
CN102571703A (en) * | 2010-12-23 | 2012-07-11 | 鸿富锦精密工业(深圳)有限公司 | Security control system and security control method for cloud data |
CN102799684A (en) * | 2012-07-27 | 2012-11-28 | 成都索贝数码科技股份有限公司 | Video-audio file catalogue labeling, metadata storage indexing and searching method |
CN103077236A (en) * | 2013-01-09 | 2013-05-01 | 公安部第三研究所 | System and method for realizing video knowledge acquisition and marking function of portable-type device |
CN103530282A (en) * | 2013-10-23 | 2014-01-22 | 北京紫冬锐意语音科技有限公司 | Corpus tagging method and equipment |
CN104917848A (en) * | 2015-07-03 | 2015-09-16 | 成都怡云科技有限公司 | Smart cloud platform for enterprises based on enterprise management and service |
US20170118279A1 (en) * | 2015-10-23 | 2017-04-27 | International Business Machines Corporation | Synchronizing proprietary data in an external cloud with data in a private storage system |
CN107153664A (en) * | 2016-03-04 | 2017-09-12 | 同方知网(北京)技术有限公司 | A kind of method flow that research conclusion is simplified based on the scientific and technical literature mark that assemblage characteristic is weighted |
CN107622056A (en) * | 2016-07-13 | 2018-01-23 | 百度在线网络技术(北京)有限公司 | The generation method and device of training sample |
CN106411857A (en) * | 2016-09-07 | 2017-02-15 | 河海大学 | Private cloud GIS service access control method based on virtual isolation mechanism |
CN108062341A (en) * | 2016-11-08 | 2018-05-22 | 中国移动通信有限公司研究院 | The automatic marking method and device of data |
WO2019005239A1 (en) * | 2017-06-27 | 2019-01-03 | Western Digital Technologies, Inc. | Hybrid data storage system with private storage cloud and public storage cloud |
CN109389275A (en) * | 2017-08-08 | 2019-02-26 | 北京图森未来科技有限公司 | A kind of image labeling method and device |
CN109992763A (en) * | 2017-12-29 | 2019-07-09 | 北京京东尚科信息技术有限公司 | Language marks processing method, system, electronic equipment and computer-readable medium |
US20190251182A1 (en) * | 2018-02-12 | 2019-08-15 | International Business Machines Corporation | Extraction of information and smart annotation of relevant information within complex documents |
US20190325318A1 (en) * | 2018-04-18 | 2019-10-24 | Ants Technology (Hk) Limited | Method and system for learning in a trustless environment |
CN109165293A (en) * | 2018-08-08 | 2019-01-08 | 上海宝尊电子商务有限公司 | A kind of expert data mask method and program towards fashion world |
CN109255044A (en) * | 2018-08-31 | 2019-01-22 | 江苏大学 | A kind of image intelligent mask method based on YOLOv3 deep learning network |
CN109885648A (en) * | 2018-12-29 | 2019-06-14 | 清华大学 | Subtitle scene and speaker information automatic marking method and system based on drama |
CN110555084A (en) * | 2019-08-26 | 2019-12-10 | 电子科技大学 | remote supervision relation classification method based on PCNN and multi-layer attention |
Non-Patent Citations (2)
Title |
---|
刘鹏等: "《数据标注工程》", 1 June 2019 * |
蔡莉等: ""数据标注研究综述"", 《软件学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553161A (en) * | 2020-04-28 | 2020-08-18 | 郑州大学 | Entity and relation labeling system for medical texts |
CN111553161B (en) * | 2020-04-28 | 2022-11-18 | 郑州大学 | Entity and relation labeling system for medical texts |
CN113591888A (en) * | 2020-04-30 | 2021-11-02 | 上海禾赛科技有限公司 | Point cloud data labeling network system and method for laser radar |
CN115248831A (en) * | 2021-04-28 | 2022-10-28 | 马上消费金融股份有限公司 | Labeling method, device, system, equipment and readable storage medium |
CN115248831B (en) * | 2021-04-28 | 2024-03-15 | 马上消费金融股份有限公司 | Labeling method, labeling device, labeling system, labeling equipment and readable storage medium |
CN114036495A (en) * | 2022-01-11 | 2022-02-11 | 北京顶象技术有限公司 | Method and device for updating privatized deployment verification code system |
Also Published As
Publication number | Publication date |
---|---|
CN110826101B (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110826101B (en) | Privatization deployment data processing method for enterprise | |
Jerry Fjermestad | Group support systems: A descriptive evaluation of case and field studies | |
CN109492981A (en) | The checking method and device of information | |
CN112333420B (en) | Big data information security management system of smart campus | |
CN110059978B (en) | Teacher evaluation system based on cloud computing auxiliary teaching evaluation | |
CN106327379A (en) | Mobile smart campus system based on IOT (Internet of Things) | |
CN112036166A (en) | Data labeling method and device, storage medium and computer equipment | |
CN115409658A (en) | Enterprise training post course matching method, system and storage medium | |
CN113850537B (en) | Multi-state mixed operation data management system | |
CN115221380A (en) | Method, system and platform for managing urban construction files in batches | |
CN107256515A (en) | The method of the financial integrated OCR identification softwares of cloud platform | |
CN106408470A (en) | Teaching quality evaluation device | |
Folkerts et al. | Analyzing sentiments of German job references | |
CN111862436A (en) | Digital campus management system for primary and middle schools | |
CN102722790A (en) | Human resource service system | |
CN109711799A (en) | Guide the teaching software and its operation method of the standardization office of administration hilllock | |
CN108805394A (en) | A kind of method and device of automatic management employee | |
CN108717674A (en) | System of examining for the levels on line and method of examining for the levels | |
CN114742412A (en) | Software technology service system and method | |
CN113642291A (en) | Method, system, storage medium and terminal for constructing logical structure tree reported by listed companies | |
Ebrahimzadeh Dastjerdi et al. | The effects of leader’s communication styles on tendency to change: A study on the effective inter-organizational conveyance and readiness for change | |
CN113610676B (en) | Computer teaching system of giving lessons based on cloud platform | |
Jintalan et al. | Organizational Culture and Job Satisfaction of Private School Teachers | |
CN116823555A (en) | Analysis report writing practical training method and system | |
CN115730005A (en) | Method and system for analyzing data standard difference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |