CN112270532A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112270532A
CN112270532A CN202011261908.XA CN202011261908A CN112270532A CN 112270532 A CN112270532 A CN 112270532A CN 202011261908 A CN202011261908 A CN 202011261908A CN 112270532 A CN112270532 A CN 112270532A
Authority
CN
China
Prior art keywords
data
marked
auditing
labeled
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011261908.XA
Other languages
Chinese (zh)
Other versions
CN112270532B (en
Inventor
杨雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011261908.XA priority Critical patent/CN112270532B/en
Publication of CN112270532A publication Critical patent/CN112270532A/en
Application granted granted Critical
Publication of CN112270532B publication Critical patent/CN112270532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/101Collaborative creation, e.g. joint development of products or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data processing method and device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and further relates to the fields of deep learning and the like. The specific implementation scheme is as follows: acquiring marked data and standard marking data of the marked data; and auditing the marked data according to the marked element position and/or the marked element content and the standard marked data. Under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to deep learning and automatic driving technology, and specifically relates to a data processing method and device, electronic equipment and a storage medium.
Background
With the gradual landing of artificial intelligence algorithms, the algorithm research is called as hot-going. As a fuel for training the algorithm, the data quality plays a crucial role in the accuracy of the algorithm. However, in order to ensure the quality of output data, the data is mainly audited manually, which is inefficient and high in labor cost, and needs to be improved urgently.
Disclosure of Invention
The disclosure provides a data processing method, a data processing device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided a data processing method, including:
acquiring marked data and standard marking data of the marked data;
and auditing the marked data according to the marked element position and/or the marked element content and the standard marked data.
According to another aspect of the present disclosure, there is provided a data processing apparatus including:
the data acquisition module is used for acquiring the marked data and the standard marked data of the marked data;
and the data auditing module is used for auditing the marked data according to the marked element position and/or the marked element content and the standard marked data.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data processing method according to any of the embodiments of the present application.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any one of the embodiments of the present application.
According to the technology of the application, under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a flowchart of a data processing method provided according to an embodiment of the present application;
FIG. 2 is a flow chart of another data processing method provided according to an embodiment of the application;
FIG. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device for implementing the data processing method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application. The embodiment of the application is suitable for the situation of how to process the data, and is particularly suitable for the situation of how to audit the labeled data labeled by a label maker in the scene of selecting the label maker for the data labeling project. Alternatively, the data annotation items may include, but are not limited to, an obstacle recognition scenario, a target (e.g., vehicle) tracking scenario, a human key point (e.g., human face) recognition scenario, an OCR text recognition scenario such as named entity recognition scenario, and the like. The embodiment may be performed by a data processing apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device carrying data processing functions, such as a server device. As shown in fig. 1, the method includes:
s101, obtaining the marked data and the standard marking data of the marked data.
In this embodiment, the marked data is obtained by marking the data to be marked. For example, for a vehicle tracking scene, the data to be labeled may include a plurality of continuous frames of images, and each frame of image is labeled according to the labeling requirement, so that the labeled data can be obtained. Optionally, the type of the labeled data may be an image, and further, the data type may be different in different labeling scenes; the data types may specifically include, but are not limited to, images, voice, text, video, web pages, and the like. Further, the present embodiment is more suitable for a scene with an image data type.
The standard marking data of the marked data can be obtained by marking the data to be marked by an authority person (such as a marker with a high marking level). In order to ensure the accuracy of auditing the labeled data, the standard labeled data of the labeled data can be obtained by labeling the data to be labeled by adopting a multi-person fitting answer mode. Optionally, in this embodiment, the standard annotation data may be used as a reference to audit the annotated data.
Optionally, in a scene of selecting a marker, the data to be marked is an examination question, correspondingly, the marked data is an examination answer, and the standard marking data is a standard answer; an examination question may correspond to data to be annotated, an examination question may include one or more examination questions, and an examination question may include one or more frames of images.
For example, after completing data annotation, a person participating in the examination selected by the annotator may submit the annotated data to a set database, and further, may acquire the annotated data when it is determined that the examination mode of the examination is automatic examination. Meanwhile, standard marking data of marked data can be obtained.
And S102, verifying the marked data according to the marked element position and/or the marked element content and the standard marked data.
In this embodiment, the labeling element is a labeling tool used for labeling the data to be labeled, and may include, but is not limited to, a point, a straight line, a curve, a square frame, a solid frame, and the like. The annotation element position is the position of the annotation element in the annotation data, such as the position of a point in the annotation data, the position of a line in the annotation data, and the like. The content of the annotation element includes information related to the object annotated with the annotation element, which may include, but is not limited to, an object type (e.g., a vehicle type), whether it is a forward obstacle, and the like; the content of the annotation element can also comprise the number of the annotation element, the size of the annotation element, whether the annotation element is blocked or not and the like.
Under the condition that the standard marking data is adopted to audit marked data, in order to ensure the auditing quality, multidimensional data such as marking element positions, marking element contents, marking element quantity and the like can be combined. For example, the standard marking data and the marked data can be compared to check whether the positions of the marking elements of the standard marking data and the marked data are consistent and/or whether the contents of the marking elements are the same; for another example, the standard marking data and the marked data can be compared to check whether the quantity of the marking elements of the standard marking data and the marked data is the same; or, the standard marking data and the marked data can be compared to check whether the quantity of the marking elements of the standard marking data and the marked data are the same, whether the positions of the marking elements are consistent and whether the contents of the marking elements are the same.
It is worth noting that in the prior art, manual auditing is mainly adopted for the auditing mode of the marked data, the labor cost is high, the efficiency is low, meanwhile, the levels of auditors are uneven, and the quality of the finally output data cannot be guaranteed; by combining the position of the labeled element and the content of the labeled element, the labeled data is automatically checked, so that the labor cost is saved, the checking efficiency is improved, and the data quality is ensured.
According to the technical scheme, the marked data are automatically checked by combining the position of the marked element and the content of the marked element, and compared with the prior art, the data checking efficiency is improved, the labor cost is reduced and a new idea is provided for checking the data under the condition that the data quality is guaranteed.
As an optional manner of the embodiment of the application, after the annotated data is reviewed, an review result of a annotator associated with the annotated data may be determined according to the review result of the annotated data.
In this embodiment, the auditing result of the labeled data may be represented by a numerical value, or may be represented by a grade, and the like.
Specifically, the audit result of the labeled data may be compared with the set admission threshold, and if the audit result of the labeled data is greater than the set admission threshold, the audit result of the label maker associated with the labeled data may be determined to be that the audit is passed. Further, under the condition that the number of the labeled data is at least two, the candidate annotators participating in the current examination can be arranged in a descending order according to the examination result of the labeled data, and preset data are selected from the candidate annotators as target annotators according to the sorting result, wherein the examination result of the target annotators is that the examination is passed, and the examination results of other candidate annotators are that the examination is not passed.
It should be noted that, in the scene of selecting the annotator, after the annotator finishes the data annotation at present, the special auditor performs the audit, which has high labor cost and low efficiency; meanwhile, due to the fact that the labeling levels of the auditors are uneven, the capability level of the admission labeler cannot be guaranteed, and the data quality cannot be guaranteed. By combining the position of the marked element and the content of the marked element, the marked data is automatically checked, so that the labor cost is saved, and the checking efficiency is improved; meanwhile, the auditing result of the marked data is substantially the quantification of the marking capability of the marker, and the quality of the selected marker can be ensured by taking the auditing result of the marked data as the basis for selecting the marker, so that the data quality is ensured.
Fig. 2 is a flowchart of another data processing method provided according to an embodiment of the present application. On the basis of the above embodiments, the embodiments of the present application provide a way of auditing labeled data based on the position of a labeled element. As shown in fig. 2, the method includes:
s201, obtaining the marked data and the standard marking data of the marked data.
And S202, determining a position evaluation index according to the type of the labeling element.
In this embodiment, the annotation element types may include, but are not limited to, points, lines, boxes, and regions; the type of the marking element involved in the marked data can comprise at least one of a point, a line, a frame and a region; similarly, the annotation elements involved in the standard annotation data also include at least one of points, lines, boxes, and regions. Optionally, if the type of the annotation element related to the standard annotation data is different from the type of the annotation element related to the annotated data, it may be directly determined that the audit result of the annotated data is 0.
If the type of the annotation element related to the standard annotation data is the same as the type of the annotation element related to the annotated data, the position evaluation index can be determined according to the type of the annotation element. The position evaluation index is an index for explaining the position of the marking element. Optionally, different types of the labeled elements may correspond to different position evaluation indexes. For example, if the type of the annotation element is a point, the position evaluation index may include a point distance; if the type of the marked element is a line, the position evaluation index may include a point distance (further, a point distance of key points on the line) and/or an intersection ratio, and the like; if the type of the marked element is a frame, the position evaluation index can comprise line spacing and/or intersection ratio and the like; if the type of the marking element is a region, the position evaluation index may include an intersection ratio and/or the number of pixels in the region.
And S203, auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
Specifically, the standard marking data and the marked data are compared, and the numerical value of the corresponding position evaluation index is determined according to the position of the marking element in the standard marking data and the position of the marking element in the marked data, so that the auditing result of the marked data is determined.
As an optional manner of the embodiment of the present application, S203 may specifically be: determining the numerical value of the position evaluation index according to the position of the labeled element, the standard labeled data and the labeled data; and determining the auditing result of the marked data according to the data of the position evaluation index and a set threshold value. In this embodiment, the set threshold may be set according to an actual labeling scenario, for example, may be a default numerical value; optionally, different position evaluation indexes correspond to different set thresholds.
For example, if the type of the annotation element is a point, the position evaluation index includes a point distance. Specifically, for each point in the labeled data, determining a point corresponding to the point in the standard labeled data and the position of the point; from the position of the point and the position of the point in the standard annotation data corresponding to the point, a value for the point separation can be determined. Further, the point may be mapped in the standard marking data according to the position of the point, and in the standard marking data, a distance between a point corresponding to the point in the standard marking data and a point where the point is mapped in the standard marking data, that is, a numerical value of a point interval, is calculated. The value of the dot spacing may then be compared with a first set threshold, such as 3px, and if the value of the dot spacing is greater than the first set threshold, the result of the verification of the dot is determined to be a mark error. Further, when the type of the annotation element includes only points and the number of the points is multiple, the auditing results of the points may be aggregated to obtain the auditing result of the annotated data. For example, if there are 10 points, and the result of the audit at 3 points is a labeling error, the result of the audit of the labeled data is 7/10.
And if the type of the marked element is a line, the position evaluation index comprises a point spacing and/or a cross-over ratio. Specifically, for each line in the labeled data, determining a line corresponding to the line in the standard labeled data and the position of the line; the value of the dot spacing can be determined from the location of the keypoint on the line (e.g., the midpoint of the line) and the location of the keypoint on the line corresponding to the line in the canonical label data (e.g., the midpoint of the line). Further, the key points on the line may be mapped in the standard labeling data according to the positions of the key points on the line, and in the standard labeling data, the distance between the key points on the line corresponding to the standard labeling data on the line and the points of the key points on the line mapped in the standard labeling data, that is, the numerical value of the dot pitch, is calculated. The value of the dot spacing may then be compared with a second set threshold, such as 10px, and if the dot spacing is greater than the second set threshold, the line is determined to have an error.
Or, for each line in the marked data, determining a line corresponding to the line in the standard marking data; constructing an area from the plurality of points on the line; meanwhile, another area is constructed according to a plurality of points on the line corresponding to the line in the standard marking data; according to the positions of the two constructed regions, the value of the intersection ratio between the two constructed regions can be determined; and then comparing the value of the cross-over ratio with a set third set threshold, and if the value of the cross-over ratio is smaller than the third set threshold, determining that the line is checked to be a marking error.
Alternatively, for each line in the labeled data, the auditing result of the line can be determined according to the intersection ratio and the point spacing. Optionally, in the case of simultaneously reviewing the intersection ratio and the point interval, only when the review result of the line is determined to be correctly labeled according to any one of the intersection ratio and the point interval, the final review result of the line is correctly labeled.
Further, when the type of the tagging element only includes a line and the number of the lines is multiple, the audit results of the lines may be aggregated to obtain the audit result of the tagged data.
And if the type of the marked element is a frame, the position evaluation index comprises a line spacing and/or a cross-over ratio. Specifically, for each frame in the labeled data, determining a frame corresponding to the frame in the standard labeled data and the position of the frame; according to the position of the frame and the position of the frame corresponding to the frame in the standard marking data, the numerical value of the line spacing and/or the numerical value of the intersection ratio can be determined; then, the auditing result of the frame can be determined according to the line spacing value and/or the intersection ratio value and the set threshold value. Wherein the line pitch is the largest line pitch among the line pitches of all sides of the frame. Optionally, under the condition that the line spacing and the intersection ratio are considered simultaneously, the final audit result of the frame is that the label is correct only when the audit result of the frame is determined to be correct according to any one of the intersection ratio and the line spacing.
And if the type of the marking element is the area, the position evaluation index comprises the intersection ratio and/or the number of area pixels. Specifically, for each region in the labeled data, determining a region corresponding to the region in the standard labeled data and the position of the region; according to the position of the area and the position of the area corresponding to the area in the standard marking data, the numerical value of the intersection ratio and/or the numerical value of the number of pixels of the area can be determined; then, the auditing result of the area can be determined according to the value of the intersection ratio and/or the value of the number of pixels of the area and the set threshold value. Optionally, under the condition that the intersection ratio and the number of pixels in the region are considered at the same time, the final audit result of the region is correctly labeled only when the audit result of the region is correctly labeled according to any one of the intersection ratio and the number of pixels in the region. For example, if the intersection ratio is greater than a set threshold (e.g., 80%), and the number of pixels in the region is greater than a set threshold (e.g., 70px), the final audit result of the region is that the label is correct.
It should be noted that, under the condition of not considering the content of the labeled element, the audit policy of the labeled element content can be defaulted to be completely exempt; further, in combination with the actual scene, the audit policy of the content of the annotation element may be a partial exemption, such as a number exemption, i.e. no audit number. Furthermore, when the position of the annotation element and the content of the annotation element are considered, if any one of the annotation element and the annotation element does not satisfy the condition, the auditing result of the standard element is an annotation error. In addition, the audit policy of the position of the marking element can also be set to be completely exempt or partially exempt. For example, for an annotation scene that is more focused on the content of the annotation element, the review of the position of the annotation element is relatively coarse.
It is to be noted that, in this embodiment, for different types of labeled elements, different position evaluation indexes may be introduced, and then, based on the position evaluation indexes, the labeled data is audited, so that an optional manner is provided for auditing the data. Meanwhile, the flexibility of the scheme is increased.
According to the technical scheme of the embodiment of the application, the position evaluation index is introduced, the marked data are audited based on the position evaluation index, and an optional mode is provided for auditing the data. Meanwhile, different marking element types can adopt different position evaluation indexes, and the flexibility of the scheme is further improved.
Fig. 3 is a flowchart of another data processing method provided in an embodiment of the present application. On the basis of the above embodiments, the embodiments of the present application provide a way to audit labeled data based on data audit granularity. As shown in fig. 3, the method includes:
s301, the marked data and the standard marking data of the marked data are obtained.
And S302, determining the data auditing granularity according to the marked data scene.
In this embodiment, the annotation data scene may include, but is not limited to: an obstacle recognition scene, a target (such as a vehicle) tracking scene, a human body key point (such as a human face) recognition scene, an OCR text recognition scene and the like. The data review granularity is used for representing the granularity of reviewing the labeled data, and may include, but is not limited to, an element dimension, an image dimension, a question dimension, and the like, wherein the review granularity of the question dimension is greater than the review granularity of the image dimension, and the review granularity of the image dimension is greater than the review granularity of the element dimension. Wherein, the element dimension takes a single labeled element as the auditing granularity; the image dimension takes a single-frame image as an auditing granularity; the question dimension is that a single question is the examination granularity, and is mainly suitable for the condition that one examination question comprises a plurality of frames of images.
Optionally, different data auditing granularities can be set in different marked data scenes. Furthermore, each tagged data scene may have a default data auditing granularity, and different data auditing granularities may also be set in the same tagged data scene.
Optionally, the present embodiment may directly obtain a preset data auditing granularity; under the condition that the data auditing granularity is not set in advance, the data auditing granularity can be flexibly set according to the marked data scene.
And S303, auditing the marked data according to the marked element position and/or the marked element content, the data auditing granularity and the standard marked data.
And comparing the standard marking data with the marked data in the scene of selecting the marker, and if the data auditing granularity is the element dimension, determining the auditing result of each marked element according to the position and/or the content of the marked element. If the labeled data comprises at least two examination questions, each examination question comprises a frame of image, and under the condition that each frame of image comprises a labeled element, the examination result of the labeled element is the examination result of the frame of image, and then the examination results of the frames of images are aggregated, so that the examination result of the labeled data can be obtained.
Furthermore, under the condition that each frame of image comprises a plurality of annotation elements and the types of the annotation elements are the same, the auditing results of the annotation elements in the frame of image can be aggregated to obtain the auditing result of the frame of image; and then, the auditing results of the images of each frame can be aggregated, so that the auditing result of the marked data can be obtained. In addition, under the condition that each frame of image comprises a plurality of annotation elements and the types of the annotation elements are different, for each type of annotation element, the auditing results of the type of annotation element in the frame of image can be aggregated to obtain the auditing result of the type of annotation element in the frame of image; further, according to the set weight value, the auditing results of all types of marking elements in the frame image are aggregated to obtain the auditing result of the frame image; then, the auditing results of each frame of image can be aggregated to obtain the auditing result of the marked data;
further, under the condition that the test questions comprise a plurality of frames of images, for each test question comprising a plurality of frames of images, the examination results of the frames of images in the test question can be aggregated to obtain the examination result of the test question; furthermore, the examination results of all examination questions can be aggregated to obtain the examination result of the labeled data.
And if the data auditing granularity is the image dimension, for each frame of image, determining the auditing result of the frame of image according to the position of the marking element and/or the content of the standard element. If the auditing result of one labeling element in the frame image is the labeling error, the auditing result of the frame image is the labeling error. Furthermore, if the labeled data comprises at least two examination questions and each examination question comprises a frame of image, the examination results of the frames of images are aggregated, and the examination result of the labeled data can be obtained.
Further, under the condition that the test questions comprise a plurality of frames of images, for each test question comprising a plurality of frames of images, the examination results of the frames of images in the test question can be aggregated to obtain the examination result of the test question; furthermore, the examination results of all examination questions can be aggregated to obtain the examination result of the labeled data.
In addition, if the data review granularity is the question dimension, for each examination question, the review result of the examination question is determined according to the position of the labeling element and/or the content of the standard element. If the examination question has an examination result of one labeling element as a labeling error, the examination result of the examination question is the labeling error. Furthermore, if the labeled data comprises at least two examination questions, aggregating the examination results of the examination questions of each frame to obtain the examination result of the labeled data.
According to the technical scheme of the embodiment of the application, the marked data is audited by introducing the data audit granularity and based on the data audit granularity, and an optional mode is provided for auditing the data. Meanwhile, different data marking scenes can adopt different data auditing granularities, so that the flexibility of the scheme is further increased, and the requirements of different scenes can be met.
Fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The embodiment of the application is suitable for the situation of how to process the data, and is particularly suitable for the situation of how to audit the labeled data labeled by a label maker in the scene of selecting the label maker for the data labeling project. Alternatively, the data annotation items may include, but are not limited to, an obstacle recognition scenario, a target (e.g., vehicle) tracking scenario, a human key point (e.g., human face) recognition scenario, an OCR text recognition scenario such as named entity recognition scenario, and the like. The embodiment may be performed by a data processing apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device carrying data processing functions, such as a server device. The data processing apparatus 400 specifically includes:
a data obtaining module 401, configured to obtain labeled data and standard labeled data of the labeled data;
and the data auditing module 402 is configured to audit the labeled data according to the position and/or content of the labeled element and the standard labeled data.
According to the technical scheme, the marked data are automatically checked by combining the position of the marked element and the content of the marked element, and compared with the prior art, the data checking efficiency is improved, the labor cost is reduced and a new idea is provided for checking the data under the condition that the data quality is guaranteed.
Illustratively, the data auditing module 402 includes:
the index determining unit is used for determining a position evaluation index according to the type of the labeling element;
and the data auditing unit is used for auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
Illustratively, the data auditing unit is specifically configured to:
determining the numerical value of the position evaluation index according to the position of the labeled element, the standard labeled data and the labeled data;
and determining the auditing result of the marked data according to the numerical value of the position evaluation index and a set threshold value.
Illustratively, the data auditing module 402 is specifically configured to:
determining data auditing granularity according to the marked data scene;
and auditing the marked data according to the marked element position and/or the marked element content, the data auditing granularity and the standard marked data.
Illustratively, the labeled data type in this embodiment is an image, the labeled element type includes at least one of a point, a line, a frame, and a region, and the position evaluation index includes at least one of a point pitch, a line pitch, an intersection ratio, and a number of pixels in the region.
Exemplarily, the apparatus further includes:
and the auditing result determining module is used for determining the auditing result of the annotator related to the annotated data according to the auditing result of the annotated data.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, the electronic device is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the data processing method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the data processing method provided by the present application.
The memory 502, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the data acquisition module 401 and the data auditing module 402 shown in fig. 4) corresponding to the data processing method in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implements the data processing method in the above-described method embodiments.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the data processing method, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 502 may optionally include memory located remotely from the processor 501, which may be connected to the data processing method electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the data processing method may further include: an input device 503 and an output device Y04. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the data processing method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
According to the technical scheme of the embodiment of the application, the marked data are automatically checked by combining the position of the marked element and the content of the marked element, and compared with the prior art, the data checking efficiency is improved, the labor cost is reduced and a new idea is provided for checking the data under the condition of ensuring the data quality.
The application can be applied to the technical field of artificial intelligence, which is a subject for researching and enabling a computer to simulate certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware level technology and a software level technology. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology knowledge map technology and the like.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A method of data processing, comprising:
acquiring marked data and standard marking data of the marked data;
and auditing the marked data according to the marked element position and/or the marked element content and the standard marked data.
2. The method of claim 1, wherein reviewing the labeled data according to labeled element position and the standard label data comprises:
determining a position evaluation index according to the type of the labeling element;
and auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
3. The method of claim 2, wherein reviewing the labeled data according to the position evaluation index, the labeled element position and the standard labeling data comprises:
determining the numerical value of the position evaluation index according to the position of the labeled element, the standard labeled data and the labeled data;
and determining the auditing result of the marked data according to the numerical value of the position evaluation index and a set threshold value.
4. The method of claim 1, wherein reviewing the tagged data based on tagged element location and/or tagged element content, and the standard tagging data, comprises:
determining data auditing granularity according to the marked data scene;
and auditing the marked data according to the marked element position and/or the marked element content, the data auditing granularity and the standard marked data.
5. The method of claim 2, wherein the labeled data type is an image, the labeled element type comprises at least one of a point, a line, a frame and a region, and the position evaluation index comprises at least one of a point pitch, a line pitch, an intersection ratio and a region pixel count.
6. The method of claim 1, further comprising:
and determining the auditing result of the annotator related to the annotated data according to the auditing result of the annotated data.
7. A data processing apparatus comprising:
the data acquisition module is used for acquiring the marked data and the standard marked data of the marked data;
and the data auditing module is used for auditing the marked data according to the marked element position and/or the marked element content and the standard marked data.
8. The apparatus of claim 7, wherein the data auditing module comprises:
the index determining unit is used for determining a position evaluation index according to the type of the labeling element;
and the data auditing unit is used for auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
9. The apparatus according to claim 8, wherein the data auditing unit is specifically configured to:
determining the numerical value of the position evaluation index according to the position of the labeled element, the standard labeled data and the labeled data;
and determining the auditing result of the marked data according to the numerical value of the position evaluation index and a set threshold value.
10. The apparatus according to claim 7, wherein the data auditing module is specifically configured to:
determining data auditing granularity according to the marked data scene;
and auditing the marked data according to the marked element position and/or the marked element content, the data auditing granularity and the standard marked data.
11. The apparatus according to claim 8, wherein the labeled data type is an image, the labeled element type includes at least one of a point, a line, a frame, and a region, and the position evaluation index includes at least one of a point pitch, a line pitch, an intersection ratio, and a region pixel count.
12. The apparatus of claim 7, further comprising:
and the auditing result determining module is used for determining the auditing result of the annotator related to the annotated data according to the auditing result of the annotated data.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-6.
14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data processing method of any one of claims 1 to 6.
CN202011261908.XA 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium Active CN112270532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011261908.XA CN112270532B (en) 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011261908.XA CN112270532B (en) 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112270532A true CN112270532A (en) 2021-01-26
CN112270532B CN112270532B (en) 2023-07-28

Family

ID=74340141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011261908.XA Active CN112270532B (en) 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112270532B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159123A (en) * 2021-03-17 2021-07-23 开易(北京)科技有限公司 Data annotation method, annotator assessment method and annotation result auditing method
CN113221999A (en) * 2021-05-06 2021-08-06 北京百度网讯科技有限公司 Method and device for obtaining accuracy of picture marking and electronic equipment
CN113284509A (en) * 2021-05-06 2021-08-20 北京百度网讯科技有限公司 Method and device for acquiring accuracy of voice annotation and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
CN109033220A (en) * 2018-06-29 2018-12-18 北京京东尚科信息技术有限公司 Automatically selecting method, system, equipment and the storage medium of labeled data
CN109697537A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus of data audit
WO2019137196A1 (en) * 2018-01-11 2019-07-18 阿里巴巴集团控股有限公司 Image annotation information processing method and device, server and system
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN111080092A (en) * 2019-11-29 2020-04-28 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium
CN111860304A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium
CN111860305A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image annotation method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
CN109697537A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus of data audit
WO2019137196A1 (en) * 2018-01-11 2019-07-18 阿里巴巴集团控股有限公司 Image annotation information processing method and device, server and system
CN109033220A (en) * 2018-06-29 2018-12-18 北京京东尚科信息技术有限公司 Automatically selecting method, system, equipment and the storage medium of labeled data
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN111080092A (en) * 2019-11-29 2020-04-28 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium
CN111860304A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium
CN111860305A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image annotation method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常致富;周风余;王玉刚;沈冬冬;赵阳;: "基于深度学习的图像自动标注方法综述", 山东大学学报(工学版), no. 06, pages 29 - 39 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159123A (en) * 2021-03-17 2021-07-23 开易(北京)科技有限公司 Data annotation method, annotator assessment method and annotation result auditing method
CN113221999A (en) * 2021-05-06 2021-08-06 北京百度网讯科技有限公司 Method and device for obtaining accuracy of picture marking and electronic equipment
CN113284509A (en) * 2021-05-06 2021-08-20 北京百度网讯科技有限公司 Method and device for acquiring accuracy of voice annotation and electronic equipment
CN113221999B (en) * 2021-05-06 2024-01-12 北京百度网讯科技有限公司 Picture annotation accuracy obtaining method and device and electronic equipment
CN113284509B (en) * 2021-05-06 2024-01-16 北京百度网讯科技有限公司 Method and device for obtaining accuracy of voice annotation and electronic equipment

Also Published As

Publication number Publication date
CN112270532B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN112270532B (en) Data processing method, device, electronic equipment and storage medium
US20210287015A1 (en) Method and apparatus for vehicle re-identification, training method and electronic device
CN111598164B (en) Method, device, electronic equipment and storage medium for identifying attribute of target object
CN112270533A (en) Data processing method and device, electronic equipment and storage medium
CN112036509A (en) Method and apparatus for training image recognition models
CN111783645A (en) Character recognition method and device, electronic equipment and computer readable storage medium
CN113591580B (en) Image annotation method and device, electronic equipment and storage medium
CN110084289B (en) Image annotation method and device, electronic equipment and storage medium
CN111783760A (en) Character recognition method and device, electronic equipment and computer readable storage medium
CN112668586A (en) Model training method, image processing device, storage medium, and program product
CN111563541B (en) Training method and device of image detection model
CN112508003A (en) Character recognition processing method and device
CN112149741A (en) Training method and device of image recognition model, electronic equipment and storage medium
CN111858905A (en) Model training method, information identification method, device, electronic equipment and storage medium
CN112529180A (en) Method and apparatus for model distillation
CN111967490A (en) Model training method for map detection and map detection method
CN111753911A (en) Method and apparatus for fusing models
CN111507405A (en) Picture labeling method and device, electronic equipment and computer readable storage medium
CN112329732A (en) Model generation method and device, electronic equipment and storage medium
CN111767380B (en) Model self-adaptive retraining method and device, electronic equipment and storage medium
CN112270318A (en) Automatic scoring method and device, electronic equipment and storage medium
CN111783635A (en) Image annotation method, device, equipment and storage medium
CN112529181A (en) Method and apparatus for model distillation
CN112381167A (en) Method for training task classification model, and task classification method and device
CN111768007A (en) Method and apparatus for mining data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant