CN112270532B - Data processing method, device, electronic equipment and storage medium - Google Patents

Data processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112270532B
CN112270532B CN202011261908.XA CN202011261908A CN112270532B CN 112270532 B CN112270532 B CN 112270532B CN 202011261908 A CN202011261908 A CN 202011261908A CN 112270532 B CN112270532 B CN 112270532B
Authority
CN
China
Prior art keywords
data
marked
auditing
standard
labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011261908.XA
Other languages
Chinese (zh)
Other versions
CN112270532A (en
Inventor
杨雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011261908.XA priority Critical patent/CN112270532B/en
Publication of CN112270532A publication Critical patent/CN112270532A/en
Application granted granted Critical
Publication of CN112270532B publication Critical patent/CN112270532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/101Collaborative creation, e.g. joint development of products or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and further relates to the fields of deep learning and the like. The specific implementation scheme is as follows: acquiring marked data and standard marked data of the marked data; and auditing the marked data according to the marked element position and/or the marked element content and the standard marked data. Under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.

Description

Data processing method, device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to deep learning and automatic driving technologies, and specifically relates to a data processing method, a data processing device, electronic equipment and a storage medium.
Background
With the gradual landing of artificial intelligence algorithms, algorithm research is called current. As a fuel for training algorithms, data quality plays a vital role in algorithm accuracy. However, in order to ensure the quality of the output data, manual data checking is mainly adopted, the efficiency is low, the labor cost is high, and improvement is needed.
Disclosure of Invention
The disclosure provides a data processing method, a data processing device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a data processing method, the method including:
acquiring marked data and standard marked data of the marked data;
and auditing the marked data according to the marked element position and/or the marked element content and the standard marked data.
According to another aspect of the present disclosure, there is provided a data processing apparatus comprising:
the data acquisition module is used for acquiring marked data and standard marked data of the marked data;
and the data auditing module is used for auditing the marked data according to the marked element position and/or the marked element content and the standard marked data.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of the embodiments of the present application.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any of the embodiments of the present application.
According to the technology, under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a data processing method provided according to an embodiment of the present application;
FIG. 2 is a flow chart of another data processing method provided in accordance with an embodiment of the present application;
FIG. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present application;
FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application. The embodiment of the application is suitable for the situation of how to process the data, and is particularly suitable for the situation of how to audit the marked data marked by the marking person in the scene of selecting the marking person for the data marking item. Alternatively, the data annotation items may include, but are not limited to, obstacle recognition scenes, object (e.g., vehicle) tracking scenes, human key point (e.g., face) recognition scenes, OCR text recognition such as named entity recognition scenes, and the like. The embodiment may be performed by a data processing apparatus, which may be implemented in software and/or hardware and may be integrated in an electronic device, such as a server device, carrying data processing functions. As shown in fig. 1, the method includes:
s101, marked data and standard marked data of the marked data are obtained.
In this embodiment, the marked data is obtained by marking the data to be marked. For example, for a vehicle tracking scene, the data to be marked may include continuous multi-frame images, and each frame of image is marked according to marking requirements, so as to obtain marked data. Optionally, the marked data type may be an image, and further, under different marking scenes, the data type may be different; the data types may include, but are not limited to, images, voice, text, video, web pages, and the like. Further, the embodiment is more suitable for a scene with an image data type.
The standard labeling data of the labeled data can be obtained by labeling the data to be labeled by authoritative personnel (such as labeling personnel with high labeling level). In order to ensure the accuracy of auditing the marked data, further, the standard marked data of the marked data can be marked by adopting a multi-person fitting answering mode. Optionally, in this embodiment, standard labeling data may be used as a reference, and the labeled data may be audited.
Optionally, in the scene of selecting the labeling person, the data to be labeled is a test question, the corresponding labeled data is a test answer, and the standard labeling data is a standard answer; one examination question can correspond to one piece of data to be marked, one examination question can comprise one or more examination questions, and one examination question can comprise one or more images.
For example, after the labeling of the data is completed, the personnel participating in the examination selected by the labeling personnel can submit the labeled data to a set database, and then the labeled data can be obtained under the condition that the examination auditing mode is determined to be automatic auditing. Meanwhile, standard marking data of the marked data can be obtained.
S102, auditing the marked data according to the marked element positions and/or the marked element contents and the standard marked data.
In this embodiment, the labeling element is a labeling tool used for labeling the data to be labeled, and may include, but is not limited to, a point, a line, a curve, a box, a stereo box, and the like. The labeling element position is the position of the labeling element in the labeling data, such as the position of a point in the labeling data, the position of a line in the labeling data, and the like. The labeling element content includes information about the object labeled with the labeling element, which may include, for example, but is not limited to, an object type (e.g., a car type), whether it is a forward obstacle, etc.; the labeling element content may also include the number of the labeling element, the size of the labeling element, whether the labeling element is occluded, and the like.
In the case of auditing the labeled data by adopting standard labeling data, multidimensional data such as labeling element positions, labeling element contents, labeling element quantity and the like can be combined to ensure auditing quality. For example, standard annotation data can be compared with annotated data to see if the positions of the annotation elements of the two are consistent and/or if the content of the annotation elements is the same; for another example, standard labeling data and labeled data can be compared to see whether the number of labeling elements of the standard labeling data and the labeled data is the same; or, the standard marking data and the marked data can be compared to check whether the number of marking elements of the standard marking data and the marked data is the same, whether the positions of the marking elements are consistent, whether the contents of the marking elements are the same, and the like.
Notably, in the prior art, the auditing mode of marked data mainly adopts manual auditing, so that the manpower cost is high, the efficiency is low, meanwhile, the level of auditors is uneven, and the quality of finally produced data cannot be ensured; by combining the marked element positions and the marked element contents, the marked data are automatically checked, so that the labor cost is saved, the checking efficiency is improved, and the data quality is ensured.
According to the technical scheme, the marked data is automatically audited by combining the marked element positions and the marked element contents, so that compared with the prior art, under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.
As an alternative manner of the embodiment of the application, after the marked data is audited, the audit result of the marker associated with the marked data may be determined according to the audit result of the marked data.
In this embodiment, the auditing result of the labeled data may be represented by a numerical value, may be represented by a level, and the manner of representing the auditing result of the labeled data is not limited in this embodiment.
Specifically, the auditing result of the marked data can be compared with the set recording threshold, and if the auditing result of the marked data is greater than the set recording threshold, the auditing result of the marked person associated with the marked data can be determined as the passing of the auditing. Further, under the condition that the number of marked data is at least two, candidate mark staff participating in the examination can be arranged in a descending order according to the auditing results of the marked data, and preset data are selected from the candidate mark staff to serve as target mark staff according to the sorting results, wherein the auditing results of the target mark staff are auditing passing, and the auditing results of other candidate mark staff are auditing failing.
In the scene of selecting the annotators, the annotators are audited by special auditors after finishing the data annotation at present, so that the labor cost is high and the efficiency is low; meanwhile, because the labeling levels of auditors are uneven, the capability level of the record labeling person cannot be guaranteed, and the data quality cannot be guaranteed. The marked data is automatically audited by combining the marked element positions and the marked element contents, so that the labor cost is saved, and the auditing efficiency is improved; meanwhile, the auditing result of the marked data is essentially the quantification of the marking capacity of the marking staff, and the auditing result of the marked data is used as the basis for selecting the marking staff, so that the quality of the selected marking staff can be ensured, and the data quality is further ensured.
Fig. 2 is a flowchart of another data processing method according to an embodiment of the present application. The embodiment of the application provides a method for auditing marked data based on marked element positions on the basis of the embodiment. As shown in fig. 2, the method includes:
s201, marked data and standard marked data of the marked data are obtained.
S202, determining a position evaluation index according to the type of the labeling element.
In this embodiment, the annotation element types may include, but are not limited to, points, lines, boxes, and regions; the type of annotation element referred to in the annotated data may comprise at least one of a point, a line, a frame, and a region; similarly, the annotation elements referred to in the standard annotation data also comprise at least one of points, lines, boxes, and regions. Optionally, if the type of the labeling element related to the standard labeling data is different from the type of the labeling element related to the labeled data, it may be directly determined that the auditing result of the labeled data is 0, etc.
If the type of the marking element related to the standard marking data is the same as the type of the marking element related to the marked data, the position evaluation index can be determined according to the type of the marking element. The position evaluation index is an index for explaining the position of the labeling element. Alternatively, different labeling element types may correspond to different location evaluation indexes. For example, if the labeling element type is a dot, the location evaluation index may include a dot pitch; if the labeling element type is a line, the position evaluation index can comprise point spacing (further the point spacing of key points on the line) and/or intersection ratio and the like; if the type of the labeling element is a frame, the position evaluation index can comprise line spacing, intersection ratio and the like; if the labeling element type is a region, the position evaluation index may include an intersection ratio and/or a region pixel number.
S203, auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
Specifically, the standard labeling data and the labeled data are compared, and the numerical value of the corresponding position evaluation index is determined according to the positions of the labeling elements in the standard labeling data and the positions of the labeling elements in the labeled data, so that the auditing result of the labeled data is determined.
As an alternative manner of the embodiment of the present application, S203 may specifically be: determining the numerical value of the position evaluation index according to the positions of the labeling elements, the standard labeling data and the labeled data; and determining the auditing result of the marked data according to the data of the position evaluation index and the set threshold value. In this embodiment, the set threshold may be set according to an actual labeling scene, for example, may be a default value; optionally, the evaluation indexes at different positions correspond to different set thresholds.
For example, if the type of the labeling element is a dot, the position evaluation index includes a dot pitch. Specifically, for each point in the marked data, determining a point corresponding to the point in the standard marked data and the position of the point; according to the position of the point and the position of the point corresponding to the point in the standard marking data, the numerical value of the point distance can be determined. Further, the point may be mapped to the standard labeling data according to the position of the point, and in the standard labeling data, a distance between the point corresponding to the point in the standard labeling data and the point mapped to the standard labeling data, that is, a numerical value of the point distance, is calculated. And then comparing the value of the point spacing with a first set threshold value such as 3px, and if the value of the point spacing is larger than the first set threshold value, determining that the auditing result of the point is a labeling error. Furthermore, under the condition that the labeling element types only comprise points and the number of the points is a plurality of points, the auditing results of the points can be aggregated to obtain the auditing results of the labeled data. For example, if 10 points exist and the auditing result of 3 points is labeling error, the auditing result of labeled data is 7/10.
If the labeling element type is a line, the position evaluation index comprises the point spacing and/or the intersection ratio. Specifically, for each line in the marked data, determining a line corresponding to the line in the standard marked data and the position of the line; the value of the dot spacing can be determined based on the location of the key point on the line (e.g., the midpoint of the line) and the location of the key point on the line (e.g., the midpoint of the line) corresponding to the line in the standard label data. Further, the on-line key points can be mapped into standard labeling data according to the positions of the on-line key points, and in the standard labeling data, the distances between the on-line key points corresponding to the on-line key points in the standard labeling data and the points of the on-line key points mapped into the standard labeling data, namely the numerical value of the point distance, are calculated. And then comparing the value of the point spacing with a second set threshold value such as 10px, and if the point spacing is larger than the second set threshold value, determining that the auditing result of the line is a labeling error.
Or, for each line in the marked data, determining a line corresponding to the line in the standard marked data; constructing a region based on the plurality of points on the line; meanwhile, constructing another area according to a plurality of points on the line corresponding to the line in the standard marking data; determining the value of the intersection ratio between the two constructed areas according to the positions of the two constructed areas; and then comparing the value of the cross ratio with a set third set threshold value, and if the value of the cross ratio is smaller than the third set threshold value, determining that the auditing result of the line is a labeling error.
Alternatively, for each line in the annotated data, the audit result for that line may be determined based on the intersection ratio and the dot spacing. Optionally, under the condition of checking the intersection ratio and the point distance simultaneously, if only one of the intersection ratio and the point distance can determine that the checking result of the line is correct in marking, the final checking result of the line is correct in marking.
Furthermore, under the condition that the labeling element type only comprises lines and the number of the lines is a plurality of lines, the auditing results of the lines can be aggregated to obtain the auditing results of the labeled data.
If the type of the labeling element is a box, the position evaluation index comprises line spacing and/or cross ratio. Specifically, for each frame in the marked data, determining the frame corresponding to the frame in the standard marked data and the position of the frame; according to the position of the frame and the position of the frame corresponding to the frame in the standard marking data, the numerical value of the line interval and/or the numerical value of the intersection ratio can be determined; thereafter, the auditing result of the box can be determined according to the value of the line spacing and/or the value of the cross ratio, and a set threshold. Wherein the value of the line pitch is the value of the largest line pitch among the line pitches of all sides of the frame. Optionally, under the condition that the line spacing and the cross-over ratio are considered at the same time, the final auditing result of the frame is correctly marked only if the auditing result of the frame can be determined to be correctly marked according to any one of the cross-over ratio and the line spacing.
If the labeling element type is the region, the position evaluation index comprises the cross ratio and/or the pixel number of the region. Specifically, for each region in the marked data, determining a region corresponding to the region in the standard marked data and the position of the region; according to the position of the region and the position of the region corresponding to the region in the standard marking data, the numerical value of the intersection ratio and/or the numerical value of the pixel number of the region can be determined; and then, determining the auditing result of the region according to the value of the cross ratio and/or the value of the pixel number of the region and setting a threshold value. Optionally, under the condition that the intersection ratio and the area pixel number are considered at the same time, if only any one of the intersection ratio and the area pixel number can determine that the auditing result of the area is correct in labeling, the final auditing result of the area is correct in labeling. For example, if the intersection ratio is greater than a set threshold (e.g., 80%), and the number of pixels in the region is greater than a set threshold (e.g., 70 px), the final audit result of the region is correctly labeled.
It should be noted that, under the condition of not considering the labeling element content, the auditing policy of the labeling element content can be defaulted to be completely exempted; furthermore, in combination with the actual scene, the auditing policy of labeling the element content can be part exemption, for example, number exemption, i.e. no auditing number, etc. Further, when the labeling element position and the labeling element content are considered, if either one of the labeling elements is not satisfied, the result of the verification of the standard element is a labeling error. In addition, the auditing strategy of marking the element position can be set to be completely exempted or partially exempted. For example, for annotation scenes that focus more on the annotation element content, the audit of the annotation element location is relatively coarse.
It is noted that, in this embodiment, for different labeling element types, different position evaluation indexes may be introduced, so that, based on the position evaluation indexes, the labeled data is audited, and an optional way is provided for auditing the data. And simultaneously, the flexibility of the scheme is increased.
According to the technical scheme, the marked data is audited based on the position evaluation index by introducing the position evaluation index, and an optional mode is provided for auditing the data. Meanwhile, different labeling element types can be adopted to evaluate indexes at different positions, so that the flexibility of the scheme is further improved.
Fig. 3 is a flowchart of yet another data processing method provided according to an embodiment of the present application. The embodiment of the application provides a method for auditing marked data based on data auditing granularity on the basis of the embodiment. As shown in fig. 3, the method includes:
s301, marked data and standard marked data of the marked data are obtained.
S302, determining data auditing granularity according to the annotation data scene.
In this embodiment, the annotation data scene may include, but is not limited to: obstacle recognition scenes, object (such as vehicles) tracking scenes, human body key point (such as human faces) recognition scenes, OCR text recognition scenes and the like. The data auditing granularity is used for representing the granularity of auditing the marked data, and can comprise, but is not limited to, an element dimension, an image dimension, a question dimension and the like, wherein the auditing granularity of the question dimension is larger than that of the image dimension, and the auditing granularity of the image dimension is larger than that of the element dimension. The element dimension takes a single marked element as an audit granularity; the image dimension takes a single frame image as an audit granularity; the question dimension is that a single question is the auditing granularity, and is mainly applicable to the condition that one examination question comprises multiple frames of images.
Optionally, different annotation data scenarios may set different data audit granularities. Furthermore, each annotation data scene can have default data auditing granularity, and different data auditing granularities can be set in the same annotation data scene.
Optionally, the embodiment may directly obtain a preset data auditing granularity; under the condition that the data auditing granularity is not preset, the data auditing granularity can be flexibly set according to the labeling data scene.
S303, auditing the marked data according to the marked element positions and/or marked element contents, the data auditing granularity and the standard marked data.
And comparing the standard marking data with the marked data under the scene of selecting a marking person, and determining the auditing result of each marking element according to the marking element position and/or the standard element content if the data auditing granularity is the element dimension. If the marked data comprises at least two examination questions and each examination question comprises a frame of image, and each frame of image comprises a marking element, the auditing result of the marking element is the auditing result of the frame of image, and the auditing results of the frame of image are further aggregated to obtain the auditing result of the marked data.
Further, under the condition that each frame of image comprises a plurality of marking elements and the types of the marking elements are the same, the auditing results of the marking elements in the frame of image can be aggregated to obtain the auditing results of the frame of image; and then, the auditing results of the images of each frame can be aggregated, and the auditing results of the marked data can be obtained. In addition, under the condition that each frame of image comprises a plurality of marking elements and the types of the marking elements are different, for each type of marking element, the auditing results of the type of marking element in the frame of image can be aggregated to obtain the auditing results of the type of marking element in the frame of image; furthermore, the auditing results of all types of labeling elements in the frame image can be aggregated according to the set weight value to obtain the auditing results of the frame image; then, the auditing results of the images of each frame can be aggregated, and auditing results of marked data can be obtained;
further, under the condition that the examination questions comprise multiple frames of images, for each examination question comprising multiple frames of images, the examination results of the images in the examination questions can be aggregated to obtain examination results of the examination questions; and the auditing results of all examination questions can be aggregated to obtain the auditing results of marked data.
If the data auditing granularity is the image dimension, determining an auditing result of each frame image according to the labeling element position and/or the standard element content. If only one auditing result of the labeling element in the frame image is labeling error, the auditing result of the frame image is labeling error. Further, if the marked data comprises at least two examination questions, and each examination question comprises a frame of image, the auditing results of the frames of images are further aggregated, and the auditing results of the marked data can be obtained.
Further, under the condition that the examination questions comprise multiple frames of images, for each examination question comprising multiple frames of images, the examination results of the images in the examination questions can be aggregated to obtain examination results of the examination questions; and the auditing results of all examination questions can be aggregated to obtain the auditing results of marked data.
In addition, if the data auditing granularity is the question dimension, for each examination question, the auditing result of the examination question is determined according to the labeling element position and/or the standard element content. If only one examination result of the labeling element in the examination questions is labeling error, the examination result of the examination questions is labeling error. Further, if the marked data comprises at least two examination questions, the examination results of the examination questions of each frame are aggregated, and the examination results of the marked data can be obtained.
According to the technical scheme, the marked data is audited based on the data audit granularity by introducing the data audit granularity, so that an optional mode is provided for auditing the data. Meanwhile, different data annotation scenes can adopt different data auditing granularities, so that the flexibility of the scheme is further improved, and the requirements of different scenes can be met.
Fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The embodiment of the application is suitable for the situation of how to process the data, and is particularly suitable for the situation of how to audit the marked data marked by the marking person in the scene of selecting the marking person for the data marking item. Alternatively, the data annotation items may include, but are not limited to, obstacle recognition scenes, object (e.g., vehicle) tracking scenes, human key point (e.g., face) recognition scenes, OCR text recognition such as named entity recognition scenes, and the like. The embodiment may be performed by a data processing apparatus, which may be implemented in software and/or hardware and may be integrated in an electronic device, such as a server device, carrying data processing functions. The data processing apparatus 400 specifically includes:
the data acquisition module 401 is configured to acquire labeled data and standard labeling data of the labeled data;
the data auditing module 402 is configured to audit the labeled data according to the labeling element position and/or the labeling element content, and the standard labeling data.
According to the technical scheme, the marked data is automatically audited by combining the marked element positions and the marked element contents, so that compared with the prior art, under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.
Illustratively, the data auditing module 402 includes:
the index determining unit is used for determining a position evaluation index according to the type of the marking element;
and the data auditing unit is used for auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
The data auditing unit is specifically used for:
determining the numerical value of the position evaluation index according to the positions of the labeling elements, the standard labeling data and the labeled data;
and determining the auditing result of the marked data according to the numerical value of the position evaluation index and the set threshold value.
Illustratively, the data auditing module 402 is specifically configured to:
determining data auditing granularity according to the labeling data scene;
and auditing the marked data according to the marked element position and/or marked element content, the data auditing granularity and the standard marked data.
The marked data type in this embodiment is an image, the marking element type includes at least one of a point, a line, a frame, and a region, and the position evaluation index includes at least one of a point pitch, a line pitch, an intersection ratio, and a region pixel number.
Illustratively, the apparatus further comprises:
and the auditing result determining module is used for determining auditing results of annotators associated with the annotated data according to the auditing results of the annotated data.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, a block diagram of an electronic device according to a data processing method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the data processing methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the data processing method provided by the present application.
The memory 502 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the data acquisition module 401 and the data auditing module 402 shown in fig. 4) corresponding to the data processing method in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the data processing method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the data processing method, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device of the data processing method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the data processing method may further include: an input device 503 and an output device Y04. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the data processing method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
According to the technical scheme of the embodiment of the application, the marked data is automatically audited by combining the marked element positions and the marked element contents, so that compared with the prior art, under the condition of ensuring the data quality, the data auditing efficiency is improved, the labor cost is reduced, and a new idea is provided for auditing the data.
The method can be applied to the technical field of artificial intelligence, wherein the artificial intelligence is a technology for researching and enabling a computer to simulate certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a person, and the technology has a hardware level and a software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology knowledge graph technology and the like.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (12)

1. A data processing method, comprising:
acquiring marked data and standard marked data of the marked data;
auditing the marked data according to the marked element position and/or marked element content and the standard marked data;
the auditing of the marked data according to the marked element position and/or marked element content and the standard marked data comprises the following steps: determining data auditing granularity according to the labeling data scene; and auditing the marked data according to the marking element position and/or marking element content, the data auditing granularity and the standard marking data.
2. The method of claim 1, wherein auditing the annotated data based on annotation element positions and the standard annotation data comprises:
determining a position evaluation index according to the type of the labeling element;
and auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
3. The method of claim 2, wherein auditing the annotated data in accordance with the location assessment indicator, the annotation element location, and the standard annotation data comprises:
determining the numerical value of the position evaluation index according to the labeling element position, the standard labeling data and the labeled data;
and determining the auditing result of the marked data according to the numerical value of the position evaluation index and a set threshold value.
4. The method of claim 2, the annotated data type being an image, the annotation element type comprising at least one of a point, a line, a frame, and a region, the location-assessment indicator comprising at least one of a point-to-point spacing, a line spacing, an intersection ratio, and a region pixel count.
5. The method of claim 1, further comprising:
and determining the auditing result of the annotators associated with the annotated data according to the auditing result of the annotated data.
6. A data processing apparatus comprising:
the data acquisition module is used for acquiring marked data and standard marked data of the marked data;
the data auditing module is used for auditing the marked data according to the marked element position and/or the marked element content and the standard marked data;
the data auditing module is specifically configured to: determining data auditing granularity according to the labeling data scene; and auditing the marked data according to the marking element position and/or marking element content, the data auditing granularity and the standard marking data.
7. The apparatus of claim 6, wherein the data auditing module comprises:
the index determining unit is used for determining a position evaluation index according to the type of the marking element;
and the data auditing unit is used for auditing the marked data according to the position evaluation index, the marked element position and the standard marked data.
8. The apparatus of claim 7, wherein the data auditing unit is specifically configured to:
determining the numerical value of the position evaluation index according to the labeling element position, the standard labeling data and the labeled data;
and determining the auditing result of the marked data according to the numerical value of the position evaluation index and a set threshold value.
9. The apparatus of claim 7, the annotated data type being an image, the annotation element type comprising at least one of a point, a line, a frame, and a region, the location evaluation index comprising at least one of a point spacing, a line spacing, an intersection ratio, and a region pixel count.
10. The apparatus of claim 6, further comprising:
and the auditing result determining module is used for determining auditing results of annotators associated with the annotated data according to the auditing results of the annotated data.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the data processing method of any one of claims 1-5.
CN202011261908.XA 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium Active CN112270532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011261908.XA CN112270532B (en) 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011261908.XA CN112270532B (en) 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112270532A CN112270532A (en) 2021-01-26
CN112270532B true CN112270532B (en) 2023-07-28

Family

ID=74340141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011261908.XA Active CN112270532B (en) 2020-11-12 2020-11-12 Data processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112270532B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284509B (en) * 2021-05-06 2024-01-16 北京百度网讯科技有限公司 Method and device for obtaining accuracy of voice annotation and electronic equipment
CN113221999B (en) * 2021-05-06 2024-01-12 北京百度网讯科技有限公司 Picture annotation accuracy obtaining method and device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
CN109033220A (en) * 2018-06-29 2018-12-18 北京京东尚科信息技术有限公司 Automatically selecting method, system, equipment and the storage medium of labeled data
CN109697537A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus of data audit
WO2019137196A1 (en) * 2018-01-11 2019-07-18 阿里巴巴集团控股有限公司 Image annotation information processing method and device, server and system
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN111080092A (en) * 2019-11-29 2020-04-28 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium
CN111860305A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image annotation method and device, electronic equipment and storage medium
CN111860304A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
CN109697537A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus of data audit
WO2019137196A1 (en) * 2018-01-11 2019-07-18 阿里巴巴集团控股有限公司 Image annotation information processing method and device, server and system
CN109033220A (en) * 2018-06-29 2018-12-18 北京京东尚科信息技术有限公司 Automatically selecting method, system, equipment and the storage medium of labeled data
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN111080092A (en) * 2019-11-29 2020-04-28 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium
CN111860305A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image annotation method and device, electronic equipment and storage medium
CN111860304A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的图像自动标注方法综述;常致富;周风余;王玉刚;沈冬冬;赵阳;;山东大学学报(工学版)(06);29-39 *

Also Published As

Publication number Publication date
CN112270532A (en) 2021-01-26

Similar Documents

Publication Publication Date Title
CN111710412B (en) Diagnostic result verification method and device and electronic equipment
CN111144577B (en) Method and device for generating node representation in heterogeneous graph and electronic equipment
US20210287015A1 (en) Method and apparatus for vehicle re-identification, training method and electronic device
CN111598164B (en) Method, device, electronic equipment and storage medium for identifying attribute of target object
CN114549935B (en) Information generation method and device
CN112270532B (en) Data processing method, device, electronic equipment and storage medium
CN112001180A (en) Multi-mode pre-training model acquisition method and device, electronic equipment and storage medium
CN111783645A (en) Character recognition method and device, electronic equipment and computer readable storage medium
CN111881908B (en) Target detection model correction method, detection device, equipment and medium
US20220027854A1 (en) Data processing method and apparatus, electronic device and storage medium
CN111582375A (en) Data enhancement strategy searching method, device, equipment and storage medium
CN111860305B (en) Image labeling method and device, electronic equipment and storage medium
CN112288699B (en) Method, device, equipment and medium for evaluating relative definition of image
CN112149741A (en) Training method and device of image recognition model, electronic equipment and storage medium
CN112241704A (en) Method and device for judging portrait infringement, electronic equipment and storage medium
CN113591580B (en) Image annotation method and device, electronic equipment and storage medium
CN111696095B (en) Method and device for detecting surface defects of object
CN112529180A (en) Method and apparatus for model distillation
CN111783635A (en) Image annotation method, device, equipment and storage medium
CN112100530A (en) Webpage classification method and device, electronic equipment and storage medium
CN110738261A (en) Image classification and model training method and device, electronic equipment and storage medium
US20220351495A1 (en) Method for matching image feature point, electronic device and storage medium
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN113420149A (en) Data labeling method and device
CN112508027B (en) Head model for instance segmentation, instance segmentation model, image segmentation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant