CN114461886A - Labeling method, labeling device, electronic equipment and storage medium - Google Patents

Labeling method, labeling device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114461886A
CN114461886A CN202210140418.7A CN202210140418A CN114461886A CN 114461886 A CN114461886 A CN 114461886A CN 202210140418 A CN202210140418 A CN 202210140418A CN 114461886 A CN114461886 A CN 114461886A
Authority
CN
China
Prior art keywords
data
page
annotation
labeling
annotated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210140418.7A
Other languages
Chinese (zh)
Inventor
张峰瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210140418.7A priority Critical patent/CN114461886A/en
Publication of CN114461886A publication Critical patent/CN114461886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a labeling method, an apparatus, an electronic device and a storage medium, which relate to the technical field of artificial intelligence, and in particular to the fields of natural language processing, data labeling, etc. The specific implementation scheme is as follows: loading a search engine of the first marked page to obtain a search page; inputting data to be marked on the search page to obtain first data related to the data to be marked; and carrying out data annotation according to the first data to obtain a target data label corresponding to the data to be annotated. By adopting the method and the device, the accuracy of data marking can be improved.

Description

Labeling method, labeling device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and more particularly, to the fields of natural language processing, data tagging, and the like.
Background
With the development of artificial intelligence technology, data annotation, which is an important link in the industrial chain of artificial intelligence technology, is becoming more and more important, and whether data annotation is accurate or not can affect the iteration period of model training and the precision of model training. There is a need for an accurate data annotation scheme, however, no effective solution exists in the related art.
Disclosure of Invention
The disclosure provides a labeling method, a labeling device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided an annotation method, including:
loading a search engine of the first marked page to obtain a search page;
inputting data to be marked on the search page to obtain first data related to the data to be marked;
and carrying out data annotation according to the first data to obtain a target data label corresponding to the data to be annotated.
According to another aspect of the present disclosure, there is provided a labeling apparatus including:
the loading unit is used for loading the search engine of the first marked page to obtain a search page;
the data processing unit is used for inputting data to be labeled on the search page to obtain first data related to the data to be labeled;
and the data labeling unit is used for performing data labeling according to the first data to obtain a target data label corresponding to the data to be labeled.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided by any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the method provided by any one of the embodiments of the present disclosure.
By adopting the method and the device, the search engine of the first labeled page can be loaded to obtain the search page, the data to be labeled is input into the search page, the first data related to the data to be labeled can be obtained, the data labeling is carried out according to the first data, the target data label corresponding to the data to be labeled is obtained, and therefore the accuracy of data labeling can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a distributed cluster processing scenario according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram of an annotation method according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram of page preloading in an application example according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a component configuration supporting generation of an annotation page in an application example according to an embodiment of the disclosure;
FIG. 5 is a schematic diagram of another component configuration supporting generation of an annotation page in an application example according to an embodiment of the disclosure;
FIG. 6 is a schematic diagram of another component configuration supporting generation of an annotation page in an application example according to an embodiment of the disclosure;
7-9 are schematic diagrams of annotation pages corresponding to a plurality of annotation scenes, respectively, in an application example according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of a component structure of a tagging device according to an embodiment of the disclosure;
FIG. 11 is a block diagram of an electronic device for implementing an annotation method of an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The term "at least one" herein means any combination of at least two of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. The terms "first" and "second" used herein refer to and distinguish one from another in the similar art, without necessarily implying a sequence or order, or implying only two, such as first and second, to indicate that there are two types/two, first and second, and first and second may also be one or more.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Training data is used as input of model training, the quality of the training data has important influence on the effect of the model training, most of the current model training is supervised training, namely the training data used by the model training has data labels (the data labels obtained by pre-marking the training data with data), for example, the training data is pictures and contains pictures of cats, the data labels can be used for describing the pictures of cats and dogs and can be used for describing the pictures of dogs, and the pictures can be classified through the data labels, so that the model training can iterate more quickly and tends to be accurate. In other words, data annotation is used as a main way to improve the quality of training data, and the accuracy and efficiency of data annotation directly determine the iteration period of model training.
Most of the current data labeling is manual data labeling, the field of text labeling is taken as an example, the data labeling mode can be data labeling based on an excel mode, but the data labeling mode has the following problems when mass data are labeled:
1) visual fatigue is easy to cause, and the judgment of the line in the excel table of the marked text is more laborious after marking for a certain time;
2) the knowledge of the labeling user is limited, and for the situation that the data to be labeled is uncertain about which data label, the accurate data label can be provided only after the labeling page is switched to other pages or other auxiliary tools are additionally added for query, which brings additional query time consumption,
in conclusion, manual data annotation is performed based on the excel mode, accuracy of data annotation and annotation efficiency are not guaranteed, and accuracy of model training and an iteration cycle of model training are affected finally.
According to an embodiment of the present disclosure, fig. 1 is a schematic diagram of a distributed cluster processing scenario according to an embodiment of the present disclosure, a labeling platform for implementing data labeling may adopt a structure of the distributed cluster system, the distributed cluster system is only an example, and data labeling is performed by using a labeling platform formed by the distributed cluster system, which is exemplarily described. As shown in fig. 1, in the distributed cluster system, a plurality of nodes (e.g., server cluster 101, server 102, server cluster 103, server 104, and server 105) are included, the server 105 may further be connected to electronic devices, such as a cell phone 1051 and a desktop 1052, and one or more data annotation tasks may be performed between the plurality of nodes and the connected electronic devices together. Optionally, multiple nodes in the distributed cluster system may execute multiple data annotation tasks in a data parallel manner, or execute partial data annotation subtasks in one data annotation task in a data parallel manner, so as to improve the annotation efficiency of data annotation. Optionally, after each round of data annotation is completed, data exchange (e.g., data synchronization) may be performed between multiple nodes.
According to an embodiment of the present disclosure, a labeling method is provided, and fig. 2 is a schematic flow chart of the labeling method according to the embodiment of the present disclosure, and the method may be applied to a labeling apparatus, for example, the apparatus may be deployed in a single machine, multiple machines, or a cluster system, and may implement labeling and other processing in a case of being executed by a terminal, a server, or other processing devices. The terminal may be a User Equipment (UE), a mobile device, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 2, the method is applied to any node or electronic device (mobile phone or desktop, etc.) in the distributed cluster system shown in fig. 1, and includes:
s201, loading a search engine of the first labeled page to obtain a search page.
S202, inputting data to be annotated on the search page to obtain first data related to the data to be annotated.
And S203, performing data annotation according to the first data to obtain a target data label corresponding to the data to be annotated.
In an example of S201 to S203, the first annotation page is used to refer to a current annotation page, the annotation page is not limited to one, and may be multiple, each annotation page includes data to be annotated, and in consideration that some data to be annotated cannot determine a specific data tag (such as a classification tag) thereof depending on technical knowledge of an annotation user, a search engine of the current annotation page is loaded to obtain a search page, and then the search page may input the uncertain data to be annotated on the search page to obtain first data related to the data to be annotated (data "auxiliary knowledge content" for assisting the annotation user in understanding and judging), so that the related knowledge may be obtained in one-stop manner without switching other pages from the current annotation page or additionally using other auxiliary tools. For example, the data to be annotated, "beijing weather", may be input in the search page, so as to obtain more first data for understanding and judging the data tag corresponding to the data to be annotated, so as to better judge the data to be annotated according to the first data, and obtain a more accurate data tag.
According to the embodiment of the disclosure, a search engine of a first labeling page can be loaded to obtain a search page, data to be labeled is input into the search page, first data related to the data to be labeled can be obtained, data labeling is performed according to the first data, and a target data tag corresponding to the data to be labeled is obtained, so that the accuracy of data labeling can be improved.
In one embodiment, the method further comprises: and under the condition of carrying out data annotation on the first annotated page, carrying out page preloading on at least one second annotated page. Wherein the at least one second annotation page comprises: and the page to be annotated is arranged behind the first annotated page in time sequence. By adopting the embodiment, the method is provided with a page preloading mechanism, namely: when the data annotation is performed on the first annotation page (used for referring to the current annotation page), page rendering can be performed on at least one second annotation page (namely, one or more pages to be annotated after the current annotation page) in advance, compared with the case that one or more pages to be annotated are loaded in real time after each annotation page is annotated, the annotation efficiency of the data annotation is improved, and the problem that the time consumption of rendering the whole page is increased after a search engine is introduced into the annotation page to obtain the search page is solved.
In an embodiment, when data annotation is performed on a first annotated page, performing page preloading on at least one second annotated page includes: and starting a multithreading mode of data marking and page preloading parallel processing. And performing the data annotation on the annotated page by adopting a first thread, and performing page preloading on the at least one second annotated page by adopting a second thread. By adopting the embodiment, the parallel processing of data marking and page preloading multithreading is supported, the first thread can be adopted to mark data of the first marked page (used for referring to the current marked page), and simultaneously the second thread is adopted to preload at least one second marked page (namely, one or more pages to be marked after the current marked page) so as to finish page rendering of other marked pages in advance while marking data of the current marked page, thereby improving marking efficiency of data marking and avoiding the problem of time consumption increase of whole page rendering after a search engine is introduced into the marked page to obtain the search page.
In one embodiment, performing data annotation according to first data to obtain a target data tag corresponding to data to be annotated includes: and performing data annotation according to the first data to obtain at least one candidate data tag, and selecting the target data tag from the at least one candidate data tag in a mouse click mode or a keyboard shortcut key mode. By adopting the embodiment, in order to complete data labeling more quickly, the target data label can be obtained by directly adopting a selected mode (a mouse click mode or a keyboard shortcut mode), and compared with a manual data label input mode, the target data label is quicker and easier to use, so that the labeling efficiency of data labeling is improved.
In one embodiment, the method further comprises: displaying the at least one candidate data tag in the same display area; alternatively, the at least one candidate data tag is displayed in a drop down list. By adopting the embodiment, the display modes of the candidate data labels are diversified, the mode that the at least one candidate data label is displayed in the same display area can be adopted, and the mode that the at least one candidate data label is displayed in a pull-down list form can also be adopted, so that the data labeling scene is more diversified, more rapid and easier to use, and the labeling efficiency of the data labeling is improved.
In one embodiment, the method further comprises: and performing data abstraction processing on the plurality of labeling scenes to obtain second data, and obtaining reusable labeling components of the plurality of labeling scenes according to the second data. By adopting the embodiment, the reusable labeling component can be abstracted by performing data abstraction processing after data combing on a plurality of commonly used labeling scenes, is an independent labeling component and supports customization, and can be assembled into a corresponding labeling template for use according to data labeling requirements, data labeling scenes and the like in practical application.
In one embodiment, the method further comprises: and assembling the labeling components into a labeling template corresponding to the scene to be labeled, and obtaining at least one page to be labeled including the first labeling page according to the labeling template. In other words, the number of pages to be annotated obtained according to the annotation template is multiple, including but not limited to the first annotation page (used to refer to the current annotation page). If a new labeling template is redeveloped for each labeling scene, the cost is very high, and by adopting the embodiment, the user-defined configuration of the labeling component can be supported so as to assemble the labeling component into the corresponding labeling template for use as required, so that the method is suitable for various labeling scenes, has reusability and universality, saves the cost, and improves the data labeling efficiency.
The following is an example description of the labeling method provided by the embodiment of the present disclosure.
Text annotation can be solved by the following two schemes:
1) data annotation based on excel tables: and storing the data to be labeled into an excel table, and labeling the excel table one by a labeling user. The data labeling method is relatively flexible, classification and component labeling can be supported at low cost, however, visual fatigue is easily caused in the face of massive labeled data, and meanwhile, knowledge acquisition of labeled texts needs to be acquired by using additional other tools, so that additional time consumption is brought.
2) Webpage labeling: and providing open source labeling service, submitting the data to be labeled to an open source platform according to a preset format, and labeling the data one by opening a webpage link. According to the data annotation method, when a new annotation scene is expanded, a set of new annotation template needs to be developed and issued again based on the established open source annotation service, and meanwhile, knowledge acquisition still needs to be acquired by using additional other tools, so that additional time consumption is brought.
In the application example, a search engine is introduced into the labeled page, page preloading is realized, different labeling habits of different labeling users are matched, a reusable labeling component (self-definition) is provided, and the like, so that the labeling accuracy is improved, the labeling efficiency is also improved, and the method mainly comprises the following sub-contents:
one, introduce the search engine
The method comprises the steps of loading a search engine on a label page to obtain a search page, aiming at meeting the search requirement of labeling related knowledge in the labeling process of a labeling user in the label page, so that the labeling is not interrupted, the labeling user does not need to leave the current label page, and does not need to use other additional tools to obtain the labeling related knowledge.
Secondly, the page preloading is realized
When the search page is not added, the label page only has embedded basic components, the rendering of the whole page is in millisecond level, after the search page is introduced, if the complete result after the search is seen, the whole time consumption is over 1s, and therefore the time consumption of marking the single piece of data is over the time. In consideration of the problem that the data in the search page is required to be obtained after the annotated page is introduced into the search engine, the time consumption for rendering the page as a whole is obviously increased, the data annotation also brings extra overhead, and in order to avoid the problem, page preloading is required.
Fig. 3 is a schematic diagram of page preloading in an application example according to the embodiment of the present disclosure, as shown in fig. 3, when the nth data of the current labeled page is labeled, the labeled pages of n +1 and n +2 behind the current labeled page may be rendered in advance, when the nth data labeling is completed and the current labeled page is switched to n +1, the labeled pages of n +2 and n +3 may be rendered in advance, and so on, so that the labeling user cannot perceive the page loading process when performing data labeling, meanwhile, in order to reduce the waiting time of the search result, improve the labeling efficiency, the data labeling and the page preloading may adopt parallel processing of multiple threads, and when the page preloading, the text labeled by the labeling user may be used by default to perform searching to check the labeling accuracy.
Thirdly, matching different labeling user differentiated labeling habits
Considering that the selection mode is usually faster than the manual input mode in terms of the labeling speed, and the selection mode of the keyboard shortcut key is adopted to label data on the premise of adapting to the labeling mode, and the selection mode of the keyboard shortcut key can be optimized faster than the data labeling by adopting the mouse selection mode.
Fig. 4 is a schematic diagram of a component configuration supporting generation of a callout page in an application example according to an embodiment of the present disclosure, for example, in a label custom setting of a component shown in fig. 4, each data label (label) may be "good", "week", "bad", "none", and for facilitating operations performed by a computer, the stored value of each data label in the computer is "1", "0.5", "0". In order to avoid visual fatigue caused by data labeling of mass data by a labeling user, different colors such as green, red, blue and yellow corresponding to each data tag can be set, and different keyboard shortcut keys such as 'a','s','d', 'f' and the like corresponding to each data tag can be set according to the using habit of the labeling user. Based on the configurations shown in fig. 4, the labeling user may perform data labeling by using a keyboard shortcut key selection mode or a mouse selection mode according to the usage habit.
Fourthly, a reusable label component (supporting self-definition)
Through data abstraction processing of a plurality of common labeling scenes, the reusable component is extracted, and the completely self-defined configuration of the component is supported, so that the reusability of the component is higher, therefore, only assembly is needed when a labeling scene is newly added, a required labeling template is obtained after assembly, a labeling page is generated according to the labeling template, a new labeling template does not need to be redeveloped, and the development cost is reduced.
5-6 are schematic diagrams of another component configuration supporting generation of a label page in an application example according to an embodiment of the disclosure, as shown in FIG. 5, at least one candidate data tag may be displayed in the same display area; as shown in fig. 6, at least one candidate data tag may also be displayed in the form of a drop-down list. Fig. 5 to 6 are suitable for a data tagging scenario for a complex problem, and if a data tag corresponding to data to be tagged cannot be determined by data tags, such as "good", "week", "bad", and "none", the data tag is assisted by at least one predefined candidate data tag, and not only can the data to be tagged in a text form be assisted by recognition, but also link content related to the data to be tagged, which is obtained based on a URL pattern, can be assisted by recognition. The 'splicing URL' may be selected from the drop-down list if the auxiliary identification needs to be performed on the link content obtained from a plurality of URLs, and the 'URL may be acquired in the data set' may be selected from the drop-down list if the auxiliary identification needs to be performed on the link content obtained from one URL. 5-6 are merely examples of components, and a static text presentation component, a text editing component, and the like may be employed in addition to the several components listed above.
FIGS. 7-9 are schematic diagrams of annotation pages respectively corresponding to a plurality of annotation scenes in an application example according to an embodiment of the disclosure, as shown in fig. 7, including loading an annotation page 701 for a first search engine and loading an annotation page 703 for a second search engine, and a page 702 for executing the data annotation task, which executes the data annotation task for the same data to be annotated, namely Beijing weather, performing data annotation according to different first data obtained from different search engines, judging by an annotation user in combination with the different first data obtained from different search engines, selecting a candidate data tag, if the data label 'positive direction' corresponding to the data to be labeled is selected from 'positive direction', 'flat state', 'negative direction' and 'none', to avoid visual fatigue, the data tag "forward" may also be highlighted in a color that is different from the other tags. As shown in fig. 8, the data annotation method includes a tagging page 801 for loading a first search engine and a page 802 for executing a data annotation task, where the data to be tagged may be, in addition to plain text data "beijing weather", or link content related to the data to be tagged obtained based on a URL mode, and if the link content is used as new data to be tagged "https:// m.xxx.com/beijing weather", the data annotation task is executed for the same new data to be tagged "https:// m.xxx.com/beijing weather", data annotation is performed according to first data obtained from the first search engine, and after a tagging user makes a determination according to the first data obtained from the first search engine, a data tag "good" corresponding to the data to be tagged is selected from candidate data tags, such as "good", "weak", "bad", "none", in order to avoid visual fatigue, the data label "good" may also be highlighted in a color different from the other labels. Further, if the problem is too complicated, it is not sufficient to identify the exact meaning of the data tag by the data tag, and an auxiliary data tag, such as "correlation-recall", may be selected to identify the meaning of the data tag. As shown in fig. 9, including loading an annotation page 901 for a first search engine, loading an annotation page 903 for a second search engine, and a page 902 for executing a data annotation task, which is executed for different data to be annotated, such as "Beijing weather" in the first search engine and "capital weather" in the second search engine, performing data annotation according to different first data obtained from different search engines, judging by an annotation user in combination with the different first data obtained from different search engines, selecting a candidate data tag, for example, the data label 'strong correlation' corresponding to the two data to be labeled 'Beijing weather and capital weather' is selected from 'strong correlation', 'weak correlation', 'irrelevant' and 'none', to avoid visual fatigue, the data tag "strongly correlated" may also be highlighted in a color that is different from the other tags.
By adopting the application example, the problem of low manual text labeling efficiency is solved, the accuracy and the labeling efficiency of data labeling are improved, and the method can be applied to a labeling platform to support various data labeling scenes such as model training data labeling, data labeling of content classification, data labeling of product effect evaluation, evaluation of data labeling superiority and inferiority. Taking a two-classification labeling scene as an example, the average labeling time of a single piece of data is 0.7s, and compared with the traditional excel labeling efficiency, the efficiency is improved by 3 times. The efficiency is improved, more data can be labeled by the same manpower, or less manpower can be labeled by the same data, the iteration period of the model is inevitably shortened by the efficiency labeling, and the model training effect is quicker and better.
According to an embodiment of the present disclosure, there is provided a labeling apparatus, and fig. 10 is a schematic structural diagram of a composition of the labeling apparatus according to an embodiment of the present disclosure, as shown in fig. 10, a labeling apparatus 1000 includes: the loading unit 1001 is configured to load a search engine of the first labeled page to obtain a search page; the data processing unit 1002 is configured to input data to be annotated on the search page, and obtain first data related to the data to be annotated; a data labeling unit 1003, configured to perform data labeling according to the first data to obtain a target data tag corresponding to the data to be labeled.
In an embodiment, the apparatus further includes a page preloading unit, configured to perform page preloading on at least one second annotated page when the data annotation is performed on the first annotated page. Wherein the at least one second annotation page comprises: and the page to be annotated arranged behind the first annotation page in time sequence.
In one embodiment, the page preloading unit is configured to start a multithreading mode in which the data annotation and the page preloading are processed in parallel; the data annotation is carried out on the first annotation page by adopting a first thread; and performing the page preloading on the at least one second annotation page by adopting a second thread.
In one embodiment, the data labeling unit is configured to perform data labeling according to the first data to obtain at least one candidate data tag; and selecting the target data tag from the at least one candidate data tag in a mouse clicking mode or a keyboard shortcut key mode.
In one embodiment, the apparatus further comprises a display unit, configured to display the at least one candidate data tag in the same display area; alternatively, the at least one candidate data tag is displayed in a drop down list.
In one embodiment, the system further comprises a component generation unit, configured to perform data abstraction processing on the multiple labeling scenes to obtain second data; and obtaining reusable labeling components of the plurality of labeling scenes according to the second data.
In one embodiment, the system further comprises a to-be-labeled page generating unit, configured to assemble the labeling component into a labeling template corresponding to a to-be-labeled scene; and obtaining at least one page to be annotated comprising the first annotated page according to the annotation template.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 11, the electronic device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
A number of components in electronic device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as the labeling method. For example, in some embodiments, the annotation methods can be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1100 via the ROM 1102 and/or the communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the annotation method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the annotation method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions of the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. An annotation method, comprising:
loading a search engine of the first marked page to obtain a search page;
inputting data to be marked on the search page to obtain first data related to the data to be marked;
and carrying out data annotation according to the first data to obtain a target data label corresponding to the data to be annotated.
2. The method of claim 1, further comprising:
performing page preloading on at least one second labeled page under the condition of performing the data labeling on the first labeled page;
wherein the at least one second annotation page comprises: and the page to be annotated arranged behind the first annotation page in time sequence.
3. The method of claim 2, wherein the page preloading of at least one second annotated page, if the data annotation is made to the first annotated page, comprises:
starting a multithreading mode of the data marking and the page preloading parallel processing;
the data annotation is carried out on the first annotation page by adopting a first thread;
and performing the page preloading on the at least one second annotation page by adopting a second thread.
4. The method according to any one of claims 1 to 3, wherein the performing data annotation according to the first data to obtain a target data tag corresponding to the data to be annotated includes:
performing data annotation according to the first data to obtain at least one candidate data tag;
and selecting the target data tag from the at least one candidate data tag in a mouse clicking mode or a keyboard shortcut key mode.
5. The method of claim 4, further comprising:
displaying the at least one candidate data tag in the same display area; alternatively, the first and second electrodes may be,
displaying the at least one candidate data tag in a drop-down list.
6. The method of any of claims 1-3, further comprising:
performing data abstraction processing on the plurality of marked scenes to obtain second data;
and obtaining the reusable annotation components of the plurality of annotation scenes according to the second data.
7. The method of claim 6, further comprising:
assembling the marking assembly into a marking template corresponding to a scene to be marked;
and obtaining at least one page to be annotated comprising the first annotated page according to the annotation template.
8. A tagging device comprising:
the loading unit is used for loading the search engine of the first marked page to obtain a search page;
the data processing unit is used for inputting data to be labeled on the search page to obtain first data related to the data to be labeled;
and the data labeling unit is used for performing data labeling according to the first data to obtain a target data label corresponding to the data to be labeled.
9. The apparatus of claim 8, further comprising a page preloading unit to:
performing page preloading on at least one second labeled page under the condition of performing the data labeling on the first labeled page;
wherein the at least one second annotation page comprises: and the page to be annotated arranged behind the first annotation page in time sequence.
10. The apparatus of claim 9, wherein the page preloading unit is to:
starting a multithreading mode of the data marking and the page preloading parallel processing;
the data annotation is carried out on the first annotation page by adopting a first thread;
and performing the page preloading on the at least one second annotation page by adopting a second thread.
11. The apparatus according to any one of claims 8-10, wherein the data annotation unit is configured to:
performing data annotation according to the first data to obtain at least one candidate data tag;
and selecting the target data tag from the at least one candidate data tag in a mouse clicking mode or a keyboard shortcut key mode.
12. The apparatus of claim 11, further comprising a display unit to:
displaying the at least one candidate data tag in the same display area; alternatively, the first and second electrodes may be,
displaying the at least one candidate data tag in a drop-down list.
13. The apparatus according to any one of claims 8-10, further comprising a component generation unit to:
performing data abstraction processing on the plurality of marked scenes to obtain second data;
and obtaining the reusable annotation components of the plurality of annotation scenes according to the second data.
14. The apparatus of claim 13, further comprising a to-be-annotated page generation unit configured to:
assembling the marking assembly into a marking template corresponding to a scene to be marked;
and obtaining at least one page to be annotated comprising the first annotated page according to the annotation template.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202210140418.7A 2022-02-16 2022-02-16 Labeling method, labeling device, electronic equipment and storage medium Pending CN114461886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210140418.7A CN114461886A (en) 2022-02-16 2022-02-16 Labeling method, labeling device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210140418.7A CN114461886A (en) 2022-02-16 2022-02-16 Labeling method, labeling device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114461886A true CN114461886A (en) 2022-05-10

Family

ID=81414096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210140418.7A Pending CN114461886A (en) 2022-02-16 2022-02-16 Labeling method, labeling device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114461886A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226557A (en) * 2022-12-29 2023-06-06 中国科学院信息工程研究所 Method and device for picking up data to be marked, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226557A (en) * 2022-12-29 2023-06-06 中国科学院信息工程研究所 Method and device for picking up data to be marked, electronic equipment and storage medium
CN116226557B (en) * 2022-12-29 2024-04-19 中国科学院信息工程研究所 Method and device for picking up data to be marked, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP2021089739A (en) Question answering method and language model training method, apparatus, device, and storage medium
CN111104514A (en) Method and device for training document label model
CN113836925B (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN113590796B (en) Training method and device for ranking model and electronic equipment
US20210209526A1 (en) Landing page processing method, device and medium
CN110674620A (en) Target file generation method, device, medium and electronic equipment
JP7198800B2 (en) Intention Recognition Optimization Processing Method, Apparatus, Equipment and Storage Medium
CN113590776A (en) Text processing method and device based on knowledge graph, electronic equipment and medium
CN111209374A (en) Data query display method and device, computer system and readable storage medium
CN114428677A (en) Task processing method, processing device, electronic equipment and storage medium
CN111090991A (en) Scene error correction method and device, electronic equipment and storage medium
CN113221565A (en) Entity recognition model training method and device, electronic equipment and storage medium
CN113657395A (en) Text recognition method, and training method and device of visual feature extraction model
CN114461886A (en) Labeling method, labeling device, electronic equipment and storage medium
CN111666771A (en) Semantic label extraction device, electronic equipment and readable storage medium of document
CN110659368A (en) Knowledge graph construction method and device, electronic equipment and readable storage medium
CN113836877A (en) Text labeling method, device, equipment and storage medium
CN112270169B (en) Method and device for predicting dialogue roles, electronic equipment and storage medium
CN113360685A (en) Method, device, equipment and medium for processing note content
CN112528146A (en) Content resource recommendation method and device, electronic equipment and storage medium
CN112528610A (en) Data labeling method and device, electronic equipment and storage medium
CN111984876A (en) Interest point processing method, device, equipment and computer readable storage medium
CN112397050A (en) Rhythm prediction method, training device, electronic device, and medium
CN113254824B (en) Content determination method, device, medium, and program product
CN114374595A (en) Event node attribution analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination