WO2021082499A1 - 一种资源的标注管理系统 - Google Patents

一种资源的标注管理系统 Download PDF

Info

Publication number
WO2021082499A1
WO2021082499A1 PCT/CN2020/099354 CN2020099354W WO2021082499A1 WO 2021082499 A1 WO2021082499 A1 WO 2021082499A1 CN 2020099354 W CN2020099354 W CN 2020099354W WO 2021082499 A1 WO2021082499 A1 WO 2021082499A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
target
module
label
annotation
Prior art date
Application number
PCT/CN2020/099354
Other languages
English (en)
French (fr)
Inventor
赵晗荻
Original Assignee
百果园技术(新加坡)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2021082499A1 publication Critical patent/WO2021082499A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification

Definitions

  • This application relates to the field of information management technology, such as a resource label management system.
  • short video products have gradually become a popular application product in the Internet field.
  • users can watch short videos, record short videos, upload and publish short videos based on the relevant client software of short video products.
  • the short video product server it is necessary to accurately, effectively and timely mark the content of the uploaded short video, that is, to tag the short video content, so as to play an effective auxiliary role in the entire recommendation and search system.
  • Establish user portraits to provide more comprehensive information for operations, marketing and other businesses.
  • tagging short videos is mainly achieved by manual or simple recognition models.
  • many short video products are not limited to being developed in one country, and most of them have been oriented to global users, resulting in a large number of short video uploads.
  • the uploaded short videos Due to the different countries of users, the uploaded short videos also have variable content, language and cultural differences, etc., such as excessive content, fast update speed, difficulty in definition, and high requirements for label management efficiency. This brings a lot of trouble to the implementation of short video tagging, and the tagging solution in related technologies cannot effectively avoid the above situation.
  • the embodiment of the present application provides a resource labeling management system, which realizes effective management of label labeling of original resources, and improves labeling efficiency and labeling accuracy of labeling.
  • the labeling management system includes a machine labeling module, a resource allocation module, and a labeling resource storage module; wherein the machine labeling module is configured to input original resources into the machine labeling network model to obtain basic labeling resources with basic labels to form basic labeling resources
  • the resource allocation module is configured to select the target annotation resource to be marked from the basic annotation resource set and allocate it to the annotator according to the resource allocation needs of the annotator;
  • the annotation resource storage module is configured to receive the annotation
  • the target labeling resource with the target label submitted by the user is stored in the target labeling resource set, and the target label is generated by the label selection from a given target label template by the labeler.
  • Figure 1 shows a structural block diagram of a resource labeling management system provided by an embodiment of the present application
  • Figure 2 shows a structural block diagram of another resource labeling management system provided by an embodiment of the present application
  • FIG. 3 shows an example diagram of the architecture of the resource labeling management system provided by an embodiment of the present application
  • Fig. 4 provides an example diagram of the data flow of the resource labeling implementation in the resource labeling management system in an embodiment of the present application.
  • Fig. 1 shows a structural block diagram of a resource labeling management system provided by an embodiment of the present application. As shown in Fig. 1, the system includes: a machine labeling module 11, a resource allocation module 12, and a labeling resource storage module 13.
  • the machine labeling module 11 is configured to input the original resources into the machine labeling network model to obtain basic labeling resources with basic labels to form a basic labeling resource set;
  • the resource allocation module 12 is configured to select target annotation resources to be marked from the basic annotation resource set and allocate them to the annotators according to the resource allocation requirements of the annotators;
  • the annotation resource storage module 13 is configured to receive target annotation resources with target tags submitted by an annotator and store them in a target annotation resource set.
  • the target tags are generated by the annotators from a given target tag template by label selection.
  • the provided resource labeling management system integrates a machine labeling module 11, through which machine labeling module 11 can perform basic label labeling on the original resources, thereby obtaining basic labeling resources with basic labels.
  • the obtained basic annotation resources can form a basic annotation resource set.
  • the original resource can be understood as the audio, video or text resource uploaded and released by the resource provider.
  • the original resource can be the short video content recorded and uploaded by the user through the short video product client.
  • the original resource is also It can be professional audio, video or text content produced by professional content creators, or it can be professional content resources and scarce cold start resources provided through multi-channel network through multi-channel network, where multi-channel network can be understood as a resource
  • There are cooperative organizations on the platform which can provide online platforms for content arrangement, creator collaboration, digital rights management, and acquisition of sales services.
  • Lengqi resources are recently released resources that have been viewed by users with a relatively small number of times.
  • the original resource can also be some text content that introduces users or authors.
  • the original resources need to be labeled before they are presented to the audience users, so as to better show the content that the resources themselves want to express to the audience users through the labels formed by the annotations.
  • the original resource can obtain the basic labeling resource with the basic label after passing through the machine labeling module 11.
  • the machine labeling module 11 is equivalent to a coarse-grained recognition of the original resource.
  • the basic tag can be understood as a basic analytical label for the content of the original resource.
  • the basic tag may include tag information such as the upload time of the original resource, the popularity of the resource, and an overview of the resource content.
  • the main target users of the provided resource annotation management system may be the annotators who perform resource labeling. Considering that the annotators are located in different regions or are good at different fields, the annotators are The resources to be labeled have their own resource allocation requirements.
  • This embodiment can provide a system entry interface for the annotator. The annotator can enter the label-labeled function page through the entry interface of the label management system, and perform resource allocation in the function page
  • the setting of requirements, for example, the set resource allocation requirements may be the allocation requirements of resources, such as automatic allocation or manual allocation, or the labeling operation requirements of resources, such as relabeling the labeled resources.
  • the resource allocation module 12 set on the label management system can receive the resource allocation requirements set by the labeler on the relevant system interface, and can select the matching basic label resources from the set of basic label resources according to the resource allocation requirements.
  • the target annotation resource to be annotated it is allocated to the annotator.
  • the resource allocation module 12 by setting the resource allocation module 12 to allocate resources on demand, the situation of assigning resources to be labeled that the annotator is not good at can be well avoided, ensuring that the annotator can simply and conveniently treat the annotated resources for effective annotation. This improves the labeling efficiency of the labeler.
  • the basic label resources in the set of basic label resources formed above can all be referred to as resources to be labelled.
  • the labeling management system also integrates a labeling resource storage module 13.
  • the labeling resource storage module 13 can receive target labeling resources with target tags and store the received target labeling resources in the target labeling resource set.
  • the target tag can be understood as text information that provides a more detailed and precise explanation of the basic tag resource with the basic tag.
  • the target tag can be a reminder that shows the classification of the resource and the key content embodied by the resource. Text information.
  • the target tag of the target tagging resource can be generated by the tagger from the target tag template given by the tagging management system.
  • the target tag template can be understood as a template file containing the text content of the standardized tag.
  • Standardization can be embodied in the standard label format, standard label content expression form, etc.
  • the target label determined based on the above operation in this embodiment ensures the standardization of label labeling of resources, and improves the difficulty of label definition for labelers.
  • the labeling resource storage module 13 stores the received target labeling resources with target labels to form a target labeling resource set.
  • the target labeling resources in the target labeling resource set can provide basic information for subsequent content recommendation and search, and at the same time provide market expansion, The business side that puts the purchase amount provides content basis to better select the resource uploader and put the resources.
  • the labeling resource storage module 13 integrated in the labeling management system is not only set to store the received target labeling resources, but also set to store the basic labeling resources formed after machine labeling, that is, this embodiment forms The target annotation resource set and the basic annotation resource set of are both stored in the annotation resource storage module 13.
  • the original resources are first input into the machine labeling network model through the machine labeling module, and the basic labeling resources with basic labels are obtained to form a basic labeling resource set; then the resource allocation module is based on the labeling
  • the resource allocation needs of the annotator, the target annotation resource to be annotated is selected from the basic annotation resource set and allocated to the annotator, and finally the annotation resource storage module receives the target annotation resource with the target tag submitted by the annotator and stores it in the target annotation resource set , Wherein the target label is generated by the annotator from a given target label template by label selection.
  • the resource labeling management system provided in this embodiment is equivalent to a labeling management platform serving annotators.
  • the labeling management system can first machine label a large number of received original resources to obtain a set of basic labeling resources and realize resource management. Coarse-grained identification, and then allocate the target annotation resources to be labeled from the basic annotation resource set to the annotators according to the needs of the annotators, so that the annotators can simply and conveniently mark the target annotation resources based on the given target label template, and obtain more refined Accurate resource label information.
  • labelers when labeling operations based on this labeling management system, it effectively avoids labeling difficulties encountered in related technologies due to variable resource content and language differences. .
  • the resource allocation module 12 in this embodiment may be configured to determine the area where the annotator is located when the resource allocation request sent by the annotator is automatic resource allocation; and filter from the basic annotation resource set At least one candidate basic annotation resource matching the area where the location is located; selecting the included basic label of the candidate basic annotation resource that meets the preset filtering items, and assigning it to the annotator as the target annotation resource to be annotated.
  • the resource allocation module 12 uses different allocation methods to allocate annotation resources to the annotators based on different resource allocation requirements sent by the annotators.
  • the resource allocation module 12 may, when the resource allocation requirement is automatic resource allocation, combine some attributes of the annotator (the geographic location and the first language used, etc.) to autonomously filter the basic annotation resources that match the annotator.
  • the resource allocation module 12 can determine the geographic location of the annotator and obtain the area where the annotator is located, so as to understand the country to which the annotator belongs, and firstly screen out the basic annotation resources that match the annotator’s language as candidate basic annotation resources, and then
  • the resource allocation module 12 may also combine some preset filtering items to match the basic tags of the candidate basic annotation resources.
  • the upload time tag of the resource and the heat tag of the resource can be used to set the filtering items.
  • the upload time in the upload time tag can be set to be earlier than a time threshold.
  • the tag in the heat tag can be set The degree of heat is higher than a degree of heat, etc.
  • the resource allocation module 12 compares preset filtering items with the basic tags of the candidate basic annotation resources, thereby filtering out at least one target annotation resource to be annotated and assigning it to the corresponding annotator.
  • the resource allocation module 12 in this embodiment may be configured to obtain the first screening condition set by the annotator when the resource allocation request sent by the annotator is conditional allocation of resources (that is, manual allocation),
  • the first screening condition includes at least one of the following: the country to which the resource belongs, the resource identifier of the resource, and the uploader identifier of the resource uploader; and the basic annotations satisfying the first filtering condition are selected from the set of basic annotation resources Resources are allocated to the annotators as target annotation resources to be annotated.
  • the resource allocation module 12 may filter the basic annotation resources that meet the filtering conditions set by the annotator from the basic annotation resource set as the target annotation resources to be marked when the resource allocation requirement is that the resources are allocated according to conditions.
  • the annotator can set filter conditions according to the annotation requirements.
  • this embodiment will record the filter conditions here as the first filter condition, and the first filter condition can be set according to some attributes of the resource itself, for example, The country where the resource belongs (the country in which the resource is uploaded), the resource identifier of the resource (the resource identifier can be the unique identifier assigned to the resource when the resource is uploaded), and the uploader identifier of the resource uploader (the uploader identifier can be the uploader in the user The user's unique identification given during registration) and so on.
  • the country where the resource belongs the country in which the resource is uploaded
  • the resource identifier of the resource the resource identifier can be the unique identifier assigned to the resource when the resource is uploaded
  • the uploader identifier of the resource uploader the uploader in the user The user's unique identification given during registration
  • the resource allocation module 12 selects the basic annotation resources to be allocated to the annotator from the queue to be labeled with all the basic annotation resources in the basic annotation resource set, and can allocate the selected basic annotation resources to the annotation After that, the selected basic annotation resources are deleted from the list to be annotated to avoid repeated allocation to other annotators.
  • the number of target annotation resources to be annotated allocated by the resource allocation module 12 to the annotators is multiple
  • a cache queue corresponding to the annotator may be used to cache the target annotation resources to be annotated, and the target annotation resources to be annotated are displayed to the annotator in the form of a list.
  • the resource allocation module 12 in this embodiment may be configured to obtain the second filter set by the annotator under the displayed resource relabeling interface when the resource allocation request sent by the annotator is resource relabeling allocation.
  • the second screening condition includes at least: the resource identifier of the relabeled resource; the target labeling resource that satisfies the second filter condition is selected from a set of target labeling resources, and the target labeling resource is allocated to the target labeling resource to be relabeled.
  • the resource allocation module 12 still has a situation in which the annotator wants to relabel some target annotated resources with target tags.
  • the resource allocation module 12 When the resource allocation request is received, the resource is relabeled and allocated.
  • the system can show a resource relabeling interface to the annotator, and set the second filter condition required for relabeling in the resource relabeling interface, so that according to the second The filter condition filters the target labeling resources that meet the requirements from the target labeling resource set and assigns them to the labelers.
  • the second filtering condition may be the geographic location of the target annotation resource that needs to be remarked, the annotation time of the last annotation, and the resource identifier, or the annotation of the last annotator corresponding to the target annotation resource.
  • the resource allocation module 12 integrated in the resource annotation management system provided by this embodiment can flexibly allocate suitable resources to annotators for annotation according to the annotators’ needs, thus realizing the rational allocation of resources, thereby avoiding the language of the annotators. And cultural differences cannot effectively label resources.
  • the resource allocation module 12 may be configured to send the target tag associated with the target annotation resource to be annotated to the annotator while allocating the target annotation resource to be annotated to the annotator.
  • a template the target label template is selected by the basic label of the target labeling resource to be labelled.
  • the target label template is stored in a preset label template set, and the label template set includes at least one target label template formed by combining standard labels possessed by historical labeling resources.
  • the label management system provided in this embodiment also includes a label template set.
  • the label template set includes at least one target label template.
  • the target label template can be determined according to the standard labels of the historical labeling resources. Perform clustering or classification processing to divide into different sets to form at least one target label template.
  • the resource allocation module 12 after the resource allocation module 12 determines that it needs to be allocated to the annotator's target annotation resource to be annotated, it can analyze the basic label of the target annotation resource to be annotated, and then send the target label template to which the basic label belongs to the annotator Used to mark the target label.
  • the labeling management system for resources with the above-mentioned characteristics provides a process in which the resource allocation module allocates target labeling resources to be labelled to the labelers according to different resource allocation requirements of the labelers.
  • the label management system with the resource allocation module realizes the flexible allocation of label resources.
  • FIG. 2 shows a structural block diagram of another resource labeling management system provided by an embodiment of the present application. This embodiment is detailed on the basis of the above embodiment.
  • the system also includes: resource receiving Module 14 and resource classification storage module 15.
  • the resource receiving module 14 includes at least one resource entry, and the resource receiving module 14 is configured to receive the original resource provided by the resource provider through a resource entry matching the resource, and the number of the resource entry is set based on the source path of the original resource;
  • the resource classification storage module 15 is configured to store the original resources received from the at least one resource entry into the original resource set corresponding to the at least one resource entry, respectively.
  • a resource receiving module 14 and a resource classification storage module 15 are also integrated in the labeling resource management system.
  • the labeling resource management system of this embodiment can effectively receive and store original resources through the above two functional modules, which is convenient for Classification management of original resources, for example, the resource entry can be set according to the source path of the original resource, for example, the source path of the original resource can be crawled by crawlers, provided by professional production content platforms, provided by user production content platforms, and resource review Platform provision, etc., based on different source channels, this embodiment can set different resource entries, so that resources that meet the conditions can be automatically pushed to the labeled resource management system, and the original resources are automatically classified and stored through the resource classification storage module 15. , In order to allocate the original resources from the source channel at the time of initial reception, and improve the efficiency of resource allocation.
  • the resource labeling management system can realize effective resource labeling of original resources, and can also realize effective management of target labeling resources, such as multi-dimensional sampling and statistics on target labeling resources, etc. , You can also respond to changes in label templates and label content at any time, and make timely adjustments, so as to improve the management efficiency of the labeling team and ensure the accuracy of the resource labeling results.
  • the labeling management system for resources provided in this embodiment may further include a label sampling module 16, which is configured to receive label query requests triggered by a sampling examiner, and determine from a set of target labeling resources that are triggered by the sampling examiner.
  • the tag sampling request corresponds to the resource to be sampled, and the resource to be sampled and the corresponding target tag are displayed to the sampler, and the sampler is a system user with sampling permission.
  • the label sampling request includes at least one of the following: resource source information of the resource to be sampled, resource identification of the resource to be sampled, annotator ID, key label content, and the country of the resource to be sampled.
  • the tag sampling module 16 embodies the sampling management of the target tagging resource set formed after tagging by the tagger.
  • the system can provide sampling inspection portals to the sampling inspectors who have the sampling inspection authority, and the sampling inspectors can perform random inspections on the tags through the sampling portal.
  • the module 16 sends the tag sampling request, and the tag sampling module 16 can analyze the tag sampling request to determine the corresponding resource to be sampled, and display the relevant tag information of the resource to be sampled to the sampler.
  • the spot checker can implement the spot check of the resources to be spot checked through the label spot check module 16 according to the dimensions of the source of the resource, the tagger or the tag team, the specific tag content, and the country at which the resource belongs.
  • the label sampling module 16 is further configured to receive the label modification information of the sampling resource from the sampling examiner, and replace the target corresponding to the sampling resource in the target labeling resource set with the label modification information. Label as the new target label.
  • the random inspector obtains the target tag of the resource to be inspected through the tag sampling module 16
  • the target tag formed by the annotator is found to be incorrect, he can directly modify the target tag to form a tag sampling module.
  • Receivable tag modification information to replace the original target tag of the resource to be sampled and store it in the target tag resource set.
  • the system provided in this embodiment may further include a tag query module 17, and the tag query module 17 is configured to receive a tag query request triggered by a queryer, and obtain a target tag matching the tag query request from a set of target tagging resources. The resources and corresponding target tags are fed back to the inquirer, who is a system user with inquiry authority.
  • the tag query request includes at least one of the following: tag tagging time information, resource identification of the resource to be queried, resource source information of the resource to be queried, and annotator identification.
  • the tag query module 17 embodies the query function of the target tagging resource set formed after tagging the tagger, and the system provides tag information for the tagger (such as taggers and other users of the system with query permissions, etc.) In this way, the queryer sends the generated tag query request to the tag query module 17, so that the tag query module 17 searches for the tag information of the relevant target tagging resource according to the tag query request.
  • the queryer can implement the tag information query of the resource to be queried through the tag query module 17 according to dimensions such as the time when the tagger forms the tag, the resource identification of the resource, the source of the resource, and the tagging team's identification.
  • the system may further include a label adjustment module 18 configured to perform an adjustment operation on the target label template in the label template set.
  • the label adjustment module 18 may be configured to receive and parse a label adjustment request triggered by a regulator.
  • the label adjustment request includes a label template to be adjusted and an adjustment requirement corresponding to the label template to be adjusted; from the label template Collectively select a target label template that matches the label adjustment request as the label template to be adjusted, and adjust the label template to be adjusted according to the adjustment requirement; the adjustment requirement is at least one of the following: a label rename request , Label hiding request and label classification request.
  • the label adjustment module 18 embodies the regulation and management of the label templates formed in the system, and the system can provide a label adjustment portal to regulators who have control rights, and the regulator sends the label adjustment module 18 through the label adjustment portal. After sending the label adjustment request, the label adjustment module 18 can analyze the label adjustment request to determine the label template to be adjusted and make corresponding adjustments.
  • the regulator can revise the label template collection through the label adjustment module 18, rename and hide the specified label, and add labels of different levels, so that the label management system can flexibly adapt to resource changes and Demand for resource labels at different stages of business development.
  • the label sampling module 16 and label query module 17 provided in this labeling management system are mainly used to collect the formed target labeling resources.
  • the related functions of management are realized, and the label adjustment module 18, the user control module 19, and the system configuration management module 20 included in the system can be understood as the realization of related functions for the management of various components or functional modules in the system at the system level.
  • the label adjustment module 18 can be regarded as a module for adjusting and managing the label template set.
  • the label adjustment module 18, the user control module 19, and the system configuration management module 20 in the system may be combined with the above-mentioned machine labeling module 11, resource allocation module 12, labeling resource storage module 13, resource receiving module 14, and resource classification storage module 15. There is no substantial connection between the label sampling module 16, the label query module 17, the information feedback module 21, and the data information statistics module 22.
  • the system of this embodiment may further include a user control module 19, which is configured to perform corresponding control and management on system users based on receiving a control and management request triggered by a regulator;
  • the system user roles include at least: Annotator, spot checker, queryer, and regulator;
  • the regulation management request is at least one of the following: a user update management request, a user group formation or update management request, and a user authority management request.
  • the system user roles in this labeling management system include at least annotator, spot checker, queryer, and regulator, etc., in order to ensure the orderliness and order of system users.
  • the permissions are clarified.
  • the user control module 19 can be used to implement the functions of adding system users, creating or modifying system user groups, and managing the permissions of different system user roles.
  • system of this embodiment further includes: a system configuration management module 20, the system configuration management module 20 is configured to perform configuration management on the configuration items to be configured according to the received system configuration management request; the configuration items to be configured include at least one of the following: Display interface language configuration items, resource source route configuration items, and machine annotation network model update configuration items.
  • the label management system of this embodiment may also include a system configuration management module 20.
  • the system configuration management module 20 mainly embodies the management of the functional configuration items in the system, such as the configuration management of the language displayed on the display interface, It can be displayed in Chinese when the target system user is a Chinese user, and it can be displayed in English when the target system user is a European and American user; another example is the management of resource sources, which can be displayed in the new resource source During the route, the corresponding resource entry is added and the corresponding storage space is allocated. It also has the management of the machine marking the network model threshold parameters and the increase or decrease of the number of models.
  • system of this embodiment may further include an information feedback module 21 configured to set the basic label of the basic annotation resource in the basic annotation resource set and the target label of the target annotation resource in the target annotation resource set as The training samples are fed back to the third-party model training management platform, so that the third-party model training management platform uses the training samples to update the model parameters of the machine labeling network model.
  • an information feedback module 21 configured to set the basic label of the basic annotation resource in the basic annotation resource set and the target label of the target annotation resource in the target annotation resource set as The training samples are fed back to the third-party model training management platform, so that the third-party model training management platform uses the training samples to update the model parameters of the machine labeling network model.
  • this embodiment since the number of original resources received by the annotation management system is huge, the number of resources of the basic annotation resources with basic tags in the basic annotation resource set and the target annotation resources with target tags in the target annotation resource set are also formed Hugely, this embodiment considers the above-mentioned data information as the sample information of the third-party model training management platform for the third-party model training management platform to update and train the machine labeling network model, and it can be known that the labeling management system The machine labeling network model updated and trained by the third-party model training management platform is used for machine labeling to ensure the accuracy of machine labeling.
  • the system of this embodiment may further include a data information statistics module 22.
  • the data information statistics module 22 is configured to perform statistics and store statistics on the statistical information according to the received data statistics request; wherein, the statistical information includes at least the following One item: the number of resources of target annotation resources formed by the same day, the resource area distribution of the target annotation resources formed by the same day, the daily labeling amount of the labeler, and the labeling accuracy rate of the labeler.
  • the information to be statistic performs distributed statistics through a distributed data processing platform.
  • the data information statistics module 22 included in the label management system can form statistical results by collecting statistics on the statistical information to be processed. Users of the relevant system can combine the statistical results of the data information statistics module 22 to control the label management system in real time. Mark the management situation.
  • the regulator can view information such as the distribution area and magnitude of labeling operations performed every day, and the efficiency and accuracy of the labelers, random checkers or reviewers among system users, so as to control the work efficiency of different system users in a timely manner. Arrange the investment of human resources.
  • FIG. 3 shows the provision of an embodiment of the present application.
  • An example diagram of the architecture of the resource labeling management system is shown in Figure 3.
  • the labeling management system roughly includes resource source management 30, machine labeling management 31, manual labeling management 32, manual correction management 33, result output management 34, and system maintenance Manage 35 six parts.
  • Resource source management 30 provides a resource pool based on a database. In this resource pool, the original resources are stored in different sub-resource pools according to the source of the resource; machine labeling management 31 provides information on the network model through machine labeling.
  • the function realization method of the active contour model for machine labeling provides the function realization method of resource labeling according to two different resource allocation requirements, namely automatic resource allocation and manual resource allocation; manual correction management 33 provides the function realization methods of relabeling resource allocation, sampling modification management, and query modification management; result output management 34 provides the function realization methods of basic label result output, target label result output, and data information statistical result output; system maintenance
  • the management 35 provides functional implementation methods for label template set control management, user control management, and system configuration management. The system maintenance management 35 can be performed in the above-mentioned multiple functional implementation stages.
  • Fig. 4 provides an example diagram of the data flow of the resource labeling implementation in the resource labeling management system in an embodiment of the present application.
  • the system background server 41 provides a HyperText Transfer Protocol (HTTP) interface service to receive the original resources uploaded by the user service terminal 42, and the original resources are stored in corresponding resource pools containing different sub-resource pools.
  • HTTP HyperText Transfer Protocol
  • the system producer process 44 caches the original resource address and encoding the original resource to the standby queue, and then the system consumer process 45 corresponding to the trained machine labeling network model performs processing on the original resource in the standby queue.
  • the system consumer process 45 stores the basic labeling resources after the machine labeling in the database 46, and puts them in the waiting queue at the same time, for the labeler’s manual labeling; when the browser’s back-end front-end function module 47 uses HTTP
  • the interface requests the system backend server 41 to obtain the basic annotation resources cached in the waiting queue.
  • the system backend server 41 screens the resources to be labeled from the waiting queue according to resource allocation requirements and renders the interface to the backend front-end functional module 47 so that the annotator can proceed.
  • the labeler submits the labelled target labeling resource with the target label to the system background server 41 through the background front-end function module 47, and the system background server 41 stores it in the database 46.
  • the third-party model training management platform 48 obtains basic annotation resources and target annotation resources from the database 46 through the data conversion process 50 and the distributed processing platform 51, and uses the basic annotation resources and target annotation resources as training samples to label the machine
  • the network model performs error regression training to improve the accuracy of the machine labeling network model, and pushes the trained machine labeling network model to the labeling management system for subsequent machine labeling.
  • the system background server 41 stores the data related to the annotation operation of the annotator in the manual annotation in the static database 49, and performs conversion processing on the data in the database 46 through the data conversion process 50.
  • the distributed processing platform 51 performs the conversion processing according to the static database 49 and
  • the converted data uses a distributed computing method to perform data information statistics, and the statistical results are stored in the static database 49.
  • the system backend server 41 obtains the statistical results from the static database 49 and displays them on the browser side in the form of reports.
  • the resource labeling management system provided by the embodiment of the application realizes the flexible combination of machine labeling and manual labeling, so that the labeling result of manual labeling is called the sample support of model training, which makes up for the long model training period and insufficient content coverage.
  • this labeling management system conducts sampling, query and correction on the results of manual labeling in a timely and multi-dimensional manner, which effectively controls the labeling accuracy problem and ensures the timely correction of manual labeling error information; in addition, this The label management system also keeps abreast of the current distribution of original resources from different sources, and provides an improvement direction for resource recommendation for resource sharing applications such as short videos.
  • each part of this application can be implemented by hardware, software, firmware, or a combination thereof.
  • multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution device.
  • a suitable instruction execution device For example, if it is implemented by hardware, as in another embodiment, it can be implemented by any one or a combination of the following technologies known in the art: Discrete logic circuits, ASICs with suitable combinational logic gate circuits, Programmable Gate Array (PGA), Field Programmable Gate Array (FPGA), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种资源的标注管理系统,该系统包括机器标注模块(11)、资源分配模块(12)以及标注资源存储模块(13);其中,所述机器标注模块(11)设置为将原始资源输入机器标注网络模型,得到带有基础标签的基础标注资源,形成基础标注资源集;所述资源分配模块(12)设置为根据标注者的资源分配需求,从所述基础标注资源集中选取待标注的目标标注资源分配给所述标注者;所述标注资源存储模块(13)设置为接收所述标注者提交的带有目标标签的目标标注资源并存储至目标标注资源集,所述目标标签由所述标注者从给定的目标标签模板中进行标签选择生成。

Description

一种资源的标注管理系统
本申请要求在2019年10月31日提交中国专利局、申请号为201911052744.7的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息管理技术领域,例如一种资源的标签管理系统。
背景技术
目前,短视频产品逐渐成为互联网领域一个受大众青睐的应用产品,对于用户来说,可以基于短视频产品的相关客户端软件进行短视频的观看、短视频录制及上传发布等操作。对于短视频产品服务端来说,需要对上传的短视频的内容进行精确、有效、及时的标注,即给短视频内容打标签,从而对整个推荐、搜索系统起到有效的辅助作用,同时可以建立用户画像,为运营、市场等业务提供更全面的信息。
相关技术中,为短视频打标签主要靠人工或者简单的识别模型来实现,然而,很多短视频产品已不局限于在一个国家发展,大部分已经面向全球用户,使得短视频的上传量大幅度增加,且由于用户所属国家的不同,所上传的短视频也存在内容多变、语言文化差异等情况,如内容量过大、更新速度过快、定义困难以及对标签管理效率要求高等,由此给短视频打标签的实现带来了很多困扰,相关技术中的打标签方案不能有效避免上述情况。
发明内容
本申请实施例提供了一种资源的标注管理系统,实现了对原始资源进行标签标注的有效管理,提高了标签标注的标注效率和标注准确率。
该标注管理系统包括机器标注模块、资源分配模块以及标注资源存储模块;其中,所述机器标注模块设置为将原始资源输入机器标注网络模型,得到带有 基础标签的基础标注资源,形成基础标注资源集;所述资源分配模块设置为根据标注者的资源分配需求,从所述基础标注资源集中选取待标注的目标标注资源分配给所述标注者;所述标注资源存储模块设置为接收所述标注者提交的带有目标标签的目标标注资源并存储至目标标注资源集,所述目标标签由所述标注者从给定的目标标签模板中进行标签选择生成。
附图说明
图1给出了本申请一实施例提供的一种资源的标注管理系统的结构框图;
图2给出了本申请一实施例提供的另一种资源的标注管理系统的结构框图;
图3给出了本申请一实施例提供的资源的标注管理系统的架构示例图;
图4提供了本申请一实施例中的资源的标注管理系统中资源标注实现的数据流向示例图。
具体实施方式
图1给出了本申请一实施例提供的一种资源的标注管理系统的结构框图,如图1所示,该系统包括:机器标注模块11、资源分配模块12以及标注资源存储模块13。
其中,机器标注模块11设置为将原始资源输入机器标注网络模型,得到带有基础标签的基础标注资源,形成基础标注资源集;
资源分配模块12设置为根据标注者的资源分配需求,从所述基础标注资源集中选取待标注的目标标注资源分配给所述标注者;
标注资源存储模块13设置为接收标注者提交的带有目标标签的目标标注资源并存储至目标标注资源集,所述目标标签由所述标注者从给定的目标标签模板中进行标签选择生成。
在本实施例中,所提供的资源的标注管理系统中集成了机器标注模块11, 通过该机器标注模块11能够对原始资源进行基础的标签标注,从而获得带有基础标签的基础标注资源,所获得的基础标注资源可形成基础标注资源集。其中,所述原始资源可理解为资源提供者上传发布的音视频或者文本资源,示例性的,原始资源可以为用户通过短视频产品客户端录制及上传发布的短视频内容,同时,原始资源还可以为专业内容创作者生产的专业性音视频或文字内容,或者也可以是通过多频道网络提供的经过内容编排的专业性内容资源及稀缺冷启资源等,其中多频道网络可理解为与资源平台存在合作的组织,可以提供内容编排、创作者协作、数字版权管理以及获取销售等服务的网络平台,冷启资源为最近发布且被用户观看次数比较少的资源。此外,原始资源也可以是一些对用户或作者进行介绍的文字内容。
可以知道的是,原始资源在面向受众用户之前需要进行标签标注,以通过标注形成的标签来更好的向受众用户展示资源本身所要表达的内容。在本实施例中,原始资源通过机器标注模块11后可以获得带有基础标签的基础标注资源,机器标注模块11相当于对原始资源的一个粗粒度识别,粗粒度识别后原始资源所带有的基础标签可理解为对原始资源内容的一个基础解析标注,示例性的,基础标签中可以包括原始资源的上传时间、资源热度以及资源内容的概述等标签信息。
在本实施例中,所提供的资源的标注管理系统主要面向的用户可以是进行资源标签标注的标注者,考虑到标注者所处地域不同或所擅长领域的不同,标注者对系统所推送的待标注资源有自己的资源分配需求,本实施例可以为标注者提供一个系统进入接口,标注者可以通过标注管理系统的进入接口进入标签标注的功能页面,并可在该功能页面中进行资源分配需求的设置,示例性的,所设置的资源分配需求可以是对资源的分配需求,如自动分配或手动分配,也可以是对资源的标注操作需求,如对已标注资源进行重新标注等。
示例性的,本标注管理系统上设置的资源分配模块12可以接收到标注者在相关系统界面上设置的资源分配需求,并可按照资源分配需求来从基础标注资 源集中选择相匹配的基础标注资源作为待标注的目标标注资源分配给标注者。本实施例通过设置资源分配模块12进行资源按需分配的方式能够很好地规避向标注者分配标注者不擅长的待标注资源的情况,保证了标注者简单便捷对待标注资源进行有效标注,以此来提高标注者的标注效率,可以理解的是,上述所形成的基础标签资源集中的基础标签资源均可称为待标注资源。
在本实施例中,本标注管理系统还集成了标注资源存储模块13,该标注资源存储模块13可以接收到带有目标标签的目标标注资源并将所接收的目标标注资源存储至目标标注资源集,所述目标标签可理解为对带有基础标签的基础标签资源进行更详细更精准解释的文本信息,示例性的,所述目标标签可以是展示资源所属分类以及资源所体现的关键内容等提示的文本信息。
在本实施例中,目标标注资源的目标标签可以由标注者从本标注管理系统给定的目标标签模板中进行标签选择生成,目标标签模板可理解为包含了规范化标签文本内容的模板文件,其规范化可以体现在标准的标签格式、标准的标签内容表述形式等方面,本实施例基于上述操作确定的目标标签保证了对资源进行标签标注的标准化,改善了标注者对标签定义困难的情况。标注资源存储模块13对接收的带有目标标签的目标标注资源进行存储,形成目标标注资源集,目标标注资源集中的目标标注资源可以为后续内容推荐以及搜索等提供基础信息,同时给市场拓展、投放买量的业务方提供内容依据,以便更好的选择资源上传者以及投放资源等。
需要说明的是,本标注管理系统中集成的标注资源存储模块13除了设置为对所接收目标标注资源的存储,还设置为对机器标注后所形成基础标注资源的存储,即,本实施例形成的目标标注资源集以及基础标注资源集均存储在标注资源存储模块13中。
本申请实施例提供的一种资源的标注管理系统,首先通过机器标注模块将原始资源输入机器标注网络模型,得到带有基础标签的基础标注资源形成基础 标注资源集;之后由资源分配模块根据标注者的资源分配需求,从所述基础标注资源集中选取待标注的目标标注资源分配给标注者,最终标注资源存储模块接收标注者提交的带有目标标签的目标标注资源并存储至目标标注资源集,其中,目标标签由所述标注者从给定的目标标签模板中进行标签选择生成。本实施例提供的一种资源的标注管理系统,相当于一个为标注者服务的标注管理平台,本标注管理系统首先能够对接收的大量原始资源进行机器标注,得到基础标注资源集,实现资源的粗粒度标识,然后根据标注者的需求从基础标注资源集中向标注者分配待标注目标标注资源,从而使标注者能够简单便捷的基于给定的目标标签模板为目标标注资源打标,获得更精细准确的资源标签信息,对于标注者而言,基于本标注管理系统进行标注操作时,有效的规避了相关技术中的标注中遇到的因资源内容多变、语言差异带来的标注困难的情况。
示例性的,本实施例中的资源分配模块12可设置为在标注者发送的资源分配需求为资源自动分配的情况下,确定所述标注者的位置所在区域;从所述基础标注资源集中筛选与所述位置所在区域匹配的至少一个候选基础标注资源;选择所包含的基础标签符合预设筛选项的候选基础标注资源,作为待标注的目标标注资源分配给所述标注者。
在本实施例中,资源分配模块12基于标注者所发送资源分配需求的不同,采用不同的分配方式向标注者进行标注资源的分配。示例性的,资源分配模块12可以在资源分配需求为资源自动分配时,结合标注者的一些属性(所处地理位置以及使用的第一语言等),自主筛选与标注者匹配的基础标注资源。例如,资源分配模块12可以确定标注者的地理位置,获得标注者的位置所在区域,以此了解标注者归属的国家,首先筛选出与标注者语言相符的基础标注资源作为候选基础标注资源,之后,资源分配模块12还可以结合一些预设筛选项与候选基础标注资源的基础标签进行匹配。
示例性的,本实施例可以采用资源的上传时间标签以及资源的热度标签来设置筛选项,如可以设置上传时间标签中的上传时间早于一个时间阈值,又如, 可以设置热度标签中标定的热度高于一个热度等级等。资源分配模块12通过预设筛选项与候选基础标注资源的基础标签进行比对,由此筛选出至少一个待标注的目标标注资源分配给相应的标注者。
示例性的,本实施例中的资源分配模块12可设置为在标注者发送的资源分配需求为资源按条件分配(即手动分配)的情况下,获取所述标注者设置的第一筛选条件,其中,所述第一筛选条件包括下述至少一项:资源所属国家、资源的资源标识以及资源上传者的上传者标识;从所述基础标注资源集中筛选满足所述第一筛选条件的基础标注资源,作为待标注的目标标注资源分配给所述标注者。
示例性的,资源分配模块12可以在资源分配需求为资源按条件分配的情况下,从基础标注资源集中筛选满足标注者所设置筛选条件的基础标注资源作为待标注的目标标注资源。例如,标注者可以根据标注需求设置筛选条件,为便于区分,本实施例将此处的筛选条件记为第一筛选条件,所述第一筛选条件可以根据资源自身的一些属性来设置,如,资源所属国家(资源在哪个国家进行上传)、资源具备的资源标识(资源标识可以为资源上传时赋予资源的唯一标识)、以及资源上传者的上传者标识(上传者标识可以为上传者在用户注册时赋予的用户唯一标识)等。
可以理解的是,资源分配模块12从缓存有基础标注资源集中所有基础标注资源的待标注队列中选定要分配给标注者的基础标注资源,并可以在将选定的基础标注资源分配给标注者之后,将选定的这些基础标注资源从待标注列表中删除,以避免重复分配给其他标注者,同时,在资源分配模块12向标注者分配的待标注的目标标注资源的数量为多个的情况下,可以采用一个对应标注者的缓存队列来缓存待标注的目标标注资源,并以列表的形式将待标注的目标标注资源展示给标注者。
此外,本实施例中的资源分配模块12可设置为在标注者发送的资源分配需 求为资源重标注分配的情况下,获取所述标注者在所展示的资源重标注界面下设置的第二筛选条件,其中,所述第二筛选条件至少包括:重标注资源的资源标识;从目标标注资源集中筛选满足所述第二筛选条件的目标标注资源,作为待重标注的目标标注资源分配给所述标注者。
在本实施例中,资源分配模块12进行分配还存在一种情况是,标注者想要对一些进行具备了目标标签的目标标注资源进行重标注,示例性的,该种情况下,资源分配模块12接收到资源分配需求为资源重标注分配,此时系统可以向标注者展示一个资源重标注界面,并在该资源重标注界面中设置重标注所需的第二筛选条件,由此根据第二筛选条件从目标标注资源集中筛选符合要求的目标标注资源分配给标注者。示例性的,所述第二筛选条件可以是需要进行重标注的目标标注资源的所处地理位置、上次标注的标注时间以及资源标识,也可以是目标标注资源所对应上次标注者的标注者名称或标注者标识等。
本实施例所提供的资源的标注管理管理系统中集成的资源分配模块12能够根据标注者需求灵活的向标注者分配适合的资源进行标注,实现了资源的合理化分配,从而避免了标注者因语言及文化差异不能有效进行资源标注的情况。
在上述实施例的基础上,资源分配模块12在向所述标注者分配待标注的目标标注资源的同时还可设置为向所述标注者发送与所述待标注的目标标注资源关联的目标标签模板,所述目标标签模板通过所述待标注的目标标注资源所具备的基础标签选定。其中,所述目标标签模板存储在预设的的标签模板集中,所述标签模板集中包含至少一个结合历史标注资源具备的标准标签形成的目标标签模板。
本实施例所提供标注管理管理系统中还包括了一个标签模板集,标签模板集包含了至少一个目标标签模板,目标标签模板可以根据历史标注资源所具备的标准标签来确定,可通过将标准标签进行聚类或分类处理划分成不同的集合,以此形成至少一个目标标签模板。本实施例中,资源分配模块12确定需要分配 给标注者待标注的目标标注资源后,可以对待标注的目标标注资源的基础标签进行分析,进而将基础标签所归属的目标标签模板发送给标注者用来进行目标标签的标注。
本实施例包含上述特征的资源的标注管理系统,给出了资源分配模块根据标注者的不同资源分配需求向标注者分配待标注的目标标注资源的过程。通过具备该资源分配模块的标注管理系统,实现了标注资源的灵活分配。
图2给出了本申请一实施例提供的另一种资源的标注管理系统的结构框图,本实施例以上述实施例为基础进行细化,在本实施例中,该系统还包括:资源接收模块14和资源分类存储模块15。
其中,资源接收模块14包含至少一个资源入口,资源接收模块14设置为通过与资源匹配的资源入口接收资源提供者提供的原始资源,所述资源入口的数量基于原始资源的来源途径设置;
资源分类存储模块15设置为将从至少一个资源入口接收的原始资源分别存储至与至少一个资源入口对应的原始资源集。
在本实施例中,标注资源管理系统中还集成了资源接收模块14和资源分类存储模块15,本实施例的标注资源管理系统通过上述两个功能模块能够实现原始资源的有效接收和存储,便于原始资源的分类管理,示例性的,所述资源入口可以根据原始资源的来源途径设置,例如,原始资源的来源途径可以由爬虫抓取、专业生产内容平台提供、用户生产内容平台提供以及资源审核平台提供等,基于不同的来源途径,本实施例可以设置不同的资源入口,从而使满足条件的资源能够自动推送到本标注资源管理系统,并通过资源分类存储模块15进行原始资源的自动分类存储,以在初始接收时就对原始资源从来源途径进行分配,提高资源的分配效率。
需要说明的是,本实施例提供的资源的标注管理系统可以实现对原始资源的有效的资源标注,还可以实现对目标标注资源的有效管理,如对目标标注资 源进行多维度的抽检以及统计等,也可以随时应对标签模板以及标签内容的变化,及时作出调整等,以此来提升标注团队的管理效率,以及保证资源标注结果的准确性。
示例性的,本实施例所提供的资源的标注管理系统,还可以包括标签抽检模块16,标签抽检模块16设置为接收抽检者触发的标签查询请求,从目标标注资源集中确定与抽检者所触发的标签抽检请求对应的待抽检资源,并向所述抽检者展示所述待抽检资源及对应的目标标签,所述抽检者为具备抽检权限的系统用户。
其中,所述标签抽检请求中包括下述至少一项:待抽检资源的资源来源信息、待抽检资源的资源标识、标注者标识、关键标签内容以及待抽检资源的资源所属国家。
在本实施例中,标签抽检模块16体现了对标注者标注后所形成的目标标注资源集的抽检管理,系统可以向具备抽检权限的抽检者提供抽检入口,抽检者能够通过抽检入口向标签抽检模块16发送标签抽检请求,标签抽检模块16则可对标签抽检请求进行分析,从而确定对应的待抽检资源,并将待抽检资源的相关标签信息展示给抽检者。
示例性的,抽检者可以按照资源来源、标注者或标注团队、具体的标签内容以及资源归属国家等维度,通过标签抽检模块16实现对待抽检资源的抽检。
在上述实施例的基础上,标签抽检模块16还设置为接收抽检者对所述待抽检资源的标签修改信息,采用所述标签修改信息替换所述目标标注资源集中所述待抽检资源对应的目标标签,作为新的目标标签。
基于上述描述,抽检者通过标签抽检模块16获得待抽检资源的目标标签后,在发现标注者所形成的该目标标签有误的情况下,可直接对该目标标签进行修改,从而形成标签抽检模块16可接收的标签修改信息,来替换该待抽检资源原有的目标标签并存储在目标标签资源集中。
示例性的,本实施例提供的系统还可以包括标签查询模块17,标签查询模块17设置为接收查询者触发的标签查询请求,从目标标注资源集中获得与所述标签查询请求相匹配的目标标注资源及对应的目标标签,并反馈至所述查询者,所述查询者为具备查询权限的系统用户。
其中,所述标签查询请求包括下述至少一项:标签标注时间信息、待查询资源的资源标识、待查询资源的资源来源信息以及标注者标识。
在本实施例中,标签查询模块17体现了对标注者标注后所形成的目标标注资源集的查询功能,系统为查询者(如标注者以及具备查询权限的系统其他用户等)提供了标签信息的查询入口,从而使查询者将生成的标签查询请求发送给标签查询模块17,以使标签查询模块17按照标签查询请求进行相关目标标注资源标签信息的查询。
示例性的,查询者可以按照标注者形成标签的时间、资源的资源标识、资源的来源途径以及标注者或标注团队的标识等维度,通过标签查询模块17实现待查询资源的标签信息查询。
在上述实施例的基础上,该系统还可以包括标签调整模块18,标签调整模块18设置为对所述标签模板集中的目标标签模板进行调整操作。
示例性的,标签调整模块18可设置为接收并解析调控者触发的标签调整请求,所述标签调整请求中包括待调整标签模板以及对应所述待调整标签模板的调整需求;从所述标签模板集中选定与所述标签调整请求匹配的目标标签模板,作为所述待调整标签模板,按照所述调整需求调整所述待调整标签模板;所述调整需求为下述至少一项:标签改名请求、标签隐藏显示请求以及标签等级分类请求。
在本实施例中,标签调整模块18体现了对系统中所形成的标签模板的调控管理,系统可以向具备调控权项的调控者提供标签调整入口,调控者通过标签调整入口向标签调整模块18发送标签调整请求,标签调整模块18则可对标签 调整请求进行分析,从而确定待调整的标签模板并进行相应的调整。
示例性的,调控者可以通过标签调整模块18对标签模板集合进行修订,可以对指定的标签进行改名以及隐藏设置,同时还可以增加不同等级的标签,从而使标注管理系统能够灵活适应资源变化以及业务不同发展阶段对资源标签的需求。
需要说明的是,本标注管理系统中设置的标签抽检模块16和标签查询模块17以及系统还包括的信息反馈模块21以及数据信息统计模块22主要是对所形成的目标标注资源集中目标标注资源进行管理的相关功能实现,而系统所包括的标签调整模块18、用户调控模块19及系统配置管理模块20,则可理解为在系统层面上对系统中各组件或功能模块进行管理的相关功能实现。如,标签调整模块18可认为是对标签模板集进行调整管理的模块。由此,系统中标签调整模块18、用户调控模块19以及系统配置管理模块20可能与上述的机器标注模块11、资源分配模块12以及标注资源存储模块13、资源接收模块14、资源分类存储模块15、标签抽检模块16、标签查询模块17、信息反馈模块21以及数据信息统计模块22之间并不存在实质连接。
示例性的,本实施例的系统还可以包括用户调控模块19,用户调控模块19设置为根据接收调控者触发的调控管理请求,对系统用户进行相应的调控管理;所述系统用户角色至少包括:标注者、抽检者、查询者以及调控者;所述调控管理请求为下述至少一项:用户更新管理请求、用户群的组建或更新管理请求以及用户权限管理请求。
在本实施例中,根据上述多个功能模块的表述,可知本标注管理系统中的系统用户角色至少包括了标注者、抽检者、查询者以及调控者等,为了保证系统用户的有序化和权限明确化,本实施例可以通过用户调控模块19来实现新增系统用户、创建或修改系统用户群组以及管理不同系统用户角色权限的功能。
示例性的,本实施例的系统还包括:系统配置管理模块20,系统配置管理 模块20设置为根据接收的系统配置管理请求,对待配置项进行配置管理;待配置项包括下述至少一项:展示界面的语言配置项、资源来源途径配置项、机器标注网络模型的更新配置项。
可以知道的是,本实施例的标注管理系统中还可以包括系统配置管理模块20,系统配置管理模块20主要体现了对系统中功能配置项的管理,如对展示界面所显示语言的配置管理,可以在面向的系统用户为中国用户的情况下,以汉语显示,获得在面向的系统用户为欧美用户的情况下,以英语显示等;又如对资源来源途径的管理,可以在新增资源来源途径时,增设相应的资源入口以及分配相应的存储空间等,还具备对机器标注网络模型阈值参数以及模型数量增减的管理。
示例性的,本实施例的系统还可以包括信息反馈模块21,信息反馈模块21设置为将所述基础标注资源集中基础标注资源的基础标签以及所述目标标注资源集中目标标注资源的目标标签作为训练样本,反馈至第三方模型训练管理平台,以使所述第三方模型训练管理平台采用所述训练样本对所述机器标注网络模型进行模型参数更新。
在本实施例中,由于本标注管理系统接收的原始资源数量巨大,所以形成的基础标注资源集中带有基础标签的基础标注资源以及目标标注资源集中带有目标标签的目标标注资源的资源数量也是巨大的,本实施例考虑将上述数据信息作为第三方模型训练管理平台的样本信息,以用于第三方模型训练管理平台对机器标注网络模型的更新训练,且可以知道的是,本标注管理系统采用经第三方模型训练管理平台更新训练后的机器标注网络模型来进行机器标注,以保证机器标注的准确性。
示例性的,本实施例的系统还可以包括数据信息统计模块22,数据信息统计模块22设置为根据接收的数据统计请求,对待统计信息进行统计并存储;其中,所述待统计信息至少包括下述一项:当天标注所形成的目标标注资源的资 源数量、当天标注所形成的目标标注资源的资源区域分布、标注者的每天标注量以及标注者的标注准确率。
其中,所述待统计信息通过分布式数据处理平台进行分布式统计。
在本实施例中,本标注管理系统包括的数据信息统计模块22可以通过对待统计信息的统计来形成统计结果,相关系统用户可以结合数据信息统计模块22的统计结果来实时掌控本标注管理系统的标注管理形势。示例性的,调控者可以查看每天进行标注操作的分布区域、量级,系统用户中标注者、抽检者或审核者的效率、准确率等信息,从而及时把控不同系统用户的工作效率以合理安排人力资源的投入。
本申请实施例分解性的描述了资源的标注管理系统的多个功能模块,为了更好的了解本标注管理系统多个功能模块之间的关联性,图3给出了本申请一实施例提供的资源的标注管理系统的架构示例图,如图3所示,标注管理系统大致包括了资源来源管理30、机器标注管理31、人工标注管理32、人工校正管理33、结果输出管理34以及系统维护管理35六个部分。资源来源管理30中提供了一个基于数据库形成的资源池,在该资源池内按照资源来源的不同将,原始资源存储在不同的子资源池内;机器标注管理31中提供了通过机器标注网络模型中的活动轮廓模型进行机器标注的功能实现方式;人工标注管理32中提供了按照两种不同资源分配需求,即自动筛选资源分配、人工筛选资源分配的标注形态进行资源标注的功能实现方式;人工校正管理33中提供了重标注资源分配、抽检修改管理以及查询修改管理的功能实现方式;结果输出管理34中提供了基础标签结果输出、目标标签结果输出以及数据信息统计结果输出的功能实现方式;系统维护管理35中提供了对标签模板集调控管理、用户调控管理以及系统配置管理的功能实现方式,系统维护管理35可以在上述多个功能实现阶段进行。
此外,为了更好的了解本资源的标注管理系统的工作流程,图4提供了本 申请一实施例中的资源的标注管理系统中资源标注实现的数据流向示例图。如图4所示,系统后台服务器41提供了超文本传输协议(HyperText Transfer Protocol,HTTP)接口服务来接收用户业务端42上传的原始资源,原始资源存储在相应的包含不同子资源池的资源池43中,系统生产者进程44把原始资源地址以及对原始资源编码后缓存至待机打队列,然后由与训练好的机器标注网络模型对应的系统消费者进程45对待机打队列中的原始资源进行机器标注,系统消费者进程45将机器标注后的基础标注资源存放至数据库46中,并同时放进待人打队列,以用于标注者的人工标注;当浏览器的后台前端功能模块47通过HTTP接口请求系统后台服务器41获取缓存至待人打队列的基础标注资源,系统后台服务器41按照资源分配需求从待人打队列中筛选待标注资源并渲染出界面给后台前端功能模块47,以使标注者进行人工标注操作,标注者通过后台前端功能模块47把标注后带有目标标签的目标标注资源提交给系统后台服务器41,并由系统后台服务器41存储到数据库46中。
接上述描述,第三方模型训练管理平台48通过数据转换进程50及分布式处理平台51从数据库46中获取基础标注资源以及目标标注资源,通过将基础标注资源以及目标标注资源作为训练样本对机器标注网络模型进行错误回归训练,以提升机器标注网络模型的准确率,并将训练后的机器标注网络模型推给本标注管理系统进行后续的机器标注。此外,系统后台服务器41将人工标注中标注者标注操作相关的数据存储在静态数据库49中,并通过数据转换进程50对数据库46中的数据进行转换处理,分布式处理平台51根据静态数据库49以及转换后的数据采用分布式计算方式进行数据信息统计,并将统计后的结果存储在静态数据库49中,系统后台服务器41从静态数据库49中获取统计结果并以报表形式展现在浏览器侧。
本申请实施例提供的一种资源的标注管理系统,实现了将机器标注与人工标注的灵活结合,使得人工标注的标签结果称为模型训练的样本支持,弥补了模型训练周期长、内容覆盖面不足的问题;同时,本标注管理系统及时、多维 度地对人工标注的结果进行抽检、查询以及校正,有力地把控了标注准确率的问题并保证了人工标注错误信息的及时纠正;此外,本标注管理系统还及时了解当前不同来源途径的原始资源的分布情况,为短视频等资源分享类应用软件的资源推荐提供了改善方向。
值得注意的是,上述资源的标注管理系统中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行装置执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(Programmable Gate Array,PGA),现场可编程门阵列(Field Programmable Gate Array,FPGA)等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。

Claims (18)

  1. 一种资源的标注管理系统,包括机器标注模块、资源分配模块以及标注资源存储模块;
    其中,所述机器标注模块设置为将原始资源输入机器标注网络模型,得到带有基础标签的基础标注资源,形成基础标注资源集;
    所述资源分配模块设置为根据标注者的资源分配需求,从所述基础标注资源集中选取待标注的目标标注资源分配给所述标注者;
    所述标注资源存储模块设置为接收所述标注者提交的带有目标标签的目标标注资源并存储至目标标注资源集,所述目标标签由所述标注者从给定的目标标签模板中进行标签选择生成。
  2. 根据权利要求1所述的系统,其中,所述资源分配模块设置为:
    在所述标注者发送的资源分配需求为资源自动分配的情况下,确定所述标注者的位置所在区域;
    从所述基础标注资源集中筛选与所述位置所在区域匹配的至少一个候选基础标注资源;
    选择所包含的基础标签符合预设筛选项的候选基础标注资源,作为待标注的目标标注资源分配给所述标注者。
  3. 根据权利要求1所述的系统,其中,所述资源分配模块设置为:
    在所述标注者发送的资源分配需求为资源按条件分配的情况下,获取所述标注者设置的第一筛选条件,其中,所述第一筛选条件包括下述至少一项:资源所属国家、资源的资源标识以及资源上传者的上传者标识;
    从所述基础标注资源集中筛选满足所述第一筛选条件的基础标注资源,作为待标注的目标标注资源分配给所述标注者。
  4. 根据权利要求1所述的系统,其中,所述资源分配模块设置为:
    在所述标注者发送的资源分配需求为资源重标注分配的情况下,获取所述标注者在所展示的资源重标注界面下设置的第二筛选条件,其中,所述第二筛选条件至少包括:重标注资源的资源标识;
    从所述目标标注资源集中筛选满足所述第二筛选条件的目标标注资源,作为待重标注的目标标注资源分配给所述标注者。
  5. 根据权利要求1所述的系统,其中,所述资源分配模块设置为向所述标注者分配待标注的目标标注资源的同时还设置为:
    向所述标注者发送与所述待标注的目标标注资源关联的目标标签模板,所述目标标签模板通过所述待标注的目标标注资源所具备的基础标签选定;
    其中,所述目标标签模板存储在预设的标签模板集中,所述标签模板集中包含至少一个结合历史标注资源具备的标准标签形成的目标标签模板。
  6. 根据权利要求1-5任一项所述的系统,还包括资源接收模块和资源分类存储模块;
    其中,所述资源接收模块包含至少一个资源入口,所述资源接收模块设置为通过与资源匹配的资源入口接收资源提供者提供的原始资源,所述资源入口的数量基于原始资源的来源途径设置;
    所述资源分类存储模块设置为将从所述至少一个资源入口接收的原始资源分别存储至与所述至少一个资源入口对应的原始资源集。
  7. 根据权利要求1-5任一项所述的系统,还包括标签抽检模块;
    其中,所述标签抽检模块设置为接收抽检者触发的标签查询请求,从所述目标标注资源集中确定与所述抽检者所触发的标签抽检请求对应的待抽检资源,并向所述抽检者展示所述待抽检资源及对应的目标标签,所述抽检者为具备抽检权限的系统用户。
  8. 根据权利要求7所述的系统,其中,所述标签抽检请求中包括下述至少一项:待抽检资源的资源来源信息、待抽检资源的资源标识、标注者标识、关键标签内容以及待抽检资源的资源所属国家。
  9. 根据权利要求7所述的系统,其中,所述标签抽检模块还设置为:
    接收所述抽检者对所述待抽检资源的标签修改信息,采用所述标签修改信息替换所述目标标注资源集中所述待抽检资源对应的目标标签,作为新的目标 标签。
  10. 根据权利要求1-5任一项所述的系统,还包括标签查询模块;
    其中,所述标签查询模块设置为接收查询者触发的标签查询请求,从所述目标标注资源集中获得与所述标签查询请求相匹配的目标标注资源及对应的目标标签并反馈至所述查询者,所述查询者为具备查询权限的系统用户。
  11. 根据权利要求10所述的系统,其中,所述标签查询请求包括下述至少一项:标签标注时间信息、待查询资源的资源标识、待查询资源的资源来源信息以及标注者标识。
  12. 根据权利要求1-5任一项所述的系统,还包括标签调整模块;
    其中,所述标签调整模块设置为对所述标签模板集中的目标标签模板进行调整操作。
  13. 根据权利要求12所述的系统,其中,所述标签调整模块设置为:
    接收并解析调控者触发的标签调整请求,所述标签调整请求中包括待调整标签模板以及对应所述待调整标签模板的调整需求;
    从所述标签模板集中选定与所述标签调整请求匹配的目标标签模板作为所述待调整标签模板,按照所述调整需求调整所述待调整标签模板;
    所述调整需求为下述至少一项:标签改名请求、标签隐藏显示请求以及标签等级分类请求。
  14. 根据权利要求1-5任一项所述的系统,还包括用户调控模块;
    其中,所述用户调控模块设置为根据接收调控者触发的调控管理请求,对系统用户进行相应的调控管理;
    所述系统用户角色至少包括:标注者、抽检者、查询者以及调控者;
    所述调控管理请求为下述至少一项:用户更新管理请求、用户群的组建或更新管理请求以及用户权限管理请求。
  15. 根据权利要求1-5任一项所述的系统,还包括系统配置管理模块;
    其中,所述系统配置管理模块设置为根据接收的系统配置管理请求,对待 配置项进行配置管理;
    所述待配置项包括下述至少一项:展示界面的语言配置项、资源来源途径配置项、机器标注网络模型的更新配置项。
  16. 根据权利要求1-5任一项所述的系统,,还包括信息反馈模块;
    所述信息反馈模块设置为将所述基础标注资源集中基础标注资源的基础标签以及所述目标标注资源集中目标标注资源的目标标签作为训练样本反馈至第三方模型训练管理平台,以使所述第三方模型训练管理平台采用所述训练样本对所述机器标注网络模型进行模型参数更新。
  17. 根据权利要求1-5任一项所述的系统,还包括数据信息统计模块;
    其中,所述数据信息统计模块设置为根据接收的数据统计请求,对待统计信息进行统计并存储;
    其中,所述待统计信息至少包括下述一项:当天标注所形成的目标标注资源的资源数量、当天标注所形成的目标标注资源的资源区域分布、所述标注者的每天标注量以及所述标注者的标注准确率。
  18. 根据权利要求17所述的系统,其中,所述待统计信息通过分布式数据处理平台进行分布式统计。
PCT/CN2020/099354 2019-10-31 2020-06-30 一种资源的标注管理系统 WO2021082499A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911052744.7A CN110825914A (zh) 2019-10-31 2019-10-31 一种资源的标注管理系统
CN201911052744.7 2019-10-31

Publications (1)

Publication Number Publication Date
WO2021082499A1 true WO2021082499A1 (zh) 2021-05-06

Family

ID=69551738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099354 WO2021082499A1 (zh) 2019-10-31 2020-06-30 一种资源的标注管理系统

Country Status (2)

Country Link
CN (1) CN110825914A (zh)
WO (1) WO2021082499A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825914A (zh) * 2019-10-31 2020-02-21 广州市百果园信息技术有限公司 一种资源的标注管理系统
CN111368929B (zh) * 2020-03-09 2023-05-02 西安中科长青医疗科技研究院有限公司 一种图片的标注方法
WO2021217610A1 (zh) * 2020-04-30 2021-11-04 华为技术有限公司 数据标注系统、方法和数据标注管理器
CN113918713A (zh) * 2021-09-22 2022-01-11 南京复保科技有限公司 数据标注方法、装置、计算机设备及存储介质
CN116910164A (zh) * 2023-07-21 2023-10-20 北京火山引擎科技有限公司 用于内容推送的标签生成方法、装置、电子设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310352A1 (en) * 2011-10-03 2015-10-29 Aol Inc. Systems and method for performing contextual classification using supervised and unsupervised training
CN108573279A (zh) * 2018-03-19 2018-09-25 精锐视觉智能科技(深圳)有限公司 图像标注方法及终端设备
CN108806668A (zh) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 一种音视频多维度标注与模型优化方法
CN108985293A (zh) * 2018-06-22 2018-12-11 深源恒际科技有限公司 一种基于深度学习的图像自动化标注方法及系统
CN109815365A (zh) * 2019-01-29 2019-05-28 北京字节跳动网络技术有限公司 用于处理视频的方法和装置
CN110825914A (zh) * 2019-10-31 2020-02-21 广州市百果园信息技术有限公司 一种资源的标注管理系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753976B (zh) * 2017-11-01 2021-03-19 中国电信股份有限公司 语料标注装置和方法
CN109445948A (zh) * 2018-11-15 2019-03-08 济南浪潮高新科技投资发展有限公司 一种基于智能合约的数据标注众包平台系统及众包数据标注方法
CN109993399A (zh) * 2019-01-25 2019-07-09 杭州易明远科技有限公司 一种操作便捷的地理信息测控系统
CN110334772A (zh) * 2019-07-11 2019-10-15 山东领能电子科技有限公司 一种扩充类别式数据快速标注方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310352A1 (en) * 2011-10-03 2015-10-29 Aol Inc. Systems and method for performing contextual classification using supervised and unsupervised training
CN108573279A (zh) * 2018-03-19 2018-09-25 精锐视觉智能科技(深圳)有限公司 图像标注方法及终端设备
CN108806668A (zh) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 一种音视频多维度标注与模型优化方法
CN108985293A (zh) * 2018-06-22 2018-12-11 深源恒际科技有限公司 一种基于深度学习的图像自动化标注方法及系统
CN109815365A (zh) * 2019-01-29 2019-05-28 北京字节跳动网络技术有限公司 用于处理视频的方法和装置
CN110825914A (zh) * 2019-10-31 2020-02-21 广州市百果园信息技术有限公司 一种资源的标注管理系统

Also Published As

Publication number Publication date
CN110825914A (zh) 2020-02-21

Similar Documents

Publication Publication Date Title
WO2021082499A1 (zh) 一种资源的标注管理系统
CN101558591B (zh) 内容管理系统
CN105677844A (zh) 一种移动广告大数据的定向推送及用户跨屏识别方法
WO2019100577A1 (zh) 自动化测试管理方法、装置、终端设备及存储介质
US20140006406A1 (en) Systems and methods for analyzing and managing electronic content
Van Doorslaer et al. Measuring relationships in translation studies. On affiliations and keyword frequencies in the translation studies bibliography
CN110895586A (zh) 生成新闻页面的方法、装置、计算机设备及存储介质
US20120271817A1 (en) Information processing apparatus, information processing method, and program
CN112883204A (zh) 一种基于大数据的智能媒体融合云服务平台
Kim et al. Comparative analysis of manuscript management systems for scholarly publishing
CN112347243A (zh) 一种基于大数据收集、处理和个性化展示推送的企业破产信息服务方法
CN104834730B (zh) 数据分析系统和方法
Liu et al. Mapping the relationship between social media usage and organizational performance: A meta-analysis
TWI489409B (zh) 不同內文提供者間相關最新新聞之即時關聯
CN112825089A (zh) 文章推荐方法、装置、设备及存储介质
Wittenburg et al. FAIR practices in Europe
CN111859108A (zh) 一种舆情系统搜索词推荐系统
KR102224782B1 (ko) 지역 콘텐츠 관리 서비스 제공 시스템
CN111914136A (zh) 一种资源管理方法、装置、电子设备及存储介质
Yuhefizar et al. An Importance Performance Analysis (IPA) Model: Review E-government Web Portal Quality of Service
Kazi et al. Smart e-grievance system for effective communication in smart cities
Diao et al. Sustainable multimedia service cloud platform framework based on intelligent management system
Kaur et al. Analysis grid based DEEC protocol with priority queue for increasing lifetime of WSN
Rivas-de-Roca Mapping Quality Digital Journalism. Research Trends and Pending Challenges in the Age of Hi-Tech
Afzali Effect of integrated management systems on organizational productivity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20880879

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20880879

Country of ref document: EP

Kind code of ref document: A1