WO2021043101A1 - 音频分配方法、装置及存储介质 - Google Patents

音频分配方法、装置及存储介质 Download PDF

Info

Publication number
WO2021043101A1
WO2021043101A1 PCT/CN2020/112510 CN2020112510W WO2021043101A1 WO 2021043101 A1 WO2021043101 A1 WO 2021043101A1 CN 2020112510 W CN2020112510 W CN 2020112510W WO 2021043101 A1 WO2021043101 A1 WO 2021043101A1
Authority
WO
WIPO (PCT)
Prior art keywords
labeling
audio
party
user information
parties
Prior art date
Application number
PCT/CN2020/112510
Other languages
English (en)
French (fr)
Inventor
彭捷
杨益
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021043101A1 publication Critical patent/WO2021043101A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group

Definitions

  • This application relates to the field of artificial intelligence technology, and mainly relates to an audio distribution method, device and storage medium.
  • audio labeling tasks are basically distributed based on task volume requirements, that is, the number of tasks requiring audio labeling is first counted, and then tasks requiring audio labeling are distributed evenly according to the number of labeling parties.
  • the inventor realizes that different audio tagging tasks correspond to different security levels, and even distribution may lead to inaccurate distribution of audio tagging tasks, thereby affecting audio security.
  • the embodiments of the present application provide an audio distribution method, device, and storage medium, which can improve the accuracy and safety of assigning audio labeling tasks.
  • an audio distribution method including:
  • the security value of each tagging party is determined from the preset rating list corresponding to the audio attribute; the information in the preset rating list is used To describe the correspondence between the first user information, the second user information, and the security value;
  • each labeling party selecting a labeling party with a safety value greater than a first threshold from the multiple labeling parties to obtain multiple labeling parties to be assigned;
  • an embodiment of the present application provides an audio distribution device, wherein:
  • the processing unit is used to obtain the first user information and audio attributes of the audio to be labeled, and to obtain the second user information and processing attributes of each of the multiple annotating parties; according to the first user information and each of the The second user information determines the security value of each tagger from the preset rating list corresponding to the audio attribute; the information in the preset rating list is used to describe the first user information and the first user information. 2. Correspondence between user information and the security value; according to the security value of each tagging party, select tagging parties with a security value greater than a first threshold from the multiple tagging parties to obtain multiple to be assigned Labeling party; selecting a target labeling party from the multiple labeling parties to be allocated according to the audio attribute and the processing attribute of each labeling party to be allocated;
  • the communication unit is configured to allocate the labeling task corresponding to the audio to be labelled to the target labeling party.
  • an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured to be processed by the above
  • the program includes instructions for some or all of the steps described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, wherein the computer program causes the computer to execute the computer program as described in the first aspect of the embodiments of the present application. Some or all of the steps described.
  • FIG. 1 is a schematic flowchart of an audio distribution method provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of an audio distribution device provided by an embodiment of the application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • an embodiment of the present application provides a schematic flowchart of an audio distribution method.
  • the audio distribution method is applied to electronic devices.
  • the electronic devices involved in the embodiments of this application may include various handheld devices with wireless communication functions, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various Forms of user equipment (UE), mobile station (mobile station, MS), terminal device (terminal device), and so on.
  • UE user equipment
  • MS mobile station
  • terminal device terminal device
  • an audio distribution method is applied to an electronic device, in which:
  • S101 Acquire first user information and audio attributes of the audio to be labeled, and acquire second user information and processing attributes of each labeling party among a plurality of labeling parties.
  • the audio to be labeled may be an unlabeled audio file, or may be an already labeled audio file used in the training process of the annotating party, which is not limited herein.
  • the first user information of the audio to be annotated refers to the user information of the entry person corresponding to the audio to be annotated, that is, the user information of the person who entered the audio to be annotated.
  • the first user information may include related information such as the native place, area, age, occupation, gender, educational background, and work experience of the entered person, which is not limited here.
  • the audio attributes of the audio to be labeled may include audio type, audio capacity, audio source, audio content, and so on.
  • the audio capacity is used to describe the data size of the audio to be marked.
  • the audio source is used to describe the upload information of the audio to be marked. For example, if the audio source is a WeChat account, it means that the audio to be marked is the audio input by the entry person in the WeChat application.
  • the audio content may include summary information corresponding to the audio. Audio types can be classified according to application types, such as browsers, instant messaging applications, financial management applications, etc.
  • the audio types can also be classified according to language types, such as: Chinese, English, Mandarin, dialects, etc.
  • the audio type can also be classified according to the input type, such as search, voice chat, etc., or the audio type can also be classified according to the audio content, such as dialogue scenes, identity verification scenes, etc., which are not limited here.
  • the tagger may be a person who is registered in the audio tagging system in the electronic device and can handle audio tagging tasks.
  • the second user information of the tagging party refers to the user information of the tagging party, for example, the hometown, region, age, occupation, gender, education background, work experience, etc. of the tagging party, which are not limited here.
  • the tagging party may also be an electronic device, that is, processing the audio tagging task based on a computer program in the electronic device.
  • the second user information of the tagging party refers to the hardware information of the tagging party, such as capacity, remaining memory size, physical address, network speed, etc., which is not limited here.
  • the processing attributes of the labeling party may include processing audio type, average labeling rate, and so on.
  • the processed audio type includes the audio type that has been trained by the labeling party.
  • the average tagging rate is the average rate of processing audio tagging tasks of the tagging party. Further, different types of audio tagging tasks correspond to different processing efficiencies, and the average tagging rate can be divided into average tagging rates corresponding to each audio type.
  • S102 According to the first user information and each of the second user information, determine the safety value of each tagger from a preset score list corresponding to the audio attribute.
  • the safety value is used to describe the safety of the annotating party in processing the audio to be annotated.
  • the information in the preset score list is used to describe the correspondence between the first user information, the second user information, and the security value. Among them, the preset score list may describe in detail various information that may be encountered, or information corresponding to the two, for example, the correlation value between the input person and the annotation party corresponding to the audio to be annotated.
  • the preset rating list corresponding to the audio attribute is shown in Table 1 below.
  • the preset rating list can be divided into two categories: rating criteria and information types.
  • the rating criteria describe the difference between the first user information and the second user information.
  • the score value corresponding to the region and occupation of the time When the area of the entry corresponding to the audio to be marked in the first user information is Shenzhen, the occupation is a teacher, and the area indicated in the second user information is Chongqing, and the occupation is a doctor, the area and occupation are classified according to Table 1.
  • the corresponding score values are summed to obtain a safety value of 4.
  • the preset rating list includes multiple preset rating dimensions
  • the specific implementation of step S102 includes steps A1-A2, wherein:
  • A1 Determine an evaluation value corresponding to each of the preset scoring dimensions according to the first user information and the second user information.
  • the preset scoring dimension can be various information types between the first user information and the second user information, and can also include related information corresponding to each information type, for example: the input personnel and the corresponding information of the audio to be marked.
  • A2. Determine the safety value of each labeling party according to the preset weight and evaluation value corresponding to each of the preset scoring dimensions.
  • the weights corresponding to different preset scoring dimensions can be set in advance.
  • the preset scoring dimension is the correlation value between the entry person and the labeling party
  • the preset weight corresponding to the preset scoring dimension is 0.5.
  • the preset scoring dimension is the distance between the input person and the labeling party
  • the preset weight corresponding to the preset scoring dimension is 0.2.
  • the preset scoring dimension is the similarity value between the input person and the labeling party
  • the preset weight corresponding to the preset scoring dimension is 0.3, etc.
  • the weighted summation of the preset weight and evaluation value corresponding to each of the preset scoring dimensions may be performed to obtain the safety value of each labeling party.
  • Table 2 when the correlation value between the entered person and the labeling party is 0.3, the corresponding evaluation value is 2. When the distance between the entered person and the labeling party is 20,000 meters, the corresponding evaluation value is 3. When the similarity value between the entered person and the labeling party is 0.5, the corresponding evaluation value is 3.
  • the default weight corresponding to the correlation value between the input personnel and the labeling party is 0.5
  • the distance between the input personnel and the labeling party corresponds to the default weight value of 0.2
  • the similarity value between the inputting personnel and the labeling party corresponds to If the preset weight is 0.3, the weighted sum of the preset weight and the evaluation value corresponding to each of the preset scoring dimensions is 0.5*2+0.2*3+0.3*3, and the safety value is 2.5. .
  • step A1 and step A2 the evaluation value corresponding to each preset scoring dimension is determined according to the first user information and the second user information, and the preset weight corresponding to each scoring dimension is combined to determine the evaluation value of each tagging party.
  • the safety value improves the accuracy of determining the safety value.
  • S103 According to the safety value of each tagging party, select tagging parties with a safety value greater than a first threshold from the multiple tagging parties to obtain multiple tagging parties to be assigned.
  • the first threshold is not limited.
  • the method further includes: determining an audio type according to the audio attribute, and using a preset labeling duration corresponding to the audio type as the first threshold.
  • This application can directly obtain the audio type from the audio attributes, and can also determine the audio type according to the audio content and/or audio scene, and can also determine the audio type according to the application type and/or input type. It can be understood that the audio attribute may reflect the audio type, and determining the audio type of the audio to be labeled according to the audio attribute can improve the accuracy of determining the audio type.
  • the preset marking duration corresponding to the audio type of the audio to be marked is used as the first threshold. In this way, different labeling parties to be assigned can be selected according to the audio type, which improves the accuracy of selecting the labeling parties to be assigned.
  • S104 Select a target labeling party from the multiple labeling parties to be allocated according to the audio attribute and the processing attribute of each labeling party to be allocated.
  • the target annotator is the annotator corresponding to the annotation task corresponding to the audio to be annotated to be allocated, that is, the target annotator processes the annotation task after receiving the annotation task. It can be understood that selecting the target tagger based on the audio attributes, the safety value of each tagger, and the processing attributes can improve the security and processing efficiency of the tagging task corresponding to the audio to be tagged.
  • step S104 includes steps B1-B5, in which:
  • the labeling progress is the progress of the current audio task completed by the labeling party to be assigned.
  • This application does not limit the method for obtaining the marking progress.
  • the specific implementation of step B1 includes steps B11-B14, where:
  • the distribution list is used to record the audio allocated to each labeling party to be allocated, and the first user information and audio attributes of each allocated audio.
  • the average labeling rate is used to describe the labeling efficiency of each labeling party to be allocated, which can be obtained by analyzing the audio capacity and completion time of each labeling party to be allocated.
  • the size of the labeled data is used to describe the task volume of the allocated audio, which can be obtained through the capacity of each allocated audio.
  • steps B11-B14 first obtain the distribution list of each labeling party to be allocated and the average labeling rate, and then obtain the label data size corresponding to each labeling party to be allocated according to each allocation list, and finally correspond to each labeling party to be allocated Obtain the labeling progress corresponding to the labeling party to be assigned based on the label data size and the average labeling rate. In this way, obtaining the labeling progress according to the assigned labeling task and the average labeling rate of the labeling party to be assigned can improve the accuracy of obtaining the labeling progress.
  • B2. Determine the allocation probability of each labeling party to be allocated according to the audio attribute and the processing attribute of each labeling party to be allocated.
  • the distribution probability is used to describe the probability of each party to be assigned to process the to-be-annotated audio. Specifically, it can be obtained according to the service type required by the audio attribute and the service capability in the processing attributes of the party to be assigned.
  • the multiple parties to be assigned include the first party to be assigned and the second party to be assigned.
  • the third party to be assigned The audio attribute is English
  • the average labeling rate of the first labeling party to be assigned to process English audio is 2 words per minute
  • the average labeling rate of the second labeling party to be assigned to process English audio is 5 words per minute
  • the third labeling to be assigned Fang’s average tagging rate for processing English audio is 4 words per minute.
  • the allocation probability of the first labeling party to be allocated is 0.5
  • the allocation probability of the second labeling party to be allocated is 0.8
  • the allocation probability of the third labeling party to be allocated is 0.7.
  • the evaluation value is used to describe the sequence of assigning the audio to be labeled to the party to be assigned.
  • This application does not limit the method for determining the evaluation value.
  • the weights corresponding to the labeling progress and the distribution probability can be set separately, and then weighted with the labeling progress and the distribution probability to obtain the evaluation value of each labeling party to be assigned. For example, suppose that the labeling progress of the labeling party to be allocated is 60%, and the allocation probability is 0.5. When the weights corresponding to the marked progress and the distribution probability are 0.5 and 0.5, respectively, the evaluation value is 0.55.
  • the evaluation value of each tagging party to be allocated is determined according to the tagging progress and allocation probability corresponding to each tagging party to be allocated, and the maximum value of the evaluation value is taken as the target tagging party. In this way, the labeling efficiency can be improved.
  • S105 Assign a labeling task corresponding to the audio to be labelled to the target labeling party.
  • first user information and audio attributes of the audio to be labeled, and second attribute information and processing attributes of each of the multiple labeling parties are acquired.
  • the safety value of each tagging party is determined from the preset score list corresponding to the audio attribute, and then the tagging party whose safety value is greater than the first threshold is used as the tagging party to be assigned.
  • the target tagger is determined according to the audio attributes of the audio to be tagged and the processing attributes of each tagger to be assigned, and the tagging task corresponding to the audio to be tagged is assigned to the target tagger. In this way, the accuracy and safety of assigning audio tagging tasks can be improved.
  • step S105 includes step C1 and step C2, where:
  • the method for separating the audio to be annotated can be through the method of voiceprint recognition, that is, identifying users in the audio to be annotated, and each audio segment corresponds to a user.
  • the separation method of the audio to be labeled can also be through the channel separation method, which is to classify the audio clips obtained by different pickup devices. For example, two channels are divided into 2 audio clips, and three channels are divided into 3 audio clips. Not limited.
  • the audio attribute includes audio type
  • the specific implementation of step C1 includes steps C11-C13, wherein:
  • Speech recognition technology converts the vocabulary content of human speech into computer-readable input, such as keystrokes, binary codes, or character sequences.
  • the segmentation can be performed according to the completeness of the sentence, that is, the same paragraph of text is divided into a text segment.
  • steps C11-C13 voice recognition is performed on the to-be-labeled audio to obtain text information, and then the text information is segmented to obtain multiple text segments. In this way, the accuracy of segmenting text segments can be improved. Then, according to the time information of each text segment, the to-be-labeled audio is separated to obtain multiple audio segments, thereby improving the accuracy of segmenting the audio segments.
  • step C1 and step C2 the audio to be labeled is classified to obtain multiple audio clips, and then the labeling tasks corresponding to the multiple audio clips are assigned to the target labeling party, so that the target labeling party can individually label the audio clips , And combine the upper and lower semantics for labeling, which is convenient to improve the efficiency and accuracy of labeling.
  • steps D1-D3 can also be performed, where:
  • the target annotation file is a file obtained by the target annotator who annotates the audio to be marked.
  • the target annotation file may include the text translation, speech rate, emotion, role, gender, identity, etc. of the audio to be annotated, which is not limited here.
  • the reference mark file is a pre-stored standard mark file.
  • the recognition rate is used to describe the recognition accuracy rate of the target annotation file.
  • the second threshold in this application is not limited, and can be set according to training.
  • the target annotation file sent by the target annotation party through the annotation device is received, and the target annotation file is compared with the reference annotation file to obtain the recognition rate. Then, the recognition rate is compared with the second threshold, and if it is less than the second threshold, a prompt message is sent to the labeling device to prompt the target labeling party to relabel the audio to be labeled. In this way, the marking business capability of the target marking party is improved by means of verification.
  • FIG. 2 is a schematic structural diagram of an audio distribution device provided by an embodiment of the present application, and the device is applied to an electronic device. As shown in FIG. 2, the above-mentioned audio distribution device 200 includes:
  • the processing unit 201 is configured to obtain the first user information and audio attributes of the audio to be labeled, and to obtain the second user information and processing attributes of each labeling party among the multiple labeling parties; according to the first user information and each labeling party;
  • the security value of each tagging party is determined from a preset rating list corresponding to the audio attribute; the information in the preset rating list is used to describe the first user information, the The corresponding relationship between the second user information and the security value; according to the security value of each tagging party, select tagging parties with a security value greater than the first threshold from the multiple tagging parties to obtain multiple pending parties. Allocating annotating parties; selecting a target annotating party from the plurality of annotating parties to be allocated according to the audio attributes and the processing attributes of each of the annotating parties to be allocated;
  • the communication unit 202 is configured to allocate the labeling task corresponding to the audio to be labelled to the target labeling party.
  • the first user information and audio attributes of the audio to be labeled are acquired first, and the second attribute information and processing attributes of each of the multiple labeling parties are acquired. Then, according to the first user information and each second user information, the safety value of each tagging party is determined from the preset score list corresponding to the audio attribute, and then the tagging party whose safety value is greater than the first threshold is used as the tagging party to be assigned. Then, the target tagger is determined according to the audio attributes of the audio to be tagged and the processing attributes of each tagger to be assigned, and the tagging task corresponding to the audio to be tagged is assigned to the target tagger. In this way, the accuracy and safety of assigning audio tagging tasks can be improved.
  • the processing unit 201 specifically It is used to obtain the labeling progress corresponding to each labeling party to be assigned to obtain multiple labeling progresses; determine the labeling progress of each labeling party to be assigned according to the audio attribute and the processing attribute of each labeling party to be assigned Allocation probability; determine the evaluation value of each annotation party to be allocated according to the annotation progress and allocation probability corresponding to each annotation party to be allocated, so as to obtain multiple evaluation values;
  • the corresponding labeling party to be assigned serves as the target labeling party.
  • the processing unit 201 is specifically configured to obtain the corresponding labeling progress of each labeling party to be assigned.
  • obtain a plurality of distribution lists To obtain a pre-stored average labeling rate corresponding to each of the labeling parties to be allocated to obtain a plurality of average labeling rates; obtain each of the to-be-assigned labeling rates according to the multiple allocation lists Allocate the annotation data size corresponding to the annotation party to obtain multiple annotation data sizes; obtain the annotation progress corresponding to each of the annotation parties to be allocated according to the multiple annotation data sizes and the multiple average annotation rates, so as to obtain multiple annotation data sizes. Annotated progress.
  • the preset rating list includes a plurality of preset rating dimensions, and according to the first user information and each of the second user information, the audio attribute corresponding to the preset
  • the processing unit 201 is specifically configured to determine the rating corresponding to each of the preset rating dimensions according to the first user information and the second user information. Value: Determine the safety value of each labeling party according to the preset weight and evaluation value corresponding to each of the preset scoring dimensions.
  • the processing unit 201 is further configured to separate the audio to be labeled to obtain multiple audio clips; the communication unit 202 is specifically configured to correspond to the multiple audio clips The labeling task is assigned to the target labeling party.
  • the processing unit 201 is specifically configured to perform voice recognition on the audio to be annotated to obtain text information;
  • the text information is divided to obtain a plurality of text fragments; and the to-be-labeled audio is separated according to the time information of each text fragment to obtain a plurality of audio fragments.
  • the communication unit 202 is further configured to receive that the labeling device corresponding to the target labeling party responds to the The target annotation file sent by the annotation task; the processing unit 202 is further configured to compare the target annotation file with the reference annotation file corresponding to the audio to be annotated to obtain the recognition rate; the communication unit 202 is also configured to If the recognition rate is less than the second threshold, sending prompt information to the labeling device, where the prompt information is used to prompt the target labeling party to relabel the audio to be labelled.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 300 includes a processor 310, a memory 320, a communication interface 330, and one or more programs 340.
  • the one or more programs 340 are stored in the memory 320 and are configured by
  • the foregoing processor 310 executes, and the foregoing program 340 includes instructions for executing the following steps:
  • the security value of each tagging party is determined from the preset rating list corresponding to the audio attribute; the information in the preset rating list is used To describe the correspondence between the first user information, the second user information, and the security value;
  • each labeling party selecting a labeling party with a safety value greater than a first threshold from the multiple labeling parties to obtain multiple labeling parties to be assigned;
  • the safety value of each labeling party is determined from the preset score list corresponding to the audio attribute, and then the safety value is greater than the first user information.
  • the labeling party of the threshold is used as the labeling party to be assigned.
  • the target tagger is determined according to the audio attributes of the audio to be tagged and the processing attributes of each tagger to be assigned, and the tagging task corresponding to the audio to be tagged is assigned to the target tagger. In this way, the accuracy and safety of assigning audio tagging tasks can be improved.
  • the program 340 specifically uses Instructions for performing the following steps:
  • the labeling party to be assigned corresponding to the maximum value of the plurality of evaluation values is used as the target labeling party.
  • the program 340 is specifically configured to execute the instructions of the following steps:
  • the preset rating list includes a plurality of preset rating dimensions, and according to the first user information and each of the second user information, the audio attribute corresponding to the preset
  • the program 340 is specifically used to execute instructions of the following steps:
  • the program 340 is specifically configured to execute instructions of the following steps:
  • the program 340 is specifically configured to execute instructions of the following steps:
  • the audio to be labeled is separated to obtain multiple audio segments.
  • the program 340 is further configured to execute the instructions of the following steps:
  • the embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for storing a computer program, and the computer program causes a computer to execute part or all of the steps of any method as recorded in the method embodiment, and the computer includes electronic equipment.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program is operable to cause a computer to execute a part of any method described in the method embodiment. Or all steps.
  • the computer program product may be a software installation package, and the computer includes electronic equipment.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated into Another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program mode.
  • the integrated unit is implemented in the form of a software program model and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned memory includes: U disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), mobile hard disk, magnetic disk, or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , ROM, RAM, magnetic disk or CD, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Library & Information Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种音频分配方法、装置及存储介质。其中方法包括:获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性(S101);根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值(S102);根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方(S103);根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方(S104);将所述待标注音频对应的标注任务分配给所述目标标注方(S105)。采用上述方法,可提高分配音频标注任务的准确性和安全性。

Description

音频分配方法、装置及存储介质
本申请要求于2019.09.02日提交中国专利局、申请号为201910826025.X,发明名称为“音频分配方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,主要涉及了一种音频分配方法、装置及存储介质。
背景技术
在现有技术中,音频标注任务基本上是基于任务量需求进行分发的,即首先统计需要进行音频标注的任务数量,再根据标注方的数量对需要进行音频标注的任务进行平均分发。发明人意识到,不同的音频标注任务对应的安全等级不同,平均分发可能导致音频标注任务分配的不准确,从而影响音频的安全性。
发明内容
本申请实施例提供了一种音频分配方法、装置及存储介质,可提高分配音频标注任务的准确性和安全性。
第一方面,本申请实施例提供一种音频分配方法,包括:
获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;
根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;
根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;
根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
将所述待标注音频对应的标注任务分配给所述目标标注方。
第二方面,本申请实施例提供一种音频分配装置,其中:
处理单元,用于获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
通信单元,用于将所述待标注音频对应的标注任务分配给所述目标标注方。
第三方面,本申请实施例提供一种电子设备,包括处理器、存储器、通信接口以及一 个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,所述程序包括用于如第一方面中所描述的部分或全部步骤的指令。
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,所述计算机可读存储介质存储计算机程序,其中,所述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。
附图说明
图1为本申请实施例提供的一种音频分配方法的流程示意图;
图2为本申请实施例提供的一种音频分配装置的结构示意图;
图3为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。根据本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
下面对本申请实施例进行详细介绍。
请参照图1,本申请实施例提供一种音频分配方法的流程示意图。该音频分配方法应用于电子设备,本申请实施例所涉及到的电子设备可以包括各种具有无线通信功能的手持设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(user equipment,UE),移动台(mobile station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为电子设备。
具体的,如图1所示,一种音频分配方法,应用于电子设备,其中:
S101:获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性。
在本申请实施例中,待标注音频可以为未进行标注的音频文件,也可以是用于标注方的训练过程中使用的已经标注完成的音频文件,在此不做限定。
待标注音频的第一用户信息是指该待标注音频对应的录入人员的用户信息,也就是说, 录入该待标注音频的人员的用户信息。该第一用户信息可以包括该录入人员的籍贯、所在地区、年龄、职业、性别、教育背景、工作经历等相关信息,在此不做限定。
待标注音频的音频属性可包括音频类型、音频容量、音频来源、音频内容等。其中,音频容量用于描述待标注音频的数据大小。音频来源用于描述待标注音频的上传信息,例如:音频来源为微信账号,则表示该待标注音频为录入人员在微信应用中输入的音频。音频内容可包括音频对应的摘要信息。音频类型可以按照应用类型进行分类,例如:浏览器、即时通讯应用、金融管理应用等。该音频类型也可按照语种类型进行分类,例如:中文、英语、普通话、方言等。该音频类型还可以按照输入类型进行分类,例如:搜索、语音聊天等,或者音频类型还可以按照音频内容进行分类,例如:对话场景、身份验证场景等,在此也不做限定。
在本申请实施例中,标注方可以是在电子设备中音频标注系统中注册,且可处理音频标注任务的人员。该标注方的第二用户信息是指该标注方的用户信息,例如,该标注方的籍贯、所在地区、年龄、职业、性别、教育背景、工作经历等,在此不做限定。
在本申请实施例中,标注方也可以是电子设备,即基于电子设备中的计算机程序处理音频标注任务。该标注方的第二用户信息是指该标注方的硬件信息,例如,容量、剩余内存大小、物理地址、网络速度等,在此也不做限定。
标注方的处理属性可包括处理音频类型、平均标注速率等。其中,处理音频类型包括标注方已训练完成的音频类型。平均标注速率为该标注方的处理音频标注任务的平均速率。进一步的,不同类型的音频标注任务对应的处理效率不同,该平均标注速率可分为各个音频类型对应的平均标注速率。
S102:根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值。
在本申请实施例中,安全值用于描述标注方处理待标注音频的安全性,安全值越大,则标注方处理该待标注音频越安全。预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系。其中,预设评分列表可详细描述了各种可能遇到的信息,或者两者对应的信息,例如,待标注音频对应的录入人员和标注方之间的关联值。
举例来说,假设与音频属性对应的预设评分列表如下表1所示,预设评分列表可分为评分标准和信息类型两项,该评分标准描述了第一用户信息和第二用户信息之间所在地区和职业对应的评分值。当第一用户信息中待标注音频对应的录入人员的所在地区为深圳,职业为教师,且第二用户信息中标注方所在地区为重庆,职业为医生时,则根据表1将所在地区和职业对应的评分值进行求和得到安全值为4。
表1
信息类型 评分标准
所在地区 同一地区为0,不同地区为2
职业 同一职业为0,相关职业为1,不相关职业为2
在一种可能的示例中,所述预设评分列表包括多个预设评分维度,步骤S102的具体实施方式包括步骤A1-A2,其中:
A1、根据所述第一用户信息和所述第二用户信息,确定每一所述预设评分维度对应的评价值。
在该示例中,预设评分维度可以是第一用户信息和第二用户信息之间的各项信息类型,也可包括各项信息类型对应的关联信息,例如:待标注音频对应的录入人员和标注方之间的关联值,录入人员和标注方之间的距离,录入人员和标注方之间的相似值等。
A2、根据每一所述预设评分维度对应的预设权值和评价值,确定每一所述标注方的安全值。
在该示例中,可预先设置不同预设评分维度对应的权值,例如,当预设评分维度为录入人员和标注方之间的关联值时,该预设评分维度对应的预设权值为0.5。当预设评分维度为录入人员和标注方之间的距离时,该预设评分维度对应的预设权值为0.2。当预设评分维度为录入人员和标注方之间的相似值时,该预设评分维度对应的预设权值为0.3等。
在该示例中,可对每一所述预设评分维度对应的预设权值和评价值进行加权求和,以得到每一标注方的安全值。举例来说,假设与音频属性对应的预设评分列表如下表2所示,根据表2可知,当录入人员和标注方之间的关联值为0.3时,对应的评价值为2。当录入人员和标注方之间的距离为2万米时,对应的评价值为3。当录入人员和标注方之间的相似值为0.5时,对应的评价值为3。假设录入人员和标注方之间的关联值对应的预设权值为0.5,录入人员和标注方之间的距离对应的预设权值为0.2,录入人员和标注方之间的相似值对应的预设权值为0.3,则对每一所述预设评分维度对应的预设权值和评价值进行加权求和,即0.5*2+0.2*3+0.3*3,可得到安全值为2.5。
表2
Figure PCTCN2020112510-appb-000001
可以理解,在步骤A1和步骤A2中,根据第一用户信息和第二用户信息确定每一预设 评分维度对应的评价值,再结合每一评分维度对应的预设权值确定各个标注方的安全值,提高了确定安全值的准确性。
S103:根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方。
在本申请实施例中,第一阈值不做限定。在一种可能的示例中,所述方法还包括:根据所述音频属性确定音频类型,将所述音频类型对应的预设标注时长作为所述第一阈值。
本申请可直接从音频属性中获取音频类型,还可根据音频内容和/或音频场景进行确定音频类型,也可按照应用类型和/或输入类型进行确定音频类型。可以理解,音频属性可体现音频类型,根据音频属性确定待标注音频的音频类型,可提高确定音频类型的准确性。
可以理解,在该可能的示例中,根据待标注音频的音频类型对应的预设标注时长作为第一阈值。如此,可依据音频类型选取不同的待分配标注方,提高了选取待分配标注方的准确性。
S104:根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方。
在本申请实施例中,目标标注方为待分配待标注音频对应的标注任务对应的标注方,即该目标标注方在接收该标注任务之后,处理该标注任务。可以理解,根据音频属性、每一标注方的安全值和处理属性选取目标标注方,可提高处理待标注音频对应的标注任务的安全性和处理效率。
本申请对于选取目标标注方的方法不做限定,在一种可能的示例中,步骤S104的具体实施方式包括步骤B1-B5,其中:
B1、获取每一所述待分配标注方对应的标注进度。
其中,标注进度为待分配标注方完成当前音频任务的进度。本申请对于获取标注进度的方法不做限定,在一种可能的示例中,步骤B1的具体实施方式包括步骤B11-B14,其中:
B11、获取每一所述待分配标注方对应的分配列表,以得到多个分配列表。
其中,分配列表用于记录为各个待分配标注方所分配的音频,以及各个已分配音频的第一用户信息和音频属性。
B12、获取预先存储的每一所述待分配标注方对应的平均标注速率,以得到多个平均标注速率。
其中,平均标注速率用于描述各个待分配标注方的标注效率,可通过各个待分配标注方的音频容量以及完成时间进行分析得到。
B13、根据所述多个分配列表获取每一所述待分配标注方对应的标注数据大小,以得到多个标注数据大小。
其中,标注数据大小用于描述已分配音频的任务量,可通过各个已分配音频的容量进行获取。
B14、根据所述多个标注数据大小和所述多个平均标注速率获取每一所述待分配标注方对应的标注进度,以得到多个标注进度。
可以理解,在步骤B11-B14中,先获取各个待分配标注方的分配列表以及平均标注速率,再根据各个分配列表获取各个待分配标注方对应的标注数据大小,最后根据各个待分配标注方对应的标注数据大小和平均标注速率获取该待分配标注方对应的标注进度。如此,根据已分配的标注任务和待分配标注方的平均标注速率获取标注进度,可提高获取标注进度的准确性。
B2、根据所述音频属性和每一所述待分配标注方的处理属性确定每一所述待分配标注方的分配概率。
其中,分配概率用于描述各个待分配标注方的处理待标注音频的概率。具体的,可根据音频属性所要求的业务类型,与待分配标注方的处理属性中的业务能力进行获取,例如,多个待分配标注方包括第一待分配标注方、第二待分配标注方和第三待分配标注方。音频属性为英语,第一待分配标注方处理英语音频的平均标注速率为每分钟2个单词,第二待分配标注方处理英语音频的平均标注速率为每分钟5个单词,第三待分配标注方处理英语音频的平均标注速率为每分钟4个单词。如此,可确定第一待分配标注方的分配概率为0.5,第二待分配标注方的分配概率为0.8,第三待分配标注方的分配概率为0.7。
B3、根据每一所述待分配标注方对应的标注进度和分配概率确定每一所述待分配标注方的评价值,以得到多个评价值。
其中,评价值用于描述将待标注音频分配给待分配标注方的排列顺序。本申请对于确定评价值的方法不做限定,可分别设置标注进度和分配概率对应的权值,再与标注进度和分配概率进行加权,以得到各个待分配标注方的评价值。举例来说,假设待分配标注方的标注进度为60%,分配概率为0.5。当标注进度和分配概率对应的权值分别为0.5和0.5时,评价值为0.55。
B4、将所述多个标注进度中的最大值对应的所述待分配标注方作为目标标注方。
可以理解,在步骤B1-B4中,根据各个待分配标注方对应的标注进度和分配概率确定各个待分配标注方的评价值,再将评价值中的最大值作为目标标注方。如此,可提高标注效率。
S105:将所述待标注音频对应的标注任务分配给所述目标标注方。
可以理解,在如图1所示的音频分配方法中,先获取待标注音频的第一用户信息和音频属性,以及多个标注方中每一标注方的第二属性信息和处理属性。然后根据第一用户信息和每一第二用户信息从音频属性对应的预设评分列表中确定每一标注方的安全值,再将安全值大于第一阈值的标注方作为待分配标注方。然后根据待标注音频的音频属性和每一待分配标注方的处理属性确定目标标注方,并将待标注音频对应的标注任务分配给目标标注方。如此,可提高分配音频标注任务的准确性和安全性。
在一种可能的示例中,步骤S105的具体实施方式包括步骤C1和步骤C2,其中:
C1、对所述待标注音频进行分离,以得到多个音频片段。
其中,待标注音频的分离方法可通过声纹识别的方法,即识别待标注音频中的用户,每一音频片段对应一个用户。待标注音频的分离方法也可通过声道分离的方法,即将不同 拾取设备获取的音频片段进行分类,例如:双声道分为2个音频片段,三声道分为3个音频片段,在此不做限定。
在一种可能的示例中,所述音频属性包括音频类型,步骤C1的具体实施方式包括步骤C11-C13,其中:
C11、对所述待标注音频进行语音识别,以得到文本信息。
语音识别技术,是将人类的语音中的词汇内容转换为计算机可读的输入,例如按键、二进制编码或者字符序列。
C12、对所述文本信息进行分割,以得到多个文本片段。
在该示例中,可按照语句的完整性进行分割,即同一段文字划分为一个文本片段。
C13、根据每一所述文本片段的时间信息,对所述待标注音频进行分离,以得到多个音频片段。
可以理解,在步骤C11-C13中,先对待标注音频进行语音识别以得到文本信息,再对文本信息进行分割以得到多个文本片段,如此,可提高分割文本片段的准确性。然后根据每一文本片段的时间信息,对待标注音频进行分离以得到多个音频片段,从而可提高分割音频片段的准确性。
C2、将所述多个音频片段对应的标注任务分配给所述目标标注方。
可以理解,在步骤C1和步骤C2中,将待标注音频进行分类以得到多个音频片段,再将多个音频片段对应的标注任务分配给目标标注方,如此,目标标注方可单独标注音频片段,并结合上下语义进行标注,便于提高标注的效率和准确性。
在一种可能的示例中,在步骤S105之后,还可执行步骤D1-D3,其中:
D1、接收所述目标标注方对应的标注设备针对所述标注任务发送的目标标注文件。
其中,目标标注文件为目标标注方对待标注音频进行标注得到的文件。该目标标注文件可包括对待标注音频的文字翻译、语速、情绪、角色、性别、身份等,在此不做限定。
D2、对所述目标标注文件和所述待标注音频对应的参考标注文件进行比对,以得到识别率。
其中,参考标注文件为预先存储的标准标注文件。识别率用于描述目标标注文件的识别准确率。
D3、若所述识别率小于第二阈值,则向所述标注设备发送提示信息,所述提示信息用于提示所述目标标注方重新标注所述待标注音频。
本申请第二阈值不做限定,可依据训练进行设定。
可以理解,在步骤D1-D3中,接收目标标注方通过标注设备发送的目标标注文件,再将该目标标注文件与参考标注文件进行比对以得到识别率。然后将识别率与第二阈值进行比对,若小于第二阈值,则向标注设备发送提示信息,以提示目标标注方重新标注该待标注音频。如此,通过校验的方式提高目标标注方的标注业务能力。
与图1的实施例一致,请参照图2,图2是本申请实施例提供的一种音频分配装置的结构示意图,所述装置应用于电子设备。如图2所示,上述音频分配装置200包括:
处理单元201,用于获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
通信单元202,用于将所述待标注音频对应的标注任务分配给所述目标标注方。
可以理解,先获取待标注音频的第一用户信息和音频属性,以及多个标注方中每一标注方的第二属性信息和处理属性。然后根据第一用户信息和每一第二用户信息从音频属性对应的预设评分列表中确定每一标注方的安全值,再将安全值大于第一阈值的标注方作为待分配标注方。然后根据待标注音频的音频属性和每一待分配标注方的处理属性确定目标标注方,并将待标注音频对应的标注任务分配给目标标注方。如此,可提高分配音频标注任务的准确性和安全性。
在一个可能的示例中,在所述根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方方面,所述处理单元201具体用于获取每一所述待分配标注方对应的标注进度,以得到多个标注进度;根据所述音频属性和每一所述待分配标注方的处理属性确定每一所述待分配标注方的分配概率;根据每一所述待分配标注方对应的标注进度和分配概率确定每一所述待分配标注方的评价值,以得到多个评价值;将所述多个评价值中的最大值对应的所述待分配标注方作为目标标注方。
在一个可能的示例中,在所述获取每一所述待分配标注方对应的标注进度,以得到多个标注进度方面,所述处理单元201具体用于获取每一所述待分配标注方对应的分配列表,以得到多个分配列表;获取预先存储的每一所述待分配标注方对应的平均标注速率,以得到多个平均标注速率;根据所述多个分配列表获取每一所述待分配标注方对应的标注数据大小,以得到多个标注数据大小;根据所述多个标注数据大小和所述多个平均标注速率获取每一所述待分配标注方对应的标注进度,以得到多个标注进度。
在一个可能的示例中,所述预设评分列表包括多个预设评分维度,在所述根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值方面,所述处理单元201具体用于根据所述第一用户信息和所述第二用户信息,确定每一所述预设评分维度对应的评价值;根据每一所述预设评分维度对应的预设权值和评价值,确定每一所述标注方的安全值。
标注方在一个可能的示例中,所述处理单元201还用于对所述待标注音频进行分离,以得到多个音频片段;所述通信单元202具体用于将所述多个音频片段对应的标注任务分配给所述目标标注方。
在一个可能的示例中,在所述对所述待标注音频进行分离,以得到多个音频片段方面,所述处理单元201具体用于对所述待标注音频进行语音识别,以得到文本信息;对所述文 本信息进行分割,以得到多个文本片段;根据每一所述文本片段的时间信息,对所述待标注音频进行分离,以得到多个音频片段。
在一个可能的示例中,在所述将所述待标注音频对应的标注任务分配给所述目标标注方之后,所述通信单元202还用于接收所述目标标注方对应的标注设备针对所述标注任务发送的目标标注文件;所述处理单元202还用于对所述目标标注文件和所述待标注音频对应的参考标注文件进行比对,以得到识别率;所述通信单元202还用于若所述识别率小于第二阈值,则向所述标注设备发送提示信息,所述提示信息用于提示所述目标标注方重新标注所述待标注音频。
与图1的实施例一致,请参照图3,图3是本申请实施例提供的一种电子设备的结构示意图。如图3所示,该电子设备300包括处理器310、存储器320、通信接口330以及一个或多个程序340,其中,上述一个或多个程序340被存储在上述存储器320中,并且被配置由上述处理器310执行,上述程序340包括用于执行以下步骤的指令:
获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;
根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;
根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;
根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
将所述待标注音频对应的标注任务分配给所述目标标注方。
可以理解,先根据待标注音频的第一用户信息和每一标注方的第二用户信息,从音频属性对应的预设评分列表中确定每一标注方的安全值,再将安全值大于第一阈值的标注方作为待分配标注方。然后根据待标注音频的音频属性和每一待分配标注方的处理属性确定目标标注方,并将待标注音频对应的标注任务分配给目标标注方。如此,可提高分配音频标注任务的准确性和安全性。
在一个可能的示例中,在所述根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方方面,所述程序340具体用于执行以下步骤的指令:
获取每一所述待分配标注方对应的标注进度,以得到多个标注进度;
根据所述音频属性和每一所述待分配标注方的处理属性确定每一所述待分配标注方的分配概率;
根据每一所述待分配标注方对应的标注进度和分配概率确定每一所述待分配标注方的评价值,以得到多个评价值;
将所述多个评价值中的最大值对应的所述待分配标注方作为目标标注方。
在一个可能的示例中,在所述获取每一所述待分配标注方对应的标注进度,以得到多个标注进度方面,所述程序340具体用于执行以下步骤的指令:
获取每一所述待分配标注方对应的分配列表,以得到多个分配列表;
获取预先存储的每一所述待分配标注方对应的平均标注速率,以得到多个平均标注速率;
根据所述多个分配列表获取每一所述待分配标注方对应的标注数据大小,以得到多个标注数据大小;
根据所述多个标注数据大小和所述多个平均标注速率获取每一所述待分配标注方对应的标注进度,以得到多个标注进度。
在一个可能的示例中,所述预设评分列表包括多个预设评分维度,在所述根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值方面,所述程序340具体用于执行以下步骤的指令:
根据所述第一用户信息和所述第二用户信息,确定每一所述预设评分维度对应的评价值;
根据每一所述预设评分维度对应的预设权值和评价值,确定每一所述标注方的安全值。
标注方在一个可能的示例中,在所述将所述待标注音频对应的标注任务分配给所述目标标注方方面,所述程序340具体用于执行以下步骤的指令:
对所述待标注音频进行分离,以得到多个音频片段;
将所述多个音频片段对应的标注任务分配给所述目标标注方。
在一个可能的示例中,在所述对所述待标注音频进行分离,以得到多个音频片段方面,所述程序340具体用于执行以下步骤的指令:
对所述待标注音频进行语音识别,以得到文本信息;
对所述文本信息进行分割,以得到多个文本片段;
根据每一所述文本片段的时间信息,对所述待标注音频进行分离,以得到多个音频片段。
在一个可能的示例中,在所述将所述待标注音频对应的标注任务分配给所述目标标注方之后,所述程序340还用于执行以下步骤的指令:
接收所述目标标注方对应的标注设备针对所述标注任务发送的目标标注文件;
对所述目标标注文件和所述待标注音频对应的参考标注文件进行比对,以得到识别率;
若所述识别率小于第二阈值,则向所述标注设备发送提示信息,所述提示信息用于提示所述目标标注方重新标注所述待标注音频。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于存储计算机程序,该计算机程序使得计算机执行如方法实施例中记载的任一方法的部分或全部步骤,计算机包括电子设备。其中,所述计算机可读存储介质可以是非易失性,也可以是易失性的。
本申请实施例还提供一种计算机程序产品,计算机程序产品包括存储了计算机程序的 非瞬时性计算机可读存储介质,计算机程序可操作来使计算机执行如方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,计算机包括电子设备。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模式并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模式的形式实现。
集成的单元如果以软件程序模式的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。根据这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、ROM、RAM、磁盘或光盘等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种音频分配方法,其中,包括:
    获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;
    根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;
    根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;
    根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
    将所述待标注音频对应的标注任务分配给所述目标标注方。
  2. 根据权利要求1所述的方法,其中,所述根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方,包括:获取每一所述待分配标注方对应的标注进度;
    根据所述音频属性和每一所述待分配标注方的处理属性确定每一所述待分配标注方的分配概率;
    根据每一所述待分配标注方对应的标注进度和分配概率确定每一所述待分配标注方的评价值,以得到多个评价值;
    将所述多个评价值中的最大值对应的所述待分配标注方作为目标标注方。
  3. 根据权利要求2所述的方法,其中,所述获取每一所述待分配标注方对应的标注进度,以得到多个标注进度,包括:
    获取每一所述待分配标注方对应的分配列表,以得到多个分配列表;
    获取预先存储的每一所述待分配标注方对应的平均标注速率,以得到多个平均标注速率;
    根据所述多个分配列表获取每一所述待分配标注方对应的标注数据大小,以得到多个标注数据大小;
    根据所述多个标注数据大小和所述多个平均标注速率获取每一所述待分配标注方对应的标注进度,以得到多个标注进度。
  4. 根据权利要求1-3任一项所述的方法,其中,所述预设评分列表包括多个预设评分维度,所述根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值,包括:
    根据所述第一用户信息和所述第二用户信息,确定每一所述预设评分维度对应的评价值;
    根据每一所述预设评分维度对应的预设权值和评价值,确定每一所述标注方的安全值。
  5. 根据权利要求1-3任一项所述的方法,其中,所述将所述待标注音频对应的标注任务分配给所述目标标注方,包括:
    对所述待标注音频进行分离,以得到多个音频片段;
    将所述多个音频片段对应的标注任务分配给所述目标标注方。
  6. 根据权利要求5所述的方法,其中,所述对所述待标注音频进行分离,以得到多个音频片段,包括:
    对所述待标注音频进行语音识别,以得到文本信息;
    对所述文本信息进行分割,以得到多个文本片段;
    根据每一所述文本片段的时间信息,对所述待标注音频进行分离,以得到多个音频片段。
  7. 根据权利要求1-3任一项所述的方法,其中,在所述将所述待标注音频对应的标注任务分配给所述目标标注方之后,所述方法还包括:
    接收所述目标标注方对应的标注设备针对所述标注任务发送的目标标注文件;
    对所述目标标注文件和所述待标注音频对应的参考标注文件进行比对,以得到识别率;
    若所述识别率小于第二阈值,则向所述标注设备发送提示信息,所述提示信息用于提示所述目标标注方重新标注所述待标注音频。
  8. 一种音频分配装置,其中,包括:
    处理单元,用于获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
    通信单元,用于将所述待标注音频对应的标注任务分配给所述目标标注方。
  9. 一种电子设备,其中,所述电子设备包括存储器和处理器,所述处理器、和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器用于执行所述存储器的所述程序指令,其中:
    获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;
    根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;
    根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;
    根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中 选取目标标注方;
    将所述待标注音频对应的标注任务分配给所述目标标注方。
  10. 根据权利要求9所述的电子设备,其中,所述处理器用于:
    获取每一所述待分配标注方对应的标注进度;
    根据所述音频属性和每一所述待分配标注方的处理属性确定每一所述待分配标注方的分配概率;
    根据每一所述待分配标注方对应的标注进度和分配概率确定每一所述待分配标注方的评价值,以得到多个评价值;
    将所述多个评价值中的最大值对应的所述待分配标注方作为目标标注方。
  11. 根据权利要求10所述的电子设备,其中,所述处理器用于:
    获取每一所述待分配标注方对应的分配列表,以得到多个分配列表;
    获取预先存储的每一所述待分配标注方对应的平均标注速率,以得到多个平均标注速率;
    根据所述多个分配列表获取每一所述待分配标注方对应的标注数据大小,以得到多个标注数据大小;
    根据所述多个标注数据大小和所述多个平均标注速率获取每一所述待分配标注方对应的标注进度,以得到多个标注进度。
  12. 根据权利要求9-11任一项所述的电子设备,其中,所述预设评分列表包括多个预设评分维度,所述处理器用于:
    根据所述第一用户信息和所述第二用户信息,确定每一所述预设评分维度对应的评价值;
    根据每一所述预设评分维度对应的预设权值和评价值,确定每一所述标注方的安全值。
  13. 根据权利要求9-11任一项所述的电子设备,其中,所述处理器用于:
    对所述待标注音频进行分离,以得到多个音频片段;
    将所述多个音频片段对应的标注任务分配给所述目标标注方。
  14. 根据权利要求13所述的电子设备,其中,所述处理器用于:
    对所述待标注音频进行语音识别,以得到文本信息;
    对所述文本信息进行分割,以得到多个文本片段;
    根据每一所述文本片段的时间信息,对所述待标注音频进行分离,以得到多个音频片段。
  15. 根据权利要求9-11任一项所述的电子设备,其中,所述处理器用于:
    接收所述目标标注方对应的标注设备针对所述标注任务发送的目标标注文件;
    对所述目标标注文件和所述待标注音频对应的参考标注文件进行比对,以得到识别率;
    若所述识别率小于第二阈值,则向所述标注设备发送提示信息,所述提示信息用于提示所述目标标注方重新标注所述待标注音频。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序, 所述计算机程序包括程序指令,所述程序指令被处理器执行时,用于实现以下步骤:
    获取待标注音频的第一用户信息和音频属性,以及获取多个标注方中每一标注方的第二用户信息和处理属性;
    根据所述第一用户信息和每一所述第二用户信息,从所述音频属性对应的预设评分列表中确定每一所述标注方的安全值;所述预设评分列表中的信息用于描述所述第一用户信息、所述第二用户信息以及所述安全值之间的对应关系;
    根据每一所述标注方的安全值,从所述多个标注方中选取安全值大于第一阈值的标注方,以得到多个待分配标注方;
    根据所述音频属性和每一所述待分配标注方的处理属性,从所述多个待分配标注方中选取目标标注方;
    将所述待标注音频对应的标注任务分配给所述目标标注方。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述程序指令被处理器执行时,还用于实现以下步骤:
    获取每一所述待分配标注方对应的标注进度;
    根据所述音频属性和每一所述待分配标注方的处理属性确定每一所述待分配标注方的分配概率;
    根据每一所述待分配标注方对应的标注进度和分配概率确定每一所述待分配标注方的评价值,以得到多个评价值;
    将所述多个评价值中的最大值对应的所述待分配标注方作为目标标注方。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述程序指令被处理器执行时,还用于实现以下步骤:
    获取每一所述待分配标注方对应的分配列表,以得到多个分配列表;
    获取预先存储的每一所述待分配标注方对应的平均标注速率,以得到多个平均标注速率;
    根据所述多个分配列表获取每一所述待分配标注方对应的标注数据大小,以得到多个标注数据大小;
    根据所述多个标注数据大小和所述多个平均标注速率获取每一所述待分配标注方对应的标注进度,以得到多个标注进度。
  19. 根据权利要求16-18任一项所述的计算机可读存储介质,其中,所述预设评分列表包括多个预设评分维度,所述程序指令被处理器执行时,还用于实现以下步骤:
    根据所述第一用户信息和所述第二用户信息,确定每一所述预设评分维度对应的评价值;
    根据每一所述预设评分维度对应的预设权值和评价值,确定每一所述标注方的安全值。
  20. 根据权利要求16-18任一项所述的计算机可读存储介质,其中,所述程序指令被处理器执行时,还用于实现以下步骤:
    对所述待标注音频进行分离,以得到多个音频片段;
    将所述多个音频片段对应的标注任务分配给所述目标标注方。
PCT/CN2020/112510 2019-09-02 2020-08-31 音频分配方法、装置及存储介质 WO2021043101A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910826025.X 2019-09-02
CN201910826025.XA CN110688517B (zh) 2019-09-02 2019-09-02 音频分配方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2021043101A1 true WO2021043101A1 (zh) 2021-03-11

Family

ID=69108895

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112510 WO2021043101A1 (zh) 2019-09-02 2020-08-31 音频分配方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN110688517B (zh)
WO (1) WO2021043101A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688517B (zh) * 2019-09-02 2023-05-30 平安科技(深圳)有限公司 音频分配方法、装置及存储介质
CN111462725B (zh) * 2020-04-17 2021-01-12 北京灵伴即时智能科技有限公司 录音编辑管理方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310587A1 (en) * 2013-04-16 2014-10-16 Electronics And Telecommunications Research Institute Apparatus and method for processing additional media information
CN108170845A (zh) * 2018-01-17 2018-06-15 腾讯音乐娱乐科技(深圳)有限公司 多媒体数据处理方法、装置及存储介质
CN109151023A (zh) * 2018-08-21 2019-01-04 平安科技(深圳)有限公司 任务分配方法、装置及存储介质
CN109359798A (zh) * 2018-08-21 2019-02-19 平安科技(深圳)有限公司 任务分配方法、装置及存储介质
CN110138865A (zh) * 2019-05-17 2019-08-16 南方科技大学 空间众包任务分配方法、装置、设备及存储介质
CN110688517A (zh) * 2019-09-02 2020-01-14 平安科技(深圳)有限公司 音频分配方法、装置及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110109747A1 (en) * 2009-11-12 2011-05-12 Siemens Industry, Inc. System and method for annotating video with geospatially referenced data
US9460457B1 (en) * 2013-03-14 2016-10-04 Google Inc. Automatically annotating content items with an entity
CN106407407B (zh) * 2016-09-22 2019-10-15 江苏通付盾科技有限公司 一种文件标注系统及方法
CN107066983B (zh) * 2017-04-20 2022-08-09 腾讯科技(上海)有限公司 一种身份验证方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310587A1 (en) * 2013-04-16 2014-10-16 Electronics And Telecommunications Research Institute Apparatus and method for processing additional media information
CN108170845A (zh) * 2018-01-17 2018-06-15 腾讯音乐娱乐科技(深圳)有限公司 多媒体数据处理方法、装置及存储介质
CN109151023A (zh) * 2018-08-21 2019-01-04 平安科技(深圳)有限公司 任务分配方法、装置及存储介质
CN109359798A (zh) * 2018-08-21 2019-02-19 平安科技(深圳)有限公司 任务分配方法、装置及存储介质
CN110138865A (zh) * 2019-05-17 2019-08-16 南方科技大学 空间众包任务分配方法、装置、设备及存储介质
CN110688517A (zh) * 2019-09-02 2020-01-14 平安科技(深圳)有限公司 音频分配方法、装置及存储介质

Also Published As

Publication number Publication date
CN110688517A (zh) 2020-01-14
CN110688517B (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
WO2020143844A1 (zh) 意图分析方法、装置、显示终端及计算机可读存储介质
US10354677B2 (en) System and method for identification of intent segment(s) in caller-agent conversations
US20190197119A1 (en) Language-agnostic understanding
US10460038B2 (en) Target phrase classifier
CN112749344B (zh) 信息推荐方法、装置、电子设备、存储介质及程序产品
US10268686B2 (en) Machine translation system employing classifier
WO2021043101A1 (zh) 音频分配方法、装置及存储介质
US8612532B2 (en) System and method for optimizing response handling time and customer satisfaction scores
US10496751B2 (en) Avoiding sentiment model overfitting in a machine language model
WO2019041520A1 (zh) 基于社交数据的金融产品推荐方法、电子装置及介质
US9811517B2 (en) Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
CN112733042A (zh) 推荐信息的生成方法、相关装置及计算机程序产品
CN110633475A (zh) 基于计算机场景的自然语言理解方法、装置、系统和存储介质
US11361759B2 (en) Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media
JP6307822B2 (ja) プログラム、コンピュータおよび訓練データ作成支援方法
US20210294969A1 (en) Generation and population of new application document utilizing historical application documents
CN112990625A (zh) 标注任务的分配方法、装置及服务器
JP2017045054A (ja) 言語モデル改良装置及び方法、音声認識装置及び方法
CN114528851B (zh) 回复语句确定方法、装置、电子设备和存储介质
CN111563381A (zh) 文本处理方法和装置
CN114141235A (zh) 语音语料库生成方法、装置、计算机设备和存储介质
CN108831473B (zh) 一种音频处理方法及装置
WO2021062757A1 (zh) 同声传译方法、装置、服务器和存储介质
CN112836529B (zh) 生成目标语料样本的方法和装置
US20230177269A1 (en) Conversation topic extraction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20861903

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20861903

Country of ref document: EP

Kind code of ref document: A1