CN110555117A - data processing method and device and electronic equipment - Google Patents

data processing method and device and electronic equipment Download PDF

Info

Publication number
CN110555117A
CN110555117A CN201910852560.2A CN201910852560A CN110555117A CN 110555117 A CN110555117 A CN 110555117A CN 201910852560 A CN201910852560 A CN 201910852560A CN 110555117 A CN110555117 A CN 110555117A
Authority
CN
China
Prior art keywords
data
subdata
target
dimensions
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910852560.2A
Other languages
Chinese (zh)
Other versions
CN110555117B (en
Inventor
张冠南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201910852560.2A priority Critical patent/CN110555117B/en
Publication of CN110555117A publication Critical patent/CN110555117A/en
Application granted granted Critical
Publication of CN110555117B publication Critical patent/CN110555117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a data processing device and electronic equipment, wherein the method comprises the following steps: acquiring target data; obtaining subdata of target data on at least two dimensions; obtaining marking information of target subdata on one or more dimensions in subdata on at least two dimensions; and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata. According to the method and the device, a plurality of training data with labels can be obtained through one-time labeling, so that the labeling efficiency is greatly improved.

Description

Data processing method and device and electronic equipment
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a data processing method and apparatus, and an electronic device.
Background
In the field of machine learning, in order to achieve a certain accuracy of an algorithm model, a large amount of labeled training data is required, for example, hundreds of thousands of face images are required for a face recognition model.
for training data, each piece of training data is labeled manually at present, and manual labeling often causes low labeling efficiency.
therefore, how to improve the labeling efficiency becomes an urgent problem to be solved.
Disclosure of Invention
in view of this, the present application provides the following technical solutions:
A method of data processing, comprising:
Acquiring target data;
Obtaining subdata of the target data on at least two dimensions;
Obtaining the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions;
And setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
Preferably, the sub-data in the at least two dimensions are the sub-data in the target data having the same data attribute.
Preferably, the obtaining sub-data of the target data in at least two dimensions includes:
determining data characteristics of the target data;
Determining a target data attribute of the target data based on the data feature;
and acquiring the subdata of the target data on at least two dimensions corresponding to the target data attribute.
Preferably, the data feature includes an audio feature, and the determining a target data attribute of the target data based on the data feature includes:
determining a temporal attribute of the target data based on the audio feature.
Preferably, the determining the time attribute of the target data based on the audio feature includes:
preprocessing the target data by utilizing a first audio characteristic;
And determining the time attribute of the preprocessed target data by using the second audio characteristic.
Preferably, the data feature further includes a subtitle feature, and further includes:
and correcting the time attribute according to the subtitle feature.
preferably, the method further comprises the following steps:
and determining the target labeling information under the condition that at least two kinds of labeling information exist in the labeling information of the target subdata.
preferably, the method further comprises the following steps:
and outputting the labeling information of the subdata on the at least two dimensions.
A data processing apparatus comprising:
The data acquisition module is used for acquiring target data;
The subdata acquisition module is used for acquiring subdata of the target data on at least two dimensions;
The label obtaining module is used for obtaining label information of the target subdata on one or more dimensions in the subdata on the at least two dimensions;
and the marking setting module is used for setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
An electronic device, comprising:
the memory is used for storing an application program and data generated by the running of the application program;
A processor for executing the application to perform the functions of: acquiring target data; obtaining subdata of the target data on at least two dimensions; obtaining the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions; and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
As can be seen from the foregoing technical solutions, embodiments of the present application provide a data processing method, in which labeling information of target sub-data in one or more dimensions of the sub-data of at least two dimensions of the target data is obtained, and the labeling information of the target sub-data is set as labeling information of sub-data in other dimensions corresponding to the target sub-data. Therefore, a plurality of training data with labels can be obtained through one-time labeling, and the labeling efficiency is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method of a data processing method according to an embodiment of the present disclosure;
FIG. 2 is an example of sub-data provided in an embodiment of the present application;
FIG. 3 is an emotion annotation interface provided in an embodiment of the present application;
Fig. 4 is a flowchart of a data processing method according to a second embodiment of the present application;
FIG. 5 is another example of sub-data provided in an embodiment of the present application;
FIG. 6 is a diagram of yet another example of sub-data provided in an embodiment of the present application;
Fig. 7 is a flowchart of a data processing method according to a third embodiment of the present application;
fig. 8 is a flowchart of a data processing method according to a fourth embodiment of the present application;
Fig. 9 is a flowchart of a method of processing data according to a fifth embodiment of the present application;
Fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 11 is a scene schematic diagram of a data processing method disclosed in an embodiment of the present application.
Detailed Description
the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The data processing method provided by the embodiment of the application can be applied to electronic equipment on a user side, such as terminals of smart phones, computers, notebooks and the like. In the embodiment of the present application, the electronic device has a web page browsing and interaction function, and may be implemented by an embedded functional component of the client, and may also be implemented by a web page browser, which is not limited in this embodiment of the present application.
in a first embodiment of a data processing method disclosed in the present application, as shown in fig. 1, the method includes the following steps:
Step 101: target data is acquired.
in the embodiment of the application, the target data can be obtained from a designated data source. The target data may be streaming media or other multimedia, which is not limited in this embodiment of the present application.
Step 102: sub-data of the target data on at least two dimensions is obtained.
In the embodiment of the application, the web technology can be used for performing data track separation on target data, and subdata on at least two dimensions can be extracted from the target data, wherein the at least two dimensions can be any of videos, audios, subtitles and single-frame images. Of course, it is understood that the above are merely examples of dimensions, and that other dimensions not listed are also within the scope of the present application.
Fig. 2 is an example of sub data provided in the embodiment of the present application. Referring to fig. 2, data is split for one media stream to obtain sub data in three dimensions of video, audio, subtitle, and single frame image.
Step 103: and acquiring the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions.
In this embodiment of the application, the electronic device on the user side may interact with the user through an H5 technology, and the user adds a label to the target sub data in one or more dimensions of the sub data in at least two dimensions. Such as emotional tagging, such as happiness, anger, surprise, etc., is added.
For ease of understanding, the description continues with the sub data shown in FIG. 2:
The electronic device at the user side can output the multimedia stream based on the user's request, and in the process of outputting the multimedia stream, the sub data in any one or more of the four dimensions of video, audio, caption and single-frame image are interacted with the user.
See the emotion annotation interface shown in fig. 3. And the electronic equipment at the user side interacts with the user according to the subdata of the video dimension. The electronic equipment on the user side can display a plurality of emotion marks to be selected of the target person in the subdata of the video dimension in the designated area of the video display interface, the target emotion marks of the target person are selected by the user, the electronic equipment on the user side determines the selection of the user by responding to the touch operation of the user, and the target emotion marks are used as marks of the subdata of the video dimension.
Of course, when the user selects the target emotion markup, the multimedia stream simultaneously outputs the subdata in the audio dimension and the subdata in the subtitle dimension, so that the user can refer to not only the expression of the character in the video, but also the sound of the character in the audio and the language of the character in the subtitle when selecting the target emotion markup.
Step 104: and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
For ease of understanding, the description continues with the sub data shown in FIG. 2:
Assuming that the target emotion of the target person selected by the user is labeled as "surprise", the labeling information of the subdata of the video dimension may be determined as "surprise". At this time, the annotation information of the subdata in three dimensions of audio, subtitle, and single-frame image may also be set to "surprise".
The data processing method provided by the embodiment of the application can be applied to various heterogeneous platforms, such as windows, Linux, Android, Ios and the like. On a web page browser or a client, under the condition that a user does not sense, the algorithm and the worker technology are issued through the web technology, the algorithm is operated in a background thread silent mode, and the user can obtain various training data for artificial intelligent training only by clicking once in the whole process.
The method and the device can assist related companies to solve the problem of high labor cost for obtaining the private data in the industry in a zero-cost approach mode. The streaming media provider can rely on massive users to request, and acquire massive and expensive training data in a very short time by implementing appropriate incentive measures and weak virtual incentive policies.
In addition, since one target sub-data is likely to be labeled by multiple users, that is, there may be multiple labeling information of the target sub-data, at this time, in order to handle labeling divergence, in the embodiment of the present application, the target labeling information may be determined under the condition that at least two kinds of labeling information exist in the labeling information of the target sub-data.
Specifically, the labeling information of the target subdata can be uploaded to a local or cloud server, the server forwards the labeling information to the electronic device on the management side, and the manager determines the final target labeling information, specifically determines one target labeling information from the labeling information, and can reset one target labeling information different from any labeling information. And further setting the target marking information as the marking information of the subdata on other dimensions corresponding to the target subdata.
In addition, the electronic device on the user side can also output the labeling information of the subdata on at least two dimensions. Specifically, the labeling information of the subdata in at least two dimensions may be sent to a (local and/or cloud) server, and persisted by the server. In order to reduce the consumption of storage resources, the electronic device on the user side may send the label (such as the sub-data number, and further such as the start-stop time corresponding to the sub-data) and the label information of the sub-data on at least two dimensions to the server in a text format.
Certainly, in order to prevent the processing task of the target data from being repeatedly calculated, the labeling information of the subdata on at least two dimensions can be synchronized to other users of the whole network through the server, and the target data is not processed any more by the electronic equipment on any subsequent user side.
The data processing method provided by the embodiment of the application can obtain a plurality of training data with labels through one-time labeling, so that the labeling efficiency is greatly improved.
as an implementation manner of sub-data in at least two dimensions, the second embodiment of the present application discloses a data processing method, as shown in fig. 4, the method includes the following steps:
step 201: target data is acquired.
Step 202: the subdata of the target data on at least two dimensions is obtained, and the subdata on the at least two dimensions is subdata with the same data attribute in the target data.
in the embodiment of the present application, the sub-data of the target data in at least two dimensions have the same data attribute, such as a spatial attribute, and further such as a temporal attribute.
For convenience of understanding, the sub data having the same data attribute will be described below by taking a spatial attribute and a temporal attribute as examples, respectively.
1) fig. 5 is another example of sub data provided in the embodiment of the present application. Referring to fig. 5, one media stream is subjected to data sub-track to obtain sub-data as data segments in four dimensions of video, audio, subtitle, and single-frame image.
Subdata 1-1 on the video dimension, subdata 2-1 on the audio dimension, subdata 3-1 on the subtitle dimension and subdata 4-1 on a single frame image all belong to a spatial scene 1 (such as a building a); the subdata 1-2 on the video dimension, the subdata 2-2 on the audio dimension, the subdata 3-2 on the subtitle dimension and the subdata 4-2 on the single-frame image all belong to a spatial scene 2 (such as a building b); the subdata 1-3 in the video dimension, the subdata 2-3 in the audio dimension, the subdata 3-3 in the subtitle dimension, and the subdata 4-3 in the single frame image all belong to a spatial scene 3 (such as a building c).
Therefore, when the partition is performed according to the spatial attributes, the sub-data 1-1, 2-1, 3-1, and 4-1 belonging to the spatial scene 1 are used as a first group of sub-data, the sub-data 1-2, 2-2, 3-2, and 4-2 belonging to the spatial scene 2 are used as a second group of sub-data, and the sub-data 1-3, 2-3, 3-3, and 4-3 belonging to the spatial scene 3 are used as a third group of sub-data.
2) Fig. 6 is a further sub-data example provided in the embodiment of the present application. Referring to fig. 6, one media stream is subjected to data sub-track to obtain sub-data as data segments in four dimensions of video, audio, subtitle, and single-frame image.
Subdata A-1 on a video dimension, subdata B-1 on an audio dimension, subdata C-1 on a subtitle dimension and subdata D-1 on a single-frame image all belong to a time window 1; the subdata A-2 on the video dimension, the subdata B-2 on the audio dimension, the subdata C-2 on the subtitle dimension and the subdata D-2 on the single-frame image all belong to a time window 2; the subdata A-3 on the video dimension, the subdata B-3 on the audio dimension, the subdata C-3 on the subtitle dimension and the subdata D-3 on the single-frame image all belong to a time window 3.
Therefore, when the partition is performed according to the time attribute, the subdata A-1, B-1, C-1 and D-1 which belong to the time window 1 are used as a first group of subdata, the subdata A-2, B-2, C-2 and D-2 which belong to the time window 2 are used as a second group of subdata, and the subdata A-3, B-3, C-3 and D-3 which belong to the time window 3 are used as a third group of subdata.
Step 203: and acquiring the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions.
For ease of understanding, the description continues with the sub-data example shown in FIG. 6:
for each of time window 1, time window 2, and time window 3, one or more subdata in a set of subdata corresponding to the time window needs to be labeled first. For example, for time window 2, the annotation information for one or more of the child data A-2, B-2, C-2, and D-2 may first be obtained.
step 204: and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
For ease of understanding, the description continues with the sub-data example shown in FIG. 6:
Assuming that, for the time window 2, the annotation information of the subdata a-2 is "surprised" when the emotion annotation is made, the annotation information of the subdata B-2, C-2 and D-2 is also set to be "surprised" respectively.
according to the data processing method provided by the embodiment of the application, a plurality of training data with the same data attribute and labels can be obtained through one-time labeling, and the scene adaptability is improved on the basis of greatly improving the labeling efficiency.
As an implementation manner of obtaining sub-data of target data in at least two dimensions, a third embodiment of the present application discloses a data processing method, as shown in fig. 7, the method includes the following steps:
Step 301: target data is acquired.
Step 302: data characteristics of the target data are determined.
In the embodiment of the application, different data feature extraction strategies are set for different labeling scenes. For example, for the spatial attribute, the caption feature can be extracted, and specifically, a keyword for identifying a spatial scene in the caption stream can be identified; for example, for the time attribute, an audio feature, specifically, a sub-feature (frequency, amplitude, and/or waveform) identifying a time window in the audio stream, may be extracted, and a subtitle feature, specifically, a timestamp in the subtitle stream, may be extracted.
step 303: target data attributes of the target data are determined based on the data characteristics.
For ease of understanding, the description continues with the example of spatial attributes and temporal attributes:
If the target data attribute is a spatial attribute, one or more spatial scenes may be based on the caption feature. If the target data attribute is a temporal attribute, one or more temporal windows may be determined based on the audio features and/or the subtitle features.
step 304: and acquiring subdata of the target data on at least two dimensions corresponding to the target data attribute.
step 305: and acquiring the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions.
Step 306: and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
according to the data processing method provided by the embodiment of the application, a plurality of training data with the same data attribute and labels can be obtained through one-time labeling, and the scene adaptability is improved on the basis of greatly improving the labeling efficiency.
As an implementation manner for determining the target data attribute of the target data based on the data feature, if the data feature includes an audio feature, a fourth embodiment of the present application discloses a data processing method, as shown in fig. 8, the method includes the following steps:
Step 401: target data is acquired.
step 402: data characteristics of the target data are determined.
Step 403: a temporal attribute of the target data is determined based on the audio features.
In the embodiment of the application, the audio features comprise frequency, amplitude and waveform, wherein the frequency features tone, the amplitude features loudness and the waveform features timbre. In the process of determining the time attribute, at least one time window of the target data may be determined based on a preset annotation scene condition.
For example, an annotation scene is to annotate a target person, each time window in which the target person appears in the media stream can be determined according to the tone of the target person, and specifically, a time window in which a waveform in the audio stream matches a waveform of the target person can be used as a time window of the media stream.
for another example, the other labeling scenario is that target gender is labeled, each time window in which the target gender appears in the media stream can be determined according to the tone of the target gender, and specifically, one time window in which the frequency in the audio stream matches the frequency of the target gender can be used as one time window of the media stream.
for another example, another labeling scenario is to label dialogs, a time window of each dialog in the media stream may be determined according to loudness, and specifically, a time window with amplitude matching the amplitude of the dialog in the audio stream may be used as a time window of the media stream.
Step 404: and acquiring the subdata of the target data on at least two dimensions corresponding to the time attributes.
Step 405: and acquiring the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions.
Step 406: and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
according to the data processing method provided by the embodiment of the application, a plurality of training data with time attributes and labels can be obtained through one-time labeling, and the scene adaptability is improved on the basis of greatly improving the labeling efficiency.
As an implementation manner of determining a time attribute of target data based on audio features, a fifth embodiment of the present application discloses a data processing method, as shown in fig. 9, the method includes the following steps:
Step 501: target data is acquired.
Step 502: data characteristics of the target data are determined.
step 503: the target data is preprocessed using the first audio feature.
In the embodiment of the application, different preprocessing strategies can be set for different labeling scenes. For example, when the annotation scene is to annotate a conversation, the media streams of the non-human voices in the media streams can be removed according to the target frequency band where the human voice tones are located, specifically, the time windows of the non-human voices, the frequencies of which are not in the target frequency band, in the audio streams can be determined first, and then the media streams of the non-human voices, corresponding to the time windows of the non-human voices, in the media streams can be removed. This can reduce subsequent data processing and reduce resource overhead.
Step 504: and determining the time attribute of the preprocessed target data by using the second audio characteristic.
In the embodiment of the present application, the first audio characteristic and the second audio characteristic may be completely the same, may also be completely different, and may also be partially the same.
Continuing with the example that the labeling scenario is the labeling of the dialog:
After the media stream without the human voice is removed, a dialogue window of each dialogue in the media stream segment of the human voice left in the media stream can be determined according to loudness, specifically, a position in the audio stream where the amplitude is close to zero can be determined as a dialogue boundary, and two dialogue boundaries or a boundary between one dialogue boundary and the media stream segment of the human voice or a boundary between two media stream segments of the human voice can form a time window.
In addition, the time window is divided for precision, and in the case that the data features also include caption features, the time attribute can be corrected according to the caption features.
continuing with the example that the labeling scenario is the labeling of the dialog:
in the embodiment of the application, the time window determined by the audio characteristics is further corrected by combining the synchronous characteristics of the subtitles and the audio and video and utilizing the timestamp marked on the dialogue in the subtitle stream.
step 505: and acquiring the subdata of the target data on at least two dimensions corresponding to the time attributes.
step 506: and acquiring the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions.
Step 507: and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
according to the data processing method provided by the embodiment of the application, a plurality of training data with time attributes and labels can be obtained through one-time labeling, and the scene adaptability is improved on the basis of greatly improving the labeling efficiency.
Corresponding to the above data processing method, an embodiment of the present application further discloses a data processing apparatus, as shown in fig. 10, the apparatus includes:
And the data acquisition module 10 is used for acquiring target data.
the sub-data obtaining module 20 is configured to obtain sub-data of the target data in at least two dimensions.
The label obtaining module 30 is configured to obtain label information of the target sub-data in one or more dimensions of the sub-data in at least two dimensions.
and a label setting module 40, configured to set label information of the target sub-data to label information of sub-data in other dimensions corresponding to the target sub-data.
Optionally, the sub-data in at least two dimensions are the sub-data with the same data attribute in the target data.
optionally, the sub-data obtaining module 20 obtains sub-data of the target data in at least two dimensions, including:
Determining data characteristics of the target data; determining a target data attribute of the target data based on the data characteristics; and acquiring subdata of the target data on at least two dimensions corresponding to the target data attribute.
Optionally, the data characteristics include audio characteristics, and the sub-data obtaining module 20 determines the target data attribute of the target data based on the data characteristics, including:
a temporal attribute of the target data is determined based on the audio features.
Optionally, the sub-data obtaining module 20 determines the time attribute of the target data based on the audio feature, including:
Preprocessing target data by utilizing the first audio characteristics; and determining the time attribute of the preprocessed target data by using the second audio characteristic.
Optionally, the data characteristics further include subtitle characteristics, and the sub-data obtaining module 20 is further configured to:
and modifying the time attribute according to the subtitle feature.
Optionally, the annotation obtaining module 30 is further configured to:
and under the condition that at least two kinds of labeling information exist in the labeling information of the target subdata, determining the target labeling information.
Optionally, the annotation setting module 40 is further configured to:
And outputting the marking information of the subdata in at least two dimensions.
The data processing device provided by the embodiment of the application can obtain a plurality of training data with labels through one-time labeling, so that the labeling efficiency is greatly improved.
Corresponding to the data processing method, an embodiment of the present application further discloses an electronic device, which includes:
The memory is used for storing the application program and data generated by the running of the application program;
a processor for executing an application to perform the functions of: acquiring target data; obtaining subdata of target data on at least two dimensions; obtaining marking information of target subdata on one or more dimensions in subdata on at least two dimensions; and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
for convenience of understanding, the following detailed description of the present application takes a data processing scheme of streaming media as an example:
in the field of machine learning, such as computer vision and natural language processing, a large amount of labeled training data is required to achieve a certain accuracy rate by well training an algorithm model. For example, face recognition requires hundreds of thousands of face images. The quantity and quality of the training data have decisive influence on artificial intelligence, especially on deep learning models.
currently, training data is marked mainly by manual labor, and the huge cost of the related companies in the industry is in labor hour, authorization fee, or equipment operation cost, and the training data is like luxury goods for enterprises.
aiming at the industrial pain points, a streaming media and web-based cross-platform technology is provided, and video data, text data, voice data and picture data for artificial intelligent training can be generated through 1-time marking. By the technology, a streaming media provider can rely on massive users to request, and acquire massive and expensive marked training data in a very short time with weak expenditure by implementing appropriate incentive measures.
fig. 11 is a scene schematic diagram of a data processing method disclosed in an embodiment of the present application. Referring to fig. 11, a large amount of media streams are stored in the cloud server, which can provide a web-based streaming media on-demand service, support various heterogeneous platforms at the user side, such as Windows, Linux, Android, and iOS, and do not need to install any software. On a web page browser or a client, under the condition that a user does not sense, the algorithm and the worker technology are issued through the web technology, the algorithm is operated in a background thread silent mode, and the user can obtain various training data for artificial intelligent training only by clicking once in the whole process.
The electronic equipment at the user side obtains the media stream from the cloud server through streaming media on demand and plays the media stream by adopting the H5 technology. In the process of playing the media stream, the electronic device at the user side performs data track splitting (namely track splitting) on the media stream through a web worker thread to obtain a video stream, an audio stream, a subtitle stream and a single frame image.
Furthermore, the electronic equipment on the user side builds an algorithm running environment in a JS engine of the browser, can cross platforms, does not need any installation, and is not perceived by the user.
Furthermore, the electronic equipment at the user side selects a human voice frequency band and rejects a non-human voice frequency band from the audio stream. And judging the dialog boundary through the amplitude in the audio features, and accurately segmenting a time window of a dialog labeling scene by utilizing the timestamp of the dialog labeling in the subtitle stream in combination with the synchronous characteristic of the subtitle and the audio video. And finally, video clips, audio clips, subtitle clips and single-frame images corresponding to the video clips under different dialogue scenes are extracted.
Still further, the user is interacted with through the H5 technology, and the user is enabled to select target emotion marks, such as happiness, anger, surprise and the like, for the video clip in the current conversation scene.
And finally, applying the target emotion mark to the audio clip, the subtitle clip and the single-frame image in the current conversation scene, and returning to the cloud server through a Rest API for persistence. Thereby providing a basis for subsequent data reprocessing or algorithm training.
in addition, the cloud server can synchronize to the users in the whole network by using a dimension data structure, so that repeated calculation of processing tasks is prevented.
this application once marks, can acquire and be used for training 4 kinds of mark data of several big scenes such as NLP natural language processing, CV computer vision and video understanding algorithm: video, audio, text (subtitles), and pictures.
in addition, the method and the device can assist related companies to solve the problem that the labor cost for acquiring the private data set in the industry is high in a mode of approaching zero cost. The daily average click volume of a large video website is over hundred million, and through a weak virtual incentive policy, a user can obtain billions of data labels after interacting for 1 time during the on-demand period.
In addition, the method and the device also solve the problems of low data labeling speed and long acquisition period. In the field of computer vision, a skilled laborer can only label hundreds of pictures every day. For a complex algorithm, the required labeling man-hours may exceed tens of thousands of hours, and the data collection process lasts for many years. By the application, hundreds of millions of data can be acquired within 1 day.
the embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data processing, comprising:
Acquiring target data;
obtaining subdata of the target data on at least two dimensions;
Obtaining the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions;
And setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
2. The method of claim 1, wherein the child data in the at least two dimensions is child data in the target data having the same data attributes.
3. The method of claim 2, wherein the obtaining sub-data of the target data in at least two dimensions comprises:
Determining data characteristics of the target data;
Determining a target data attribute of the target data based on the data feature;
and acquiring the subdata of the target data on at least two dimensions corresponding to the target data attribute.
4. the method of claim 3, wherein the data feature comprises an audio feature, the determining a target data attribute of the target data based on the data feature comprising:
determining a temporal attribute of the target data based on the audio feature.
5. The method of claim 4, wherein the determining a temporal attribute of the target data based on the audio feature comprises:
preprocessing the target data by utilizing a first audio characteristic;
And determining the time attribute of the preprocessed target data by using the second audio characteristic.
6. The method of claim 4, the data features further comprising subtitle features, further comprising:
and correcting the time attribute according to the subtitle feature.
7. The method of claim 1, further comprising:
and determining the target labeling information under the condition that at least two kinds of labeling information exist in the labeling information of the target subdata.
8. The method of claim 1, further comprising:
and outputting the labeling information of the subdata on the at least two dimensions.
9. a data processing apparatus comprising:
The data acquisition module is used for acquiring target data;
The subdata acquisition module is used for acquiring subdata of the target data on at least two dimensions;
The label obtaining module is used for obtaining label information of the target subdata on one or more dimensions in the subdata on the at least two dimensions;
and the marking setting module is used for setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
10. An electronic device, comprising:
The memory is used for storing an application program and data generated by the running of the application program;
a processor for executing the application to perform the functions of: acquiring target data; obtaining subdata of the target data on at least two dimensions; obtaining the marking information of the target subdata on one or more dimensions in the subdata on at least two dimensions; and setting the marking information of the target subdata as the marking information of the subdata on other dimensions corresponding to the target subdata.
CN201910852560.2A 2019-09-10 2019-09-10 Data processing method and device and electronic equipment Active CN110555117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910852560.2A CN110555117B (en) 2019-09-10 2019-09-10 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910852560.2A CN110555117B (en) 2019-09-10 2019-09-10 Data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110555117A true CN110555117A (en) 2019-12-10
CN110555117B CN110555117B (en) 2022-05-31

Family

ID=68739628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910852560.2A Active CN110555117B (en) 2019-09-10 2019-09-10 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110555117B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111526405A (en) * 2020-04-30 2020-08-11 网易(杭州)网络有限公司 Media material processing method, device, equipment, server and storage medium
CN112487238A (en) * 2020-10-27 2021-03-12 百果园技术(新加坡)有限公司 Audio processing method, device, terminal and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching
CN104317894A (en) * 2014-10-23 2015-01-28 北京百度网讯科技有限公司 Method and device for determining sample labels
CN105138953A (en) * 2015-07-09 2015-12-09 浙江大学 Method for identifying actions in video based on continuous multi-instance learning
CN107004141A (en) * 2017-03-03 2017-08-01 香港应用科技研究院有限公司 To the efficient mark of large sample group
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN109977255A (en) * 2019-02-22 2019-07-05 北京奇艺世纪科技有限公司 Model generating method, audio-frequency processing method, device, terminal and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching
CN104317894A (en) * 2014-10-23 2015-01-28 北京百度网讯科技有限公司 Method and device for determining sample labels
CN105138953A (en) * 2015-07-09 2015-12-09 浙江大学 Method for identifying actions in video based on continuous multi-instance learning
CN107004141A (en) * 2017-03-03 2017-08-01 香港应用科技研究院有限公司 To the efficient mark of large sample group
CN108806668A (en) * 2018-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of audio and video various dimensions mark and model optimization method
CN109977255A (en) * 2019-02-22 2019-07-05 北京奇艺世纪科技有限公司 Model generating method, audio-frequency processing method, device, terminal and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111526405A (en) * 2020-04-30 2020-08-11 网易(杭州)网络有限公司 Media material processing method, device, equipment, server and storage medium
CN114025216A (en) * 2020-04-30 2022-02-08 网易(杭州)网络有限公司 Media material processing method, device, server and storage medium
CN114025216B (en) * 2020-04-30 2023-11-17 网易(杭州)网络有限公司 Media material processing method, device, server and storage medium
CN112487238A (en) * 2020-10-27 2021-03-12 百果园技术(新加坡)有限公司 Audio processing method, device, terminal and medium
CN112487238B (en) * 2020-10-27 2024-05-17 百果园技术(新加坡)有限公司 Audio processing method, device, terminal and medium

Also Published As

Publication number Publication date
CN110555117B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
US11321667B2 (en) System and method to extract and enrich slide presentations from multimodal content through cognitive computing
CN105391730B (en) A kind of information feedback method, apparatus and system
US20180130496A1 (en) Method and system for auto-generation of sketch notes-based visual summary of multimedia content
CN106960051B (en) Audio playing method and device based on electronic book and terminal equipment
US20100070860A1 (en) Animated cloud tags derived from deep tagging
CN107492383B (en) Live content screening method, device, equipment and storage medium
CN112511854A (en) Live video highlight generation method, device, medium and equipment
US9953451B2 (en) Audio media mood visualization
CN112929744A (en) Method, apparatus, device, medium and program product for segmenting video clips
CN109493888B (en) Cartoon dubbing method and device, computer-readable storage medium and electronic equipment
US20200090246A1 (en) Media enhancement with customized add-on content
CN104038473A (en) Method of audio ad insertion, device, equipment and system
US11749255B2 (en) Voice question and answer method and device, computer readable storage medium and electronic device
CN110555117B (en) Data processing method and device and electronic equipment
CN109286848B (en) Terminal video information interaction method and device and storage medium
CN104918060A (en) Method and device for selecting position to insert point in video advertisement
CN113411674A (en) Video playing control method and device, electronic equipment and storage medium
CN110248235B (en) Software teaching method, device, terminal equipment and medium
CN113259763B (en) Teaching video processing method and device and electronic equipment
US20200226208A1 (en) Electronic presentation reference marker insertion
CN111695670A (en) Neural network model training method and device
CN114513706B (en) Video generation method and device, computer equipment and storage medium
CN113411517B (en) Video template generation method and device, electronic equipment and storage medium
CN112601129B (en) Video interaction system, method and receiving terminal
CN113762056A (en) Singing video recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant