CN112215336A - Data labeling method, device, equipment and storage medium based on user behavior - Google Patents

Data labeling method, device, equipment and storage medium based on user behavior Download PDF

Info

Publication number
CN112215336A
CN112215336A CN202011058992.5A CN202011058992A CN112215336A CN 112215336 A CN112215336 A CN 112215336A CN 202011058992 A CN202011058992 A CN 202011058992A CN 112215336 A CN112215336 A CN 112215336A
Authority
CN
China
Prior art keywords
behavior
data
user behavior
standard
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011058992.5A
Other languages
Chinese (zh)
Other versions
CN112215336B (en
Inventor
杜晨冰
张一帆
沈志勇
高宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Finance Technology Co Ltd
Original Assignee
China Merchants Finance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Finance Technology Co Ltd filed Critical China Merchants Finance Technology Co Ltd
Priority to CN202011058992.5A priority Critical patent/CN112215336B/en
Publication of CN112215336A publication Critical patent/CN112215336A/en
Application granted granted Critical
Publication of CN112215336B publication Critical patent/CN112215336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses a data annotation method based on user behavior, which comprises the following steps: acquiring a marked image set and dividing the marked image set into a plurality of marked image subsets; acquiring a standard user behavior set; generating a set of predicted user behaviors for a target tagged image subset of the plurality of tagged image subsets; updating parameters of the behavior model by calculating loss values of a standard user behavior and a predicted user behavior set corresponding to the target marked image subset to obtain an updated behavior model; training the updated behavior model to obtain a standard behavior model; acquiring a data set to be labeled, and acquiring user behavior preference of the data set to be labeled by using a standard behavior model; and marking the data in the data set to be marked according to the user behavior preference. The invention also provides a data labeling device, equipment and a storage medium based on the user behavior. In addition, the invention also relates to a blockchain technology, and the mark image set can be stored in a blockchain node. The invention can improve the accuracy of labeling the data.

Description

Data labeling method, device, equipment and storage medium based on user behavior
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data annotation method and device based on user behaviors, electronic equipment and a computer readable storage medium.
Background
With the development of artificial intelligence, more and more complicated intelligent models are trained by using mass data, such as image recognition, image positioning and the like. In the process, the data needs to be labeled to obtain the data with the label, and then the model is trained by using the data with the label. Therefore, in the machine learning-based algorithm, labeling data is crucial in model training. If the labeling efficiency is not high, the accuracy of model training is affected.
Most of the existing data labeling methods are based on training a data labeling model on an image with a label, and the trained data labeling model is used for generating a prediction label for data, so that the data labeling is realized. However, a large number of error labels exist in labeled images of the current training data labeling model, so that the accuracy of the trained data labeling model on data labeling is low. Therefore, how to improve the accuracy of labeling data becomes a problem to be solved urgently.
Disclosure of Invention
The invention provides a data labeling method and device based on user behaviors, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of labeling data.
In order to achieve the above object, the present invention provides a data annotation method based on user behavior, which includes:
acquiring a marked image set, and dividing the marked image set into a plurality of marked image subsets;
obtaining a standard user behavior set, wherein the standard user behavior set comprises standard user behaviors corresponding to the plurality of tagged image subsets;
generating a predicted user behavior set of a target tagged image subset of the plurality of tagged image subsets by using a pre-constructed behavior model;
calculating loss values of the standard user behaviors corresponding to the target mark image subset and the predicted user behavior set, and updating parameters of the behavior model according to the loss values to obtain an updated behavior model;
sequentially utilizing the plurality of labeled image subsets to train the updated behavior model in order to obtain a standard behavior model;
acquiring a data set to be labeled, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain a plurality of user behavior preferences;
and marking the data to be marked in the data set to be marked according to the plurality of user behavior preferences.
Optionally, the obtaining the standard user behavior set includes:
generating a prediction label set corresponding to a plurality of marked image subsets in the marked image set by using a pre-constructed marking model;
calculating the credible value of each prediction label in the prediction label set;
correcting the predicted labels in the predicted label set according to the credibility value to obtain a standard label set;
and searching and acquiring a standard user behavior set corresponding to the standard label set in a preset database.
Optionally, the generating, by using a pre-constructed annotation model, a prediction label set corresponding to a plurality of tagged image subsets in the tagged image set includes:
performing convolution processing on the marked images in the plurality of marked image sets by using the marking model to obtain a convolution image set;
pooling all the convolution images in the convolution image set to obtain a coding feature set of the convolution image set;
carrying out full-connection processing on the coding features in the coding feature set to obtain a full-connection feature set;
calculating the label probability that each full-connection feature in the full-connection feature set belongs to a preset label by using a first activation function;
and determining that the preset label with the label probability larger than the probability threshold is a prediction label of the marked image corresponding to the full-connection feature, and obtaining a plurality of prediction label sets corresponding to a plurality of marked image subsets in the plurality of marked image sets.
Optionally, the sequentially training the updated behavior model by using the plurality of labeled image subsets to obtain a standard behavior model includes:
inputting a forward target marked image subset in the plurality of marked image subsets into an updating behavior model to obtain a hidden state generated by the updating behavior model based on the forward target marked image subset;
and updating the parameters of the updated behavior model by using the hidden state and the backward target marked image subsets in the plurality of marked image subsets to obtain the standard behavior model.
Optionally, the labeling, according to the multiple user behavior preferences, the data to be labeled in the data set to be labeled, includes:
carrying out serialization coding on the plurality of user behavior preferences to obtain a serialization behavior list;
and labeling the data to be labeled in the data set to be labeled according to the sequence of the user behavior preference in the serialized behavior list.
Optionally, the acquiring a marker image set includes:
determining a storage environment of the tagged image set;
acquiring a compiler corresponding to the storage environment;
generating a data call request by using the compiler;
and executing the data call request to obtain the marked image set.
Optionally, the calculating the loss values of the standard user behaviors and the predicted user behavior set corresponding to the target tagged image subset includes:
calculating a loss value f (P) of the set of predicted user behaviors and the standard user behaviors corresponding to the target marked image subset by using the following loss functioni):
f(Pi)=l(QTψ-1(Pi))
Wherein Q is the distribution matrix of the standard user behavior, T is the transposed symbol of the distribution matrix, psi is the error parameter of the identification model, PiAnd predicting the ith predicted user behavior in the predicted user behavior set.
In order to solve the above problem, the present invention further provides a data annotation device based on user behavior, the device comprising:
the image dividing module is used for acquiring a marked image set and dividing the marked image set into a plurality of marked image subsets;
a behavior obtaining module, configured to obtain a standard user behavior set, where the standard user behavior set includes standard user behaviors corresponding to the plurality of tagged image subsets;
the behavior prediction module is used for generating a predicted user behavior set of a target marked image subset in the plurality of marked image subsets by using a pre-constructed behavior model;
the first updating module is used for calculating loss values of the standard user behaviors and the predicted user behavior set corresponding to the target mark image subset, and updating parameters of the behavior model according to the loss values to obtain an updated behavior model;
the second updating module is used for sequentially utilizing the plurality of label image subsets to train the updating behavior model in order to obtain a standard behavior model;
the behavior analysis module is used for acquiring a data set to be labeled, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain a plurality of user behavior preferences;
and the data marking module is used for marking the data to be marked in the data set to be marked according to the user behavior preferences.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program; and
a processor executing the computer program stored in the memory to implement the data annotation method based on user behavior as described in any one of the above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium including a storage data area and a storage program area, the storage data area storing created data, the storage program area storing a computer program; wherein the computer program, when executed by a processor, implements the method for data annotation based on user behavior as described in any one of the above.
According to the embodiment of the invention, standard user behaviors are obtained, a pre-constructed behavior model is used for generating a predicted user behavior set of a target marked image subset in a plurality of marked image subsets, loss values of the standard user behaviors and the predicted user behavior set corresponding to the target marked image subset are calculated, parameters of the behavior model are updated according to the loss values to obtain an updated behavior model, the updated behavior model is sequentially trained in order by using the plurality of marked image subsets to obtain the standard behavior model, the model is trained by using the behavior data of a user instead of training the model by using training data with error labels, and the condition that the accuracy of the model is reduced due to the fact that the training data contain the error labels can be avoided; the data annotation method comprises the steps of obtaining a data set to be annotated, performing behavior analysis on the data set to be annotated by using a standard behavior model to obtain a plurality of user behavior preferences, annotating data to be annotated in the data set to be annotated according to the user behavior preferences, generating the user behavior preferences for the data set to be annotated first, and annotating the data to be annotated according to the user behavior preferences, so that the accuracy of data annotation can be improved. Therefore, the data labeling method, the data labeling device and the computer readable storage medium based on the user behaviors can improve the accuracy of labeling the data.
Drawings
Fig. 1 is a schematic flowchart of a data annotation method based on user behavior according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data annotation device based on user behavior according to an embodiment of the present invention;
fig. 3 is a schematic internal structural diagram of an electronic device implementing a data annotation method based on user behavior according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The execution subject of the data annotation method based on the user behavior provided by the embodiment of the present application includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the data annotation method based on user behavior may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
The invention provides a data annotation method based on user behaviors. Fig. 1 is a schematic flow chart of a data annotation method based on user behavior according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the data annotation method based on user behavior includes:
s1, obtaining a marked image set, and dividing the marked image set into a plurality of marked image subsets.
In an embodiment of the present invention, the labeled image set is any image set that can be used for training a machine learning model, and the labeled image set includes a plurality of labeled images and user behaviors generated by a user on the labeled images.
The marker image set can be stored in a block chain node which is constructed in advance, and the efficiency of obtaining the marker image set can be improved by utilizing the high throughput of the block chain node to data.
In detail, the acquiring a marker image set includes:
determining a storage environment of the tagged image set;
acquiring a compiler corresponding to the storage environment;
generating a data call request by using the compiler;
and executing the data call request to obtain the marked image set.
Further, the embodiment of the present invention divides the marker image set into a plurality of marker image subsets, where the plurality of marker image subsets include: a first subset of marker images, a second subset of marker images, …, an nth subset of marker images.
For example, the marked image set includes 1000 marked images, and the marked image set is divided into 10 marked image subsets, each of which includes 100 marked images. In an optional embodiment of the present invention, after acquiring the plurality of marker image subsets, the method further includes:
sorting the plurality of tagged image subsets.
Specifically, the plurality of marker image subsets may be sorted by a random sorting manner, or by acquiring a sequence with a specific order stored in advance, the plurality of marker image subsets may be sorted according to an order set in the sequence.
Preferably, in an optional embodiment of the present invention, the sorting of the plurality of labeled image subsets facilitates subsequent training of the model by sequentially using the plurality of labeled image subsets.
S2, obtaining a standard user behavior set, wherein the standard user behavior set comprises standard user behaviors corresponding to the plurality of label image subsets.
In the embodiment of the present invention, the acquiring a standard user behavior set includes:
generating a prediction label set corresponding to a plurality of marked image subsets in the marked image set by using a pre-constructed marking model;
calculating the credible value of each prediction label in the prediction label set;
correcting the predicted labels in the predicted label set according to the credibility value to obtain a standard label set;
and searching and acquiring a standard user behavior set corresponding to the standard label set in a preset database.
According to the embodiment of the invention, the label model is utilized to generate the prediction label sets corresponding to a plurality of label image subsets in the label image set, the credibility values of the prediction labels in the prediction label sets are calculated, the prediction labels are corrected into the standard label sets according to the credibility values, then the standard user behavior sets corresponding to the standard label sets are searched and obtained, and the correctness of the obtained standard user behavior sets is improved.
In detail, the generating, by using a pre-constructed annotation model, a prediction tag set corresponding to a plurality of tagged image subsets in the tagged image set includes:
performing convolution processing on the marked images in the plurality of marked image sets by using the marking model to obtain a convolution image set;
pooling all the convolution images in the convolution image set to obtain a coding feature set of the convolution image set;
carrying out full-connection processing on the coding features in the coding feature set to obtain a full-connection feature set;
calculating the label probability that each full-connection feature in the full-connection feature set belongs to a preset label by using a first activation function;
and determining that the preset label with the label probability larger than the probability threshold is a prediction label of the marked image corresponding to the full-connection feature, and obtaining a plurality of prediction label sets corresponding to a plurality of marked image subsets in the plurality of marked image sets.
Preferably, the first activation function includes a softmax activation function, a tanh activation function, a relu activation function, and the like.
And S3, generating a predicted user behavior set of a target mark image subset in the plurality of mark image subsets by using a pre-constructed behavior model.
In the embodiment of the present invention, the pre-constructed behavior model may be any supervised learning model with machine learning performance, and the behavior model is used to predict the behavior of the user on different tagged images (for example, selecting, dragging, stretching, etc. the tagged images).
In an embodiment of the present invention, the behavior model may be a convolutional neural network with a behavior prediction function, and the convolutional neural network includes, but is not limited to, a combination of one or more layers of a convolutional layer, a pooling layer, and a fully-connected layer. Specifically, the method comprises the following steps:
the convolution layer is used for carrying out convolution processing on the image, firstly locally perceiving each feature in the image, and then carrying out comprehensive operation on the local feature at a higher level so as to obtain global information;
the pooling layer is used for pooling the images after convolution, reducing the dimension of the characteristics, marking the number of data sets and parameters, reducing overfitting and improving the fault tolerance of the model;
and the full connection layer is used for carrying out linear classification on the images because the huge parameters of the full connection layer are easy to be over-fitted and do not accord with the local perception principle of human beings on the images, and is equivalent to carrying out linear combination on the extracted high-level feature vectors and outputting the high-level feature vectors.
The convolutional neural network may be specifically demonstrated as:
h=(h(n)·h(n-1)·…·h(1))
wherein h is(n)A network structure representing an nth layer of the convolutional neural network.
Preferably, in the embodiment of the present invention, the labeled images in the labeled image subset are input into the behavior model, so as to obtain the predicted user behavior corresponding to the labeled images.
For example, the marked images x in the marked image subset are input into the behavior model, so that the predicted user behaviors f (x) corresponding to the marked images x are obtained, and after the predicted user behaviors corresponding to all the marked images in the marked image subset are obtained, all the obtained predicted user behaviors are collected into a predicted user behavior set.
S4, calculating loss values of the standard user behaviors corresponding to the target mark image subsets and the predicted user behavior set, and updating parameters of the behavior model according to the loss values to obtain an updated behavior model.
In an embodiment of the present invention, the calculating a loss value of the standard user behavior and the predicted user behavior set corresponding to the target tagged image subset includes:
calculating a loss value f (P) of the set of predicted user behaviors and the standard user behaviors corresponding to the target marked image subset by using the following loss functioni):
f(Pi)=l(QTψ-1(Pi))
Wherein Q is the distribution matrix of the standard user behavior, T is the transposed symbol of the distribution matrix, psi is the error parameter of the identification model, PiAnd predicting the ith predicted user behavior in the predicted user behavior set.
According to the embodiment of the invention, the loss values of the standard user behaviors corresponding to the target marked image subset and the predicted user behavior set are calculated, and the difference between the standard user behaviors and the predicted user behavior set can be intuitively obtained according to the loss values.
In the embodiment of the invention, when the loss value is greater than the preset loss threshold value, the predicted user behavior set output by the behavior model is corrected, and then the parameters of the behavior model are updated by using the corrected predicted user behavior set to obtain the updated behavior model.
In the embodiment of the present invention, the parameters of the behavior model may be updated by using a gradient descent algorithm, which includes, but is not limited to, a batch gradient descent algorithm, a random gradient descent algorithm, and a small batch gradient descent algorithm.
And S5, sequentially training the updated behavior model by utilizing the plurality of labeled image subsets to obtain a standard behavior model.
In this embodiment of the present invention, the sequentially training the updated behavior model by using the plurality of labeled image subsets to obtain a standard behavior model includes:
inputting a forward target marked image subset in the plurality of marked image subsets into an updating behavior model to obtain a hidden state generated by the updating behavior model based on the forward target marked image subset;
and updating the parameters of the updated behavior model by using the hidden state and the backward target marked image subsets in the plurality of marked image subsets to obtain the standard behavior model.
Preferably, the forward target marker image subset and the backward target marker image subset are relatively similar, e.g., a plurality of marker image subsets comprises: a first marker-image subset, a second marker-image subset …, a tenth marker-image subset, the first marker-image subset being a forward target marker-image subset relative to the second to tenth marker-image subsets, the second marker-image subset being a forward target marker-image subset relative to the third to tenth marker-image subsets, the second marker-image subset being a backward target marker-image subset relative to the first marker-image subset, and so on.
Preferably, the LSTM network controller is added to the updated behavior model, so that the updated behavior model obtains a hidden state generated based on the forward target tagged image subset, and the updated behavior model is sequentially trained by using the plurality of tagged image subsets, the LSTM network controller is an ordered neural network, and the problem of long-term dependence in the neural network training process can be solved by sequentially training the updated behavior model by using the plurality of tagged image subsets, thereby improving the accuracy of the finally trained standard behavior model.
For example, the plurality of marker image subsets include: the first tagged image subset and the second tagged image subset … are tenth tagged image subsets, the first tagged image subset is input to the updated behavior model, a hidden state generated by the updated behavior model based on the forward target tagged image subset is obtained through an LSTM network controller in the updated behavior model, the updated behavior model is trained by using the hidden state and the second tagged image subset until all tagged image subsets in the plurality of tagged image subsets complete training of the updated behavior model, and a standard behavior model is obtained.
And S6, acquiring a data set to be labeled, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain a plurality of user behavior preferences.
In the embodiment of the invention, a to-be-labeled data set can be acquired from a pre-constructed database for storing to-be-labeled data by utilizing a python statement with a data grabbing function.
And further, inputting the data set to be labeled into a standard behavior model, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain the user behavior preference.
Preferably, the user behavior preference refers to a behavior (e.g., selecting, dragging, stretching, etc. a marker image) performed by a user on different marker images.
And S7, labeling the data to be labeled in the data set to be labeled according to the user behavior preferences.
In an optional embodiment of the present invention, the labeling, according to the multiple user behavior preferences, data to be labeled in the data set to be labeled, includes:
carrying out serialization coding on the plurality of user behavior preferences to obtain a serialization behavior list;
and labeling the data to be labeled in the data set to be labeled according to the sequence of the user behavior preference in the serialized behavior list.
In detail, the plurality of user behavior preferences are serialized and encoded to obtain a serialized behavior list, for example, the plurality of user behavior preferences include preference a, preference B, and preference C, the preference a is serialized and encoded into No. 1, the preference B is serialized and encoded into No. 3, and the preference C is serialized and encoded into No. 2, and the serialized behavior list with the order of preference a, preference C, and preference B is obtained.
In the embodiment of the invention, a plurality of user behavior preferences are serialized and coded into the serialized behavior list, and the data to be labeled in the data set to be labeled is labeled according to the sequence of the user behavior preferences in the serialized behavior list, so that the data to be labeled in the data set to be labeled can be labeled orderly, repeated labeling is avoided, and the efficiency of labeling the data to be labeled in the data set to be labeled is improved.
According to the embodiment of the invention, standard user behaviors are obtained, a pre-constructed behavior model is used for generating a predicted user behavior set of a target marked image subset in a plurality of marked image subsets, loss values of the standard user behaviors and the predicted user behavior set corresponding to the target marked image subset are calculated, parameters of the behavior model are updated according to the loss values to obtain an updated behavior model, the updated behavior model is sequentially trained in order by using the plurality of marked image subsets to obtain the standard behavior model, the model is trained by using the behavior data of a user instead of training the model by using training data with error labels, and the condition that the accuracy of the model is reduced due to the fact that the training data contain the error labels can be avoided; the data annotation method comprises the steps of obtaining a data set to be annotated, performing behavior analysis on the data set to be annotated by using a standard behavior model to obtain a plurality of user behavior preferences, annotating data to be annotated in the data set to be annotated according to the user behavior preferences, generating the user behavior preferences for the data set to be annotated first, and annotating the data to be annotated according to the user behavior preferences, so that the accuracy of data annotation can be improved. Therefore, the data annotation method based on the user behavior can improve the accuracy of data annotation.
Fig. 2 is a schematic block diagram of a data annotation device based on user behavior according to the present invention.
The data annotation device 100 based on user behavior can be installed in an electronic device. According to the realized functions, the data annotation device based on the user behaviors can comprise an image dividing module 101, a behavior acquisition module 102, a behavior prediction module 103, a first updating module 104, a second updating module 105, a behavior analysis module 106 and a data annotation module 107. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the image dividing module 101 is configured to obtain a marker image set, and divide the marker image set into a plurality of marker image subsets;
the behavior obtaining module 102 is configured to obtain a standard user behavior set, where the standard user behavior set includes standard user behaviors corresponding to the plurality of tag image subsets;
the behavior prediction module 103 is configured to generate a predicted user behavior set of a target tagged image subset of the plurality of tagged image subsets by using a pre-constructed behavior model;
the first updating module 104 is configured to calculate loss values of the standard user behavior and the predicted user behavior set corresponding to the target tagged image subset, and update parameters of the behavior model according to the loss values to obtain an updated behavior model;
the second updating module 105 is configured to sequentially train the updated behavior model by using the plurality of labeled image subsets to obtain a standard behavior model;
the behavior analysis module 106 is configured to obtain a data set to be labeled, perform behavior analysis on the data set to be labeled by using the standard behavior model, and obtain a plurality of user behavior preferences;
the data labeling module 107 is configured to label the data to be labeled in the data set to be labeled according to the multiple user behavior preferences.
The modules in the data annotation device 100 based on user behaviors provided by the embodiment of the present invention can use the same technical means as the embodiment of the data annotation method based on user behaviors in fig. 1, and produce the same technical effect.
Fig. 3 is a schematic structural diagram of an electronic device implementing the data annotation method based on user behavior according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a data annotation program 12 based on user behavior, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as code of the data annotation program 12 based on user behavior, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., executing a data tagging program based on user behavior, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data annotation program 12 based on user behavior stored in the memory 11 of the electronic device 1 is a combination of a plurality of computer programs, which when executed in the processor 10, can realize:
acquiring a marked image set, and dividing the marked image set into a plurality of marked image subsets;
obtaining a standard user behavior set, wherein the standard user behavior set comprises standard user behaviors corresponding to the plurality of tagged image subsets;
generating a predicted user behavior set of a target tagged image subset of the plurality of tagged image subsets by using a pre-constructed behavior model;
calculating loss values of the standard user behaviors corresponding to the target mark image subset and the predicted user behavior set, and updating parameters of the behavior model according to the loss values to obtain an updated behavior model;
sequentially utilizing the plurality of labeled image subsets to train the updated behavior model in order to obtain a standard behavior model;
acquiring a data set to be labeled, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain a plurality of user behavior preferences;
and marking the data to be marked in the data set to be marked according to the plurality of user behavior preferences.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any accompanying claims should not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A data annotation method based on user behavior is characterized by comprising the following steps:
acquiring a marked image set, and dividing the marked image set into a plurality of marked image subsets;
obtaining a standard user behavior set, wherein the standard user behavior set comprises standard user behaviors corresponding to the plurality of tagged image subsets;
generating a predicted user behavior set of a target tagged image subset of the plurality of tagged image subsets by using a pre-constructed behavior model;
calculating loss values of the standard user behaviors corresponding to the target mark image subset and the predicted user behavior set, and updating parameters of the behavior model according to the loss values to obtain an updated behavior model;
sequentially utilizing the plurality of labeled image subsets to train the updated behavior model in order to obtain a standard behavior model;
acquiring a data set to be labeled, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain a plurality of user behavior preferences;
and marking the data to be marked in the data set to be marked according to the plurality of user behavior preferences.
2. The user behavior-based data annotation method of claim 1, wherein the obtaining the set of standard user behaviors comprises:
generating a prediction label set corresponding to a plurality of marked image subsets in the marked image set by using a pre-constructed marking model;
calculating the credible value of each prediction label in the prediction label set;
correcting the predicted labels in the predicted label set according to the credibility value to obtain a standard label set;
and searching and acquiring a standard user behavior set corresponding to the standard label set in a preset database.
3. The method for data annotation based on user behavior of claim 2, wherein the generating the corresponding predictive tag set of the plurality of tagged image subsets in the tagged image set using a pre-constructed annotation model comprises:
performing convolution processing on the marked images in the plurality of marked image sets by using the marking model to obtain a convolution image set;
pooling all the convolution images in the convolution image set to obtain a coding feature set of the convolution image set;
carrying out full-connection processing on the coding features in the coding feature set to obtain a full-connection feature set;
calculating the label probability that each full-connection feature in the full-connection feature set belongs to a preset label by using a first activation function;
and determining that the preset label with the label probability larger than the probability threshold is a prediction label of the marked image corresponding to the full-connection feature, and obtaining a plurality of prediction label sets corresponding to a plurality of marked image subsets in the plurality of marked image sets.
4. The user behavior-based data annotation method of claim 1, wherein the sequentially training the updated behavior model with the plurality of tagged image subsets to obtain a standard behavior model comprises:
inputting a forward target marked image subset in the plurality of marked image subsets into an updating behavior model to obtain a hidden state generated by the updating behavior model based on the forward target marked image subset;
and updating the parameters of the updated behavior model by using the hidden state and the backward target marked image subsets in the plurality of marked image subsets to obtain the standard behavior model.
5. The method for labeling data based on user behavior according to any one of claims 1 to 4, wherein the labeling data to be labeled in the data set to be labeled according to the user behavior preferences comprises:
carrying out serialization coding on the plurality of user behavior preferences to obtain a serialization behavior list;
and labeling the data to be labeled in the data set to be labeled according to the sequence of the user behavior preference in the serialized behavior list.
6. The user behavior-based data annotation method of any one of claims 1 to 4, wherein the obtaining of the tagged image set comprises:
determining a storage environment of the tagged image set;
acquiring a compiler corresponding to the storage environment;
generating a data call request by using the compiler;
and executing the data call request to obtain the marked image set.
7. The method for user behavior-based data annotation of claim 1, wherein the calculating the loss values for the set of predicted user behaviors and the standard user behaviors corresponding to the subset of target markup images comprises:
calculating a loss value f (P) of the set of predicted user behaviors and the standard user behaviors corresponding to the target marked image subset by using the following loss functioni):
f(Pi)=l(QTψ-1(Pi))
Wherein Q is the distribution matrix of the standard user behavior, T is the transposed symbol of the distribution matrix, psi is the error parameter of the identification model, PiAnd predicting the ith predicted user behavior in the predicted user behavior set.
8. An apparatus for annotating data based on user behavior, the apparatus comprising:
the image dividing module is used for acquiring a marked image set and dividing the marked image set into a plurality of marked image subsets;
a behavior obtaining module, configured to obtain a standard user behavior set, where the standard user behavior set includes standard user behaviors corresponding to the plurality of tagged image subsets;
the behavior prediction module is used for generating a predicted user behavior set of a target marked image subset in the plurality of marked image subsets by using a pre-constructed behavior model;
the first updating module is used for calculating loss values of the standard user behaviors and the predicted user behavior set corresponding to the target mark image subset, and updating parameters of the behavior model according to the loss values to obtain an updated behavior model;
the second updating module is used for sequentially utilizing the plurality of label image subsets to train the updating behavior model in order to obtain a standard behavior model;
the behavior analysis module is used for acquiring a data set to be labeled, and performing behavior analysis on the data set to be labeled by using the standard behavior model to obtain a plurality of user behavior preferences;
and the data marking module is used for marking the data to be marked in the data set to be marked according to the user behavior preferences.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of data annotation based on user behavior of any one of claims 1 to 7.
10. A computer-readable storage medium comprising a storage data area storing created data and a storage program area storing a computer program; wherein the computer program when executed by a processor implements a method of data annotation based on user behavior as claimed in any one of claims 1 to 7.
CN202011058992.5A 2020-09-30 2020-09-30 Data labeling method, device, equipment and storage medium based on user behaviors Active CN112215336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011058992.5A CN112215336B (en) 2020-09-30 2020-09-30 Data labeling method, device, equipment and storage medium based on user behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011058992.5A CN112215336B (en) 2020-09-30 2020-09-30 Data labeling method, device, equipment and storage medium based on user behaviors

Publications (2)

Publication Number Publication Date
CN112215336A true CN112215336A (en) 2021-01-12
CN112215336B CN112215336B (en) 2024-02-09

Family

ID=74050946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011058992.5A Active CN112215336B (en) 2020-09-30 2020-09-30 Data labeling method, device, equipment and storage medium based on user behaviors

Country Status (1)

Country Link
CN (1) CN112215336B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114706927A (en) * 2022-04-12 2022-07-05 平安国际智慧城市科技股份有限公司 Data batch annotation method based on artificial intelligence and related equipment
CN114706927B (en) * 2022-04-12 2024-05-03 平安国际智慧城市科技股份有限公司 Data batch labeling method based on artificial intelligence and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334814A (en) * 2018-01-11 2018-07-27 浙江工业大学 A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis
WO2020164270A1 (en) * 2019-02-15 2020-08-20 平安科技(深圳)有限公司 Deep-learning-based pedestrian detection method, system and apparatus, and storage medium
CN111652278A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 User behavior detection method and device, electronic equipment and medium
CN111709766A (en) * 2020-04-14 2020-09-25 中国农业银行股份有限公司 User behavior prediction method and device, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334814A (en) * 2018-01-11 2018-07-27 浙江工业大学 A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis
WO2020164270A1 (en) * 2019-02-15 2020-08-20 平安科技(深圳)有限公司 Deep-learning-based pedestrian detection method, system and apparatus, and storage medium
CN111709766A (en) * 2020-04-14 2020-09-25 中国农业银行股份有限公司 User behavior prediction method and device, storage medium and electronic equipment
CN111652278A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 User behavior detection method and device, electronic equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黎健成;袁春;宋友;: "基于卷积神经网络的多标签图像自动标注", 计算机科学, no. 07 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114706927A (en) * 2022-04-12 2022-07-05 平安国际智慧城市科技股份有限公司 Data batch annotation method based on artificial intelligence and related equipment
CN114706927B (en) * 2022-04-12 2024-05-03 平安国际智慧城市科技股份有限公司 Data batch labeling method based on artificial intelligence and related equipment

Also Published As

Publication number Publication date
CN112215336B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN111814962A (en) Method and device for acquiring parameters of recognition model, electronic equipment and storage medium
CN111932547B (en) Method and device for segmenting target object in image, electronic device and storage medium
CN111626047A (en) Intelligent text error correction method and device, electronic equipment and readable storage medium
CN111783982A (en) Attack sample acquisition method, device, equipment and medium
CN113157927A (en) Text classification method and device, electronic equipment and readable storage medium
CN113704429A (en) Semi-supervised learning-based intention identification method, device, equipment and medium
CN113298159A (en) Target detection method and device, electronic equipment and storage medium
CN113327136A (en) Attribution analysis method and device, electronic equipment and storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN114187489B (en) Method and device for detecting abnormal driving risk of vehicle, electronic equipment and storage medium
CN112885423A (en) Disease label detection method and device, electronic equipment and storage medium
CN115205225A (en) Training method, device and equipment of medical image recognition model and storage medium
CN112990374B (en) Image classification method, device, electronic equipment and medium
CN114494800A (en) Prediction model training method and device, electronic equipment and storage medium
CN113157739A (en) Cross-modal retrieval method and device, electronic equipment and storage medium
CN112101481B (en) Method, device, equipment and storage medium for screening influence factors of target object
CN113658002A (en) Decision tree-based transaction result generation method and device, electronic equipment and medium
CN113869456A (en) Sampling monitoring method and device, electronic equipment and storage medium
CN113868528A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN113268665A (en) Information recommendation method, device and equipment based on random forest and storage medium
CN113505273A (en) Data sorting method, device, equipment and medium based on repeated data screening
CN112269875A (en) Text classification method and device, electronic equipment and storage medium
CN111652282A (en) Big data based user preference analysis method and device and electronic equipment
CN114398890A (en) Text enhancement method, device, equipment and storage medium
CN115496166A (en) Multitasking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant