WO2018214895A1 - 数据处理方法、数据处理装置、存储设备及网络设备 - Google Patents

数据处理方法、数据处理装置、存储设备及网络设备 Download PDF

Info

Publication number
WO2018214895A1
WO2018214895A1 PCT/CN2018/087961 CN2018087961W WO2018214895A1 WO 2018214895 A1 WO2018214895 A1 WO 2018214895A1 CN 2018087961 W CN2018087961 W CN 2018087961W WO 2018214895 A1 WO2018214895 A1 WO 2018214895A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
target
sample data
data
verification system
Prior art date
Application number
PCT/CN2018/087961
Other languages
English (en)
French (fr)
Inventor
何卓略
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018214895A1 publication Critical patent/WO2018214895A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the field of Internet technologies, and particularly relates to the field of data processing technologies based on machine learning, and in particular, to a data processing method, a data processing device, a storage device, and a network device.
  • the sample data with the labeled result such as the image data marked with the face position, or the image data marked with the facial expression, or the voice data marked with the age of the person, is the object used for the machine for training and learning, and is the basis of machine learning.
  • the demand for sample data is increasing due to various Internet systems based on machine learning; for example, as the number of layers of deep neural networks increases, the number of sample data required by deep neural networks may reach Another example: the social recommendation system may require hundreds of billions of sample data in order to obtain more accurate and effective social recommendations.
  • the current situation of the Internet that contradicts the big demand is the general lack of sample data. The main reason for this contradiction is that the labeling process of the current sample data is manually completed by special labeling personnel, and the cost is high and the sample data in the Internet is made. Scarcity is scarce. .
  • the embodiment of the present application provides a data processing method, a data processing device, a storage device, and a network device, which can reduce the labeling cost of the sample data and expand the number of sample data in the Internet.
  • the embodiment of the present application provides a data processing method, which may include:
  • the embodiment of the present application provides a data processing apparatus, which may include:
  • An obtaining unit configured to acquire target sample data to be processed in the verification system
  • An output unit configured to output the target sample data to at least one target user entering the verification system, so that the at least one target user labels the target sample data
  • a collecting unit configured to collect at least one annotation data generated by the at least one target user to mark the target sample data
  • a learning unit configured to perform learning processing on the at least one annotation data of the target sample data by using a machine learning algorithm, to obtain an annotation result of the target sample data.
  • the embodiment of the present application provides a network device, including:
  • a processor adapted to implement one or more instructions
  • a storage device the storage device storing one or more instructions, the one or more instructions being adapted to be loaded by the processor and to perform the data processing method described in the embodiments of the present application.
  • the embodiment of the present application can exploit the short-term attention of the fragmentation of the Internet user in the verification process, and expand the number of Internet sample data by using the user to realize the labeling of the sample data on a large scale and in a distributed manner; further, the sample data with the labeled result is further It can be used as a verification material for known answers to expand the amount of material data in the Internet authentication system.
  • FIG. 1 is a schematic diagram of an annotation page according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 3 is a flowchart of another data processing method according to an embodiment of the present application.
  • FIG. 4a is a schematic diagram of another annotation page provided by an embodiment of the present application.
  • FIG. 4b is a schematic diagram of another labeling page according to an embodiment of the present application.
  • 4c is a schematic diagram of still another annotation page provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a network device according to an embodiment of the present disclosure.
  • Machine learning is a technique involving multiple fields of interdisciplinary research. It mainly involves many disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. Machine learning is used to specifically study how machines simulate or implement human learning behaviors to acquire new knowledge or new skills, and to reorganize existing knowledge structures to continuously improve their performance.
  • a machine herein may refer to a computer, an electronic computer, a neural computer, or the like.
  • machine learning is widely used in various Internet scenarios, for example, data mining scenarios, computer vision scenarios, natural language processing scenarios, neural network construction scenarios, information recommendation scenarios, and the like.
  • the sample data with the labeled results is the object used for machine learning and learning, and is the basis of machine learning.
  • image data marked with face position can be used as sample data for machine learning; or image data with facial expressions can be used as sample data for machine learning; or voice data with age of characters can be used as machine learning Sample data; and so on.
  • machine learning technology the demand for sample data is increasing due to various Internet systems based on machine learning. For example, as the number of levels of deep neural networks increases, the number of sample data required by deep neural networks may reach hundreds of millions. For another example, the social recommendation system may require hundreds of billions of sample data in order to obtain more accurate and effective social recommendations.
  • the status quo of the Internet that contradicts the big demand is the general lack of sample data, which is mainly manifested in two aspects. On the one hand, there is a lack of sample data.
  • sample data on the location of a face in the Internet there may be sample data on the location of a face in the Internet, but sample data on face gender, face age, face expression, face pose, etc. is scarce.
  • the amount of sample data is seriously insufficient.
  • sample data on voice, objects, animals, and autopilots that exist on the Internet.
  • the main reason for this contradiction is that the current labeling process of sample data is done manually by specialized labeling personnel, and the cost is high and the sample data in the Internet is scarce.
  • a labeling process for a sample of data requires “short attention”; for example, when animating a sentiment index for a photo or a certain segment of speech, assume that the sentiment index is set as follows: 1. Frustrated; 2 , generally depressed; 3, peace; 4, happy; 5, very happy. This labeling process only needs to be labeled by the person who is paying attention to the photo or voice for a few seconds. A careful analysis shows that the Internet itself can provide a large amount of such "short-term attention". For example, in order to ensure the security of Internet users in application login, e-commerce process or other application scenarios, Internet scenarios usually have a verification system. The verification system requires the user to perform an authentication step such as inputting a verification code.
  • Such verification steps require the user to pay a "short attention” to carefully look at the verification code picture and carefully input the correct result in order to pass the verification as soon as possible.
  • the embodiment of the present application utilizes a large amount of “short-term attention” provided by the Internet to collect the more reliable output generated by multiple Internet users in the authentication process by “short attention” to realize the labeling of sample data. And based on machine learning to obtain the labeling result of the sample data, greatly reducing the labeling cost of the sample data, and expanding the amount of sample data in the Internet.
  • the traditional verification system only includes the verification mode, that is, the user enters the verification system, that is, the step of performing identity verification in the verification mode, for example, when a user enters the verification system, in the verification mode
  • the user outputs a verification code image, asking the user to fill out and submit the correct verification code to pass the verification.
  • the verification system of the embodiment of the present application adds a labeling mode based on the verification mode. For example, when the user enters the verification system, the label data is first marked in the label mode, and then the verification code image is output to the user in the verification mode. The user fills in and submits the correct verification code to pass the verification.
  • the solution of the embodiment of the present application is briefly described below with reference to FIG.
  • the emotional index reflected in the face photo shown in Figure 1 is marked, and the emotional index is set as follows: 1. Depressed; 2. Generally depressed; 3. Peaceful; 4. Happy; 5. Very happy.
  • the solution of the embodiment of the present application is as follows:
  • the user A in the Internet enters the verification system the user A is first selected in the annotation mode to display the face photo and the emotion index option.
  • user B, user C, user D, and other users of the Internet entering the verification system display the same face photo and emotional index in the annotation mode to request the emotional index selection. Understandably, the user's choice of emotional index may be real, random, or even meaningless.
  • the embodiment of the present application collects the sentiment index selected by all users for the same face photo on the one hand, and the selection data necessarily has a certain distribution rule, and uses the machine learning method to identify the valid data therein, and finally obtains the face photo about the emotion index.
  • the result of the annotation On the other hand, after collecting the sentiment index selected by each user, respectively output a similar verification failure prompt message such as "Error, please re-select" to each user, and then switch from the annotation mode to the verification mode and adopt the Internet in the verification mode.
  • the existing material data is re-verified for each user. For example, outputting a verification code image to the user requires the user to fill in and submit the correct verification code to pass the verification.
  • the embodiment of the present application can exploit the short-term attention of fragmentation of the Internet user in the verification process, and expand the number of Internet sample data by using the user to realize the labeling of the sample data on a large scale and in a distributed manner. Further, the sample data with the labeled result can be used as a verification material for the known answer, and the amount of material data in the Internet verification system is expanded.
  • the embodiment of the present application provides a data processing method.
  • the method may include the following steps S101 to S104.
  • a traditional authentication system is a system that provides authentication services. For example, an Internet user enters the authentication system for authentication during application login, e-commerce, and the like. If the output verification code picture requires the user to fill out and submit the correct verification code to confirm that it is non-machine operation, to ensure the security of the login or transaction.
  • the verification system of the embodiment of the present application also provides a sample annotation service while providing an identity verification service. According to the above example: before the Internet user enters the authentication system for authentication in the process of application login, e-commerce, etc., the Internet user is first required to complete the annotation of the sample data and collect the annotation data of the Internet user, and then perform the output as an Internet user. The captcha image requires the user to complete and submit the verification step for the correct captcha.
  • the verification system includes an annotation mode and a verification mode
  • the verification system includes a sample library and a material library.
  • the sample library includes at least one piece of sample data, the sample data including any of the following: an image, a voice, and a text.
  • the material library includes at least one material data, and the material data includes any one of the following: an image, a voice, and a text.
  • the annotation mode is used to label each sample data in the sample library.
  • the sample data in the sample library is unlabeled data.
  • the material data in the material library is data with annotations.
  • the verification mode is used to authenticate a user entering the verification system by using each material data in the material library.
  • a sample data may be randomly selected from the sample library as the target sample data, or a sample data may be specified from the sample library as the target sample data according to actual needs.
  • S102 Output the target sample data to at least one target user entering the verification system, so that the at least one target user labels the target sample data.
  • the target sample data can be output in a certain format. For example, some optional annotation data of the target sample data may be outputted while the target sample data is output, so that the target user can complete the annotation of the target sample data by selecting an annotation data. Alternatively, the input box may be displayed while the target sample data is being output, so that the target user can complete the labeling of the target sample data by manually inputting the annotation data.
  • S103 Collect at least one annotation data generated by the at least one target user to mark the target sample data.
  • Each target user labels the target sample data to generate annotated data.
  • FIG. 1 if user A selects “3, peace”, then “3, peace” is the annotation data generated by user A to mark the face photo; similarly, user B selects “2, generally frustrated”. Then “2, generally frustrated” is the annotation data generated by user B for the face photo annotation.
  • step S103 the annotation data generated by each target user respectively marking the target sample data is collected.
  • S104 Perform learning processing on at least one annotation data of the target sample data by using a machine learning algorithm to obtain an annotation result of the target sample data.
  • the label data of the target sample data may be real, random, or even meaningless, but the label data must have a certain distribution law, and the machine learning algorithm can be used to learn and process the label data to obtain the target sample data. Label the results.
  • the machine learning algorithm may include, but is not limited to, an anomaly detection algorithm, a collaborative filtering algorithm, a decision tree algorithm, an optimization algorithm, and the like.
  • the embodiment of the present application can exploit the short-term attention of the fragmentation of the Internet user in the verification process, and expand the number of Internet sample data by using the user to realize the labeling of the sample data on a large scale and in a distributed manner.
  • An embodiment of the present application provides another data processing method.
  • the method may include the following steps S201 to S208.
  • the embodiment of the present application can mark the sample data by means of all Internet users. However, in order to improve the user experience, a part of Internet users can be selected as the target user to label the sample data.
  • the determining process of step S201 is a process of determining a target user; in some embodiments, step S201 includes at least the following three possible implementation modes:
  • step S201 may include the following steps s11-s13:
  • Steps s11-s13 adopt a random method to determine the target user, that is, randomly select the target user according to the frequency of the historical annotation.
  • the historical annotation information of a user records the total number of times the user performs the labeling operation in a predetermined period; for example, a user A separately labels the sample data a in the last hour, and labels the sample data b twice.
  • the sample data c is marked once; then, the historical annotation information of the user A records that the user A's labeling frequency is 4 times/hour.
  • a user corresponds to a historical annotation information, and the historical annotation information of each user can be stored in a local or cloud storage space, and updated in real time according to the user's annotation operation, so the user's historical annotation information can be obtained from the local or cloud storage space.
  • the first preset threshold may be determined according to actual needs, for example, the first preset threshold may be 5 times/hour, 2 times/minute, and the like. If the frequency of labeling of a user is greater than or equal to the first preset threshold, it indicates that the user has performed the labeling operation of the sample data multiple times in the preset period. If the user frequently performs the labeling operation during the verification process, the user may be affected. The user's experience, so the user can be determined as a normal user, and the labeling operation is no longer performed in the current preset period. On the other hand, if the frequency of labeling by a user is less than the first preset threshold, it indicates that the user performs the labeling operation of the sample data within the preset period. If the user is required to perform the labeling operation again during the verification process, the The user's experience, so the user can be determined to perform the labeling operation for the target user to enter the subsequent process.
  • the S201 may include the following steps s21-s23.
  • the identifier is a preset candidate user identifier, determine that the user entering the verification system is the target user.
  • identifier is not a preset candidate user identifier, determine that the user entering the verification system is a normal user.
  • Steps s21-s23 adopt a directional manner to determine the target user, that is, some target users are pre-selected.
  • the preset candidate user identifier is the identifier of the pre-selected target user; the identifier herein may include, but is not limited to, an instant messaging identifier (such as a QQ number, a micro signal, etc.), an SNS identifier (such as a microblog number, a blog). No.), communication identification (such as mobile phone number, landline number, etc.), email number, etc. If the user identifier entering the verification system is a preset candidate user identifier, the user may be determined to be the target user; otherwise, the user is determined to be a normal user.
  • the S201 can include the following steps s31-s33.
  • Steps s31-s33 adopt a screening method to determine the target user, that is, filter the qualified target users according to the success rate of the historical verification.
  • the historical verification information of a user records the success rate of the user who enters the verification system in the verification mode of the verification system; for example, a user A performs the authentication process with the verification code, the first time to The verification code submitted in the N-1th time is wrong. When the correct verification code is submitted in the Nth time, the success rate of User A is 1/N (N is a positive integer).
  • a user corresponds to a historical verification information.
  • the historical verification information of each user can be stored in the local or cloud storage space, and updated in real time according to the user's verification process.
  • the second preset threshold may be determined according to actual needs, for example, the second preset threshold may be 1/2, 1/3, and the like. If the verification success rate of a user is greater than or equal to the second preset threshold, it indicates that the user usually completes the verification process more seriously, and the submitted data is more realistic and highly reliable, and is suitable for determining the user as the target user. To perform labeling operations on sample data. Conversely, if the verification success rate of a user is less than the second preset threshold, it indicates that the user usually performs more casually during the verification process, and the submitted data is less authentic and less reliable, and is not suitable for the user to perform. The labeling operation of the sample data, so the user can be determined as a normal user.
  • step S202 If it is determined that the target user enters the verification system, the process proceeds to step S202 to perform subsequent labeling process on the target sample data in the annotation mode; if it is determined that the ordinary user enters the verification system, then Going to step S208, the normal user is authenticated in the verification mode.
  • S202 Start an annotation mode of the verification system, and obtain target sample data to be processed in the verification system in the annotation mode.
  • the target sample data is any sample data in the sample library, which may be an image, such as a face image, an animal image, or the like. It can also be a voice, such as a voice spoken by a person, a song, and so on. It can also be text, such as: a sentence, a word, and so on.
  • S203 Output the target sample data to at least one target user entering the verification system, so that the at least one target user labels the target sample data.
  • the output method can be flexibly selected according to the type of target sample data. For example, if the target sample data is an image or text, it can be output by display. Another example: if the target sample data is voice, then it can be output through the speaker playback mode.
  • the purpose of outputting target sample data is to enable the target user to recognize the target sample data through the sensory system (eye, ear, mouth, nose) and to mark the target sample data through listening, speaking, reading and writing.
  • step S203 may specifically include the following steps s41-s43.
  • the labeling manner of the target sample data is a selection manner, outputting the target sample data to the at least one target user, and outputting at least one candidate annotation data corresponding to the target sample data to the at least one The target user makes a selection.
  • the labeling manner of the target sample data is an input mode, output the target sample data to the at least one target user, and display an input box to enable the at least one target user to input the target in the input box.
  • the annotation data corresponding to the sample data.
  • Steps s41-s43 define the manner in which the target user labels the target sample data; specifically, if the labeling method of the target sample data is the selection mode, at least one candidate annotation data is displayed while outputting the target sample data, and the target The user directly selects an annotation data to complete the annotation of the target sample data.
  • at least one candidate annotation data can be encapsulated as an option (as shown in FIG. 1), in which case the target user can click on an option to select an annotation data.
  • At least one candidate annotation data can also be packaged into the sliding area (as shown in Figure 4a), at which point the target user selects an annotation data by operating the slider in the sliding area.
  • the input box is displayed while the target sample data is output, and the target user directly inputs the label data in the input box to complete the labeling;
  • the input box may be a text input box (as shown in the figure) 4b) can also be a voice input box (as shown in Figure 4c).
  • S205 Perform learning processing on at least one annotation data of the target sample data by using a machine learning algorithm, and obtain an annotation result of the target sample data.
  • Steps S204-S205 may refer to steps S103-S104 of the embodiment shown in FIG. 2, and details are not described herein.
  • two processing logics are entered, one of which is processing steps S205-S206; and the other processing logic is steps S207-S208.
  • the process proceeds to steps S205-S206 to learn at least one annotation data by the machine learning algorithm to obtain the annotation result of the target sample data.
  • the preset number here can be set according to actual needs. In order to ensure accuracy, the preset quantity can usually be set larger, such as hundreds, thousands, tens of thousands, etc., to ensure a sufficient number of label data. So far, a sample data with labelled results that can be used for machine learning training has been obtained.
  • the target sample data with the labeled result is added as material data to the material library, which expands the sample data for training and learning in the Internet, and expands the material data used for authentication in the Internet verification system.
  • the process proceeds to steps S207-S208 to output verification failure prompt information such as "error, please re-select” or "error, please re-enter” to each target user. To remind the target user to re-authenticate in authentication mode.
  • the embodiment of the present application can exploit the short-term attention of the fragmentation of the Internet user in the verification process, and expand the number of Internet sample data by using the user to realize the labeling of the sample data on a large scale and in a distributed manner; further, the sample data with the labeled result is further It can be used as a verification material for known answers to expand the amount of material data in the Internet authentication system.
  • the embodiment of the present application further discloses a data processing device, which may be a computer program (including program code), and the computer program can run on a terminal (such as a PC (Personal Computer) , a personal computer, a mobile phone, etc., a network device such as a single server or a cluster service device, for performing the data processing method shown in any of the embodiments of FIGS. 2 to 3.
  • a data processing apparatus operates as follows:
  • the obtaining unit 101 is configured to acquire target sample data to be processed in the verification system.
  • the output unit 102 is configured to output the target sample data to at least one target user entering the verification system, so that the at least one target user labels the target sample data.
  • the collecting unit 103 is configured to collect at least one annotation data generated by the at least one target user to mark the target sample data.
  • the learning unit 104 is configured to perform learning processing on the at least one annotation data of the target sample data by using a machine learning algorithm to obtain an annotation result of the target sample data.
  • the verification system includes an annotation mode and a verification mode, and the verification system includes a sample library and a material library;
  • the sample library includes at least one piece of sample data, the sample data including any one of the following: an image, a voice, and a text;
  • the material library includes at least one material data, the material data including any one of the following: an image, a voice, and text;
  • the annotation mode is used to label each sample data in the sample library; the verification mode is used to authenticate a user entering the verification system by using each material data in the material library.
  • the data processing apparatus also operates as follows:
  • the determining unit 105 is configured to determine, when it is detected that any user enters the verification system, whether the user entering the verification system is the target user.
  • the processing unit 106 is configured to: if the user entering the verification system is the target user, start the labeling mode of the verification system, and notify the acquiring unit to obtain the target sample data to be processed in the verification system in the labeling mode; or If the user entering the verification system is a normal user, the verification mode of the verification system is started, and a material data is selected from the material library in the verification mode to authenticate the user entering the verification system.
  • the data processing apparatus specifically runs the following units in the process of running the determining unit 105:
  • the first information acquiring unit 1001 is configured to acquire historical annotation information of the user entering the verification system when detecting that any user enters the verification system, where the historical annotation information records the user in the verification system The frequency at which the sample data in the sample library is labeled in the annotation mode.
  • the first determining unit 1002 is configured to: if the frequency is less than a preset first threshold, determine that the user entering the verification system is a target user; or if the frequency is greater than or equal to a first preset threshold, Then, it is determined that the user entering the verification system is a normal user.
  • the data processing apparatus specifically runs the following unit in the process of running the determining unit 105:
  • the identifier obtaining unit 1011 is configured to acquire an identifier of the user entering the verification system when detecting that any user enters the verification system.
  • the second determining unit 1012 is configured to: if the identifier is a preset candidate user identifier, determine that the user entering the verification system is a target user; or if the identifier is not a preset candidate user identifier, Then, it is determined that the user entering the verification system is a normal user.
  • the data processing apparatus specifically runs the following unit in the process of running the determining unit 105:
  • the second information acquiring unit 1111 is configured to acquire, when detecting that any user enters the verification system, historical verification information of the user entering the verification system, where the historical verification information records the user entering the verification system in the verification system. The success rate of authentication in authentication mode.
  • a third determining unit 1112 configured to determine that the user entering the verification system is a target user if the success rate is greater than or equal to a second preset threshold; or if the success rate is less than the second preset valve The value determines that the user entering the verification system is a normal user.
  • the data processing apparatus specifically runs the following units in the process of running the output unit 102:
  • the mode setting unit 2001 is configured to set a labeling manner for the target sample data, where the labeling manner includes any one of the following: a selection mode and an input mode.
  • a data output unit 2002 configured to output the target sample data to the at least one target user and output at least one candidate labeled data corresponding to the target sample data, if the labeling manner of the target sample data is a selection manner Selecting to the at least one target user; or for outputting the target sample data to the at least one target user if the labeling manner of the target sample data is an input mode, and displaying an input box to enable the at least one The target user inputs the annotation data corresponding to the target sample data in the input box.
  • the data processing device also operates as follows:
  • a prompting unit 107 configured to output verification failure prompt information to the at least one target user, and switch from the annotation mode to the verification mode, and notify the processing unit to be from the material library in the verification mode Selecting one material data to authenticate the at least one target user.
  • the data processing device also operates as follows:
  • the adding unit 108 is configured to add the target sample data and the labeling result thereof as new material data to the material library.
  • steps involved in the data processing method illustrated in FIG. 2 may be performed by respective units in the data processing apparatus illustrated in FIG. 5.
  • steps S101-S104 shown in FIG. 2 may be performed by the acquisition unit 101, the output unit 102, the acquisition unit 103, and the learning unit 104 shown in FIG. 5, respectively.
  • steps involved in the data processing method shown in FIG. 3 may also be performed by respective units in the data processing apparatus shown in FIG. 5.
  • steps S201-S208 shown in FIG. 3 may be performed by the determination unit 105, the acquisition unit 101, the output unit 102, the acquisition unit 103, the learning unit 104, the processing unit 106, the adding unit 108, and the prompting unit shown in FIG. 5.
  • 107 is performed; wherein, step s11, s12-s13 shown in FIG. 3 can be performed by the first information acquiring unit 1001 and the first determining unit 1002 shown in FIG. 5; step s21, s22-s23 can be represented by The identification obtaining unit 1011 and the second determining unit 1012 shown in FIG.
  • step s31 s32-s33 can be executed by the second information acquiring unit 1111 and the third determining unit 1112 shown in FIG. 5; step s41- S42 can be performed by the mode setting unit 2001 and the data output unit 2002 shown in FIG.
  • each unit in the data processing apparatus shown in FIG. 5 may be separately or entirely combined into one or several other units, or one of the units may be further removed. It is divided into a plurality of units that are functionally smaller, which can achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application.
  • the above units are divided based on logical functions. In practical applications, the functions of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit. In other embodiments of the present application, the data processing apparatus may also include other units. In practical applications, these functions may also be implemented by other units, and may be implemented by multiple units in cooperation.
  • a general-purpose computing device such as a computer that includes processing elements and storage elements such as a central processing unit (CPU), a random access memory device (RAM), a read only memory device (ROM), and the like.
  • CPU central processing unit
  • RAM random access memory
  • ROM read only memory
  • the operation can execute the data processing method as shown in FIG. 2 or FIG. 3 to construct the data processing device device as shown in FIG. 5, and to implement the data processing method according to the embodiment of the present application.
  • the computer program can be recorded, for example, on a computer readable recording medium, and loaded in and run in the above-described computing device by a computer readable recording medium.
  • the embodiment of the present application can exploit the short-term attention of the fragmentation of the Internet user in the verification process, and expand the number of Internet sample data by using the user to realize the labeling of the sample data on a large scale and in a distributed manner; further, the sample data with the labeled result is further It can be used as a verification material for known answers to expand the amount of material data in the Internet authentication system.
  • the embodiment of the present application further provides a network device, which may be a terminal device such as a PC (Personal Computer), a mobile phone, a PDA (tablet computer), or a service device such as an application server or a cluster server.
  • a network device may be a terminal device such as a PC (Personal Computer), a mobile phone, a PDA (tablet computer), or a service device such as an application server or a cluster server.
  • the internal structure of the network device may include, but is not limited to, a processor, a network interface, and a memory.
  • the processor, the network interface, and the memory in the network device may be connected by a bus or other means.
  • a bus connection is taken as an example.
  • the processor (or CPU (Central Processing Unit) is the computing core and control core of the network device.
  • the network interface can optionally include a standard wired interface, a wireless interface (such as WI-FI, a mobile communication interface, etc.).
  • a memory device is a memory device in a network device that stores programs and data. It can be understood that the storage device herein may be a high-speed RAM storage device, or may be a non-volatile memory, such as at least one disk storage device; optionally, at least one is located far from the foregoing.
  • the storage device of the processor may be a high-speed RAM storage device, or may be a non-volatile memory, such as at least one disk storage device; optionally, at least one is located far from the foregoing.
  • the storage device provides a storage space for storing the operating system of the network device, which may include, but is not limited to, a Windows system (an operating system), a Linux (an operating system), and an Android (Android, a mobile operating system).
  • System IOS (a mobile operating system) system, etc., which is not limited in this application; and one or more instructions suitable for being loaded and executed by the processor are also stored in the storage space. It can be one or more computer programs (including program code).
  • the processor loads and executes one or more instructions stored in the storage device to implement the corresponding steps of the method flow shown in FIG. 2 to FIG. 3; in some embodiments, one of the storage devices or More than one instruction is loaded by the processor and the following steps are performed:
  • the verification system includes an annotation mode and a verification mode, and the verification system includes a sample library and a material library;
  • the sample library includes at least one piece of sample data, the sample data including any one of the following: an image, a voice, and a text;
  • the material library includes at least one material data, the material data including any one of the following: an image, a voice, and text;
  • the annotation mode is used to label each sample data in the sample library; the verification mode is used to authenticate a user entering the verification system by using each material data in the material library.
  • the annotation mode of the verification system is started, and the target sample data to be processed in the verification system is acquired in the annotation mode;
  • the verification mode of the verification system is started, and a material data is selected from the material library in the verification mode to authenticate the user entering the verification system.
  • one or more instructions in the storage device are loaded by the processor and executed.
  • the specific execution is as follows: step:
  • the historical annotation information records that the user entering the verification system searches the sample library in the annotation mode of the verification system The frequency at which the sample data in the label is marked;
  • the frequency is greater than or equal to the first preset threshold, it is determined that the user entering the verification system is a normal user.
  • one or more instructions in the storage device are loaded by the processor and executed.
  • the specific execution is as follows: step:
  • the identifier is a preset candidate user identifier, determining that the user entering the verification system is the target user;
  • the identifier is not a preset candidate user identifier, it is determined that the user entering the verification system is a normal user.
  • one or more instructions in the storage device are loaded by the processor and executed.
  • the specific execution is as follows: step:
  • the success rate is less than the second preset threshold, it is determined that the user entering the verification system is a normal user.
  • one or more instructions in the storage device are loaded by the processor and executed to output the target sample data to at least one target user entering the verification system to cause the at least one target user to
  • the following steps are specifically performed:
  • the labeling manner includes any one of the following: a selection manner and an input manner;
  • the labeling manner of the target sample data is an input mode, outputting the target sample data to the at least one target user, and displaying an input box to enable the at least one target user to input the target sample data in the input box Corresponding label data.
  • one or more instructions in the storage device are loaded by the processor and executed to perform learning processing on the at least one annotation data of the target sample data by using a machine learning algorithm to obtain an annotation result of the target sample data. After the steps, the following steps are also performed:
  • the target sample data and its labeling result are added to the material library as new material data.
  • the embodiment of the present application can exploit the short-term attention of the fragmentation of the Internet user in the verification process, and expand the number of Internet sample data by using the user to realize the labeling of the sample data on a large scale and in a distributed manner; further, the sample data with the labeled result is further It can be used as a verification material for known answers to expand the amount of material data in the Internet authentication system.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” or “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开一种数据处理方法、装置、存储设备及网络设备,其中的方法可包括:获取验证系统中待处理的目标样本数据;向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注;采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。本申请能够降低样本数据的标注成本,扩充互联网中样本数据的数量。

Description

数据处理方法、数据处理装置、存储设备及网络设备
本申请要求于2017年05月25日提交中国专利局、申请号为201710378502.1、申请名称为“数据处理方法、数据处理装置、存储设备及网络设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,具体涉及基于机器学习的数据处理技术领域,尤其涉及一种数据处理方法、一种数据处理装置、一种存储设备及一种网络设备。
背景
带标注结果的样本数据如标注了人脸位置的图像数据,或标注了人脸表情的图像数据,或标注了人物年龄的语音数据等是用于机器进行训练学习的对象,是机器学习的基础。随着机器学习技术的发展,基于机器学习的各类互联网系统对样本数据的需求日益增大;例如:随着深度神经网络的层级数量的增加,深度神经网络需要的样本数据的数量可能达到数亿;再如:社交推荐系统为了获得较为精准有效的社交推荐,其可能需要的样本数据的数量甚至可达数千亿。然而,与大需求相矛盾的互联网现状则是普遍缺乏样本数据,造成此矛盾问题的主要原因,是目前样本数据的标注过程由专门的标注人员人工完成,成本较高且使得互联网中的样本数据稀缺匮乏。。
技术内容
本申请实施例提供一种数据处理方法、数据处理装置、存储设备及 网络设备,能够降低样本数据的标注成本,扩充互联网中样本数据的数量。
本申请实施例提供一种数据处理方法,可包括:
获取验证系统中待处理的目标样本数据;
向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注;
采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;
采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
本申请实施例提供一种数据处理装置,可包括:
获取单元,用于获取验证系统中待处理的目标样本数据;
输出单元,用于向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注;
采集单元,用于采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;
学习单元,用于采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
本申请实施例提供一种网络设备,包括:
处理器,适于实现一条或一条以上指令;以及,
存储设备,所述存储设备存储有一条或一条以上指令,所述一条或一条以上指令适于由所述处理器加载并执行本申请实施例所述的数据处理方法。
本申请实施例可挖掘利用互联网用户在验证过程中碎片化的短暂的 注意力,大规模、分散地借助用户实现样本数据的标注,扩充互联网样本数据的数量;进一步,带标注结果的样本数据又可以作为已知答案的验证素材,扩充互联网验证系统中素材数据的数量。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种标注页面的示意图;
图2为本申请实施例提供的一种数据处理方法的流程图;
图3为本申请实施例提供的另一种数据处理方法的流程图;
图4a为本申请实施例提供的另一种标注页面的示意图;
图4b为本申请实施例提供的另一种标注页面的示意图;
图4c为本申请实施例提供的又一种标注页面的示意图;
图5为本申请实施例提供的一种数据处理装置的结构示意图;
图6为本申请实施例提供的一种网络设备的结构示意图。
实施本申请的方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
机器学习是一种涉及多领域交叉学科的技术。主要涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习用于专门研究机器如何模拟或实现人类的学习行为以获取新知识或新技能,重 新组织已有的知识结构使之不断改善自身的性能。此处的机器可指计算机、电子计算机、神经计算机等等设备。目前,机器学习被广泛应用于各类互联网场景中,例如可应用于:数据挖掘场景、计算机视觉场景、自然语言处理场景、神经网络的构建场景、信息推荐场景等等。带标注结果的样本数据是用于机器进行训练学习的对象,是机器学习的基础。例如:标注了人脸位置的图像数据可作为机器学习的样本数据;或者,标注了人脸表情的图像数据可作为机器学习的样本数据;或者,标注了人物年龄的语音数据可作为机器学习的样本数据;等等。随着机器学习技术的发展,基于机器学习的各类互联网系统对样本数据的需求日益增大。例如,随着深度神经网络的层级数量的增加,深度神经网络需要的样本数据的数量可能达到数亿。再如,社交推荐系统为了获得较为精准有效的社交推荐,其可能需要的样本数据的数量甚至可达数千亿。然而,与大需求相矛盾的互联网现状则是普遍缺乏样本数据,这主要表现为两方面。一方面是样本数据的种类匮乏。例如,互联网中可能存在关于人脸位置的样本数据,但是关于人脸性别、人脸年龄、人脸表情、人脸姿势等样本数据却十分缺乏。另一方面是样本数据的数量严重不足。例如,目前存在于互联网中的关于语音、物体、动物、自动驾驶等各类样本数据的量较少。造成此矛盾问题的主要原因,是目前样本数据的标注过程由专门的标注人员人工完成,成本较高且使得互联网中的样本数据稀缺匮乏。
实际上,一个样本数据的一次标注过程需要的是“短暂的注意力”;例如:针对某张照片或者某段语音体现出的情绪指数进行标注时,假设情绪指数设置如下:1、沮丧;2、一般沮丧;3、平和;4、开心;5、很开心。此标注过程仅需要标注人员在该照片或语音持续数秒的注意力即可完成。仔细分析可知,互联网本身即可提供大量这类“短暂的注意 力”,例如:为了保证互联网用户在应用登录、电子商务过程或其他应用场景中的安全性,互联网场景通常设有验证系统,该验证系统要求用户进行诸如输入验证码等身份验证步骤,此类验证步骤需要用户付出“短暂的注意力”认真看验证码图片、认真输入正确结果以便尽快通过验证。基于此,本申请实施例利用互联网所提供的大量“短暂的注意力”,收集多个互联网用户在身份验证过程中通过“短暂的注意力”所产生的较为可靠的输出来实现样本数据的标注,并且基于机器学习获得样本数据的标注结果,大大降低样本数据的标注成本,扩充互联网中样本数据的数量。本申请实施例的主要思想如下:传统的验证系统仅包含验证模式,也就是说用户进入验证系统即在验证模式下执行身份验证的步骤,例如:某用户进入验证系统时,在验证模式下向用户输出验证码图片,要求用户填写并提交正确的验证码以通过验证。本申请实施例的验证系统在验证模式的基础上增加标注模式,例如:用户进入验证系统时,首先在标注模式下完成对样本数据的标注,再切换至验证模式下向用户输出验证码图片要求用户填写并提交正确的验证码以通过验证。
下面结合图1对本申请实施例的方案进行简单阐述。假设要对图1所示人脸照片体现出的情绪指数进行标注,并设置情绪指数如下:1、沮丧;2、一般沮丧;3、平和;4、开心;5、很开心。那么,本申请实施例的方案如下:当互联网中的用户A进入验证系统时,首先在标注模式下对用户A展示该人脸照片及情绪指数选项要求用户A选择。同理,对进入验证系统的互联网的用户B、用户C、用户D等其他用户在标注模式下展示同一张人脸照片及情绪指数要求其进行情绪指数的选择。可以理解的是,用户对情绪指数的选择可能是真实的、随意的,甚至是无意义的。本申请实施例一方面收集所有用户对同一张人脸照片所选择的情绪指数,这些选择数据必然具有一定的分布规律,采用机器学习方法 识别其中的有效数据,最后得到该人脸照片关于情绪指数的标注结果。另一方面在收集到各用户所选择的情绪指数之后,分别向各用户输出如“错误,请重新选择”类似的验证失败提示信息,然后从标注模式切换至验证模式并在验证模式下采用互联网已有的素材数据对各用户进行重新验证,例如向用户输出验证码图片要求用户填写并提交正确的验证码以通过验证。由上述例子可知,本申请实施例可挖掘利用互联网用户在验证过程中碎片化的短暂的注意力,大规模、分散地借助用户实现样本数据的标注,扩充互联网样本数据的数量。进一步,带标注结果的样本数据又可以作为已知答案的验证素材,扩充互联网验证系统中素材数据的数量。
基于上述描述,本申请实施例提供一种数据处理方法,请参见图2,该方法可包括以下步骤S101-步骤S104。
S101,获取验证系统中待处理的目标样本数据。
传统的验证系统是提供身份验证服务的系统。例如:互联网用户进行应用登录、电子商务等过程中会进入验证系统进行身份验证。如输出验证码图片要求用户填写并提交正确验证码以确认为非机器操作,以保证登录或交易安全。本申请实施例的验证系统在提供身份验证服务的同时,还提供样本标注服务。按照上述例子:在互联网用户进行应用登录、电子商务等过程中进入验证系统进行身份验证之前,首先让互联网用户完成对样本数据的标注并收集互联网用户的标注数据,然后再对互联网用户执行如输出验证码图片要求用户填写并提交正确验证码的验证步骤。本申请实施例中,所述验证系统包括标注模式和验证模式,且所述验证系统包含样本库和素材库。所述样本库包括至少一条样本数据,所述样本数据包括以下任一种:图像、语音和文本。所述素材库包括至少一个素材数据,所述素材数据包括以下任一种:图像、语音和文本。所 述标注模式用于对所述样本库中的各样本数据进行标注。样本库中的样本数据为未标注的数据。素材库中的素材数据为带有标注的数据。所述验证模式用于采用所述素材库中的各素材数据对进入验证系统的用户进行身份验证。本步骤中,可从样本库中随机选取一个样本数据作为目标样本数据,或者,根据实际需要从样本库中指定一个样本数据作为目标样本数据。
S102,向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注。
可按照一定的格式输出目标样本数据。例如:可以在输出目标样本数据的同时输出该目标样本数据的一些可选的标注数据,这样目标用户可以通过选择一个标注数据来完成对目标样本数据的标注。或者,可以在输出目标样本数据的同时显示输入框,这样目标用户可以通过手动输入标注数据来完成对目标样本数据的标注。
S103,采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据。
每个目标用户对目标样本数据进行标注会产生标注数据。如图1所示,假设用户A选择“3、平和”,那么“3、平和”即为用户A对人脸照片进行标注产生的标注数据;同理,用户B选择“2、一般沮丧”,那么“2、一般沮丧”为用户B对人脸照片标注产生的标注数据。本步骤S103收集每一个目标用户分别对目标样本数据进行标注所产生的标注数据。
S104,采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
目标样本数据的各标注数据可能是真实的、随意的,甚至是无意义的,但是这些标注数据必然具有一定的分布规律,可以采用机器学习算 法对这些标注数据进行学习处理,得到目标样本数据的标注结果。此处,机器学习算法可以包括但不限于:异常检测算法、协同过滤算法、决策树算法、最优化算法等等。
本申请实施例可挖掘利用互联网用户在验证过程中碎片化的短暂的注意力,大规模、分散地借助用户实现样本数据的标注,扩充互联网样本数据的数量。
本申请实施例提供另一种数据处理方法,请参见图3,该方法可包括以下步骤S201-步骤S208。
S201,当检测到任一用户进入验证系统时,判断进入验证系统的用户是否为目标用户;若判断结果为是,确定进入验证系统的用户为目标用户,进而转入执行S202;若判断结果为否,确定进入验证系统的用户为普通用户,进入转入执行S208。
本申请实施例可以借助于所有互联网用户对样本数据进行标注。但为了提升用户体验,可选取一部分互联网用户作为目标用户来对样本数据进行标注。步骤S201的判断过程即是确定目标用户的过程;一些实施例中,步骤S201至少包括以下三种可行的实施方式:
在一种可行的实施方式中,步骤S201可包括以下步骤s11-s13:
s11,当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的历史标注信息,所述历史标注信息记录了所述进入验证系统的用户在验证系统的标注模式下对所述样本库中的样本数据进行标注的频次。
s12,若所述频次小于预设第一阀值,则确定所述进入验证系统的用户为目标用户。
s13,若所述频次大于或等于第一预设阀值,则确定所述进入验证系统的用户为普通用户。
步骤s11-s13采用的是随机方式确定目标用户,即根据历史标注的频次随机选取目标用户。其中,一个用户的历史标注信息记录了该用户在预定周期内执行标注操作的总次数;比如:某用户A在最近一个小时内分别对样本数据a进行一次标注,对样本数据b进行两次标注,对样本数据c进行一次标注;那么,用户A的历史标注信息记录该用户A的标注频次为4次/小时。一个用户对应一个历史标注信息,各用户的历史标注信息可存储至本地或云端存储空间中,并根据用户的标注操作进行实时更新,因此可以从本地或云端存储空间中获得用户的历史标注信息。第一预设阀值可以根据实际需要进行确定,例如:该第一预设阀值可以为5次/小时,2次/分钟等等。如果某用户的标注频次大于或等于第一预设阀值,表明该用户在预设周期内已经进行过多次样本数据的标注操作,若验证过程中过于频繁要求该用户进行标注操作可能会影响该用户的使用体验,因此可将该用户确定为普通用户,当前预设周期内不再进行标注操作。反之,如果某用户的标注频次小于第一预设阀值,表明该用户在预设周期内进行样本数据的标注操作次数较少,若验证过程中再次要求该用户进行标注操作并不会影响该用户的使用体验,因此可将该用户确定为目标用户进入后续流程执行标注操作。
在另一种可行的实施方式中,该S201可包括以下步骤s21-s23。
s21,当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的标识。
s22,若所述标识为预设的备选用户标识,则确定所述进入验证系统的用户为目标用户。
s23,若所述标识并非预设的备选用户标识,则确定所述进入验证系统的用户为普通用户。
步骤s21-s23采用的是定向方式确定目标用户,即预先选定一些目 标用户。其中,预设的备选用户标识是预先选取的目标用户的标识;此处的标识可以包括但不限于:即时通讯标识(如QQ号、微信号等)、SNS标识(如微博号、博客号等)、通信标识(如手机号、座机号等)、email号等等。如果进入验证系统的用户标识是预设的备选用户标识,则可确定该用户为目标用户;否则确定该用户为普通用户。
在又一种可行的实施方式中,该S201可包括以下步骤s31-s33。
s31,当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的历史验证信息,所述历史验证信息记录所述进入验证系统的用户在验证系统的验证模式下进行身份验证的成功率。
s32,若所述成功率大于或等于第二预设阀值,则确定所述进入验证系统的用户为目标用户。
s33,若所述成功率小于第二预设阀值,则确定所述进入验证系统的用户为普通用户。
步骤s31-s33采用的是筛选方式确定目标用户,即根据历史验证的成功率筛选符合条件的目标用户。其中,一个用户的历史验证信息记录所述进入验证系统的用户在验证系统的验证模式下进行身份验证的成功率;比如:某用户A在以验证码进行身份验证的过程中,第1次至第N-1次所提交的验证码均错误,在第N次提交正确验证码,那么用户A的成功率为1/N(N为正整数)。一个用户对应一个历史验证信息,各用户的历史验证信息可存储至本地或云端存储空间中,并根据用户的验证过程进行实时更新,因此可以从本地或云端存储空间中获得用户的历史验证信息。第二预设阀值可以根据实际需要进行确定,例如:该第二预设阀值可以为1/2,1/3等等。如果某用户的验证成功率大于或等于第二预设阀值,表明该用户通常较为认真的完成验证过程,其提交的数据较为真实且可信度较高,适于将该用户确定为目标用户以进行样本数据的 标注操作。反之如果某用户的验证成功率小于第二预设阀值,表明该用户在验证过程通常表现得较为随意,其提交的数据真实性较低且可信度较低,不适于让该用户来执行样本数据的标注操作,因此可将该用户确定为普通用户。
实际应用中,上述三种实施方式可灵活选择,如果确定目标用户进入验证系统,则转入步骤S202在标注模式下执行后续对目标样本数据的标注过程;如果确定是普通用户进入验证系统,那么转入步骤S208在验证模式下对该普通用户进行身份验证。
S202,启动验证系统的标注模式,并在所述标注模式下获取验证系统中待处理的目标样本数据。
本步骤S202可参见图2所示的步骤S101,在此不赘述。其中,目标样本数据是样本库中的任一个样本数据,其可以为图像,例如:人脸图像、动物图像等等。其也可以是语音,例如:一段人说话的语音、一段歌曲等等。其还可以是文本,例如:一句话、一个单词等等。
S203,向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注。
根据目标样本数据的类型可灵活选择输出方式。例如:若目标样本数据为图像、文本,那么可以通过显示方式进行输出。再如:若目标样本数据为语音,那么可通过喇叭播放方式进行输出。输出目标样本数据的目的在于让目标用户通过感官系统(眼、耳、口、鼻)认知目标样本数据,并通过听说读写完成对目标样本数据的标注。一些实施例中,步骤S203可具体包括以下步骤s41-s43。
s41,为所述目标样本数据设置标注方式,所述标注方式包括以下任一种:选择方式和输入方式。
s42,若所述目标样本数据的标注方式为选择方式,向所述至少一个 目标用户输出所述目标样本数据,并将所述目标样本数据对应的至少一个待选标注数据输出至所述至少一个目标用户进行选择。
s43,若所述目标样本数据的标注方式为输入方式,向所述至少一个目标用户输出所述目标样本数据,并显示输入框以使所述至少一个目标用户在所述输入框输入所述目标样本数据对应的标注数据。
步骤s41-s43定义了目标用户采用何种方式对目标样本数据进行标注;具体地,若目标样本数据的标注方式为选择方式,则在输出目标样本数据的同时显示至少一个待选标注数据,目标用户直接选择一个标注数据即可完成对目标样本数据的标注。此处,至少一个待选标注数据既可以被封装为选项(如图1所示),此时目标用户可点击选项从而选中一个标注数据。至少一个待选标注数据也可以被封装至滑动区域(如图4a所示),此时目标用户通过在滑动区域操作滑块来选中一个标注数据。若目标样本数据的标注方式为输入方式,则在输出目标样本数据的同时显示输入框,由目标用户在输入框中直接输入标注数据完成标注;此处,输入框可以是文本输入框(如图4b所示),也可以是语音输入框(如图4c所示)。
S204,采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;之后,转入执行步骤S205-S206;并且,转入执行步骤S207-S208。
S205,采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
步骤S204-S205可以参见图2所示实施例的步骤S103-S104,在此不赘述。
S206,将所述目标样本数据及其标注结果作为新的素材数据添加至所述素材库中。
S207,向所述至少一个目标用户输出验证失败提示信息,并从所述标注模式切换至所述验证模式。
S208,启动验证系统的验证模式,并在所述验证模式下从所述素材库中选取一个素材数据对所述进入验证系统的用户进行身份验证。
本申请实施例中,步骤S204收集到至少一个目标用户的标注数据之后,会进入两个处理逻辑,其中一个处理逻辑为步骤S205-S206;另一个处理逻辑为步骤S207-S208。一方面,如果步骤S204收集到的标注数据达到预设数量之后,进入步骤S205-S206通过机器学习算法对至少一个标注数据进行学习得到目标样本数据的标注结果。此处的预设数量可根据实际需要进行设定,为了保证准确性,通常可将预设数量设置较大,如几百、几千、几万等,以保证有足够数量的标注数据。至此获得了一个带标注结果的、可用于机器学习训练的样本数据。将该带有标注结果的目标样本数据作为素材数据添加至素材库中,这样既扩充了互联网中用于训练学习的样本数据,又扩充了互联网验证系统中用于身份验证的素材数据。另一方面,步骤S204收集到各目标用户的标注数据之后,会进入步骤S207-S208分别向各目标用户输出诸如“错误,请重新选择”,或者“错误,请重新输入”等验证失败提示信息,以提醒目标用户重新在验证模式下进行身份验证。
本申请实施例可挖掘利用互联网用户在验证过程中碎片化的短暂的注意力,大规模、分散地借助用户实现样本数据的标注,扩充互联网样本数据的数量;进一步,带标注结果的样本数据又可以作为已知答案的验证素材,扩充互联网验证系统中素材数据的数量。基于上述方法实施例的描述,本申请实施例还公开了一种数据处理装置,该数据处理装置可以是一个计算机程序(包括程序代码),且该计算机程序可以运行于终端(如PC(Personal Computer,个人计算机)、手机等)、单个服务器 或集群服务设备等网络设备中以用来执行图2-图3任一实施例所示的数据处理方法。请一并参见图5,该数据处理装置运行如下单元:
获取单元101,用于获取验证系统中待处理的目标样本数据。
输出单元102,用于向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注。
采集单元103,用于采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据。
学习单元104,用于采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
一些实施例中,所述验证系统包括标注模式和验证模式,且所述验证系统包含样本库和素材库;
所述样本库包括至少一条样本数据,所述样本数据包括以下任一种:图像、语音和文本;所述素材库包括至少一个素材数据,所述素材数据包括以下任一种:图像、语音和文本;
所述标注模式用于对所述样本库中的各样本数据进行标注;所述验证模式用于采用所述素材库中的各素材数据对进入验证系统的用户进行身份验证。
一些实施例中,该该数据处理装置还运行如下单元:
判断单元105,用于当检测到任一用户进入验证系统时,判断进入验证系统的用户是否为目标用户。
处理单元106,用于若进入验证系统的用户为目标用户,则启动验证系统的标注模式,并在所述标注模式下通知所述获取单元获取验证系统中待处理的目标样本数据;或者用于若进入验证系统的用户为普通用户,则启动验证系统的验证模式,并在所述验证模式下从所述素材库中 选取一个素材数据对所述进入验证系统的用户进行身份验证。
在一种可行的实施方式中,该数据处理装置在运行所述判断单元105的过程中具体运行如下单元:
第一信息获取单元1001,用于当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的历史标注信息,所述历史标注信息记录了所述进入验证系统的用户在验证系统的标注模式下对所述样本库中的样本数据进行标注的频次。
第一确定单元1002,用于若所述频次小于预设第一阀值,则确定所述进入验证系统的用户为目标用户;或用于若所述频次大于或等于第一预设阀值,则确定所述进入验证系统的用户为普通用户。
在另一种可行的实施方式中,该数据处理装置在运行所述判断单元105的过程中具体运行如下单元:
标识获取单元1011,用于当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的标识。
第二确定单元1012,用于若所述标识为预设的备选用户标识,则确定所述进入验证系统的用户为目标用户;或用于若所述标识并非预设的备选用户标识,则确定所述进入验证系统的用户为普通用户。
在又一种可行的实施方式中,该数据处理装置在运行所述判断单元105的过程中具体运行如下单元:
第二信息获取单元1111,用于当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的历史验证信息,所述历史验证信息记录所述进入验证系统的用户在验证系统的验证模式下进行身份验证的成功率。
第三确定单元1112,用于若所述成功率大于或等于第二预设阀值,则确定所述进入验证系统的用户为目标用户;或用于若所述成功率小于 第二预设阀值,则确定所述进入验证系统的用户为普通用户。
一些实施例中,该数据处理装置在运行所述输出单元102的过程中具体运行如下单元:
方式设置单元2001,用于为所述目标样本数据设置标注方式,所述标注方式包括以下任一种:选择方式和输入方式。
数据输出单元2002,用于若所述目标样本数据的标注方式为选择方式,向所述至少一个目标用户输出所述目标样本数据,并将所述目标样本数据对应的至少一个待选标注数据输出至所述至少一个目标用户进行选择;或用于若所述目标样本数据的标注方式为输入方式,向所述至少一个目标用户输出所述目标样本数据,并显示输入框以使所述至少一个目标用户在所述输入框输入所述目标样本数据对应的标注数据。
一些实施例中,该数据处理装置还运行如下单元:
提示单元107,用于向所述至少一个目标用户输出验证失败提示信息,并从所述标注模式切换至所述验证模式,并通知所述处理单元在所述验证模式下从所述素材库中选取一个素材数据对所述至少一个目标用户进行身份验证。
一些实施例中,该数据处理装置还运行如下单元:
添加单元108,用于将所述目标样本数据及其标注结果作为新的素材数据添加至所述素材库中。
根据本申请的一个实施例,图2所示的数据处理方法涉及的各步骤可以是由图5所示的数据处理装置中的各个单元来执行的。例如,图2中所示的步骤S101-S104可以分别由图5中所示的获取单元101、输出单元102、采集单元103和学习单元104来执行。
根据本申请的另一个实施例,图3所示的数据处理方法涉及的各步骤也可以是由图5所示的数据处理装置中的各个单元来执行的。例如, 图3中所示的步骤S201-S208可以由图5中所示的判断单元105、获取单元101、输出单元102、采集单元103、学习单元104、处理单元106、添加单元108和提示单元107来执行;其中,图3中所示的步骤s11,s12-s13可以由图5中所示的第一信息获取单元1001和第一确定单元1002来执行;步骤s21,s22-s23可以由图5中所示的标识获取单元1011和第二确定单元1012来执行;步骤s31,s32-s33可以由图5中所示的第二信息获取单元1111和第三确定单元1112来执行;步骤s41-s42可以由图5中所示的方式设置单元2001和数据输出单元2002来执行。
根据本申请的再一个实施例,图5所示的数据处理装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,数据处理装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。
根据本申请的又一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储设备(RAM)、只读存储设备(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图2或图3中所示的数据处理方法,来构造如图5中所示的数据处理装置设备,以及来实现根据本申请的实施例的数据处理方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
本申请实施例可挖掘利用互联网用户在验证过程中碎片化的短暂的注意力,大规模、分散地借助用户实现样本数据的标注,扩充互联网样 本数据的数量;进一步,带标注结果的样本数据又可以作为已知答案的验证素材,扩充互联网验证系统中素材数据的数量。
本申请实施例还提供了一种网络设备,该网络设备可以是诸如PC(Personal Computer,个人计算机)、手机、PDA(平板电脑)等终端设备,也可以是诸如应用服务器、集群服务器等服务设备。请参见图6,该网络设备的内部结构可包括但不限于:处理器、网络接口及存储器。其中,网络设备内的处理器、网络接口及存储器可通过总线或其他方式连接,在本申请实施例所示图6中以通过总线连接为例。
其中,处理器(或称CPU(Central Processing Unit,中央处理器))是网络设备的计算核心以及控制核心。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI、移动通信接口等)。存储设备(Memory)是网络设备中的记忆设备,用于存放程序和数据。可以理解的是,此处的存储设备可以是高速RAM存储设备,也可以是非不稳定的存储设备(non-volatile memory),例如至少一个磁盘存储设备;可选的还可以是至少一个位于远离前述处理器的存储装置。存储设备提供存储空间,该存储空间存储了网络设备的操作系统,可包括但不限于:Windows系统(一种操作系统)、Linux(一种操作系统)、Android(安卓,一种移动操作系统)系统、IOS(一种移动操作系统)系统等等,本申请对此并不作限定;并且,在该存储空间中还存放了适于被处理器加载并执行的一条或一条以上的指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。
在本申请实施例中,处理器加载并执行存储设备中存放的一条或一条以上指令,以实现上述图2-图3所示方法流程的相应步骤;一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行如下步骤:
获取验证系统中待处理的目标样本数据;
向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注;
采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;
采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
所述验证系统包括标注模式和验证模式,且所述验证系统包含样本库和素材库;
所述样本库包括至少一条样本数据,所述样本数据包括以下任一种:图像、语音和文本;所述素材库包括至少一个素材数据,所述素材数据包括以下任一种:图像、语音和文本;
所述标注模式用于对所述样本库中的各样本数据进行标注;所述验证模式用于采用所述素材库中的各素材数据对进入验证系统的用户进行身份验证。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述获取验证系统中待处理的目标样本数据的步骤之前,还执行如下步骤:
当检测到任一用户进入验证系统时,判断进入验证系统的用户是否为目标用户;
若进入验证系统的用户为目标用户,则启动验证系统的标注模式,并在所述标注模式下获取验证系统中待处理的目标样本数据;
若进入验证系统的用户为普通用户,则启动验证系统的验证模式,并在所述验证模式下从所述素材库中选取一个素材数据对所述进入验证系统的用户进行身份验证。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述当检测到任一用户进入验证系统时,判断进入验证系统的用户是否为目标用户的步骤时,具体执行如下步骤:
当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的历史标注信息,所述历史标注信息记录了所述进入验证系统的用户在验证系统的标注模式下对所述样本库中的样本数据进行标注的频次;
若所述频次小于预设第一阀值,则确定所述进入验证系统的用户为目标用户;
若所述频次大于或等于第一预设阀值,则确定所述进入验证系统的用户为普通用户。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述当检测到任一用户进入验证系统时,判断进入验证系统的用户是否为目标用户的步骤时,具体执行如下步骤:
当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的标识;
若所述标识为预设的备选用户标识,则确定所述进入验证系统的用户为目标用户;
若所述标识并非预设的备选用户标识,则确定所述进入验证系统的用户为普通用户。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述当检测到任一用户进入验证系统时,判断进入验证系统的用户是否为目标用户的步骤时,具体执行如下步骤:
当检测到任一用户进入验证系统时,获取所述进入验证系统的用户的历史验证信息,所述历史验证信息记录所述进入验证系统的用户在验证系统的验证模式下进行身份验证的成功率;
若所述成功率大于或等于第二预设阀值,则确定所述进入验证系统的用户为目标用户;
若所述成功率小于第二预设阀值,则确定所述进入验证系统的用户为普通用户。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注的步骤时,具体执行如下步骤:
为所述目标样本数据设置标注方式,所述标注方式包括以下任一种:选择方式和输入方式;
若所述目标样本数据的标注方式为选择方式,向所述至少一个目标用户输出所述目标样本数据,并将所述目标样本数据对应的至少一个待选标注数据输出至所述至少一个目标用户进行选择;
若所述目标样本数据的标注方式为输入方式,向所述至少一个目标用户输出所述目标样本数据,并显示输入框以使所述至少一个目标用户在所述输入框输入所述目标样本数据对应的标注数据。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据步骤之后,还执行如下步骤:
向所述至少一个目标用户输出验证失败提示信息,并从所述标注模式切换至所述验证模式;
在所述验证模式下从所述素材库中选取一个素材数据对所述至少一个目标用户进行身份验证。
一些实施例中,存储设备中的一条或一条以上指令由处理器加载并执行所述采用机器学习算法对所述目标样本数据的至少一个标注数据 进行学习处理,获得所述目标样本数据的标注结果的步骤之后,还执行如下步骤:
将所述目标样本数据及其标注结果作为新的素材数据添加至所述素材库中。
本申请实施例可挖掘利用互联网用户在验证过程中碎片化的短暂的注意力,大规模、分散地借助用户实现样本数据的标注,扩充互联网样本数据的数量;进一步,带标注结果的样本数据又可以作为已知答案的验证素材,扩充互联网验证系统中素材数据的数量。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功 能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (20)

  1. 一种数据处理方法,包括:
    获取验证系统中待处理的目标样本数据;
    向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注;
    采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;
    采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
  2. 如权利要求1所述的方法,其中,所述验证系统包含样本库和素材库;
    所述样本库包括至少一条未标注的样本数据,所述样本数据包括以下任一种:图像、语音和文本;
    所述素材库包括至少一个带有标注的素材数据,所述素材数据包括以下任一种:图像、语音和文本。
  3. 如权利要求1所述的方法,进一步包括:
    当检测到一用户进入验证系统时,判断所述用户是否为目标用户;
    若所述用户为目标用户,则执行所述获取目标样本数据、向所述目标用户输出所述目标样本数据的步骤;
    若所述用户为普通用户,从所述素材库中选取一个素材数据对所述用户进行身份验证。
  4. 如权利要求3所述的方法,其中,判断所述用户是否为目标用户包括:
    获取所述用户的历史标注信息,所述历史标注信息记录了所述进入 验证系统的用户在验证系统的标注模式下对所述样本库中的样本数据进行标注的频次;
    若所述频次小于预设的第一阀值,则确定所述用户为目标用户;
    若所述频次大于或等于所述第一阀值,则确定所述用户为普通用户。
  5. 如权利要求3所述的方法,其中,判断所述用户是否为目标用户包括:
    获取所述用户的标识;
    若所述标识为预设的备选用户标识,则确定所述进入验证系统的用户为目标用户;
    若所述标识并非预设的备选用户标识,则确定所述进入验证系统的用户为普通用户。
  6. 如权利要求3所述的方法,其中,判断所述用户是否为目标用户包括:
    获取所述用户的历史验证信息,所述历史验证信息记录所述进入验证系统的用户在验证系统的验证模式下进行身份验证的成功率;
    若所述成功率大于或等于第二预设阀值,则确定所述进入验证系统的用户为目标用户;
    若所述成功率小于第二预设阀值,则确定所述进入验证系统的用户为普通用户。
  7. 如权利要求2-6任一项所述的方法,其中,向进入所述验证系统的所述至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注,包括:
    为所述目标样本数据设置标注方式,所述标注方式包括以下任一种:选择方式和输入方式;
    若所述目标样本数据的标注方式为选择方式,向所述至少一个目标用户输出所述目标样本数据,并将所述目标样本数据对应的至少一个待选标注数据输出至所述至少一个目标用户进行选择;
    若所述目标样本数据的标注方式为输入方式,向所述至少一个目标用户输出所述目标样本数据,并显示输入框以使所述至少一个目标用户在所述输入框输入所述目标样本数据对应的标注数据。
  8. 如权利要求2所述的方法,其中,进一步包括:
    向所述至少一个目标用户输出验证失败提示信息;
    从所述素材库中选取一个素材数据对所述至少一个目标用户进行身份验证。
  9. 如权利要求2所述的方法,进一步包括:
    将所述目标样本数据及其标注结果作为新的素材数据添加至所述素材库中。
  10. 一种数据处理装置,包括:处理器和存储器,所述存储器中存储有机器可读指令,可以使所述处理器:
    获取验证系统中待处理的目标样本数据;
    向进入所述验证系统的至少一个目标用户输出所述目标样本数据,以使所述至少一个目标用户对所述目标样本数据进行标注;
    采集所述至少一个目标用户对所述目标样本数据进行标注所产生的至少一个标注数据;
    采用机器学习算法对所述目标样本数据的至少一个标注数据进行学习处理,获得所述目标样本数据的标注结果。
  11. 如权利要求10所述的装置,其中,所述指令可以使所述处理器:
    当检测到一用户进入验证系统时,判断所述用户是否为目标用户;
    若所述用户为目标用户,则从预设的样本库获取目标样本数据,并向所述目标用户输出所述目标样本数据;所述样本库包括至少一条未标注的样本数据,所述样本数据包括以下任一种:图像、语音和文本。
  12. 如权利要求11所述的装置,其中,所述指令可以使所述处理器:
    若所述用户为普通用户,则从预设的素材库中选取一个素材数据对所述进入验证系统的用户进行身份验证;所述素材库包括至少一个带有标注的素材数据,所述素材数据包括以下任一种:图像、语音和文本。
  13. 如权利要求10所述的装置,其中,所述指令可以使所述处理器:
    向所述至少一个目标用户输出验证失败提示信息;
    从预设的素材库中选取一个素材数据对所述至少一个目标用户进行身份验证;所述素材库包括至少一个带有标注的素材数据,所述素材数据包括以下任一种:图像、语音和文本。
  14. 如权利要求10所述的装置,其中,所述指令可以使所述处理器:
    将所述目标样本数据及其标注结果作为新的素材数据添加至素材库中;所述素材库包括至少一个带有标注的素材数据,所述素材数据包括以下任一种:图像、语音和文本。
  15. 如权利要求11所述的装置,其中,所述指令可以使所述处理器:
    获取所述用户的历史标注信息,所述历史标注信息记录了所述进入 验证系统的用户在验证系统的标注模式下对所述样本库中的样本数据进行标注的频次;
    若所述频次小于预设第一阀值,则确定所述进入验证系统的用户为目标用户;
    若所述频次大于或等于第一预设阀值,则确定所述进入验证系统的用户为普通用户。
  16. 如权利要求11所述的装置,其中,所述指令可以使所述处理器:
    获取所述用户的标识;
    若所述标识为预设的备选用户标识,则确定所述进入验证系统的用户为目标用户;
    若所述标识并非预设的备选用户标识,则确定所述进入验证系统的用户为普通用户。
  17. 如权利要求11所述的装置,其中,所述指令可以使所述处理器:
    获取所述用户的历史验证信息,所述历史验证信息记录所述进入验证系统的用户在验证系统的验证模式下进行身份验证的成功率;
    若所述成功率大于或等于第二预设阀值,则确定所述进入验证系统的用户为目标用户;
    若所述成功率小于第二预设阀值,则确定所述进入验证系统的用户为普通用户。
  18. 如权利要求10所述的装置,其中,所述指令可以使所述处理器:
    获取为所述目标样本数据设置的标注方式,所述标注方式包括以下任一种:选择方式和输入方式;
    若所述目标样本数据的标注方式为选择方式,向所述至少一个目标用户输出所述目标样本数据,并将所述目标样本数据对应的至少一个待选标注数据输出至所述至少一个目标用户进行选择;
    若所述目标样本数据的标注方式为输入方式,向所述至少一个目标用户输出所述目标样本数据,并显示输入框以使所述至少一个目标用户在所述输入框输入所述目标样本数据对应的标注数据。
  19. 一种存储设备,其特征在于,所述存储设备存储有一条或一条以上指令,所述一条或一条以上指令适于由处理器加载并执行如权利要求1-9任一项所述的数据处理方法。
  20. 一种网络设备,其特征在于,包括:
    处理器,适于实现一条或一条以上指令;以及,
    存储设备,所述存储设备存储有一条或一条以上指令,所述一条或一条以上指令适于由所述处理器加载并执行如权利要求1-9任一项所述的数据处理方法。
PCT/CN2018/087961 2017-05-25 2018-05-23 数据处理方法、数据处理装置、存储设备及网络设备 WO2018214895A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710378502.1 2017-05-25
CN201710378502.1A CN107256428B (zh) 2017-05-25 2017-05-25 数据处理方法、数据处理装置、存储设备及网络设备

Publications (1)

Publication Number Publication Date
WO2018214895A1 true WO2018214895A1 (zh) 2018-11-29

Family

ID=60028034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/087961 WO2018214895A1 (zh) 2017-05-25 2018-05-23 数据处理方法、数据处理装置、存储设备及网络设备

Country Status (2)

Country Link
CN (1) CN107256428B (zh)
WO (1) WO2018214895A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256428B (zh) * 2017-05-25 2022-11-18 腾讯科技(深圳)有限公司 数据处理方法、数据处理装置、存储设备及网络设备
CN109697537A (zh) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 数据审核的方法和装置
CN108154197B (zh) * 2018-01-22 2022-03-15 腾讯科技(深圳)有限公司 实现虚拟场景中图像标注验证的方法及装置
JP6760317B2 (ja) * 2018-03-14 2020-09-23 オムロン株式会社 学習支援装置
CN108537129B (zh) * 2018-03-14 2021-01-08 北京影谱科技股份有限公司 训练样本的标注方法、装置和系统
CN110163376B (zh) * 2018-06-04 2023-11-03 腾讯科技(深圳)有限公司 样本检测方法、媒体对象的识别方法、装置、终端及介质
CN109325213B (zh) * 2018-09-30 2023-11-28 北京字节跳动网络技术有限公司 用于标注数据的方法和装置
CN109376868B (zh) * 2018-09-30 2021-06-25 北京字节跳动网络技术有限公司 信息管理系统
CN109993315B (zh) * 2019-03-29 2021-05-18 联想(北京)有限公司 一种数据处理方法、装置及电子设备
CN110516558B (zh) * 2019-08-01 2022-04-22 仲恺农业工程学院 样本数据获取方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637172A (zh) * 2011-02-10 2012-08-15 北京百度网讯科技有限公司 网页分块标注方法与系统
CN103150454A (zh) * 2013-03-27 2013-06-12 山东大学 基于样本推荐标注的动态机器学习建模方法
CN103514369A (zh) * 2013-09-18 2014-01-15 上海交通大学 一种基于主动学习的回归分析系统及方法
CN103530321A (zh) * 2013-09-18 2014-01-22 上海交通大学 一种基于机器学习的排序系统
CN107256428A (zh) * 2017-05-25 2017-10-17 腾讯科技(深圳)有限公司 数据处理方法、数据处理装置、存储设备及网络设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110313779A1 (en) * 2010-06-17 2011-12-22 Microsoft Corporation Augmentation and correction of location based data through user feedback
CN102867025A (zh) * 2012-08-23 2013-01-09 百度在线网络技术(北京)有限公司 一种获取图片标注数据的方法和装置
US9489373B2 (en) * 2013-07-12 2016-11-08 Microsoft Technology Licensing, Llc Interactive segment extraction in computer-human interactive learning
CN103824053B (zh) * 2014-02-17 2018-02-02 北京旷视科技有限公司 一种人脸图像的性别标注方法及人脸性别检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637172A (zh) * 2011-02-10 2012-08-15 北京百度网讯科技有限公司 网页分块标注方法与系统
CN103150454A (zh) * 2013-03-27 2013-06-12 山东大学 基于样本推荐标注的动态机器学习建模方法
CN103514369A (zh) * 2013-09-18 2014-01-15 上海交通大学 一种基于主动学习的回归分析系统及方法
CN103530321A (zh) * 2013-09-18 2014-01-22 上海交通大学 一种基于机器学习的排序系统
CN107256428A (zh) * 2017-05-25 2017-10-17 腾讯科技(深圳)有限公司 数据处理方法、数据处理装置、存储设备及网络设备

Also Published As

Publication number Publication date
CN107256428A (zh) 2017-10-17
CN107256428B (zh) 2022-11-18

Similar Documents

Publication Publication Date Title
WO2018214895A1 (zh) 数据处理方法、数据处理装置、存储设备及网络设备
US11062270B2 (en) Generating enriched action items
US11809829B2 (en) Virtual assistant for generating personalized responses within a communication session
US11568231B2 (en) Waypoint detection for a contact center analysis system
US11699039B2 (en) Virtual assistant providing enhanced communication session services
US9892414B1 (en) Method, medium, and system for responding to customer requests with state tracking
US11144560B2 (en) Utilizing unsumbitted user input data for improved task performance
US9722965B2 (en) Smartphone indicator for conversation nonproductivity
US11816609B2 (en) Intelligent task completion detection at a computing device
CN110610698B (zh) 一种语音标注方法及装置
WO2018153316A1 (zh) 获取文本提取模型的方法及装置
US10706371B2 (en) Data processing techniques
WO2023124215A1 (zh) 用户问题的标注方法及装置
CN105677636A (zh) 智能问答系统的信息处理方法及信息处理装置
US8938405B2 (en) Classifying activity using probabilistic models
CN112614559A (zh) 病历文本处理方法、装置、计算机设备和存储介质
US20230385778A1 (en) Meeting thread builder
US20230419270A1 (en) Meeting attendance prompt
CN109408175B (zh) 通用高性能深度学习计算引擎中的实时交互方法及系统
WO2021159756A1 (zh) 基于多模态的响应义务检测方法、系统及装置
US11558471B1 (en) Multimedia content differentiation
KR20210009885A (ko) 오프라인 오브젝트에 관한 콘텐츠 자동 생성 방법, 장치 및 컴퓨터 판독가능 저장 매체
CN112784034A (zh) 摘要生成方法、装置及计算机设备
JP7425194B2 (ja) 実世界のオーディオ訓練データの自動化されたマイニング
US20230252809A1 (en) Systems and methods for dynamically providing notary sessions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18806450

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18806450

Country of ref document: EP

Kind code of ref document: A1