CN109840274A - Data processing method and device, storage medium - Google Patents

Data processing method and device, storage medium Download PDF

Info

Publication number
CN109840274A
CN109840274A CN201811618506.3A CN201811618506A CN109840274A CN 109840274 A CN109840274 A CN 109840274A CN 201811618506 A CN201811618506 A CN 201811618506A CN 109840274 A CN109840274 A CN 109840274A
Authority
CN
China
Prior art keywords
dimension
pending data
dimensional characteristics
data
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811618506.3A
Other languages
Chinese (zh)
Other versions
CN109840274B (en
Inventor
刘佳祥
万星
白林楠
张傲
王经委
李芝
孙宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811618506.3A priority Critical patent/CN109840274B/en
Publication of CN109840274A publication Critical patent/CN109840274A/en
Application granted granted Critical
Publication of CN109840274B publication Critical patent/CN109840274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of data processing method and device, storage medium.This method comprises: carrying out feature identifying processing to pending data according to preset dimension, the dimensional characteristics of the pending data are obtained;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;To be labeled to the pending data according to each dimensional characteristics identified.Technical solution provided by the present invention can reduce mark cost, improve processing capacity.

Description

Data processing method and device, storage medium
Technical field
The present invention relates to technical field of data processing more particularly to a kind of data processing method and devices, storage medium.
Background technique
With the continuous development of artificial intelligence technology, intelligent robot is used in more and more scenes, all to cope with use The enquirement at family.
Generally machine learning model is trained by the method for machine learning in the prior art, to realize response. And before the training of machine learning model, need high quality, have the input data of specific intent features.Currently, meaning The work of icon note is generally manually implemented by maintenance personnel, and notation methods are generally flush system, it is, artificial by maintenance personnel Identification is semantic, and marks the intention of its expression.
As soon as flush system mask method has tens of kinds of intentions only for a type of service, and with the expansion of business, it is intended that Quantity is probably expanded into thousands of, substantially increases mark cost, processing capacity is lower.
Summary of the invention
The present invention provides a kind of data processing method and device, storage medium, to reduce mark cost, improves processing energy Power.
In a first aspect, the present invention provides a kind of data processing method, comprising:
Feature identifying processing is carried out to pending data according to preset dimension, the dimension for obtaining the pending data is special Sign;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
According to each dimensional characteristics identified, the pending data is labeled.
Second aspect, the present invention provide a kind of data processing equipment, comprising:
Identification module obtains described wait locate for carrying out feature identifying processing to pending data according to preset dimension Manage the dimensional characteristics of data;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
Labeling module, for being labeled to the pending data according to each dimensional characteristics identified.
The third aspect, the present invention provide a kind of data processing equipment, comprising:
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality Now method as described in relation to the first aspect.
Fourth aspect, the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, the meter Calculation machine program is executed by processor to realize method as described in relation to the first aspect.
Data processing method provided by the invention and device, storage medium, these three dimensions of subordinate act, field and purpose go out Hair carries out feature identification to pending data, obtains its dimensional characteristics.Thus, can for single pending data It avoids maintenance personnel's subjective judgement from being intended to the adverse effect for being intended to annotation results, also, is marked compared in the form of short sentence It is intended to, this programme only needs to carry out its behavior, field and purpose the simple mark that marks and can be realized to intention, can effectively drop Low mark quantity reduces the cost of labor and time cost of annotation process, effectively improves mark processing capacity.
Also, requirement of this notation methods to system is lower, without compressing professional ability, has higher flexibility, In the case where including more data content in particular for pending data, this programme can be obtained by the combination of dimensional characteristics To a variety of intentions, this is equivalent to the mark quantity greatly reduced the number dimensionality reduction of intention on the whole, can be effectively reduced mark Number is infused, the processing capacity of annotation process is obviously improved.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow diagram of data processing method provided by the embodiment of the present invention;
Fig. 2 is the flow diagram of another kind data processing method provided by the embodiment of the present invention;
Fig. 3 is the flow diagram of another kind data processing method provided by the embodiment of the present invention;
Fig. 4 is a kind of functional block diagram of data processing equipment provided by the embodiment of the present invention;
Fig. 5 is a kind of entity structure schematic diagram of data processing equipment provided by the embodiment of the present invention.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The specific application scenarios of the present invention are for the process of data preprocessing in intelligent response system.
In this scenario, usually primary data is made in advance by maintenance personnel to be intended to mark, and is marked using these Intention realize response, to promote the reply accuracy rate of answering system.And current intention notation methods are flush system, also It is, it is semantic by maintenance personnel's manual identified, and mark the intention of its expression.
For example, the pending data for if desired carrying out being intended to mark is " I think query flows packet how much is also remained " Its corresponding intention is then gone out by artificial judgment are as follows: query flows remaining sum ";If pending data is that " I wants to see my flow packet How much is " its corresponding intention is then gone out by artificial judgment are as follows: query flows contract price lattice ".And in this business of flow packet In the range of covering, in addition to " remaining sum " " price " can be abstracted as intention there are also numerous information, all business are expanded to When upper, it is intended that quantity will growth at double.
It is, existing intention notation methods not only result in mark cost increase, and, it is also possible to it can be due to introducing More data are to influence the training result of machine learning model.
Data processing method provided by the invention, it is intended to solve the technical problem as above of the prior art, and propose following solution Certainly thinking: designing a kind of intent classifier system for belonging to conversational system, come solve mark, the classification problem in engineering practice, keep away Exempt from that the ability of operation system is caused to keep in check due to carrying out disastrous effect to mark band without classification of the limitation without logic.Specifically For, in the mark system, three subordinate act, field, purpose dimensions are set out, and are come with the dimensional characteristics combination of three dimensions real Treatment effeciency now is improved so that mark number is effectively reduced to the mark of different intentions.
How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Embodiment one
The embodiment of the invention provides a kind of data processing methods.Referring to FIG. 1, this method comprises the following steps:
S102 carries out feature identifying processing to pending data according to preset dimension, obtains the pending data Dimensional characteristics;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension.
Specifically, pending data can be the friendship of user's input in human-computer interaction in this specific application scenarios Mutual data.
S104 is labeled the pending data according to each dimensional characteristics identified.
In order to make it easy to understand, hereinafter, being illustrated to dimension involved by the embodiment of the present invention.
Firstly, behavior (act) dimension is for characterizing the requested processing mode of the pending data.
Specifically, behavior dimensional characteristics involved by the embodiment of the present invention can include but is not limited to following at least one Kind: it handles, cancel, changing and inquiry.
For example, if pending data is user's enquirement " inquiry set meal residue situation " under man-machine interaction scenarios, In, user's request is " inquiry " for the processing mode of set meal residue situation, then " inquiring " can be used as the pending data and be expert at For the dimensional characteristics (hereinafter referred to as behavior dimensional characteristics) of dimension.In another example if pending data is under man-machine interaction scenarios User puts question to " handling flow packet ", wherein user's request is " handling " for the processing mode of set meal residue situation, then " handles " It can be used as the behavior dimensional characteristics of the pending data.
There can also be other behavior dimensional characteristics in addition to aforementioned behavior dimensional characteristics, in the embodiment of the present invention, can wrap It includes but is not limited to following at least one: supplementing with money, apply, order, quit the subscription of, seek advice from.
Secondly, field dimension is used to characterize the business fields that the pending data requests processing.Specifically, The field dimension can include but is not limited to: business scope.
Specifically, by taking the human-computer interaction scene between common carrier and user as an example, in this scenario, business scope It can be specific type of service.For example, business scope can include but is not limited to: flow packet, set meal, rate or broadband Deng.
For example, if pending data is user's enquirement " inquiry set meal residue situation " under man-machine interaction scenarios, In, user requests the business scope being directed to are as follows: " set meal ", then " set meal " can be used as the pending data in the dimension of field dimension Feature (hereinafter referred to as field dimensional characteristics).In another example " being handled if pending data is putd question to for the user under man-machine interaction scenarios Flow packet ", wherein the business of user's request is flow packet related service, then " flow packet " can be used as the neck of the pending data Domain dimensional characteristics.
And purpose dimension is used to characterize the relevant information for the business that the pending data requests processing.
Specifically, purpose dimension involved by the embodiment of the present invention can include but is not limited to following at least one: Service attribute, Business Name, business residue situation and business Expenditure Levels.Wherein, service attribute can include but is not limited to: industry Business price, introduction to business message etc..
For example, if pending data is user's enquirement " inquiry set meal residue situation " under man-machine interaction scenarios, In, it is " remaining situation " that user, which requests the information of the set meal of processing, is existed it is, " remaining situation " can be used as the pending data The dimensional characteristics (hereinafter referred to as purpose dimensional characteristics) of purpose dimension.In another example if pending data is under man-machine interaction scenarios User put question to " handling flow packet ", wherein user request business be flow packet transacting business, at this time, it is only necessary to machine Side executes and handles process, therefore, the purpose dimensional characteristics of the pending data can be recorded as to empty or designated character.
By illustrating above it is found that in human-computer interaction scene, purpose dimensional characteristics are " asking for " in behavior dimensional characteristics Just significant when type, when behavior dimensional characteristics are " giving " type, purpose dimensional characteristics can not then consider or not record.Its In, so-called " asking for " type behavior dimensional characteristics refer to that, by human-computer interaction, custom system gets desired letter from machine side Breath.For example, " what I wanted to see my flow packet is how much? " this interactive process is desirable to get feedback from machine side It answers.And so-called " giving " type behavior dimensional characteristics refer to, by human-computer interaction, cooperate the answer of machine side.For example, " system: Do you want the air ticket where to set out bought? user: Beijing ".
It is set out by three subordinate act, field, purpose dimensions, data processing method provided by the embodiment of the present invention only needs Its dimensional characteristics is recorded respectively under these three dimensions respectively, by the acquisition and mark to dimensional characteristics, it can be achieved that being directed to The accurate mark that user is intended to.
In order to make it easy to understand, please referring to table 1, wherein flush system mask method in the prior art is shown in table 1, with And the data processing method that subordinate act, field provided by the embodiment of the present invention is also shown in table 1, three dimensions of purpose are set out.
Table 1
By the comparisons of 5 groups of data in table 1 it is found that using any one group of data in Fig. 5 as pending data in the case where, Existing flush system is intended to need maintenance personnel to judge in labelling schemes and mark these manually to be intended to phrase, and the present invention is implemented Example only needs simply to record three dimensional characteristics, this not only can avoid maintenance personnel's subjective judgement and is intended to mark intention As a result adverse effect, also, be intended to compared to being marked in the form of short sentence, this programme is only needed to its behavior, field and purpose The simple mark that marks and can be realized to intention is carried out, mark quantity is can be effectively reduced, reduces the cost of labor of annotation process And time cost, effectively improve mark processing capacity.
In addition, the situation larger for the data volume of pending data, scheme provided in an embodiment of the present invention has more excellent Different mark performance.Specifically, if having 5 " behaviors ", 20 " field ", 40 " purpose " in pending data, according to Existing flush system mask method, it will there is 5*20*40=3000 intention to need to mark.In contrast, this programme then only needs 5+20+40=65 dimensional characteristics are marked, compared to existing flush system mask method, marking cost is reduced 97%.
It is, existing flush system mask method would generally select to compress under same manpower, process capability constraint Business system, running-down of business ability;However, in data processing method provided in an embodiment of the present invention, it can be in manpower, engineering In the case that ability is constant, realize that professional ability covering quantity significantly improves or even double.In other words, institute of the embodiment of the present invention The technical solution of offer can obtain a variety of intentions to the combination only by dimensional characteristics, this, which is equivalent to, drops the number of intention Dimension, greatly reduces mark quantity on the whole, can be effectively reduced mark number, be obviously improved the processing energy of annotation process Power.Moreover,
Based on previous designs, the embodiment of the present invention further provides the specific implementation of These steps.
Specifically, can be with reference to mode shown in Fig. 2 for the special identification processing procedure of dimension in S102:
S1022 carries out semantics recognition to the pending data, obtains the semantic feature of the pending data.
Before executing semantics recognition, pending data can also be pre-processed in advance, and then for pretreatment Pending data afterwards executes semantics recognition processing.Specifically, pretreatment can include but is not limited to following at least one: point Word processing, voice convert word processing, etc..
In addition, the embodiment of the present invention is not particularly limited semantics recognition algorithm, in specific implementation, short sentence class can be passed through Identify class algorithm, alternatively, long sentence class recognizer, alternatively, model class recognizer is realized.
S1024 carries out semantic matches to each semantic feature respectively, obtains the pending data according to preset dimension Dimensional characteristics.
In the step, the semantic similarity of each semantic feature Yu preset dimensional characteristics can be obtained respectively, and be based on language Adopted similarity is compared with preset similarity threshold.For any semantic feature and any dimensional characteristics, if therebetween Semantic similarity reach similarity threshold, then can determine that the dimensional characteristics match with semantic feature, can be by the dimensional characteristics A dimensional characteristics as the pending data.Conversely, if semantic similarity not up to (is lower than) similarity threshold, really It is both fixed to mismatch, then the semantic feature can be abandoned, next round judgement is carried out.
In addition, during specific implementation, in order to further increase treatment effeciency, aforementioned semanteme can be executed in this step Before matching process, each semantic feature that S1022 is got is screened, by the semanteme unrelated with behavior, field and purpose Feature screens out, and to reduce the data volume of subsequent match process, improves treatment effeciency.
Based on aforementioned any implementation, each dimensional characteristics in pending data can be obtained.
The embodiment of the present invention also furthermore presents the notation methods for pending data.
Refer to it should be noted that " being labeled to pending data " involved by the embodiment of the present invention, establishes it Incidence relation between mark and pending data, it is not limited to directly modify in pending data to realize mark A kind of this implementation, can also be by storing pending data with labeled data respectively, and establishes association between the two The mode of relationship is realized.
It, can be directly using each dimensional characteristics identified in S102 step, directly to be processed when being specifically labeled Data are labeled;Alternatively, can also be filtered to the multiple dimensional characteristics identified in S102 step, to obtain each dimension Target dimension feature, and using each dimension target dimension feature, the pending data is labeled.
Wherein, it if being labeled in a manner of filtering, can be filtered by the artificial matched mode of basis, or Person can also be filtered according to preset algorithm.
In a kind of possible realization scene, Fig. 3 can be referred to, which includes:
S1042 exports multiple dimensional characteristics of pending data.
It, can be according to the affiliated dimension of each dimensional characteristics successively when executing the output of each dimensional characteristics in a kind of implementation Output, alternatively, subfield exports, the difference filter type of this dimensional characteristics, so that user is in multiple dimensional characteristics of output Selection it is more convenient clear, it is higher using friendliness.
S1044 obtains the operation information that user is directed to the multiple dimensional characteristics.
In actual application scenarios, user herein can be maintenance personnel, or participate in the user side of human-computer interaction.
S1046 obtains dimensional characteristics indicated by the operation information, using as target dimension feature.
It is, obtain selection of the user in multiple dimensional characteristics of output, the dimensional characteristics that user is selected as Target dimension feature
S1048 is labeled the pending data using target dimension feature.
Implementation as shown in Figure 3 has given the final decision of selection target dimensional characteristics to user, as user Side can according to need selection and itself be intended by the immediate dimensional characteristics of the meaning;It, then can be with and as operator side Subjective judgement is combined on the basis of this programme objective screens out, and is further filtered out closest to user side and is intended by the meaning Dimensional characteristics.It is, being conducive to improve connecing between dimensional characteristics and true intention by this further interactive mode In other words short range degree is conducive to the accuracy rate for improving dimensional characteristics.
Further, it is also possible to realize filtering by way of default filter algorithm.
For example, can be realized and be filtered based on the frequency of occurrence of each dimensional characteristics in a possible realization scene. For example, can be ranked up for the frequency of occurrence of multiple dimensional characteristics in each dimension, and obtain wherein sort it is forward One or more partial dimensional features are using as target dimension feature;Alternatively, obtaining frequency of occurrence is greater than preset times threshold value Partial dimensional feature, using as target dimension feature.
In addition, the corresponding target dimension feature of each dimension can also further be arranged in aforementioned any filtering scene Quantity threshold, it is, guaranteeing that the number of the target dimension feature of any dimension is less than or equal to the quantity threshold of the dimension. This implementation has made further limitation to the number of the target dimension feature finally marked, and this limitation mode is conducive to Mark number is controlled within a preset range.
Specifically, the quantity threshold of each dimension can be identical, alternatively, can be different, it is set as needed.Example Such as, in a kind of possible design, the quantity threshold of behavior dimensional characteristics may be set to 5, and the quantity threshold of field dimensional characteristics can It is set as 20.
In addition, foregoing filtration step can also lead in addition to separately through the mode of artificial selection or preset algorithm filtering The association schemes for crossing the two are realized.
For example, can first pass through preset algorithm in a realization scene and be filtered to each dimensional characteristics, so that after filtering Each dimension dimensional characteristics all within the scope of the corresponding quantity threshold of the dimension, and then execute it is shown in Fig. 3 output and after Continuous step, realizes the mark to pending data.In this implementation, the candidate item of S1042 output is in aforesaid number threshold It is worth in range, user only needs to select respectively in three dimensions, that is, can determine that target dimension feature.
When realizing the mark to pending data with aforementioned any implementation, can directly utilize each dimensional characteristics (or Target dimension feature) be labeled, alternatively, can also by the corresponding identifier of each dimensional characteristics (or target dimension feature) into Rower note.
Still shown in Fig. 3 for method, in another possible design, S1044 method can also be real as follows It is existing:
Obtain the corresponding identifier of each target dimension feature;
It is described to utilize the target dimension feature, the pending data is labeled.
Wherein, identifier can be number there are many form, alternatively, it is alphabetical, alternatively, character string etc..It is tieed up with behavior It spends for feature, " can will handle " corresponding identifier and be set as C, and set D for " cancellation " corresponding identifier, it will " more Changing " corresponding identifier is set as U, R is set by " inquiry " corresponding identifier.
In addition, the embodiment of the present invention is not particularly limited the corresponding relationship between each target dimension feature and identifier, The two can correspond, alternatively, can also one-to-many (or many-one) mode it is corresponding.Still by taking aforementioned behavioural characteristic as an example, C It can correspond to multiple behavior dimensional characteristics, such as: handling, supplement with money, apply and order;Similarly, D may correspond to multiple behaviors Dimensional characteristics: cancelling and quits the subscription of;R can correspond to multiple behavior dimensional characteristics: inquiry and consulting;And U still can correspond to a row For dimensional characteristics: change.
By any one mode as above, may be implemented to mark for the intention of pending data, and notation methods are simple Feasible, mark number can be controlled rationally, have higher scalability and flexibility.
In order to illustrate more clearly of this programme, the embodiment of the present invention gives the feasible application mode of at least the following two kinds:
The first, can be by the aforementioned session operational scenarios being intended to apply in human-computer interaction scene marked.Specifically, should Method can also include the following steps:
According to each dimensional characteristics marked in the pending data, determine that the interaction of the pending data is intended to;
Obtain reply data corresponding with the interaction intention;
Export the reply data.
This is conducive to improve the dialogue efficiency in human-computer interaction scene, improves the reply efficiency of machine side.
It second, can be by the aforementioned answer model Training scene being intended to apply in human-computer interaction scene marked.Tool For body, this method can also include the following steps:
Utilize the pending data for having marked each dimensional characteristics, the man-machine alternate acknowledge model of training.
It is, will mark the pending datas of each dimensional characteristics as the primary data of human-computer interaction answer model or Initial sample carrys out the training of finishing man-machine interaction answer model.In the realization scene, this programme can effectively be extracted wait locate The dimensional characteristics in data are managed, the model accuracy rate for improving human-computer interaction answer model is conducive to.
It is understood that step or operation are only example, the embodiment of the present application some or all of in above-described embodiment The deformation of other operations or various operations can also be performed.In addition, each step can be presented not according to above-described embodiment With sequence execute, and it is possible to do not really want to execute all operationss in above-described embodiment.
Technical solution provided by the embodiment of the present invention at least has following technical effect:
Data processing method provided by the invention and device, storage medium, these three dimensions of subordinate act, field and purpose go out Hair carries out feature identification to pending data, obtains its dimensional characteristics.Thus, can for single pending data It avoids maintenance personnel's subjective judgement from being intended to the adverse effect for being intended to annotation results, also, is marked compared in the form of short sentence It is intended to, this programme only needs to carry out its behavior, field and purpose the simple mark that marks and can be realized to intention, can effectively drop Low mark quantity reduces the cost of labor and time cost of annotation process, effectively improves mark processing capacity.
Also, requirement of this notation methods to system is lower, without compressing professional ability, has higher flexibility, In the case where including more data content in particular for pending data, this programme can be obtained by the combination of dimensional characteristics To a variety of intentions, this is equivalent to the mark quantity greatly reduced the number dimensionality reduction of intention on the whole, can be effectively reduced mark Number is infused, the processing capacity of annotation process is obviously improved.
Embodiment two
Data processing method provided by one based on the above embodiment, the embodiment of the present invention, which further provides, realizes above-mentioned side The Installation practice of each step and method in method embodiment.
The embodiment of the invention provides a kind of data processing equipments, referring to FIG. 4, the data processing equipment 400, comprising:
Identification module 41, for carrying out feature identifying processing to pending data according to preset dimension, obtain it is described to Handle the dimensional characteristics of data;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
Labeling module 42, for being labeled to the pending data according to each dimensional characteristics identified.
In the embodiment of the present invention, the behavior dimension is for characterizing the requested processing mode of the pending data;
Wherein, behavior dimensional characteristics include following at least one: handling, cancel, changing and inquiry.
In the embodiment of the present invention, the field dimension is used to characterize the pending data and requests belonging to the business of processing Field;
Wherein, the field dimension includes: business scope.
In the embodiment of the present invention, the purpose dimension is used to characterize the phase for the business that the pending data requests processing Close information;
Wherein, the purpose dimension includes following at least one: service attribute, Business Name, business residue situation and industry Business Expenditure Levels.
In a kind of possible design, identification module 41 is specifically used for:
Semantics recognition is carried out to the pending data, obtains the semantic feature of the pending data;
According to preset dimension, semantic matches are carried out to each semantic feature respectively, obtain the dimension of the pending data Feature.
In a kind of possible design, labeling module 42 can be used for:
Multiple dimensional characteristics of the pending data are filtered, the target dimension for obtaining the pending data is special Sign;
Using the target dimension feature, the pending data is labeled.
In another possible design, labeling module 42 can be specifically used for:
Export multiple dimensional characteristics of the pending data;
Obtain the operation information that user is directed to the multiple dimensional characteristics;
Dimensional characteristics indicated by the operation information are obtained, using as the target dimension feature.
In another possible design, the number of the target dimension feature of any dimension is less than or equal to the number of the dimension Mesh threshold value.
In another possible design, labeling module 42 is specifically used for:
Obtain the corresponding identifier of each target dimension feature;
The identifier is labeled in the pending data.
In addition, in another possible design, the data processing equipment 400 further include:
Determining module (Fig. 4 is not shown), described in determining according to each dimensional characteristics marked in the pending data The interaction of pending data is intended to;
It obtains module (Fig. 4 is not shown), for obtaining reply data corresponding with the interaction intention;
Output module (Fig. 4 is not shown), for exporting the reply data.
In addition, in another possible design, the data processing equipment 400 further include:
Training module (Fig. 4 is not shown), for utilizing the pending data for having marked each dimensional characteristics, training human-computer interaction Answer model.
In addition, the division of the modules in data processing equipment 400 shown in Fig. 4 is only a kind of drawing for logic function Point, it can completely or partially be integrated on a physical entity in actual implementation, it can also be physically separate.And these modules can All to be realized by way of processing element calls with software;It can also all realize in the form of hardware;It can also part Module realizes that part of module passes through formal implementation of hardware with software by way of processing element calls.Furthermore these modules It completely or partially can integrate together, can also independently realize.Processing element described here can be a kind of integrated circuit, Processing capacity with signal.During realization, each step of the above method or the above modules can pass through processor The integrated logic circuit of hardware in element or the instruction of software form are completed.
Also, the embodiment of the invention provides a kind of data processing equipments, referring to FIG. 5, the data processing equipment 500, Include:
Memory 510;
Processor 520;And
Computer program;
Wherein, computer program is stored in memory 510, and is configured as being executed by processor 520 to realize as above State data processing method described in a kind of any implementation of embodiment.
In data processing equipment 500, the number of processor 520 can also claim for one or more, processor 520 For processing unit, certain control function may be implemented.The processor 520 can be general processor or application specific processor Deng.The number of memory 510 can be one or more, have instruction or intermediate data on memory 510, described instruction can It is run on the processor 510, so that data processing equipment 500 executes method described in above method embodiment.It can Selection of land can also be stored with other related datas in the memory.
In addition, as shown in figure 5, be additionally provided with transceiver 530 in the data processing equipment 500, it is used for and other equipment Carry out data transmission or communicate, details are not described herein.
As shown in figure 5, memory 510, processor 520 are connected and communicated with transceiver 530 by bus.
In addition, data processing equipment involved by the embodiment of the present invention can be one in a kind of possible design A part of independent equipment either larger equipment.For example, larger equipment can be man-machine interactive server or client.
In addition, it is stored thereon with computer program the embodiment of the invention provides a kind of readable storage medium storing program for executing, the computer Program is executed by processor to realize the method as described in any implementation of embodiment one.
Method shown in embodiment one is able to carry out as each module in this present embodiment, what the present embodiment was not described in detail Part can refer to the related description to embodiment one.
Technical solution provided by the embodiment of the present invention at least has following technical effect:
Data processing method provided by the invention and device, storage medium, these three dimensions of subordinate act, field and purpose go out Hair carries out feature identification to pending data, obtains its dimensional characteristics.Thus, can for single pending data It avoids maintenance personnel's subjective judgement from being intended to the adverse effect for being intended to annotation results, also, is marked compared in the form of short sentence It is intended to, this programme only needs to carry out its behavior, field and purpose the simple mark that marks and can be realized to intention, can effectively drop Low mark quantity reduces the cost of labor and time cost of annotation process, effectively improves mark processing capacity.
Also, requirement of this notation methods to system is lower, without compressing professional ability, has higher flexibility, In the case where including more data content in particular for pending data, this programme can be obtained by the combination of dimensional characteristics To a variety of intentions, this is equivalent to the mark quantity greatly reduced the number dimensionality reduction of intention on the whole, can be effectively reduced mark Number is infused, the processing capacity of annotation process is obviously improved.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claims are pointed out.

Claims (14)

1. a kind of data processing method characterized by comprising
Feature identifying processing is carried out to pending data according to preset dimension, obtains the dimensional characteristics of the pending data; Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
According to each dimensional characteristics identified, the pending data is labeled.
2. the method according to claim 1, wherein the behavior dimension is for characterizing the pending data institute The processing mode of request;
Wherein, behavior dimensional characteristics include following at least one: handling, cancel, changing and inquiry.
3. the method according to claim 1, wherein the field dimension is for characterizing the pending data institute Request the business fields of processing;
Wherein, the field dimension includes: business scope.
4. the method according to claim 1, wherein the purpose dimension is for characterizing the pending data institute Request the relevant information of the business of processing;
Wherein, the purpose dimension includes following at least one: service attribute, Business Name, business residue situation and business disappear Consume situation.
5. method according to claim 1-4, which is characterized in that it is described according to preset dimension to number to be processed According to feature identifying processing is carried out, the dimensional characteristics of the pending data are obtained, comprising:
Semantics recognition is carried out to the pending data, obtains the semantic feature of the pending data;
According to preset dimension, semantic matches are carried out to each semantic feature respectively, obtain the dimensional characteristics of the pending data.
6. method according to claim 1-4, which is characterized in that each dimensional characteristics that the basis identifies, The pending data is labeled, comprising:
Multiple dimensional characteristics of the pending data are filtered, the target dimension feature of the pending data is obtained;
Using the target dimension feature, the pending data is labeled.
7. according to the method described in claim 6, it is characterized in that, multiple dimensional characteristics to the pending data into Row filtering, obtains the target dimension feature of the pending data, comprising:
Export multiple dimensional characteristics of the pending data;
Obtain the operation information that user is directed to the multiple dimensional characteristics;
Dimensional characteristics indicated by the operation information are obtained, using as the target dimension feature.
8. according to the method described in claim 6, it is characterized in that, the number of the target dimension feature of any dimension be less than or Equal to the quantity threshold of the dimension.
9. according to the method described in claim 6, it is characterized in that, described utilize the target dimension feature, to described wait locate Reason data are labeled, comprising:
Obtain the corresponding identifier of each target dimension feature;
Using the identifier, the pending data is labeled.
10. method according to claim 1-4, which is characterized in that the method also includes:
According to each dimensional characteristics marked in the pending data, determine that the interaction of the pending data is intended to;
Obtain reply data corresponding with the interaction intention;
Export the reply data.
11. method according to claim 1-4, which is characterized in that the method also includes:
Utilize the pending data for having marked each dimensional characteristics, the man-machine alternate acknowledge model of training.
12. a kind of data processing equipment characterized by comprising
Identification module obtains the number to be processed for carrying out feature identifying processing to pending data according to preset dimension According to dimensional characteristics;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
Labeling module, for being labeled to the pending data according to each dimensional characteristics identified.
13. a kind of data processing equipment characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as The described in any item methods of claim 1-11.
14. a kind of computer readable storage medium, which is characterized in that it is stored thereon with computer program,
The computer program is executed by processor to realize such as the described in any item methods of claim 1-11.
CN201811618506.3A 2018-12-28 2018-12-28 Data processing method and device and storage medium Active CN109840274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811618506.3A CN109840274B (en) 2018-12-28 2018-12-28 Data processing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811618506.3A CN109840274B (en) 2018-12-28 2018-12-28 Data processing method and device and storage medium

Publications (2)

Publication Number Publication Date
CN109840274A true CN109840274A (en) 2019-06-04
CN109840274B CN109840274B (en) 2021-11-30

Family

ID=66883437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811618506.3A Active CN109840274B (en) 2018-12-28 2018-12-28 Data processing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN109840274B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001002995A1 (en) * 1999-06-30 2001-01-11 Silverbrook Research Pty Ltd Method and system for navigating a history list
CN101650797A (en) * 2009-09-14 2010-02-17 中国科学院计算技术研究所 Movable service knowledge base system and working method thereof
US20110010374A1 (en) * 2008-06-26 2011-01-13 Alibaba Group Holding Limited Filtering Information Using Targeted Filtering Schemes
US20130216100A1 (en) * 2010-10-29 2013-08-22 Andrew Litvin Object identification using sparse spectral components
CN105550295A (en) * 2015-12-10 2016-05-04 小米科技有限责任公司 Classification model optimization method and classification model optimization apparatus
CN105630827A (en) * 2014-11-05 2016-06-01 阿里巴巴集团控股有限公司 Information processing method and system, and auxiliary system
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN107506775A (en) * 2016-06-14 2017-12-22 北京陌上花科技有限公司 model training method and device
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN108363693A (en) * 2018-02-13 2018-08-03 上海智臻智能网络科技股份有限公司 Text handling method and device
CN108898162A (en) * 2018-06-08 2018-11-27 东软集团股份有限公司 A kind of data mask method, device, equipment and computer readable storage medium
US20180341801A1 (en) * 2016-01-18 2018-11-29 Alibaba Group Holding Limited Feature data processing method and device
CN108959327A (en) * 2017-05-27 2018-12-07 中国移动通信有限公司研究院 A kind of method for processing business, device and computer readable storage medium
CN109036466A (en) * 2018-08-01 2018-12-18 太原理工大学 The emotion dimension PAD prediction technique of Emotional Speech identification
CN109086351A (en) * 2018-07-17 2018-12-25 北京光年无限科技有限公司 A kind of method and user tag system obtaining user tag

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001002995A1 (en) * 1999-06-30 2001-01-11 Silverbrook Research Pty Ltd Method and system for navigating a history list
US20110010374A1 (en) * 2008-06-26 2011-01-13 Alibaba Group Holding Limited Filtering Information Using Targeted Filtering Schemes
CN101650797A (en) * 2009-09-14 2010-02-17 中国科学院计算技术研究所 Movable service knowledge base system and working method thereof
US20130216100A1 (en) * 2010-10-29 2013-08-22 Andrew Litvin Object identification using sparse spectral components
CN105630827A (en) * 2014-11-05 2016-06-01 阿里巴巴集团控股有限公司 Information processing method and system, and auxiliary system
CN105550295A (en) * 2015-12-10 2016-05-04 小米科技有限责任公司 Classification model optimization method and classification model optimization apparatus
US20180341801A1 (en) * 2016-01-18 2018-11-29 Alibaba Group Holding Limited Feature data processing method and device
CN107506775A (en) * 2016-06-14 2017-12-22 北京陌上花科技有限公司 model training method and device
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN108959327A (en) * 2017-05-27 2018-12-07 中国移动通信有限公司研究院 A kind of method for processing business, device and computer readable storage medium
CN108363693A (en) * 2018-02-13 2018-08-03 上海智臻智能网络科技股份有限公司 Text handling method and device
CN108898162A (en) * 2018-06-08 2018-11-27 东软集团股份有限公司 A kind of data mask method, device, equipment and computer readable storage medium
CN109086351A (en) * 2018-07-17 2018-12-25 北京光年无限科技有限公司 A kind of method and user tag system obtaining user tag
CN109036466A (en) * 2018-08-01 2018-12-18 太原理工大学 The emotion dimension PAD prediction technique of Emotional Speech identification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘楠楠: "文本分类中特征降维算法的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN109840274B (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN109101545A (en) Natural language processing method, apparatus, equipment and medium based on human-computer interaction
CN109739939A (en) The data fusion method and device of knowledge mapping
CN107766511A (en) Intelligent answer method, terminal and storage medium
CN107563417A (en) A kind of deep learning artificial intelligence model method for building up and system
CN109325040B (en) FAQ question-answer library generalization method, device and equipment
CN109992763A (en) Language marks processing method, system, electronic equipment and computer-readable medium
CN109376847A (en) User's intension recognizing method, device, terminal and computer readable storage medium
CN109885664A (en) A kind of Intelligent dialogue method, robot conversational system, server and storage medium
CN107301229A (en) Feedback assigning method and system based on semantic analysis
CN107958059B (en) Intelligent question answering method, device, terminal and computer readable storage medium
CN108416032A (en) A kind of file classification method, device and storage medium
CN110110049A (en) Service consultation method, apparatus, system, service robot and storage medium
CN109783624A (en) Answer generation method, device and the intelligent conversational system in knowledge based library
CN110188210A (en) One kind is based on figure regularization and the independent cross-module state data retrieval method of mode and system
CN109885823A (en) A kind of distributed semantic recognition methods of financial industry and system and device
CN109657792A (en) Construct the method, apparatus and computer-readable medium of neural network
CA3153056A1 (en) Intelligently questioning and answering method, device, computer, equipment and storage medium
CN109871527A (en) A kind of method for recognizing semantics based on participle
CN110032736A (en) A kind of text analyzing method, apparatus and storage medium
CN108628908A (en) The method, apparatus and electronic equipment of sorted users challenge-response boundary
CN113627194B (en) Information extraction method and device, and communication message classification method and device
CN109976725A (en) A kind of process program development approach and device based on lightweight flow engine
CN114282513A (en) Text semantic similarity matching method and system, intelligent terminal and storage medium
CN110489740A (en) Semantic analytic method and Related product
CN109840274A (en) Data processing method and device, storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant