CN109840274A - Data processing method and device, storage medium - Google Patents
Data processing method and device, storage medium Download PDFInfo
- Publication number
- CN109840274A CN109840274A CN201811618506.3A CN201811618506A CN109840274A CN 109840274 A CN109840274 A CN 109840274A CN 201811618506 A CN201811618506 A CN 201811618506A CN 109840274 A CN109840274 A CN 109840274A
- Authority
- CN
- China
- Prior art keywords
- dimension
- pending data
- dimensional characteristics
- data
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention provides a kind of data processing method and device, storage medium.This method comprises: carrying out feature identifying processing to pending data according to preset dimension, the dimensional characteristics of the pending data are obtained;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;To be labeled to the pending data according to each dimensional characteristics identified.Technical solution provided by the present invention can reduce mark cost, improve processing capacity.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of data processing method and devices, storage medium.
Background technique
With the continuous development of artificial intelligence technology, intelligent robot is used in more and more scenes, all to cope with use
The enquirement at family.
Generally machine learning model is trained by the method for machine learning in the prior art, to realize response.
And before the training of machine learning model, need high quality, have the input data of specific intent features.Currently, meaning
The work of icon note is generally manually implemented by maintenance personnel, and notation methods are generally flush system, it is, artificial by maintenance personnel
Identification is semantic, and marks the intention of its expression.
As soon as flush system mask method has tens of kinds of intentions only for a type of service, and with the expansion of business, it is intended that
Quantity is probably expanded into thousands of, substantially increases mark cost, processing capacity is lower.
Summary of the invention
The present invention provides a kind of data processing method and device, storage medium, to reduce mark cost, improves processing energy
Power.
In a first aspect, the present invention provides a kind of data processing method, comprising:
Feature identifying processing is carried out to pending data according to preset dimension, the dimension for obtaining the pending data is special
Sign;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
According to each dimensional characteristics identified, the pending data is labeled.
Second aspect, the present invention provide a kind of data processing equipment, comprising:
Identification module obtains described wait locate for carrying out feature identifying processing to pending data according to preset dimension
Manage the dimensional characteristics of data;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
Labeling module, for being labeled to the pending data according to each dimensional characteristics identified.
The third aspect, the present invention provide a kind of data processing equipment, comprising:
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality
Now method as described in relation to the first aspect.
Fourth aspect, the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, the meter
Calculation machine program is executed by processor to realize method as described in relation to the first aspect.
Data processing method provided by the invention and device, storage medium, these three dimensions of subordinate act, field and purpose go out
Hair carries out feature identification to pending data, obtains its dimensional characteristics.Thus, can for single pending data
It avoids maintenance personnel's subjective judgement from being intended to the adverse effect for being intended to annotation results, also, is marked compared in the form of short sentence
It is intended to, this programme only needs to carry out its behavior, field and purpose the simple mark that marks and can be realized to intention, can effectively drop
Low mark quantity reduces the cost of labor and time cost of annotation process, effectively improves mark processing capacity.
Also, requirement of this notation methods to system is lower, without compressing professional ability, has higher flexibility,
In the case where including more data content in particular for pending data, this programme can be obtained by the combination of dimensional characteristics
To a variety of intentions, this is equivalent to the mark quantity greatly reduced the number dimensionality reduction of intention on the whole, can be effectively reduced mark
Number is infused, the processing capacity of annotation process is obviously improved.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow diagram of data processing method provided by the embodiment of the present invention;
Fig. 2 is the flow diagram of another kind data processing method provided by the embodiment of the present invention;
Fig. 3 is the flow diagram of another kind data processing method provided by the embodiment of the present invention;
Fig. 4 is a kind of functional block diagram of data processing equipment provided by the embodiment of the present invention;
Fig. 5 is a kind of entity structure schematic diagram of data processing equipment provided by the embodiment of the present invention.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings
It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments
Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The specific application scenarios of the present invention are for the process of data preprocessing in intelligent response system.
In this scenario, usually primary data is made in advance by maintenance personnel to be intended to mark, and is marked using these
Intention realize response, to promote the reply accuracy rate of answering system.And current intention notation methods are flush system, also
It is, it is semantic by maintenance personnel's manual identified, and mark the intention of its expression.
For example, the pending data for if desired carrying out being intended to mark is " I think query flows packet how much is also remained "
Its corresponding intention is then gone out by artificial judgment are as follows: query flows remaining sum ";If pending data is that " I wants to see my flow packet
How much is " its corresponding intention is then gone out by artificial judgment are as follows: query flows contract price lattice ".And in this business of flow packet
In the range of covering, in addition to " remaining sum " " price " can be abstracted as intention there are also numerous information, all business are expanded to
When upper, it is intended that quantity will growth at double.
It is, existing intention notation methods not only result in mark cost increase, and, it is also possible to it can be due to introducing
More data are to influence the training result of machine learning model.
Data processing method provided by the invention, it is intended to solve the technical problem as above of the prior art, and propose following solution
Certainly thinking: designing a kind of intent classifier system for belonging to conversational system, come solve mark, the classification problem in engineering practice, keep away
Exempt from that the ability of operation system is caused to keep in check due to carrying out disastrous effect to mark band without classification of the limitation without logic.Specifically
For, in the mark system, three subordinate act, field, purpose dimensions are set out, and are come with the dimensional characteristics combination of three dimensions real
Treatment effeciency now is improved so that mark number is effectively reduced to the mark of different intentions.
How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned
Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept
Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Embodiment one
The embodiment of the invention provides a kind of data processing methods.Referring to FIG. 1, this method comprises the following steps:
S102 carries out feature identifying processing to pending data according to preset dimension, obtains the pending data
Dimensional characteristics;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension.
Specifically, pending data can be the friendship of user's input in human-computer interaction in this specific application scenarios
Mutual data.
S104 is labeled the pending data according to each dimensional characteristics identified.
In order to make it easy to understand, hereinafter, being illustrated to dimension involved by the embodiment of the present invention.
Firstly, behavior (act) dimension is for characterizing the requested processing mode of the pending data.
Specifically, behavior dimensional characteristics involved by the embodiment of the present invention can include but is not limited to following at least one
Kind: it handles, cancel, changing and inquiry.
For example, if pending data is user's enquirement " inquiry set meal residue situation " under man-machine interaction scenarios,
In, user's request is " inquiry " for the processing mode of set meal residue situation, then " inquiring " can be used as the pending data and be expert at
For the dimensional characteristics (hereinafter referred to as behavior dimensional characteristics) of dimension.In another example if pending data is under man-machine interaction scenarios
User puts question to " handling flow packet ", wherein user's request is " handling " for the processing mode of set meal residue situation, then " handles "
It can be used as the behavior dimensional characteristics of the pending data.
There can also be other behavior dimensional characteristics in addition to aforementioned behavior dimensional characteristics, in the embodiment of the present invention, can wrap
It includes but is not limited to following at least one: supplementing with money, apply, order, quit the subscription of, seek advice from.
Secondly, field dimension is used to characterize the business fields that the pending data requests processing.Specifically,
The field dimension can include but is not limited to: business scope.
Specifically, by taking the human-computer interaction scene between common carrier and user as an example, in this scenario, business scope
It can be specific type of service.For example, business scope can include but is not limited to: flow packet, set meal, rate or broadband
Deng.
For example, if pending data is user's enquirement " inquiry set meal residue situation " under man-machine interaction scenarios,
In, user requests the business scope being directed to are as follows: " set meal ", then " set meal " can be used as the pending data in the dimension of field dimension
Feature (hereinafter referred to as field dimensional characteristics).In another example " being handled if pending data is putd question to for the user under man-machine interaction scenarios
Flow packet ", wherein the business of user's request is flow packet related service, then " flow packet " can be used as the neck of the pending data
Domain dimensional characteristics.
And purpose dimension is used to characterize the relevant information for the business that the pending data requests processing.
Specifically, purpose dimension involved by the embodiment of the present invention can include but is not limited to following at least one:
Service attribute, Business Name, business residue situation and business Expenditure Levels.Wherein, service attribute can include but is not limited to: industry
Business price, introduction to business message etc..
For example, if pending data is user's enquirement " inquiry set meal residue situation " under man-machine interaction scenarios,
In, it is " remaining situation " that user, which requests the information of the set meal of processing, is existed it is, " remaining situation " can be used as the pending data
The dimensional characteristics (hereinafter referred to as purpose dimensional characteristics) of purpose dimension.In another example if pending data is under man-machine interaction scenarios
User put question to " handling flow packet ", wherein user request business be flow packet transacting business, at this time, it is only necessary to machine
Side executes and handles process, therefore, the purpose dimensional characteristics of the pending data can be recorded as to empty or designated character.
By illustrating above it is found that in human-computer interaction scene, purpose dimensional characteristics are " asking for " in behavior dimensional characteristics
Just significant when type, when behavior dimensional characteristics are " giving " type, purpose dimensional characteristics can not then consider or not record.Its
In, so-called " asking for " type behavior dimensional characteristics refer to that, by human-computer interaction, custom system gets desired letter from machine side
Breath.For example, " what I wanted to see my flow packet is how much? " this interactive process is desirable to get feedback from machine side
It answers.And so-called " giving " type behavior dimensional characteristics refer to, by human-computer interaction, cooperate the answer of machine side.For example, " system:
Do you want the air ticket where to set out bought? user: Beijing ".
It is set out by three subordinate act, field, purpose dimensions, data processing method provided by the embodiment of the present invention only needs
Its dimensional characteristics is recorded respectively under these three dimensions respectively, by the acquisition and mark to dimensional characteristics, it can be achieved that being directed to
The accurate mark that user is intended to.
In order to make it easy to understand, please referring to table 1, wherein flush system mask method in the prior art is shown in table 1, with
And the data processing method that subordinate act, field provided by the embodiment of the present invention is also shown in table 1, three dimensions of purpose are set out.
Table 1
By the comparisons of 5 groups of data in table 1 it is found that using any one group of data in Fig. 5 as pending data in the case where,
Existing flush system is intended to need maintenance personnel to judge in labelling schemes and mark these manually to be intended to phrase, and the present invention is implemented
Example only needs simply to record three dimensional characteristics, this not only can avoid maintenance personnel's subjective judgement and is intended to mark intention
As a result adverse effect, also, be intended to compared to being marked in the form of short sentence, this programme is only needed to its behavior, field and purpose
The simple mark that marks and can be realized to intention is carried out, mark quantity is can be effectively reduced, reduces the cost of labor of annotation process
And time cost, effectively improve mark processing capacity.
In addition, the situation larger for the data volume of pending data, scheme provided in an embodiment of the present invention has more excellent
Different mark performance.Specifically, if having 5 " behaviors ", 20 " field ", 40 " purpose " in pending data, according to
Existing flush system mask method, it will there is 5*20*40=3000 intention to need to mark.In contrast, this programme then only needs
5+20+40=65 dimensional characteristics are marked, compared to existing flush system mask method, marking cost is reduced
97%.
It is, existing flush system mask method would generally select to compress under same manpower, process capability constraint
Business system, running-down of business ability;However, in data processing method provided in an embodiment of the present invention, it can be in manpower, engineering
In the case that ability is constant, realize that professional ability covering quantity significantly improves or even double.In other words, institute of the embodiment of the present invention
The technical solution of offer can obtain a variety of intentions to the combination only by dimensional characteristics, this, which is equivalent to, drops the number of intention
Dimension, greatly reduces mark quantity on the whole, can be effectively reduced mark number, be obviously improved the processing energy of annotation process
Power.Moreover,
Based on previous designs, the embodiment of the present invention further provides the specific implementation of These steps.
Specifically, can be with reference to mode shown in Fig. 2 for the special identification processing procedure of dimension in S102:
S1022 carries out semantics recognition to the pending data, obtains the semantic feature of the pending data.
Before executing semantics recognition, pending data can also be pre-processed in advance, and then for pretreatment
Pending data afterwards executes semantics recognition processing.Specifically, pretreatment can include but is not limited to following at least one: point
Word processing, voice convert word processing, etc..
In addition, the embodiment of the present invention is not particularly limited semantics recognition algorithm, in specific implementation, short sentence class can be passed through
Identify class algorithm, alternatively, long sentence class recognizer, alternatively, model class recognizer is realized.
S1024 carries out semantic matches to each semantic feature respectively, obtains the pending data according to preset dimension
Dimensional characteristics.
In the step, the semantic similarity of each semantic feature Yu preset dimensional characteristics can be obtained respectively, and be based on language
Adopted similarity is compared with preset similarity threshold.For any semantic feature and any dimensional characteristics, if therebetween
Semantic similarity reach similarity threshold, then can determine that the dimensional characteristics match with semantic feature, can be by the dimensional characteristics
A dimensional characteristics as the pending data.Conversely, if semantic similarity not up to (is lower than) similarity threshold, really
It is both fixed to mismatch, then the semantic feature can be abandoned, next round judgement is carried out.
In addition, during specific implementation, in order to further increase treatment effeciency, aforementioned semanteme can be executed in this step
Before matching process, each semantic feature that S1022 is got is screened, by the semanteme unrelated with behavior, field and purpose
Feature screens out, and to reduce the data volume of subsequent match process, improves treatment effeciency.
Based on aforementioned any implementation, each dimensional characteristics in pending data can be obtained.
The embodiment of the present invention also furthermore presents the notation methods for pending data.
Refer to it should be noted that " being labeled to pending data " involved by the embodiment of the present invention, establishes it
Incidence relation between mark and pending data, it is not limited to directly modify in pending data to realize mark
A kind of this implementation, can also be by storing pending data with labeled data respectively, and establishes association between the two
The mode of relationship is realized.
It, can be directly using each dimensional characteristics identified in S102 step, directly to be processed when being specifically labeled
Data are labeled;Alternatively, can also be filtered to the multiple dimensional characteristics identified in S102 step, to obtain each dimension
Target dimension feature, and using each dimension target dimension feature, the pending data is labeled.
Wherein, it if being labeled in a manner of filtering, can be filtered by the artificial matched mode of basis, or
Person can also be filtered according to preset algorithm.
In a kind of possible realization scene, Fig. 3 can be referred to, which includes:
S1042 exports multiple dimensional characteristics of pending data.
It, can be according to the affiliated dimension of each dimensional characteristics successively when executing the output of each dimensional characteristics in a kind of implementation
Output, alternatively, subfield exports, the difference filter type of this dimensional characteristics, so that user is in multiple dimensional characteristics of output
Selection it is more convenient clear, it is higher using friendliness.
S1044 obtains the operation information that user is directed to the multiple dimensional characteristics.
In actual application scenarios, user herein can be maintenance personnel, or participate in the user side of human-computer interaction.
S1046 obtains dimensional characteristics indicated by the operation information, using as target dimension feature.
It is, obtain selection of the user in multiple dimensional characteristics of output, the dimensional characteristics that user is selected as
Target dimension feature
S1048 is labeled the pending data using target dimension feature.
Implementation as shown in Figure 3 has given the final decision of selection target dimensional characteristics to user, as user
Side can according to need selection and itself be intended by the immediate dimensional characteristics of the meaning;It, then can be with and as operator side
Subjective judgement is combined on the basis of this programme objective screens out, and is further filtered out closest to user side and is intended by the meaning
Dimensional characteristics.It is, being conducive to improve connecing between dimensional characteristics and true intention by this further interactive mode
In other words short range degree is conducive to the accuracy rate for improving dimensional characteristics.
Further, it is also possible to realize filtering by way of default filter algorithm.
For example, can be realized and be filtered based on the frequency of occurrence of each dimensional characteristics in a possible realization scene.
For example, can be ranked up for the frequency of occurrence of multiple dimensional characteristics in each dimension, and obtain wherein sort it is forward
One or more partial dimensional features are using as target dimension feature;Alternatively, obtaining frequency of occurrence is greater than preset times threshold value
Partial dimensional feature, using as target dimension feature.
In addition, the corresponding target dimension feature of each dimension can also further be arranged in aforementioned any filtering scene
Quantity threshold, it is, guaranteeing that the number of the target dimension feature of any dimension is less than or equal to the quantity threshold of the dimension.
This implementation has made further limitation to the number of the target dimension feature finally marked, and this limitation mode is conducive to
Mark number is controlled within a preset range.
Specifically, the quantity threshold of each dimension can be identical, alternatively, can be different, it is set as needed.Example
Such as, in a kind of possible design, the quantity threshold of behavior dimensional characteristics may be set to 5, and the quantity threshold of field dimensional characteristics can
It is set as 20.
In addition, foregoing filtration step can also lead in addition to separately through the mode of artificial selection or preset algorithm filtering
The association schemes for crossing the two are realized.
For example, can first pass through preset algorithm in a realization scene and be filtered to each dimensional characteristics, so that after filtering
Each dimension dimensional characteristics all within the scope of the corresponding quantity threshold of the dimension, and then execute it is shown in Fig. 3 output and after
Continuous step, realizes the mark to pending data.In this implementation, the candidate item of S1042 output is in aforesaid number threshold
It is worth in range, user only needs to select respectively in three dimensions, that is, can determine that target dimension feature.
When realizing the mark to pending data with aforementioned any implementation, can directly utilize each dimensional characteristics (or
Target dimension feature) be labeled, alternatively, can also by the corresponding identifier of each dimensional characteristics (or target dimension feature) into
Rower note.
Still shown in Fig. 3 for method, in another possible design, S1044 method can also be real as follows
It is existing:
Obtain the corresponding identifier of each target dimension feature;
It is described to utilize the target dimension feature, the pending data is labeled.
Wherein, identifier can be number there are many form, alternatively, it is alphabetical, alternatively, character string etc..It is tieed up with behavior
It spends for feature, " can will handle " corresponding identifier and be set as C, and set D for " cancellation " corresponding identifier, it will " more
Changing " corresponding identifier is set as U, R is set by " inquiry " corresponding identifier.
In addition, the embodiment of the present invention is not particularly limited the corresponding relationship between each target dimension feature and identifier,
The two can correspond, alternatively, can also one-to-many (or many-one) mode it is corresponding.Still by taking aforementioned behavioural characteristic as an example, C
It can correspond to multiple behavior dimensional characteristics, such as: handling, supplement with money, apply and order;Similarly, D may correspond to multiple behaviors
Dimensional characteristics: cancelling and quits the subscription of;R can correspond to multiple behavior dimensional characteristics: inquiry and consulting;And U still can correspond to a row
For dimensional characteristics: change.
By any one mode as above, may be implemented to mark for the intention of pending data, and notation methods are simple
Feasible, mark number can be controlled rationally, have higher scalability and flexibility.
In order to illustrate more clearly of this programme, the embodiment of the present invention gives the feasible application mode of at least the following two kinds:
The first, can be by the aforementioned session operational scenarios being intended to apply in human-computer interaction scene marked.Specifically, should
Method can also include the following steps:
According to each dimensional characteristics marked in the pending data, determine that the interaction of the pending data is intended to;
Obtain reply data corresponding with the interaction intention;
Export the reply data.
This is conducive to improve the dialogue efficiency in human-computer interaction scene, improves the reply efficiency of machine side.
It second, can be by the aforementioned answer model Training scene being intended to apply in human-computer interaction scene marked.Tool
For body, this method can also include the following steps:
Utilize the pending data for having marked each dimensional characteristics, the man-machine alternate acknowledge model of training.
It is, will mark the pending datas of each dimensional characteristics as the primary data of human-computer interaction answer model or
Initial sample carrys out the training of finishing man-machine interaction answer model.In the realization scene, this programme can effectively be extracted wait locate
The dimensional characteristics in data are managed, the model accuracy rate for improving human-computer interaction answer model is conducive to.
It is understood that step or operation are only example, the embodiment of the present application some or all of in above-described embodiment
The deformation of other operations or various operations can also be performed.In addition, each step can be presented not according to above-described embodiment
With sequence execute, and it is possible to do not really want to execute all operationss in above-described embodiment.
Technical solution provided by the embodiment of the present invention at least has following technical effect:
Data processing method provided by the invention and device, storage medium, these three dimensions of subordinate act, field and purpose go out
Hair carries out feature identification to pending data, obtains its dimensional characteristics.Thus, can for single pending data
It avoids maintenance personnel's subjective judgement from being intended to the adverse effect for being intended to annotation results, also, is marked compared in the form of short sentence
It is intended to, this programme only needs to carry out its behavior, field and purpose the simple mark that marks and can be realized to intention, can effectively drop
Low mark quantity reduces the cost of labor and time cost of annotation process, effectively improves mark processing capacity.
Also, requirement of this notation methods to system is lower, without compressing professional ability, has higher flexibility,
In the case where including more data content in particular for pending data, this programme can be obtained by the combination of dimensional characteristics
To a variety of intentions, this is equivalent to the mark quantity greatly reduced the number dimensionality reduction of intention on the whole, can be effectively reduced mark
Number is infused, the processing capacity of annotation process is obviously improved.
Embodiment two
Data processing method provided by one based on the above embodiment, the embodiment of the present invention, which further provides, realizes above-mentioned side
The Installation practice of each step and method in method embodiment.
The embodiment of the invention provides a kind of data processing equipments, referring to FIG. 4, the data processing equipment 400, comprising:
Identification module 41, for carrying out feature identifying processing to pending data according to preset dimension, obtain it is described to
Handle the dimensional characteristics of data;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
Labeling module 42, for being labeled to the pending data according to each dimensional characteristics identified.
In the embodiment of the present invention, the behavior dimension is for characterizing the requested processing mode of the pending data;
Wherein, behavior dimensional characteristics include following at least one: handling, cancel, changing and inquiry.
In the embodiment of the present invention, the field dimension is used to characterize the pending data and requests belonging to the business of processing
Field;
Wherein, the field dimension includes: business scope.
In the embodiment of the present invention, the purpose dimension is used to characterize the phase for the business that the pending data requests processing
Close information;
Wherein, the purpose dimension includes following at least one: service attribute, Business Name, business residue situation and industry
Business Expenditure Levels.
In a kind of possible design, identification module 41 is specifically used for:
Semantics recognition is carried out to the pending data, obtains the semantic feature of the pending data;
According to preset dimension, semantic matches are carried out to each semantic feature respectively, obtain the dimension of the pending data
Feature.
In a kind of possible design, labeling module 42 can be used for:
Multiple dimensional characteristics of the pending data are filtered, the target dimension for obtaining the pending data is special
Sign;
Using the target dimension feature, the pending data is labeled.
In another possible design, labeling module 42 can be specifically used for:
Export multiple dimensional characteristics of the pending data;
Obtain the operation information that user is directed to the multiple dimensional characteristics;
Dimensional characteristics indicated by the operation information are obtained, using as the target dimension feature.
In another possible design, the number of the target dimension feature of any dimension is less than or equal to the number of the dimension
Mesh threshold value.
In another possible design, labeling module 42 is specifically used for:
Obtain the corresponding identifier of each target dimension feature;
The identifier is labeled in the pending data.
In addition, in another possible design, the data processing equipment 400 further include:
Determining module (Fig. 4 is not shown), described in determining according to each dimensional characteristics marked in the pending data
The interaction of pending data is intended to;
It obtains module (Fig. 4 is not shown), for obtaining reply data corresponding with the interaction intention;
Output module (Fig. 4 is not shown), for exporting the reply data.
In addition, in another possible design, the data processing equipment 400 further include:
Training module (Fig. 4 is not shown), for utilizing the pending data for having marked each dimensional characteristics, training human-computer interaction
Answer model.
In addition, the division of the modules in data processing equipment 400 shown in Fig. 4 is only a kind of drawing for logic function
Point, it can completely or partially be integrated on a physical entity in actual implementation, it can also be physically separate.And these modules can
All to be realized by way of processing element calls with software;It can also all realize in the form of hardware;It can also part
Module realizes that part of module passes through formal implementation of hardware with software by way of processing element calls.Furthermore these modules
It completely or partially can integrate together, can also independently realize.Processing element described here can be a kind of integrated circuit,
Processing capacity with signal.During realization, each step of the above method or the above modules can pass through processor
The integrated logic circuit of hardware in element or the instruction of software form are completed.
Also, the embodiment of the invention provides a kind of data processing equipments, referring to FIG. 5, the data processing equipment 500,
Include:
Memory 510;
Processor 520;And
Computer program;
Wherein, computer program is stored in memory 510, and is configured as being executed by processor 520 to realize as above
State data processing method described in a kind of any implementation of embodiment.
In data processing equipment 500, the number of processor 520 can also claim for one or more, processor 520
For processing unit, certain control function may be implemented.The processor 520 can be general processor or application specific processor
Deng.The number of memory 510 can be one or more, have instruction or intermediate data on memory 510, described instruction can
It is run on the processor 510, so that data processing equipment 500 executes method described in above method embodiment.It can
Selection of land can also be stored with other related datas in the memory.
In addition, as shown in figure 5, be additionally provided with transceiver 530 in the data processing equipment 500, it is used for and other equipment
Carry out data transmission or communicate, details are not described herein.
As shown in figure 5, memory 510, processor 520 are connected and communicated with transceiver 530 by bus.
In addition, data processing equipment involved by the embodiment of the present invention can be one in a kind of possible design
A part of independent equipment either larger equipment.For example, larger equipment can be man-machine interactive server or client.
In addition, it is stored thereon with computer program the embodiment of the invention provides a kind of readable storage medium storing program for executing, the computer
Program is executed by processor to realize the method as described in any implementation of embodiment one.
Method shown in embodiment one is able to carry out as each module in this present embodiment, what the present embodiment was not described in detail
Part can refer to the related description to embodiment one.
Technical solution provided by the embodiment of the present invention at least has following technical effect:
Data processing method provided by the invention and device, storage medium, these three dimensions of subordinate act, field and purpose go out
Hair carries out feature identification to pending data, obtains its dimensional characteristics.Thus, can for single pending data
It avoids maintenance personnel's subjective judgement from being intended to the adverse effect for being intended to annotation results, also, is marked compared in the form of short sentence
It is intended to, this programme only needs to carry out its behavior, field and purpose the simple mark that marks and can be realized to intention, can effectively drop
Low mark quantity reduces the cost of labor and time cost of annotation process, effectively improves mark processing capacity.
Also, requirement of this notation methods to system is lower, without compressing professional ability, has higher flexibility,
In the case where including more data content in particular for pending data, this programme can be obtained by the combination of dimensional characteristics
To a variety of intentions, this is equivalent to the mark quantity greatly reduced the number dimensionality reduction of intention on the whole, can be effectively reduced mark
Number is infused, the processing capacity of annotation process is obviously improved.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following
Claims are pointed out.
Claims (14)
1. a kind of data processing method characterized by comprising
Feature identifying processing is carried out to pending data according to preset dimension, obtains the dimensional characteristics of the pending data;
Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
According to each dimensional characteristics identified, the pending data is labeled.
2. the method according to claim 1, wherein the behavior dimension is for characterizing the pending data institute
The processing mode of request;
Wherein, behavior dimensional characteristics include following at least one: handling, cancel, changing and inquiry.
3. the method according to claim 1, wherein the field dimension is for characterizing the pending data institute
Request the business fields of processing;
Wherein, the field dimension includes: business scope.
4. the method according to claim 1, wherein the purpose dimension is for characterizing the pending data institute
Request the relevant information of the business of processing;
Wherein, the purpose dimension includes following at least one: service attribute, Business Name, business residue situation and business disappear
Consume situation.
5. method according to claim 1-4, which is characterized in that it is described according to preset dimension to number to be processed
According to feature identifying processing is carried out, the dimensional characteristics of the pending data are obtained, comprising:
Semantics recognition is carried out to the pending data, obtains the semantic feature of the pending data;
According to preset dimension, semantic matches are carried out to each semantic feature respectively, obtain the dimensional characteristics of the pending data.
6. method according to claim 1-4, which is characterized in that each dimensional characteristics that the basis identifies,
The pending data is labeled, comprising:
Multiple dimensional characteristics of the pending data are filtered, the target dimension feature of the pending data is obtained;
Using the target dimension feature, the pending data is labeled.
7. according to the method described in claim 6, it is characterized in that, multiple dimensional characteristics to the pending data into
Row filtering, obtains the target dimension feature of the pending data, comprising:
Export multiple dimensional characteristics of the pending data;
Obtain the operation information that user is directed to the multiple dimensional characteristics;
Dimensional characteristics indicated by the operation information are obtained, using as the target dimension feature.
8. according to the method described in claim 6, it is characterized in that, the number of the target dimension feature of any dimension be less than or
Equal to the quantity threshold of the dimension.
9. according to the method described in claim 6, it is characterized in that, described utilize the target dimension feature, to described wait locate
Reason data are labeled, comprising:
Obtain the corresponding identifier of each target dimension feature;
Using the identifier, the pending data is labeled.
10. method according to claim 1-4, which is characterized in that the method also includes:
According to each dimensional characteristics marked in the pending data, determine that the interaction of the pending data is intended to;
Obtain reply data corresponding with the interaction intention;
Export the reply data.
11. method according to claim 1-4, which is characterized in that the method also includes:
Utilize the pending data for having marked each dimensional characteristics, the man-machine alternate acknowledge model of training.
12. a kind of data processing equipment characterized by comprising
Identification module obtains the number to be processed for carrying out feature identifying processing to pending data according to preset dimension
According to dimensional characteristics;Wherein, the preset dimension includes: behavior dimension, field dimension and purpose dimension;
Labeling module, for being labeled to the pending data according to each dimensional characteristics identified.
13. a kind of data processing equipment characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as
The described in any item methods of claim 1-11.
14. a kind of computer readable storage medium, which is characterized in that it is stored thereon with computer program,
The computer program is executed by processor to realize such as the described in any item methods of claim 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811618506.3A CN109840274B (en) | 2018-12-28 | 2018-12-28 | Data processing method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811618506.3A CN109840274B (en) | 2018-12-28 | 2018-12-28 | Data processing method and device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109840274A true CN109840274A (en) | 2019-06-04 |
CN109840274B CN109840274B (en) | 2021-11-30 |
Family
ID=66883437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811618506.3A Active CN109840274B (en) | 2018-12-28 | 2018-12-28 | Data processing method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109840274B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001002995A1 (en) * | 1999-06-30 | 2001-01-11 | Silverbrook Research Pty Ltd | Method and system for navigating a history list |
CN101650797A (en) * | 2009-09-14 | 2010-02-17 | 中国科学院计算技术研究所 | Movable service knowledge base system and working method thereof |
US20110010374A1 (en) * | 2008-06-26 | 2011-01-13 | Alibaba Group Holding Limited | Filtering Information Using Targeted Filtering Schemes |
US20130216100A1 (en) * | 2010-10-29 | 2013-08-22 | Andrew Litvin | Object identification using sparse spectral components |
CN105550295A (en) * | 2015-12-10 | 2016-05-04 | 小米科技有限责任公司 | Classification model optimization method and classification model optimization apparatus |
CN105630827A (en) * | 2014-11-05 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Information processing method and system, and auxiliary system |
CN106844530A (en) * | 2016-12-29 | 2017-06-13 | 北京奇虎科技有限公司 | Training method and device of a kind of question and answer to disaggregated model |
CN107506775A (en) * | 2016-06-14 | 2017-12-22 | 北京陌上花科技有限公司 | model training method and device |
CN108121721A (en) * | 2016-11-28 | 2018-06-05 | 渡鸦科技(北京)有限责任公司 | Intension recognizing method and device |
CN108363693A (en) * | 2018-02-13 | 2018-08-03 | 上海智臻智能网络科技股份有限公司 | Text handling method and device |
CN108898162A (en) * | 2018-06-08 | 2018-11-27 | 东软集团股份有限公司 | A kind of data mask method, device, equipment and computer readable storage medium |
US20180341801A1 (en) * | 2016-01-18 | 2018-11-29 | Alibaba Group Holding Limited | Feature data processing method and device |
CN108959327A (en) * | 2017-05-27 | 2018-12-07 | 中国移动通信有限公司研究院 | A kind of method for processing business, device and computer readable storage medium |
CN109036466A (en) * | 2018-08-01 | 2018-12-18 | 太原理工大学 | The emotion dimension PAD prediction technique of Emotional Speech identification |
CN109086351A (en) * | 2018-07-17 | 2018-12-25 | 北京光年无限科技有限公司 | A kind of method and user tag system obtaining user tag |
-
2018
- 2018-12-28 CN CN201811618506.3A patent/CN109840274B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001002995A1 (en) * | 1999-06-30 | 2001-01-11 | Silverbrook Research Pty Ltd | Method and system for navigating a history list |
US20110010374A1 (en) * | 2008-06-26 | 2011-01-13 | Alibaba Group Holding Limited | Filtering Information Using Targeted Filtering Schemes |
CN101650797A (en) * | 2009-09-14 | 2010-02-17 | 中国科学院计算技术研究所 | Movable service knowledge base system and working method thereof |
US20130216100A1 (en) * | 2010-10-29 | 2013-08-22 | Andrew Litvin | Object identification using sparse spectral components |
CN105630827A (en) * | 2014-11-05 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Information processing method and system, and auxiliary system |
CN105550295A (en) * | 2015-12-10 | 2016-05-04 | 小米科技有限责任公司 | Classification model optimization method and classification model optimization apparatus |
US20180341801A1 (en) * | 2016-01-18 | 2018-11-29 | Alibaba Group Holding Limited | Feature data processing method and device |
CN107506775A (en) * | 2016-06-14 | 2017-12-22 | 北京陌上花科技有限公司 | model training method and device |
CN108121721A (en) * | 2016-11-28 | 2018-06-05 | 渡鸦科技(北京)有限责任公司 | Intension recognizing method and device |
CN106844530A (en) * | 2016-12-29 | 2017-06-13 | 北京奇虎科技有限公司 | Training method and device of a kind of question and answer to disaggregated model |
CN108959327A (en) * | 2017-05-27 | 2018-12-07 | 中国移动通信有限公司研究院 | A kind of method for processing business, device and computer readable storage medium |
CN108363693A (en) * | 2018-02-13 | 2018-08-03 | 上海智臻智能网络科技股份有限公司 | Text handling method and device |
CN108898162A (en) * | 2018-06-08 | 2018-11-27 | 东软集团股份有限公司 | A kind of data mask method, device, equipment and computer readable storage medium |
CN109086351A (en) * | 2018-07-17 | 2018-12-25 | 北京光年无限科技有限公司 | A kind of method and user tag system obtaining user tag |
CN109036466A (en) * | 2018-08-01 | 2018-12-18 | 太原理工大学 | The emotion dimension PAD prediction technique of Emotional Speech identification |
Non-Patent Citations (1)
Title |
---|
刘楠楠: "文本分类中特征降维算法的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN109840274B (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109101545A (en) | Natural language processing method, apparatus, equipment and medium based on human-computer interaction | |
CN109739939A (en) | The data fusion method and device of knowledge mapping | |
CN107766511A (en) | Intelligent answer method, terminal and storage medium | |
CN107563417A (en) | A kind of deep learning artificial intelligence model method for building up and system | |
CN109325040B (en) | FAQ question-answer library generalization method, device and equipment | |
CN109992763A (en) | Language marks processing method, system, electronic equipment and computer-readable medium | |
CN109376847A (en) | User's intension recognizing method, device, terminal and computer readable storage medium | |
CN109885664A (en) | A kind of Intelligent dialogue method, robot conversational system, server and storage medium | |
CN107301229A (en) | Feedback assigning method and system based on semantic analysis | |
CN107958059B (en) | Intelligent question answering method, device, terminal and computer readable storage medium | |
CN108416032A (en) | A kind of file classification method, device and storage medium | |
CN110110049A (en) | Service consultation method, apparatus, system, service robot and storage medium | |
CN109783624A (en) | Answer generation method, device and the intelligent conversational system in knowledge based library | |
CN110188210A (en) | One kind is based on figure regularization and the independent cross-module state data retrieval method of mode and system | |
CN109885823A (en) | A kind of distributed semantic recognition methods of financial industry and system and device | |
CN109657792A (en) | Construct the method, apparatus and computer-readable medium of neural network | |
CA3153056A1 (en) | Intelligently questioning and answering method, device, computer, equipment and storage medium | |
CN109871527A (en) | A kind of method for recognizing semantics based on participle | |
CN110032736A (en) | A kind of text analyzing method, apparatus and storage medium | |
CN108628908A (en) | The method, apparatus and electronic equipment of sorted users challenge-response boundary | |
CN113627194B (en) | Information extraction method and device, and communication message classification method and device | |
CN109976725A (en) | A kind of process program development approach and device based on lightweight flow engine | |
CN114282513A (en) | Text semantic similarity matching method and system, intelligent terminal and storage medium | |
CN110489740A (en) | Semantic analytic method and Related product | |
CN109840274A (en) | Data processing method and device, storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |