CN109840274B - Data processing method and device and storage medium - Google Patents

Data processing method and device and storage medium Download PDF

Info

Publication number
CN109840274B
CN109840274B CN201811618506.3A CN201811618506A CN109840274B CN 109840274 B CN109840274 B CN 109840274B CN 201811618506 A CN201811618506 A CN 201811618506A CN 109840274 B CN109840274 B CN 109840274B
Authority
CN
China
Prior art keywords
data
dimension
processed
feature
dimensional features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811618506.3A
Other languages
Chinese (zh)
Other versions
CN109840274A (en
Inventor
刘佳祥
万星
白林楠
张傲
王经委
李芝
孙宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811618506.3A priority Critical patent/CN109840274B/en
Publication of CN109840274A publication Critical patent/CN109840274A/en
Application granted granted Critical
Publication of CN109840274B publication Critical patent/CN109840274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a data processing method and device and a storage medium. The method comprises the following steps: performing feature recognition processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavior dimension, a domain dimension, and a destination dimension; therefore, the data to be processed is labeled according to the identified dimension characteristics. The technical scheme provided by the invention can reduce the marking cost and improve the processing capacity.

Description

Data processing method and device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, and a storage medium.
Background
With the continuous development of artificial intelligence technology, intelligent robots are adopted in more and more scenes to deal with the questions of users.
In the prior art, a machine learning model is generally trained by a machine learning method to realize response. Prior to training of machine learning models, high quality input data with well-defined intent characteristics is required. Currently, the intention labeling work is generally manually realized by maintenance personnel, and the labeling mode is generally a flat mode, that is, the maintenance personnel manually identifies semantics and labels expressed intentions.
The tiled labeling method only aims at one service type and has dozens of intents, and along with the expansion of the service, the number of the intents is likely to expand to thousands, so that the labeling cost is greatly improved, and the processing capacity is lower.
Disclosure of Invention
The invention provides a data processing method and device and a storage medium, aiming at reducing the marking cost and improving the processing capacity.
In a first aspect, the present invention provides a data processing method, including:
performing feature recognition processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavior dimension, a domain dimension, and a destination dimension;
and marking the data to be processed according to the identified dimension characteristics.
In a second aspect, the present invention provides a data processing apparatus comprising:
the identification module is used for carrying out feature identification processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavior dimension, a domain dimension, and a destination dimension;
and the marking module is used for marking the data to be processed according to the identified dimension characteristics.
In a third aspect, the present invention provides a data processing apparatus comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, the invention provides a computer readable storage medium having stored thereon a computer program for execution by a processor to perform the method according to the first aspect.
The data processing method, the data processing device and the storage medium provided by the invention have the advantages that the characteristic identification is carried out on the data to be processed from three dimensions of behaviors, fields and purposes, and the dimensional characteristics of the data are obtained. Therefore, for single data to be processed, the adverse effect of subjective judgment intention on intention labeling results of maintenance personnel can be avoided, and compared with labeling the intention in a short sentence form, the intention labeling method and the intention labeling system can realize the purpose of labeling the intention by simply labeling the behavior, the field and the purpose of the intention, can effectively reduce the number of labels, reduce the labor cost and the time cost in the labeling process, and effectively improve the labeling processing capacity.
Moreover, the labeling mode has low requirements on the system, does not need to compress the service capacity, has high flexibility, and particularly can obtain various intents through combination of dimension characteristics under the condition that the data to be processed contains more data contents, which is equivalent to dimension reduction of the number of the intents, so that the overall labeling quantity is greatly reduced, the labeling number can be effectively reduced, and the processing capacity in the labeling process is remarkably improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating another data processing method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating another data processing method according to an embodiment of the present invention;
FIG. 4 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic physical structure diagram of a data processing apparatus according to an embodiment of the present invention.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The specific application scenario of the invention is a data preprocessing process in an intelligent response system.
In this scenario, maintenance personnel generally mark intentions for the initial data in advance, and use these marked intentions to implement responses, so as to improve the response accuracy of the response system. The current intention labeling mode is a flat mode, namely, maintenance personnel manually identify semantics and label expressed intentions.
For example, if the data to be processed that needs to be subjected to intent labeling is "how much money is left for inquiring the traffic packet", the corresponding intent is determined manually as follows: query traffic balance "; if the data to be processed is "i want to check how much money is in my traffic packet", the corresponding intention is judged by human: query traffic packet price ". In the range covered by the traffic packet, besides "balance" and "price", lots of information can be abstracted as intentions, and when the information is extended to all services, the number of intentions is multiplied.
That is, the existing intent labeling method not only causes the labeling cost to rise, but also may affect the training result of the machine learning model due to introducing more data.
The data processing method provided by the invention aims to solve the technical problems in the prior art and provides the following solving ideas: an intention classification system belonging to a dialogue system is designed to solve the classification problems in the labeling and engineering practice and avoid the catastrophic influence on the labeling caused by the unlimited and non-logical classification and the elbow stopping caused by the capacity of a business system. Specifically, in the labeling system, from three dimensions of behavior, field and purpose, the labeling of different intentions is realized by combining the dimension characteristics of the three dimensions, so that the number of labels is effectively reduced, and the processing efficiency is improved.
The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Example one
The embodiment of the invention provides a data processing method. Referring to fig. 1, the method includes the following steps:
s102, performing feature identification processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavioral dimension, a domain dimension, and a destination dimension.
Specifically, in a specific application scenario of human-computer interaction, the data to be processed may be interaction data input by a user.
And S104, marking the data to be processed according to the identified dimension characteristics.
For ease of understanding, dimensions involved in embodiments of the present invention are described below.
First, the action dimension is used to characterize the processing mode requested by the data to be processed.
Specifically, the behavior dimension features involved in the embodiments of the present invention may include, but are not limited to, at least one of the following: transacting, canceling, changing and querying.
For example, if the data to be processed is a user question "query package remaining situation" in a human-computer interaction scenario, where a processing manner of a user request for the package remaining situation is "query," the "query" may be used as a dimensional feature of the data to be processed in a behavior dimension (hereinafter referred to as a behavior dimension feature). For another example, if the data to be processed is a user question "transact traffic package" in a human-computer interaction scenario, where the processing manner of the user request for the remaining package is "transact," transact "may be used as the behavior dimension feature of the data to be processed.
In addition to the foregoing behavior dimension characteristics, there may be other behavior dimension characteristics in the embodiments of the present invention, which may include but are not limited to at least one of the following: recharge, apply for, subscribe, unsubscribe, consult, etc.
And secondly, the domain dimension is used for representing the domain to which the service requested to be processed by the data to be processed belongs. Specifically, the domain dimensions may include, but are not limited to: the business field.
Specifically, a human-computer interaction scenario between a communication operator and a user is taken as an example, and in the scenario, a service field may be a specific service type. For example, business areas may include, but are not limited to: traffic packets, packages, tariffs, or broadband, etc.
For example, if the data to be processed is a user question "query package remaining situation" in a human-computer interaction scenario, where a service field for which a user request is directed is: and the package can be used as a dimension feature of the to-be-processed data in the domain dimension (hereinafter referred to as the domain dimension feature). For another example, if the data to be processed is a user question "transact a traffic packet" in a human-computer interaction scenario, where the service requested by the user is a traffic packet-related service, the "traffic packet" may be used as a domain dimension feature of the data to be processed.
And the destination dimension is used for representing the relevant information of the service requested to be processed by the data to be processed.
Specifically, the destination dimension related to the embodiment of the present invention may include, but is not limited to, at least one of the following: service attribute, service name, service remaining condition and service consumption condition. The service attribute may include, but is not limited to: service price, service introduction message, etc.
For example, if the data to be processed is a user question "query package remaining situation" in a human-computer interaction scenario, where the package information that the user requests to process is "remaining situation", that is, "remaining situation" may be used as a dimension feature (hereinafter referred to as a destination dimension feature) of the data to be processed in a destination dimension. For another example, if the data to be processed is a user question "transact a flow package" in a human-computer interaction scenario, where the service requested by the user is a transaction service of the flow package, at this time, only the machine side needs to execute a transaction flow, and therefore, the target dimensional feature of the data to be processed can be recorded as null or a designated character.
From the above example, in a human-computer interaction scene, the objective dimensional feature is meaningful only when the behavior dimensional feature is of a "asking" type, and when the behavior dimensional feature is of a "giving" type, the objective dimensional feature may not be considered or recorded. The "asking" type behavior dimension feature refers to that a user system acquires expected information from a machine side through man-machine interaction. For example, "how much money did i want to look up my traffic package? "this interactive process is the desire to get feedback answers from the machine side. The 'give' type behavior dimension characteristic refers to that answers on the machine side are matched through man-machine interaction. For example, the "system: which departure ticket you want to buy? The user: beijing ″.
According to the data processing method provided by the embodiment of the invention, the three dimensions of behavior, field and purpose are used, the dimensional characteristics of the data processing method only need to be recorded under the three dimensions respectively, and accurate marking aiming at the user intention can be realized by acquiring and marking the dimensional characteristics.
For easy understanding, please refer to table 1, where table 1 shows a tiled labeling method in the prior art, and table 1 also shows a data processing manner based on three dimensions of behavior, domain and purpose, which is provided by the embodiment of the present invention.
TABLE 1
Figure BDA0001926340250000051
Figure BDA0001926340250000061
It can be known from comparison of 5 groups of data in table 1 that, when any group of data in fig. 5 is to-be-processed data, a maintainer needs to judge and manually mark the intention phrases in the existing tiled intention labeling scheme, but the embodiment of the present invention only needs to simply record three dimensional features, which can not only avoid adverse effects of the maintainer's subjective judgment intention on the intention labeling result, but also can realize the intention labeling by simply labeling the behavior, field and purpose thereof, compared with labeling the intention in the form of short sentences, thereby effectively reducing the number of labels, reducing the labor cost and time cost in the labeling process, and effectively improving the labeling processing capability.
In addition, the scheme provided by the embodiment of the invention has more excellent marking performance under the condition that the data volume of the data to be processed is larger. Specifically, if there are 5 "behaviors", 20 "domains", and 40 "destinations" in the data to be processed, there will be 5 × 20 × 40-3000 intentions that need to be labeled according to the existing tiled labeling method. In contrast, in the scheme, only 5+20+ 40-65 dimensional features need to be labeled, and the labeling cost is reduced by 97% compared with the existing tiled labeling method.
That is, under the constraint of the same manpower and engineering capacity, the existing tiled labeling method usually selects a compressed service system to reduce the service capacity; however, in the data processing method provided in the embodiment of the present invention, under the condition that the manpower and the engineering capacity are not changed, the number of the service capacity coverage can be significantly increased, even doubled. In other words, the technical solution provided by the embodiment of the present invention can obtain multiple intents only by combining the dimensional features, which is equivalent to reducing the number of intents in dimension, thereby greatly reducing the number of labels as a whole, effectively reducing the number of labels, and significantly improving the processing capability of the labeling process. Furthermore, it is possible to provide a liquid crystal display device,
based on the design, the embodiment of the present invention further provides a specific implementation manner of the foregoing steps.
Specifically, the identification process tailored to the dimension in S102 may refer to the manner shown in fig. 2:
s1022, performing semantic recognition on the data to be processed to obtain semantic features of the data to be processed.
Before performing semantic recognition, preprocessing can also be performed on the data to be processed in advance, and then semantic recognition processing is performed on the preprocessed data to be processed. Specifically, the pretreatment may include, but is not limited to, at least one of: word segmentation processing, voice-to-text processing, etc.
In addition, the embodiment of the present invention is not particularly limited to the semantic recognition algorithm, and when the semantic recognition algorithm is specifically implemented, the semantic recognition algorithm may be implemented by a short sentence recognition algorithm, a long sentence recognition algorithm, or a model recognition algorithm.
And S1024, performing semantic matching on each semantic feature according to preset dimensionality to obtain the dimensionality feature of the data to be processed.
In this step, semantic similarity between each semantic feature and a preset dimension feature may be obtained, and the semantic similarity may be compared with a preset similarity threshold. For any semantic feature and any dimension feature, if the semantic similarity between the two reaches a similarity threshold, the dimension feature can be determined to be matched with the semantic feature, and the dimension feature can be used as one dimension feature of the data to be processed. Otherwise, if the semantic similarity does not reach (is lower than) the similarity threshold, the semantic features are determined to be not matched, and the semantic features can be abandoned for the next round of judgment.
In addition, in a specific implementation process, in order to further improve the processing efficiency, before the foregoing semantic matching process is performed in this step, each semantic feature acquired in S1022 may be screened, and semantic features irrelevant to behaviors, fields, and purposes are screened out, so as to reduce the data amount in a subsequent matching process and improve the processing efficiency.
Based on any one of the above implementation manners, each dimension feature in the data to be processed can be obtained.
The embodiment of the invention further provides a labeling mode for the data to be processed.
It should be noted that "labeling the data to be processed" according to the embodiment of the present invention refers to establishing an association relationship between the label and the data to be processed, and is not limited to an implementation manner of directly modifying the data to be processed to implement the label, and may also be implemented by storing the data to be processed and the labeled data respectively and establishing an association relationship between the two.
During specific labeling, the data to be processed can be directly labeled by directly utilizing the dimensional features identified in the step S102; or, filtering the multiple dimensional features identified in step S102 to obtain target dimensional features of each dimension, and labeling the to-be-processed data by using the target dimensional features of each dimension.
If labeling is performed in a filtering manner, filtering may be performed in a manual matching manner, or filtering may be performed according to a preset algorithm.
In a possible implementation scenario, referring to fig. 3, the step S104 includes:
s1042, outputting a plurality of dimensional characteristics of the data to be processed.
In one implementation mode, when the output of each dimension characteristic is executed, the dimension to which each dimension characteristic belongs can be sequentially output, or the dimension characteristics are output in columns, and the respective filtering mode of the dimension characteristics enables a user to select the output dimension characteristics more conveniently, more conveniently and more clearly, and the use friendliness is higher.
And S1044, acquiring the operation information of the user aiming at the plurality of dimensional characteristics.
In a practical application scenario, the user here may be a maintenance person, or a user side participating in human-computer interaction.
And S1046, acquiring the dimensional feature indicated by the operation information to serve as a target dimensional feature.
That is, the user's selection of a plurality of output dimensional features is obtained, and the dimensional features selected by the user are taken as target dimensional features
And S1048, marking the data to be processed by using the target dimension characteristics.
The implementation shown in fig. 3 gives the final decision right of selecting the target dimensional feature to the user, and the user side can select the dimensional feature closest to the meaning of the user to express according to the requirement; and as the operator side, the dimension characteristic which is closest to the meaning which the user side wants to express can be further filtered by combining subjective judgment on the basis of objective screening of the scheme. That is, by means of the further interaction mode, the degree of closeness between the dimensional feature and the true meaning is favorably improved, in other words, the accuracy of the dimensional feature is favorably improved.
In addition, filtering can be realized by means of a preset filtering algorithm.
For example, in one possible implementation scenario, filtering may be implemented based on the number of occurrences of each dimensional feature. For example, the occurrence times of a plurality of dimensional features in each dimension can be ranked, and one or more partial dimensional features ranked at the top are obtained as the target dimensional features; or acquiring a part of dimensional features of which the occurrence times are greater than a preset time threshold value to serve as target dimensional features.
In addition, in any of the foregoing filtering scenarios, a number threshold of the target dimension features corresponding to each dimension may be further set, that is, it is ensured that the number of the target dimension features of any dimension is less than or equal to the number threshold of the dimension. The realization mode further limits the number of the finally labeled target dimension features, and the limiting mode is favorable for controlling the labeling number within a preset range.
Specifically, the number threshold value of each dimension may be the same or may be different, and may be set as needed. For example, in one possible design, the threshold number of behavioral dimensional features may be set to 5 and the threshold number of domain dimensional features may be set to 20.
Besides, the filtering step can be realized by a combination scheme of the two methods, besides the filtering method by manual selection or preset algorithm alone.
For example, in an implementation scenario, each dimension feature may be filtered by a preset algorithm, so that the filtered dimension features of each dimension are within a number threshold range corresponding to the dimension, and then the output and subsequent steps shown in fig. 3 are executed to label the data to be processed. In this implementation, the candidates output by S1042 are within the aforementioned number threshold range, and the user only needs to select from three dimensions respectively to determine the target dimension characteristics.
When the labeling of the data to be processed is realized in any of the foregoing implementation manners, the labeling may be directly performed by using each dimension feature (or target dimension feature), or may also be performed by using an identifier corresponding to each dimension feature (or target dimension feature).
Still taking the method shown in fig. 3 as an example, in another possible design, the S1044 method may be further implemented by:
acquiring identifiers corresponding to the target dimension characteristics;
and marking the data to be processed by utilizing the target dimension characteristics.
The identifier may be in various forms, and may be a number, a letter, a character string, or the like. Taking the behavior dimension characteristic as an example, the identifier corresponding to "transact" may be set to C, the identifier corresponding to "cancel" may be set to D, the identifier corresponding to "change" may be set to U, and the identifier corresponding to "query" may be set to R.
In addition, the embodiment of the present invention is not particularly limited to the correspondence between each target dimension feature and the identifier, and the two may be in one-to-one correspondence, or may be in one-to-many (or many-to-one) correspondence. Still taking the foregoing behavioral characteristics as an example, C may correspond to a plurality of behavioral dimensional characteristics, such as: handling, recharging, applying for and ordering; similarly, D may also correspond to a plurality of behavioral dimensional characteristics: canceling and unsubscribing; r may correspond to a plurality of behavioral dimensional characteristics: querying and consulting; while U may still correspond to a behavioral dimension feature: and (6) changing.
By any mode, the intention labeling for the data to be processed can be realized, the labeling mode is simple and feasible, the labeling number can be reasonably controlled, and the method has high expandability and flexibility.
In order to illustrate the present solution more clearly, the embodiment of the present invention also provides at least two possible application manners as follows:
first, the aforementioned labeled intent can be applied to a dialog scenario in a human-computer interaction scenario. Specifically, the method may further include the steps of:
determining an interaction intention of the data to be processed according to the dimension characteristics marked in the data to be processed;
acquiring reply data corresponding to the interaction intention;
and outputting the reply data.
The method is beneficial to improving the conversation efficiency in a human-computer interaction scene and improving the response efficiency of the machine side.
Second, the aforementioned labeled intent can be applied to a response model training scenario in a human-computer interaction scenario. Specifically, the method may further include the steps of:
and training a human-computer interaction response model by using the data to be processed with the marked dimension characteristics.
Namely, the data to be processed with the marked dimension characteristics is used as initial data or initial samples of the human-computer interaction response model to complete the training of the human-computer interaction response model. In the implementation scene, the method and the device can effectively extract the dimension characteristics in the data to be processed, and are beneficial to improving the model accuracy of the human-computer interaction response model.
It is to be understood that some or all of the steps or operations in the above-described embodiments are merely examples, and other operations or variations of various operations may be performed by the embodiments of the present application. Further, the various steps may be performed in a different order presented in the above-described embodiments, and it is possible that not all of the operations in the above-described embodiments are performed.
The technical scheme provided by the embodiment of the invention at least has the following technical effects:
the data processing method, the data processing device and the storage medium provided by the invention have the advantages that the characteristic identification is carried out on the data to be processed from three dimensions of behaviors, fields and purposes, and the dimensional characteristics of the data are obtained. Therefore, for single data to be processed, the adverse effect of subjective judgment intention on intention labeling results of maintenance personnel can be avoided, and compared with labeling the intention in a short sentence form, the intention labeling method and the intention labeling system can realize the purpose of labeling the intention by simply labeling the behavior, the field and the purpose of the intention, can effectively reduce the number of labels, reduce the labor cost and the time cost in the labeling process, and effectively improve the labeling processing capacity.
Moreover, the labeling mode has low requirements on the system, does not need to compress the service capacity, has high flexibility, and particularly can obtain various intents through combination of dimension characteristics under the condition that the data to be processed contains more data contents, which is equivalent to dimension reduction of the number of the intents, so that the overall labeling quantity is greatly reduced, the labeling number can be effectively reduced, and the processing capacity in the labeling process is remarkably improved.
Example two
Based on the data processing method provided in the first embodiment, the embodiment of the present invention further provides an embodiment of an apparatus for implementing each step and method in the above method embodiment.
Referring to fig. 4, a data processing apparatus 400 according to an embodiment of the present invention includes:
the identification module 41 is configured to perform feature identification processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavior dimension, a domain dimension, and a destination dimension;
and the labeling module 42 is configured to label the data to be processed according to the identified dimensional features.
In the embodiment of the invention, the behavior dimension is used for representing the processing mode requested by the data to be processed;
wherein the behavior dimension characteristics comprise at least one of the following: transacting, canceling, changing and querying.
In the embodiment of the invention, the domain dimension is used for representing the domain to which the service requested to be processed by the data to be processed belongs;
wherein the domain dimensions include: the business field.
In the embodiment of the present invention, the destination dimension is used to represent relevant information of a service requested to be processed by the data to be processed;
wherein the destination dimension comprises at least one of: service attribute, service name, service remaining condition and service consumption condition.
In one possible design, the identification module 41 is specifically configured to:
performing semantic recognition on the data to be processed to obtain semantic features of the data to be processed;
and according to the preset dimensionality, semantic matching is carried out on each semantic feature respectively to obtain the dimensionality feature of the data to be processed.
In one possible design, the labeling module 42 may be configured to:
filtering the multiple dimensional features of the data to be processed to obtain target dimensional features of the data to be processed;
and marking the data to be processed by utilizing the target dimension characteristics.
In another possible design, the labeling module 42 may be specifically configured to:
outputting a plurality of dimensional features of the data to be processed;
acquiring operation information of a user aiming at the multiple dimension characteristics;
and acquiring the dimensional features indicated by the operation information as the target dimensional features.
In another possible design, the number of target dimensional features for any dimension is less than or equal to the number threshold for that dimension.
In another possible design, the labeling module 42 is specifically configured to:
acquiring identifiers corresponding to the target dimension characteristics;
and marking the identifier on the data to be processed.
Furthermore, in another possible design, the data processing apparatus 400 further includes:
a determining module (not shown in fig. 4) configured to determine an interaction intention of the data to be processed according to the dimensional features labeled in the data to be processed;
an obtaining module (not shown in fig. 4) for obtaining reply data corresponding to the interaction intention;
an output module (not shown in fig. 4) for outputting the reply data.
Furthermore, in another possible design, the data processing apparatus 400 further includes:
and the training module (not shown in FIG. 4) is used for training the human-computer interaction response model by using the to-be-processed data marked with the dimensional features.
In addition, the division of each module in the data processing apparatus 400 shown in fig. 4 is only a logical division, and all or part of the actual implementation may be integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling by the processing element in software, and part of the modules can be realized in the form of hardware. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
Furthermore, an embodiment of the present invention provides a data processing apparatus, referring to fig. 5, the data processing apparatus 500 includes:
a memory 510;
a processor 520; and
a computer program;
wherein the computer program is stored in the memory 510 and configured to be executed by the processor 520 to implement the data processing method according to any one of the above-mentioned embodiments.
In the data processing apparatus 500, the number of the processors 520 may be one or more, and the processors 520 may also be referred to as processing units, which may implement certain control functions. The processor 520 may be a general purpose processor, a special purpose processor, or the like. The number of the memory 510 may be one or more, and the memory 510 stores instructions or intermediate data, and the instructions may be executed on the processor 510, so that the data processing apparatus 500 executes the method described in the above method embodiment. Optionally, other related data may also be stored in the memory.
In addition, as shown in fig. 5, the data processing apparatus 500 is further provided with a transceiver 530 for data transmission or communication with other devices, which is not described herein again.
As shown in fig. 5, the memory 510, the processor 520, and the transceiver 530 are connected by a bus and communicate.
Furthermore, in one possible design, the data processing apparatus according to the embodiments of the present invention may be a stand-alone device or a part of a larger device. For example, the larger device may be a human interaction server or client.
Furthermore, the embodiment of the present invention provides a readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method according to any one of the embodiments.
Since each module in this embodiment can execute the method shown in the first embodiment, reference may be made to the related description of the first embodiment for a part of this embodiment that is not described in detail.
The technical scheme provided by the embodiment of the invention at least has the following technical effects:
the data processing method, the data processing device and the storage medium provided by the invention have the advantages that the characteristic identification is carried out on the data to be processed from three dimensions of behaviors, fields and purposes, and the dimensional characteristics of the data are obtained. Therefore, for single data to be processed, the adverse effect of subjective judgment intention on intention labeling results of maintenance personnel can be avoided, and compared with labeling the intention in a short sentence form, the intention labeling method and the intention labeling system can realize the purpose of labeling the intention by simply labeling the behavior, the field and the purpose of the intention, can effectively reduce the number of labels, reduce the labor cost and the time cost in the labeling process, and effectively improve the labeling processing capacity.
Moreover, the labeling mode has low requirements on the system, does not need to compress the service capacity, has high flexibility, and particularly can obtain various intents through combination of dimension characteristics under the condition that the data to be processed contains more data contents, which is equivalent to dimension reduction of the number of the intents, so that the overall labeling quantity is greatly reduced, the labeling number can be effectively reduced, and the processing capacity in the labeling process is remarkably improved.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (11)

1. A data processing method, comprising:
performing feature recognition processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavior dimension, a domain dimension, and a destination dimension;
filtering the multiple dimensional features of the data to be processed through a preset algorithm to obtain filtered dimensional features, wherein the filtered dimensional features are within a number threshold range corresponding to the dimensions;
outputting the filtered dimensional features;
acquiring operation information of a user for the filtered dimensional features;
acquiring the dimensional features indicated by the operation information as target dimensional features;
and marking the data to be processed by utilizing the target dimension characteristics.
2. The method of claim 1, wherein the behavior dimension is used to characterize a processing manner requested by the data to be processed;
wherein the behavior dimension characteristics comprise at least one of the following: transacting, canceling, changing and querying.
3. The method according to claim 1, wherein the domain dimension is used for characterizing a domain to which a service requested to be processed by the data to be processed belongs;
wherein the domain dimensions include: the business field.
4. The method according to claim 1, wherein the destination dimension is used to characterize relevant information of a service requested to be processed by the data to be processed;
wherein the destination dimension comprises at least one of: service attribute, service name, service remaining condition and service consumption condition.
5. The method according to any one of claims 1 to 4, wherein the performing feature identification processing on the data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed comprises:
performing semantic recognition on the data to be processed to obtain semantic features of the data to be processed;
and according to the preset dimensionality, semantic matching is carried out on each semantic feature respectively to obtain the dimensionality feature of the data to be processed.
6. The method according to claim 1, wherein the labeling the data to be processed by using the target dimension feature comprises:
acquiring identifiers corresponding to the target dimension characteristics;
and marking the data to be processed by using the identifier.
7. The method according to any one of claims 1-4, further comprising:
determining an interaction intention of the data to be processed according to the dimension characteristics marked in the data to be processed;
acquiring reply data corresponding to the interaction intention;
and outputting the reply data.
8. The method according to any one of claims 1-4, further comprising:
and training a human-computer interaction response model by using the data to be processed with the marked dimension characteristics.
9. A data processing apparatus, comprising:
the identification module is used for carrying out feature identification processing on data to be processed according to a preset dimension to obtain a dimension feature of the data to be processed; wherein the preset dimensions include: a behavior dimension, a domain dimension, and a destination dimension;
the marking module is used for filtering the plurality of dimensional features of the data to be processed through a preset algorithm to obtain each filtered dimensional feature, and each filtered dimensional feature is within a number threshold range corresponding to the dimension; outputting the filtered dimensional features; acquiring operation information of a user for the filtered dimensional features; acquiring the dimensional features indicated by the operation information as target dimensional features;
and the marking module is also used for marking the data to be processed by utilizing the target dimension characteristics.
10. A data processing apparatus, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-8.
11. A computer-readable storage medium, having stored thereon a computer program,
the computer program is executed by a processor to implement the method of any one of claims 1-8.
CN201811618506.3A 2018-12-28 2018-12-28 Data processing method and device and storage medium Active CN109840274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811618506.3A CN109840274B (en) 2018-12-28 2018-12-28 Data processing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811618506.3A CN109840274B (en) 2018-12-28 2018-12-28 Data processing method and device and storage medium

Publications (2)

Publication Number Publication Date
CN109840274A CN109840274A (en) 2019-06-04
CN109840274B true CN109840274B (en) 2021-11-30

Family

ID=66883437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811618506.3A Active CN109840274B (en) 2018-12-28 2018-12-28 Data processing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN109840274B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650797A (en) * 2009-09-14 2010-02-17 中国科学院计算技术研究所 Movable service knowledge base system and working method thereof
CN105630827A (en) * 2014-11-05 2016-06-01 阿里巴巴集团控股有限公司 Information processing method and system, and auxiliary system
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN108363693A (en) * 2018-02-13 2018-08-03 上海智臻智能网络科技股份有限公司 Text handling method and device
CN108898162A (en) * 2018-06-08 2018-11-27 东软集团股份有限公司 A kind of data mask method, device, equipment and computer readable storage medium
CN108959327A (en) * 2017-05-27 2018-12-07 中国移动通信有限公司研究院 A kind of method for processing business, device and computer readable storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766942B1 (en) * 1999-06-30 2004-07-27 Silverbrook Research Pty Ltd Method and system for collaborative document markup
CN101616101B (en) * 2008-06-26 2012-01-18 阿里巴巴集团控股有限公司 Method and device for filtering user information
WO2012057779A1 (en) * 2010-10-29 2012-05-03 Analogic Corporation Object identification using sparse spectral components
CN105550295B (en) * 2015-12-10 2019-09-10 小米科技有限责任公司 Disaggregated model optimization method and device
CN106980900A (en) * 2016-01-18 2017-07-25 阿里巴巴集团控股有限公司 A kind of characteristic processing method and equipment
CN107506775A (en) * 2016-06-14 2017-12-22 北京陌上花科技有限公司 model training method and device
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN109086351B (en) * 2018-07-17 2022-03-22 北京光年无限科技有限公司 Method for acquiring user tag and user tag system
CN109036466B (en) * 2018-08-01 2022-11-29 太原理工大学 Emotion dimension PAD prediction method for emotion voice recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650797A (en) * 2009-09-14 2010-02-17 中国科学院计算技术研究所 Movable service knowledge base system and working method thereof
CN105630827A (en) * 2014-11-05 2016-06-01 阿里巴巴集团控股有限公司 Information processing method and system, and auxiliary system
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN108959327A (en) * 2017-05-27 2018-12-07 中国移动通信有限公司研究院 A kind of method for processing business, device and computer readable storage medium
CN108363693A (en) * 2018-02-13 2018-08-03 上海智臻智能网络科技股份有限公司 Text handling method and device
CN108898162A (en) * 2018-06-08 2018-11-27 东软集团股份有限公司 A kind of data mask method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN109840274A (en) 2019-06-04

Similar Documents

Publication Publication Date Title
JP2020521210A (en) Information processing method and terminal, computer storage medium
CN109783227A (en) Method for allocating tasks, device, system and computer readable storage medium
US11461317B2 (en) Method, apparatus, system, device, and storage medium for answering knowledge questions
CN107958059B (en) Intelligent question answering method, device, terminal and computer readable storage medium
CN110782318A (en) Marketing method and device based on audio interaction and storage medium
US20210256534A1 (en) Supporting automation of customer service
US20140379399A1 (en) Method and System for Dynamically Determining Completion Status in a Human Intelligence System
CN110188210A (en) One kind is based on figure regularization and the independent cross-module state data retrieval method of mode and system
CN109213758B (en) Data access method, device, equipment and computer readable storage medium
CN110990527A (en) Automatic question answering method and device, storage medium and electronic equipment
CN113760242A (en) Data processing method, device, server and medium
CN109902182A (en) Knowledge data processing method, device, equipment and storage medium
CN109840274B (en) Data processing method and device and storage medium
CN110764760B (en) Method, apparatus, computer system, and medium for drawing program flow chart
CN109783201A (en) Method for interchanging data and its system
WO2020237535A1 (en) Systems, methods, and computer readable mediums for controlling federation of automated agents
CN110222286A (en) Information acquisition method, device, terminal and computer readable storage medium
CN113627526B (en) Vehicle identification recognition method and device, electronic equipment and medium
CN109739958A (en) A kind of specification handbook answering method and system
WO2021179956A1 (en) Translation method, related apparatus, device, and computer-readable storage medium
CN106304026A (en) The determination method and device of end message
CN113382075A (en) Enterprise information management platform, management method, electronic device and storage medium
CN106339429A (en) Method, device and system for realizing intelligent customer service
CN112948251A (en) Automatic software testing method and device
CN112000786A (en) Dialogue robot problem processing method, device and equipment combining RPA and AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant