CN110263161A - A kind of processing method of information, device and equipment - Google Patents

A kind of processing method of information, device and equipment Download PDF

Info

Publication number
CN110263161A
CN110263161A CN201910458461.6A CN201910458461A CN110263161A CN 110263161 A CN110263161 A CN 110263161A CN 201910458461 A CN201910458461 A CN 201910458461A CN 110263161 A CN110263161 A CN 110263161A
Authority
CN
China
Prior art keywords
resource
label
corpus data
scheduled
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910458461.6A
Other languages
Chinese (zh)
Other versions
CN110263161B (en
Inventor
江少华
钟文亮
符劼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910458461.6A priority Critical patent/CN110263161B/en
Publication of CN110263161A publication Critical patent/CN110263161A/en
Application granted granted Critical
Publication of CN110263161B publication Critical patent/CN110263161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This specification embodiment discloses processing method, device and the equipment of a kind of information, which comprises corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library;Based on the corresponding class label of the keyword and scheduled disaggregated model, classify to the corpus data, determine class label belonging to the corpus data, the disaggregated model is the model classified based on neural network to text, and the class label is the attribute tags that the user of resource to be provided has;Phrase extraction processing is carried out to the corpus data of different classes of label, obtains the corresponding object phrase of different classes of label, shows official documents and correspondence with generate resource to be provided based on the object phrase first.

Description

A kind of processing method of information, device and equipment
Technical field
This specification is related to field of computer technology more particularly to a kind of processing method of information, device and equipment.
Background technique
In order to attract more users to pay close attention to, trade company will do it many marketing activities, such as provide or extract prize etc., In order to reach preferable marketing effectiveness, typically prize setting is corresponding shows official documents and correspondence, for example, the prize that user extracts is Expiring for certain shop subtracts discount coupon, then, can also be to the use while this completely to be subtracted to the relevant information of discount coupon being sent to the user Family sends the displaying official documents and correspondence of the prize, such as " dedicating you of struggle to ".
In general, official documents and correspondence is arranged for prize can realize by artificial mode in marketing activity, that is, it is directed to certain prize, Trade company can be write by way of manually writing it is one or more show official documents and correspondences, when certain user is drawn into the prize, should While the relevant information of prize is sent to the user, above-mentioned displaying official documents and correspondence is also sent to the user.However, passing through above-mentioned people The mode of work is the processing that prize setting shows official documents and correspondence, since to require trade company or operation personnel logical for the displaying official documents and correspondence of all prizes It crosses and is accomplished manually, human resources consumption is larger, shows that the formation efficiency of official documents and correspondence is lower, and the finally obtained effect for showing official documents and correspondence May be poor, accordingly, it is desirable to provide a kind of formation efficiency for showing official documents and correspondence is higher, is more suitable for the processing scheme of user.
Summary of the invention
The purpose of this specification embodiment is to provide processing method, device and the equipment of a kind of information, to provide a kind of exhibition Show the processing scheme that the formation efficiency of official documents and correspondence is higher, is more suitable for user.
In order to realize that above-mentioned technical proposal, this specification embodiment are achieved in that
A kind of processing method for information that this specification embodiment provides, which comprises
Corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the corresponding class label of the keyword and scheduled disaggregated model, classify to the corpus data, Determine that class label belonging to the corpus data, the disaggregated model are the mould classified based on neural network to text Type, the class label are the attribute tags that the user of resource to be provided has;
Phrase extraction processing is carried out to the corpus data of different classes of label, obtains the corresponding mesh of different classes of label Phrase is marked, shows official documents and correspondence with generate resource to be provided based on the object phrase first.
Optionally, the method also includes:
Receive the resource acquisition request of target user;
Determine the information for being issued to the resource of the target user;
From the corresponding class label of the target user, the first category label with the information matches of resource is obtained;
According to the first category label, it is short that the corresponding target of the first category label is obtained from the object phrase Language;
Object phrase based on acquisition, generate the resource for being sent to the target user first show official documents and correspondence.
Optionally, the method also includes:
Resource corpus data corresponding with scheduled resource keyword is obtained from scheduled corpus data library;
Based on the scheduled resource keyword and resource corpus data, generate resource to be provided second shows text Case.
Optionally, described to be based on the scheduled resource keyword and resource corpus data, generate resource to be provided Second shows official documents and correspondence, comprising:
The scheduled resource keyword and the resource corpus data are input to scheduled Pointer-Generator In model, obtain resource to be provided second shows official documents and correspondence.
Optionally, the method also includes:
Sample data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the sample data of the keyword corresponding class label and acquisition, the disaggregated model is trained, Disaggregated model after being trained.
Optionally, the method also includes:
Random Mask operation is carried out to the sample data, is replaced with carrying out keyword to the sample data at random It changes.
Optionally, the method also includes:
The quantity of the corresponding sample data of each class label is adjusted, so that the corresponding sample data of each class label Quantity is within the scope of scheduled amount threshold.
Optionally, the corpus data to different classes of label carries out phrase extraction processing, obtains different classes of The corresponding object phrase of label, comprising:
Based on scheduled regular expression, phrase extraction processing is carried out to the corpus data of different classes of label, is obtained To extraction result;
Processing is filtered to the extraction result, obtains the corresponding object phrase of different classes of label.
Optionally, described that processing is filtered to the extraction result, the corresponding object phrase of different classes of label is obtained, Include:
To the extraction result carries out text size filtration treatment, IDF divides filtration treatment, one in NER filtration treatment Or it is multinomial, obtain the corresponding object phrase of different classes of label.
A kind of processing unit for information that this specification embodiment provides, described device include:
Corpus obtains module, for obtaining corpus number corresponding with scheduled keyword from scheduled corpus data library According to;
Categorization module, for being based on the corresponding class label of the keyword and scheduled disaggregated model, to the corpus Data are classified, and determine that class label belonging to the corpus data, the disaggregated model are based on neural network to text The model classified, the class label are the attribute tags that the user of resource to be provided has;
Abstraction module carries out phrase extraction processing for the corpus data to different classes of label, obtains inhomogeneity The corresponding object phrase of distinguishing label shows official documents and correspondence with generate resource to be provided based on the object phrase first.
Optionally, described device further include:
Resource corpus obtains module, corresponding with scheduled resource keyword for obtaining from scheduled corpus data library Resource corpus data;
Official documents and correspondence generation module generates to be provided for being based on the scheduled resource keyword and resource corpus data The second of resource shows official documents and correspondence.
Optionally, the official documents and correspondence generation module is used for the scheduled resource keyword and the resource corpus data It is input in scheduled Pointer-Generator model, obtain resource to be provided second shows official documents and correspondence.
Optionally, the abstraction module, comprising:
Extracting unit carries out the corpus data of different classes of label short for being based on scheduled regular expression Language extraction processing obtains extracting result;
It is short to obtain the corresponding target of different classes of label for being filtered processing to the extraction result for filter element Language.
Optionally, the filter element, for carrying out text size filtration treatment, IDF points of filterings to the extraction result It is one or more in processing, NER filtration treatment, obtain the corresponding object phrase of different classes of label.
A kind of processing equipment for information that this specification embodiment provides, the processing equipment of the information include:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the place when executed Manage device:
Corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the corresponding class label of the keyword and scheduled disaggregated model, classify to the corpus data, Determine that class label belonging to the corpus data, the disaggregated model are the mould classified based on neural network to text Type, the class label are the attribute tags that the user of resource to be provided has;
Phrase extraction processing is carried out to the corpus data of different classes of label, obtains the corresponding mesh of different classes of label Phrase is marked, shows official documents and correspondence with generate resource to be provided based on the object phrase first.
The technical solution that is there is provided by above this specification embodiment as it can be seen that this specification embodiment by from scheduled corpus Corresponding with scheduled keyword corpus data is obtained in database, then, based on the corresponding class label of the keyword with Scheduled disaggregated model classifies to the corpus data, determines class label belonging to the corpus data, which is Based on the model that neural network classifies to text, such distinguishing label is the attribute mark that the user of resource to be provided has Label carry out phrase extraction processing to the corpus data of different classes of label, it is short to obtain the corresponding target of different classes of label in turn Language shows official documents and correspondence with generate resource to be provided based on object phrase first, in this way, the class label by user is corresponding Keyword obtains corresponding corpus data, then the corpus data is mapped to corresponding class label by disaggregated model, in turn The corresponding phrase of each class label is obtained, which can be used as the displaying official documents and correspondence of following resource granting, so that showing official documents and correspondence It is more suitable the user, also, it is not necessary to manually participate in showing the generation of official documents and correspondence, thus, reduce the consumption of human resources, mentions The high formation efficiency for showing official documents and correspondence.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification, for those of ordinary skill in the art, in not making the creative labor property Under the premise of, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of processing method embodiment of information of this specification;
Fig. 2 is a kind of schematic diagram of the displaying official documents and correspondence of prize of this specification;
Fig. 3 is the processing method embodiment of this specification another kind information;
Fig. 4 is a kind of handling principle schematic diagram of Text-CNN model of this specification;
Fig. 5 is a kind of processing device embodiment of information of this specification;
Fig. 6 is a kind of processing equipment embodiment of information of this specification.
Specific embodiment
This specification embodiment provides processing method, device and the equipment of a kind of information.
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described Embodiment be only this specification a part of the embodiment, instead of all the embodiments.The embodiment of base in this manual, Every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all should belong to The range of this specification protection.
Embodiment one
As shown in Figure 1, this specification embodiment provides a kind of processing method of information, the executing subject of this method can be Server, wherein the server can be an independent server, can also be the server set being made of multiple servers Group.The server can be the background server for a certain marketing activity, alternatively, being also possible to certain financial class application or shopping The background server of class application etc..This method can be used in marketing process, should when providing resource (such as prize) for user In the processing such as the generation of displaying official documents and correspondence of resource.This method can specifically include following steps:
In step s 102, corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library.
Wherein, corpus data library can be comprising there are many database of different corpus datas, corpus datas therein It may include text data etc..Keyword can be the word that can characterize user aspect attribute constructed for certain user Language, such as digital intelligent, culinary art intelligent, without peppery not joyous person or buy shoes control etc..
In an implementation, in order to attract more users to pay close attention to, trade company will do it many marketing activities, such as provide red Wrap, give certain commodity or extract other prizes etc., in order to reach preferable marketing effectiveness, typically prize setting is corresponding Official documents and correspondence is shown, for example, as shown in Fig. 2, the prize that user extracts is certain shop or the coupons of certain product, then by the coupons While relevant information is sent to the user, the displaying official documents and correspondence of the prize can also be sent to the user, as " dedicating struggle to You " etc..
In general, official documents and correspondence is arranged for prize can be realized by artificial mode in marketing activity, for example, for certain prize Product, trade company can write one or more displaying official documents and correspondences by way of manually writing, and when certain user is drawn into the prize, incite somebody to action While the relevant information of the prize is sent to the user, above-mentioned displaying official documents and correspondence is also sent to the user, wherein if the prize Product are corresponding with multiple displaying official documents and correspondences write, then can therefrom randomly choose a displaying official documents and correspondence and be sent to the user.However, logical Cross above-mentioned artificial mode be prize setting show official documents and correspondence processing, on the one hand, if show official documents and correspondence only one, all pumpings It gets the displaying official documents and correspondence that the user of prize sees to be the same, due to meetings such as the education level difference of user, difference in character Cause user also can be different to the acceptance level for showing official documents and correspondence, accordingly, it is possible to will lead to some users and do not like displaying text Case, on the other hand, the displaying official documents and correspondence of all prizes require trade company or operation personnel by being accomplished manually, human resources consumption compared with Greatly, show that the formation efficiency of official documents and correspondence is lower, and the finally obtained effect for showing official documents and correspondence is possible and bad, accordingly, it is desirable to provide A kind of formation efficiency showing official documents and correspondence is higher, is more suitable for the processing scheme for showing official documents and correspondence of user.For this purpose, this specification embodiment A kind of method that above-mentioned processing scheme may be implemented is provided, can specifically include the following contents:
Trade company or operation personnel can determine resource (such as prize used in a certain marketing activity according to the actual situation Deng), the value volume and range of product of the resource may be set according to actual conditions, which can be is made of virtual objects, such as preferential Certificate, red packet, integral etc. are also possible to be made of physical item, such as cup, digital product (such as mobile phone or tablet computer). The relevant information of above-mentioned resource can be combined into resource collection (such as prize set).Wherein, which can be needle To the marketing activity that a certain item business carries out, corresponding resource can be the prize for this business, can also be for more The prize etc. for the marketing activity that item different business carries out.
It can be obtained by multiple channel for generating the corpus data for showing official documents and correspondence, multiple channel therein can basis Actual conditions setting, such as the information launched of the data of certain shopping at network platform, history and extranets are crawled by web crawlers Corpus data in network (such as from media, social media, search website, intention official documents and correspondence website).The corpus data of above-mentioned acquisition It can be used as general corpus, and the data acquisition system that above-mentioned corpus data can be constituted is as corpus data library.
It can be according to the actual situation some or multiple users building user tag (i.e. class label), for example, certain is used Various digital products (such as mobile phone, tablet computer, camera) is often bought at family, when the user buys quantity, the kind of digital product When class and purchase frequency respectively reach scheduled threshold value, the class label etc. of digital intelligent can be set for the user, in addition, also It can be the other class labels of the user setting, such as without peppery not joyous person.At the same time it can also create class label and keyword Set, for example, buying the corresponding keyword of shoes control this class label may include buying shoes control, buying shoes, basketball shoes, AJ etc., then example Such as, the corresponding keyword of culinary art this class label of intelligent may include culinary art intelligent and culinary art etc., in another example, digital intelligent this The corresponding keyword of one class label may include digital intelligent and number etc..
The set of class label and keyword can be constructed through the above way, can be obtained from the above-mentioned set of building Keyword, can be using the keyword as label keyword, it is then possible to obtain and mark from above-mentioned scheduled corpus data library The corresponding corpus data of keyword is signed, specifically such as, can search from corpus data library includes some or multiple labels pass The corpus data of keyword, and the corpus data that can be will acquire is as corpus data corresponding with scheduled keyword.
In step S104, it is based on the corresponding class label of above-mentioned keyword and scheduled disaggregated model, to above-mentioned corpus Data are classified, and determine class label belonging to the corpus data, which is to be carried out based on neural network to text The model of classification, such distinguishing label are the attribute tags that the user of resource to be provided has.
Wherein, disaggregated model can be the model for corpus data to be classified, which, which can be, is based on The model that neural network classifies to text, neural network therein may include multilayer, such as input layer, hidden layer and output Layer etc., neural network may include a variety of, such as convolutional neural networks, Recognition with Recurrent Neural Network or deep neural network, if should Neural network is convolutional neural networks, then wherein hidden layer can also include convolutional layer, pond layer, full articulamentum and Inception module etc., convolutional layer may include convolution kernel, convolution layer parameter and excitation function etc..The function of convolutional layer can be with That feature extraction is carried out to input data, inside may include multiple convolution kernels, each element for forming convolution kernel is corresponding One weight coefficient and a departure.Convolution layer parameter includes convolution kernel size, step-length and filling, and three has codetermined volume The size of lamination output feature.Class label can be the label of used classification when disaggregated model is classified, attribute mark Label can be the corresponding label of the attribute informations such as age, gender, the education background of user, can also be the behavior property pair of user The label answered, as the user is made using the corresponding label of attribute information or user of the behavior during certain shopping website With the corresponding label of attribute information etc. of the behavior in certain means of payment journey.
In an implementation, due to needing to classify to corpus data, it is therefore desirable to use disaggregated model, disaggregated model makes With needing to be trained before, for this purpose, sample data can be obtained in several ways, such as can be invited by way of reward Please user participate in training to disaggregated model, alternatively, the related data of the user can be obtained by way of buying to user As sample data etc..It is then possible to which above-mentioned sample data is divided into two parts, a portion can be used for the classification mould Type is trained, and another part can be used for verifying the disaggregated model.Specifically, above-mentioned sample data can be inputted The sample data can be transported to hidden layer, may include in hidden layer by input layer to the input layer of the disaggregated model Undetermined parameter etc. can construct equation or equation group comprising above-mentioned undetermined parameter by sample data, can pass through the equation Or equation group calculates, and obtains the numerical value of undetermined parameter, the numerical value that undetermined parameter can be used is replaced respectively in above-mentioned hidden layer Undetermined parameter, so as to the disaggregated model after being trained.In order to enable the accuracy rate of disaggregated model is higher, it can be by another The sample data of a part verifies the disaggregated model after training, if verification result is to pass through, which can To come into operation, if verification result is not pass through, disaggregated model can be subjected to re -training, finally obtaining can be normal The disaggregated model used.
Building is for the class label of user and the set of keyword in advance for the processing of S102 through the above steps, due to can To obtain corresponding corpus data by keyword, and class label is with keyword that there are corresponding relationships, therefore, class label There is also corresponding relationships with corpus data, can will be each in order to which corpus data to be accurately mapped to corresponding class label A corpus data is separately input to be calculated in above-mentioned disaggregated model, obtains the corresponding class label of the corpus data.Wherein, If being the class label and the above-mentioned corpus data and class label that certain corpus data determines by disaggregated model Whether the class label in corresponding relationship is identical, if the two is identical, can not need to change above-mentioned corresponding relationship, if two Person is different, then the class label obtained by disaggregated model can be determined as to the classification results of the corpus data.
It should be noted that passing through the class label classification corresponding with above-mentioned keyword that disaggregated model is classified Label is identical, for example, the corresponding class label of above-mentioned keyword includes digital intelligent, culinary art intelligent, without peppery not joyous person and buys shoes Control, the then class label classified by disaggregated model also include digital intelligent, cook intelligent, without peppery not joyous person and buy Shoes control by the class label that disaggregated model is classified is obtained based on the corresponding class label of above-mentioned keyword It arrives.
In step s 106, phrase extraction processing is carried out to the corpus data of different classes of label, obtains different classes of mark Corresponding object phrase is signed, shows official documents and correspondence with generate resource to be provided based on object phrase first.
Wherein, object phrase can be any phrase, and phrase can be by can in three syntax, semanteme and pragmatic levels The linguistic unit without sentence tune that the linguistic unit of collocation combines, is referred to as phrase, and phrase is typically greater than word And not at the syntactical unit of sentence, phrase can become sentence plus sentence tune.
In an implementation, the corpus data that the mode of S104 can be through the above steps is classified, by corpus Data are mapped to corresponding class label, for the corpus data of different classes of label, can carry out phrase extraction processing to it, In practical applications, it may include a variety of for carrying out the implementation of phrase extraction processing to corpus data, such as can set in advance Determine regular expression, by regular expression, the phrase to match can be extracted from corpus data, alternatively, can be according to reality Border situation presets the extraction template of phrase, when needing to carry out phrase extraction to certain corpus data, can be directed to the corpus The content of data chooses the template for meeting above-mentioned corpus data in above-mentioned extraction template, it is then possible to pass through the template chosen Phrase extraction is carried out to the corpus data, to obtain the corresponding phrase of different classes of label (i.e. object phrase).It can incite somebody to action The object phrase arrived and the corresponding storage of corresponding class label.
As shown in Fig. 2, when the relevant information for needing to send certain resource (the coupons prize in such as Fig. 2) to certain user When, the class label that the available user is possessed can the related letter based on obtained class label and the resource It ceases, one or more object phrases is chosen in the corresponding object phrase of the class label possessed from the user as above-mentioned resource The displaying official documents and correspondence (i.e. first shows official documents and correspondence) of (i.e. coupons).
This specification embodiment provides a kind of processing method of information, by obtaining from scheduled corpus data library and in advance Then the corresponding corpus data of fixed keyword is based on the corresponding class label of the keyword and scheduled disaggregated model, right The corpus data is classified, and determines class label belonging to the corpus data, which is based on neural network to text This model classified, such distinguishing label is the attribute tags that the user of resource to be provided has, in turn, to different classes of The corpus data of label carries out phrase extraction processing, obtains the corresponding object phrase of different classes of label, to be based on object phrase Generate resource to be provided first shows official documents and correspondence, in this way, passing through the corresponding keyword of class label of user, obtains corresponding Corpus data, then the corpus data is mapped to by corresponding class label by disaggregated model, and then obtain each class label Corresponding phrase, the phrase can be used as the displaying official documents and correspondence of following resource granting, so that show that official documents and correspondence is more suitable the user, and And do not need manually to participate in showing the generation of official documents and correspondence, thus, reduce the consumption of human resources, improves the life for showing official documents and correspondence At efficiency.
Embodiment two
As shown in figure 3, this specification embodiment provides a kind of processing method of information, the executing subject of this method can be Server, wherein the server can be an independent server, can also be the server set being made of multiple servers Group.The server can be the background server for a certain marketing activity, alternatively, being also possible to certain financial class application or shopping The background server of class application etc..This method can be used in marketing process, should when providing resource (such as prize) for user In the processing such as the generation of displaying official documents and correspondence of resource.This method can specifically include following steps:
In step s 302, sample data corresponding with scheduled keyword is obtained from scheduled corpus data library.
In an implementation, it can be obtained by multiple channel for generating the corpus data for showing official documents and correspondence, a variety of canals therein Road may be set according to actual conditions, such as data, the history of certain shopping at network platform launch resource (such as history dispensing prize Deng) etc. information and by web crawlers crawl external network (such as from media, social media, search website, intention official documents and correspondence net Stand) in corpus data etc..The corpus data of above-mentioned acquisition can be used as general corpus, and above-mentioned corpus data can be constituted Data acquisition system as corpus data library.
It can be according to the actual situation some or multiple users building class label, such as digital intelligent, without peppery not joyous person Or buy shoes control etc., a user can be corresponding with multiple and different class labels, and different users may have identical class Distinguishing label etc..Further, it is also possible to create the set of class label and keyword.Key can be obtained from the above-mentioned set of building Word can close it is then possible to obtain from above-mentioned scheduled corpus data library with label using the keyword as label keyword The corresponding corpus data of keyword, specifically such as, can search from corpus data library includes some or multiple label keywords Corpus data, and the corpus data that can be will acquire is as sample data corresponding with scheduled keyword.
After obtaining sample data, first sample data can be pre-processed, specifically may refer to following step S304 and The processing of step S306.
In step s 304, the quantity of the corresponding sample data of each class label is adjusted, so that each class label pair For the quantity for the sample data answered within the scope of scheduled amount threshold, such distinguishing label is that the user of resource to be provided has Attribute tags.
Wherein, amount threshold range may be set according to actual conditions, specific if quantity threshold range is (1000,1100) Deng.
In an implementation, the available sample data of the processing of S302 through the above steps, since class label and label close There are corresponding relationships for keyword, and sample data is obtained by label keyword, and therefore, sample data and class label exist Corresponding relationship for example, class label 1 includes label keyword A, and finds sample data p by label keyword A, then sample Notebook data p can be corresponding with class label 1.In order to enable the quantity of the sample data of each class label is suitable or identical, The quantity of the corresponding sample data of adjustable each class label, so that the quantity of the corresponding sample data of each class label Within the scope of scheduled amount threshold.
It should be noted that if some or the corresponding sample data of multiple class labels are less, and not in scheduled number It measures in threshold range, then can be combined the corresponding sample data set of one or more class label, and can will collect The corresponding class label of sample data after conjunction is set as other classifications, in this way, the corresponding sample number of other classifications can be used According to the subsequent relevant treatment of execution.
In step S306, Random Mask operation is carried out to above-mentioned sample data, to carry out at random to the sample data The replacement of keyword.
In step S308, based on the sample data of above-mentioned keyword corresponding class label and acquisition, to disaggregated model It is trained, the disaggregated model after being trained.
Wherein, disaggregated model can be the model classified based on convolutional neural networks to text, specifically, Text- CNN model etc., in practical applications, disaggregated model are not limited to the mould classified based on convolutional neural networks to text Type can also be the model based on other neural networks, such as Recognition with Recurrent Neural Network or deep neural network etc..
In an implementation, neural network may include input layer, hidden layer and output layer, during this specification is implemented, mould of classifying Type can be the model based on convolutional neural networks, i.e. Text-CNN model.As shown in figure 4, can wrap in Text-CNN model Four layers are included, respectively mapping or embeding layer, convolutional layer, pond layer and full articulamentum specifically can be by one in sample data A or multiple words are mapped by certain method or are embedded into another numerical value vector space, obtained from text space State the corresponding sentence matrix of sample data (leftmost side grid in referring to fig. 4), wherein what every row indicated is term vector.Then it passes through One-dimensional convolutional layer is crossed, is exported accordingly, for example, as shown in figure 4, the one of kernel_sizes=(2,3,4) can be passed through Convolutional layer is tieed up, each kernel_size can be there are two output channel (that is to say that there are two outputs).Pond layer can be 1- Pooling layers of Max, in this way, the sentence of different length can become the character representation of fixed length after the layer of pond.Finally, can be with It connects one layer of Softmax layer connected entirely and is then based on the corresponding class of the sample data to export the probability of each class label Distinguishing label adjusts disaggregated model, to achieve the purpose that train classification models.
In order to enable the accuracy rate of disaggregated model is higher, can by part sample data to the disaggregated model after training into Row verifying, if verification result is to pass through, which can come into operation, can if verification result is not pass through Disaggregated model is carried out re -training, the disaggregated model that can be used normally is finally obtained.
After disaggregated model after being trained by above-mentioned treatment process, can be used the disaggregated model to corpus data into Row classification, can specifically include following steps S310~step S318.
In step s310, corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library.
In step S312, the class label of the corresponding user of every corpus data is obtained, the class label that will acquire is made The class label classified for disaggregated model.
In step S314, above-mentioned corpus data is separately input in disaggregated model, obtains every corpus data and class The mapping relations of distinguishing label, to determine class label belonging to the corpus data.
In step S316, it is based on scheduled regular expression, phrase pumping is carried out to the corpus data of different classes of label Processing is taken, obtains extracting result.
Wherein, scheduled regular expression can be the mechanism for the extracting phrase from text data, regular expression Set-up mode may be set according to actual conditions, this specification embodiment does not limit this.
In an implementation, corresponding regular expression can be set for the content of the corpus data of phrase to be extracted, such as Corpus data is " exquisite shirt dedicates you of exquisiteness to ", then it is short to extract that regular expression " dedicating you of (.+) to " can be used Language carries out phrase extraction processing to above-mentioned corpus data in this way, the regular expression of above-mentioned setting can be used, extraction it is short Language can be " dedicating you of exquisiteness to " etc..By above-mentioned treatment process, it can be based on above-mentioned regular expression, to different classes of mark The corpus data of label carries out phrase extraction processing, respectively obtains corresponding extraction result.
In step S318, processing is filtered to above-mentioned extraction result, it is short to obtain the corresponding target of different classes of label Language shows official documents and correspondence with generate resource to be provided based on object phrase first.
In an implementation, it is extracted with invalid as a result, can be to above-mentioned extraction knot to reduce the interference data extracted in result Fruit is filtered processing, can will include designated word in the phrase that extract the data, extraction that are not inconsistent in result with regular expression The data etc. of symbol (such as * or #) or symbol (such as punctuation mark) filter out, and finally, available different classes of label is corresponding Object phrase.It can be by obtained object phrase and the corresponding storage of corresponding class label.
The processing of above-mentioned steps S318 can be varied, can be with other than it can be realized by above-mentioned processing mode It is realized by other processing modes, a kind of optional processing mode also provided below can specifically include the following contents: to extraction As a result text size filtration treatment is carried out, IDF (Inverse Document Frequency, reverse document-frequency) divides at filtering It is one or more in reason, NER (Named Entity Recognition names Entity recognition) filtration treatment, obtain difference The corresponding object phrase of class label.
Wherein, IDF points of filtration treatment can be the processing that the importance based on word is filtered, and IDF is a word The measurement of general importance, the IDF of a certain particular words can be by the total number of extraction result divided by the extraction comprising the word As a result number, then obtained quotient is taken into logarithm and is obtained.
In an implementation, it is contemplated that the text size of phrase is unsuitable too long, therefore, can above-mentioned extraction result to carry out text long Filtration treatment is spent, can will extract text size in result is more than that the extraction result of predetermined length threshold value filters out, for example, predetermined Length threshold is 20 character lengths, if it includes 25 characters that certain, which is extracted in result, which can be filtered out, If it includes 10 characters in result that certain, which is extracted, which can retain.
In addition it is also possible to divide filtration treatment to result progress IDF is extracted, specifically, can calculate in each extraction result The word for including importance (can based on extract result total number divided by the extraction result comprising the word number, Obtained quotient is taken into logarithm again and is obtained), extraction of the importance greater than predetermined importance threshold value can be retained as a result, can simultaneously incite somebody to action The extraction result that importance is less than predetermined importance threshold value filters out.
Specifically it can preset alternatively, it is also possible to carry out NER filtration treatment to extraction result and need to know others Name, place name, mechanism name, proper noun etc. identify it is then possible to extract result to each, can will be including upper The extraction result for stating name, place name, mechanism name, proper noun etc. retains, and the extraction result that will not meet above-mentioned filter condition It filters out.
It should be noted that the generalization that obtained object phrase needs to have certain, under a class label, usually cover Cover multiple and different preference informations, for example, class label be it is good-for-nothing, then its corresponding preference information may include chafing dish, snacks, night Night etc., therefore, the common trait that the object phrase of generation needs that there is certain abstract sense could express multiple preferences, simultaneously Need to avoid the privacy information of leakage user again.Object phrase comprising fine-grained product word can be filtered out.Target is short Language can also be excavated otherwise other than it can be excavated by modes such as regular expressions, for example, can It, can the case where for by model progress phrase extraction processing to carry out phrase extraction processing etc. by certain algorithm or model Using the phrase data by historical accumulation as the training sample for generating the model.Further, it is also possible to according to above-mentioned classification Corpus data and object phrase obtain the set of a large amount of class label-phrases.
Furthermore it is also possible to generate corresponding displaying official documents and correspondence based on resource correlation corpus data, following step can specifically include The processing of rapid S320 and step S322.
In step s 320, resource language corresponding with scheduled resource keyword is obtained from scheduled corpus data library Expect data.
In an implementation, trade company or operation personnel can determine resource used in a certain marketing activity according to the actual situation The value volume and range of product of (such as prize), the resource may be set according to actual conditions.It can be by the relevant information group of above-mentioned resource Synthesize resource collection (such as prize set).Resource keyword (such as prize keyword) can be extracted from resource collection, It is then possible to be based on the resource keyword, searching from above-mentioned corpus data library includes some or multiple resource keywords Corpus data, and the corpus data that can be will acquire is as resource corpus data corresponding with scheduled resource keyword.
In step S322, it is based on scheduled resource keyword and resource corpus data, generates the of resource to be provided Two show official documents and correspondence.
In an implementation, the template for generating and showing official documents and correspondence can be preset, preset template, above-mentioned money can be based on Source keyword and resource corpus data generate the second displaying official documents and correspondence of resource to be provided.
In practical applications, the processing mode of above-mentioned steps S322 can be varied, also provided below a kind of optional Processing mode can specifically include the following contents: scheduled resource keyword and resource corpus data is based on, by scheduled Pointer-Generator model, generate resource to be provided second show official documents and correspondence.
Wherein, Pointer-Generator model can be the model of the mechanism construction based on abstract, Pointer- The part Pointer of Generator model can extract character or word of high importance from Current resource corpus data, And resource keyword building abstract, the part Generator therein can be based on the word for including in all resource corpus datas Symbol or word building abstract.
In an implementation, obtained each resource corpus data and resource keyword can be input to Pointer- In Generator model, to be trained to Pointer-Generator model, pass through the Pointer- after training Generator model, can extract abstract relevant to resource keyword, and the abstract of extraction can be used as resource to be provided Second shows official documents and correspondence.
Through the above way it is available first show official documents and correspondence and second show official documents and correspondence, can based on first show official documents and correspondence and Second shows that official documents and correspondence constructs official documents and correspondence set, when certain user extracts resource, can provide phase by the official documents and correspondence set for the user The displaying official documents and correspondence answered specifically may refer to following step S324~step S334 processing.
In step S324, the resource acquisition request of target user is received.
In an implementation, the marketing activity that trade company or marketing personnel can be arranged according to the actual situation, and determine that the marketing is living Resource (such as prize) used in dynamic, the value volume and range of product of the resource may be set according to actual conditions.If target user It needs to participate in the marketing activity, then can trigger corresponding treatment mechanism, at this point, target user can send resource to server Acquisition request, server can receive resource acquisition request.
In step S326, the information for being issued to the resource of target user is determined.
In an implementation, it after server receives resource acquisition request, can be used according to the selection rule of resource for target Choose resource, after the completion of selection, the relevant information of the resource of available selection in family.
In step S328, from the corresponding class label of target user, the first kind with the information matches of resource is obtained Distinguishing label.
In an implementation, due to being previously provided with the class label of each user, target user also has corresponding class Distinguishing label, the corresponding class label of target user can be one, be also possible to multiple.It can be from the corresponding classification of target user In label, the first category label with the information matches of resource is obtained, for example, the prize of target user is the 50 of certain digital product First coupons, the corresponding class label of target user include buying shoes control, digital intelligent, without peppery not joyous person etc., then can will be digital First category label etc. of the intelligent as the information matches with prize.
In step S330, according to first category label, the corresponding target of first category label is obtained from object phrase Phrase.
In step S332, the object phrase based on acquisition, generate the resource for being sent to target user first shows text Case.
In step S334, according to the information for the resource for being issued to target user, obtains corresponding second and show official documents and correspondence, and First displaying official documents and correspondence and/or the second displaying official documents and correspondence are sent to target user.
It should be noted that in practical applications, resource mentioned above can be prize, resource collection can be prize Set, resource keyword can be prize keyword, and resource corpus data can be prize corpus data, in addition, resource can also To be other things other than above-mentioned prize, may be set according to actual conditions, this specification embodiment does not limit this It is fixed.
This specification embodiment provides a kind of processing method of information, by obtaining from scheduled corpus data library and in advance Then the corresponding corpus data of fixed keyword is based on the corresponding class label of the keyword and scheduled disaggregated model, right The corpus data is classified, and determines class label belonging to the corpus data, which is based on neural network to text This model classified, such distinguishing label is the attribute tags that the user of resource to be provided has, in turn, to different classes of The corpus data of label carries out phrase extraction processing, obtains the corresponding object phrase of different classes of label, to be based on object phrase Generate resource to be provided first shows official documents and correspondence, in this way, passing through the corresponding keyword of class label of user, obtains corresponding Corpus data, then the corpus data is mapped to by corresponding class label by disaggregated model, and then obtain each class label Corresponding phrase, the phrase can be used as the displaying official documents and correspondence of following resource granting, so that show that official documents and correspondence is more suitable the user, and And do not need manually to participate in showing the generation of official documents and correspondence, thus, reduce the consumption of human resources, improves the life for showing official documents and correspondence At efficiency.
Embodiment three
The above are the processing methods for the information that this specification embodiment provides, and are based on same thinking, and this specification is implemented Example also provides a kind of processing unit of information, as shown in Figure 5.
The processing unit of the information includes: that corpus obtains module 501, categorization module 502 and abstraction module 503, in which:
Corpus obtains module 501, for obtaining language corresponding with scheduled keyword from scheduled corpus data library Expect data;
Categorization module 502, for being based on the corresponding class label of the keyword and scheduled disaggregated model, to institute's predicate Material data are classified, and determine that class label belonging to the corpus data, the disaggregated model are based on neural network to text This model classified, the class label are the attribute tags that the user of resource to be provided has;
Abstraction module 503 carries out phrase extraction processing for the corpus data to different classes of label, obtains difference The corresponding object phrase of class label shows official documents and correspondence with generate resource to be provided based on the object phrase first.
In this specification embodiment, described device further include:
Request receiving module, the resource acquisition for receiving target user are requested;
Resource determination module, for determining the information for being issued to the resource of the target user;
Label acquisition module, for obtaining the information matches with resource from the corresponding class label of the target user First category label;
Phrase obtains module, for obtaining the first kind from the object phrase according to the first category label The corresponding object phrase of distinguishing label;
It shows official documents and correspondence generation module, for the object phrase based on acquisition, generates the resource for being sent to the target user First show official documents and correspondence.
In this specification embodiment, described device further include:
Resource corpus obtains module, corresponding with scheduled resource keyword for obtaining from scheduled corpus data library Resource corpus data;
Official documents and correspondence generation module generates to be provided for being based on the scheduled resource keyword and resource corpus data The second of resource shows official documents and correspondence.
In this specification embodiment, the official documents and correspondence generation module is used for the scheduled resource keyword and the money Source corpus data is input in scheduled Pointer-Generator model, and obtain resource to be provided second shows official documents and correspondence.
In this specification embodiment, described device further include:
Sample acquisition module, for obtaining sample number corresponding with scheduled keyword from scheduled corpus data library According to;
Training module, for the sample data based on the keyword corresponding class label and acquisition, to the classification Model is trained, the disaggregated model after being trained.
In this specification embodiment, described device further include:
Replacement module, for the sample data carry out Random Mask operation, at random to the sample data into The replacement of row keyword.
In this specification embodiment, described device further include:
Module is adjusted, for adjusting the quantity of the corresponding sample data of each class label, so that each class label pair The quantity for the sample data answered is within the scope of scheduled amount threshold.
In this specification embodiment, the abstraction module 503, comprising:
Extracting unit carries out the corpus data of different classes of label short for being based on scheduled regular expression Language extraction processing obtains extracting result;
It is short to obtain the corresponding target of different classes of label for being filtered processing to the extraction result for filter element Language.
In this specification embodiment, the filter element, for the extractions result progress text size filtration treatment, IDF divides filtration treatment, one or more in NER filtration treatment, obtains the corresponding object phrase of different classes of label.
This specification embodiment provides a kind of processing unit of information, by obtaining from scheduled corpus data library and in advance Then the corresponding corpus data of fixed keyword is based on the corresponding class label of the keyword and scheduled disaggregated model, right The corpus data is classified, and determines class label belonging to the corpus data, which is based on neural network to text This model classified, such distinguishing label is the attribute tags that the user of resource to be provided has, in turn, to different classes of The corpus data of label carries out phrase extraction processing, obtains the corresponding object phrase of different classes of label, to be based on object phrase Generate resource to be provided first shows official documents and correspondence, in this way, passing through the corresponding keyword of class label of user, obtains corresponding Corpus data, then the corpus data is mapped to by corresponding class label by disaggregated model, and then obtain each class label Corresponding phrase, the phrase can be used as the displaying official documents and correspondence of following resource granting, so that show that official documents and correspondence is more suitable the user, and And do not need manually to participate in showing the generation of official documents and correspondence, thus, reduce the consumption of human resources, improves the life for showing official documents and correspondence At efficiency.
Example IV
The above are the processing units for the information that this specification embodiment provides, and are based on same thinking, and this specification is implemented Example also provides a kind of processing equipment of information, as shown in Figure 6.
The processing equipment of the information can be server provided by the above embodiment.
The processing equipment of information can generate bigger difference because configuration or performance are different, may include one or one Above processor 601 and memory 602, can store in memory 602 one or more storage application programs or Data.Wherein, memory 602 can be of short duration storage or persistent storage.The application program for being stored in memory 602 may include One or more modules (diagram is not shown), each module may include the series of computation in the processing equipment to information Machine executable instruction.Further, processor 601 can be set to communicate with memory 602, in the processing equipment of information Execute the series of computation machine executable instruction in memory 602.The processing equipment of information can also include one or one with Upper power supply 603, one or more wired or wireless network interfaces 604, one or more input/output interfaces 605, One or more keyboards 606.
Specifically in the present embodiment, the processing equipment of information includes memory and one or more program, Perhaps more than one program is stored in memory and one or more than one program may include one or one for one of them It is a with upper module, and each module may include the series of computation machine executable instruction in processing equipment to information, and pass through Configuration includes for carrying out following calculate to execute this or more than one program by one or more than one processor Machine executable instruction:
Corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the corresponding class label of the keyword and scheduled disaggregated model, classify to the corpus data, Determine that class label belonging to the corpus data, the disaggregated model are the mould classified based on neural network to text Type, the class label are the attribute tags that the user of resource to be provided has;
Phrase extraction processing is carried out to the corpus data of different classes of label, obtains the corresponding mesh of different classes of label Phrase is marked, shows official documents and correspondence with generate resource to be provided based on the object phrase first.
In this specification embodiment, further includes:
Receive the resource acquisition request of target user;
Determine the information for being issued to the resource of the target user;
From the corresponding class label of the target user, the first category label with the information matches of resource is obtained;
According to the first category label, it is short that the corresponding target of the first category label is obtained from the object phrase Language;
Object phrase based on acquisition, generate the resource for being sent to the target user first show official documents and correspondence.
In this specification embodiment, further includes:
Resource corpus data corresponding with scheduled resource keyword is obtained from scheduled corpus data library;
Based on the scheduled resource keyword and resource corpus data, generate resource to be provided second shows text Case.
It is described to be based on the scheduled resource keyword and resource corpus data in this specification embodiment, it generates pending The second of the resource put shows official documents and correspondence, comprising:
The scheduled resource keyword and the resource corpus data are input to scheduled Pointer-Generator In model, obtain resource to be provided second shows official documents and correspondence.
In this specification embodiment, further includes:
Sample data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the sample data of the keyword corresponding class label and acquisition, the disaggregated model is trained, Disaggregated model after being trained.
In this specification embodiment, further includes:
Random Mask operation is carried out to the sample data, is replaced with carrying out keyword to the sample data at random It changes.
In this specification embodiment, further includes:
The quantity of the corresponding sample data of each class label is adjusted, so that the corresponding sample data of each class label Quantity is within the scope of scheduled amount threshold.
In this specification embodiment, the corpus data to different classes of label carries out phrase extraction processing, obtains To the corresponding object phrase of different classes of label, comprising:
Based on scheduled regular expression, phrase extraction processing is carried out to the corpus data of different classes of label, is obtained To extraction result;
Processing is filtered to the extraction result, obtains the corresponding object phrase of different classes of label.
It is described that processing is filtered to the extraction result in this specification embodiment, it is corresponding to obtain different classes of label Object phrase, comprising:
To the extraction result carries out text size filtration treatment, IDF divides filtration treatment, one in NER filtration treatment Or it is multinomial, obtain the corresponding object phrase of different classes of label.
This specification embodiment provides a kind of processing equipment of information, by obtaining from scheduled corpus data library and in advance Then the corresponding corpus data of fixed keyword is based on the corresponding class label of the keyword and scheduled disaggregated model, right The corpus data is classified, and determines class label belonging to the corpus data, which is based on neural network to text This model classified, such distinguishing label is the attribute tags that the user of resource to be provided has, in turn, to different classes of The corpus data of label carries out phrase extraction processing, obtains the corresponding object phrase of different classes of label, to be based on object phrase Generate resource to be provided first shows official documents and correspondence, in this way, passing through the corresponding keyword of class label of user, obtains corresponding Corpus data, then the corpus data is mapped to by corresponding class label by disaggregated model, and then obtain each class label Corresponding phrase, the phrase can be used as the displaying official documents and correspondence of following resource granting, so that show that official documents and correspondence is more suitable the user, and And do not need manually to participate in showing the generation of official documents and correspondence, thus, reduce the consumption of human resources, improves the life for showing official documents and correspondence At efficiency.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when specification one or more embodiment.
It should be understood by those skilled in the art that, the embodiment of this specification can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or knot can be used in this specification one or more embodiment The form of embodiment in terms of conjunction software and hardware.Moreover, this specification one or more embodiment can be used at one or more A wherein includes computer-usable storage medium (the including but not limited to magnetic disk storage, CD- of computer usable program code ROM, optical memory etc.) on the form of computer program product implemented.
The embodiment of this specification is referring to the method, equipment (system) and computer journey according to this specification embodiment The flowchart and/or the block diagram of sequence product describes.It should be understood that flow chart and/or box can be realized by computer program instructions The combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in figure.It can provide this The processing of a little computer program instructions to general purpose computer, special purpose computer, Embedded Processor or other programmable informations is set Standby processor is to generate a machine, so that being executed by the processor of the processing equipment of computer or other programmable informations Instruction generation refer to for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of fixed function.
These computer program instructions may also be stored in be able to guide the processing equipment of computer or other programmable informations with In the computer-readable memory of ad hoc fashion work, so that instruction stored in the computer readable memory generation includes The manufacture of command device, the command device are realized in one box of one or more flows of the flowchart and/or block diagram Or the function of being specified in multiple boxes.
These computer program instructions can also be loaded into the processing equipment of computer or other programmable informations, so that Series of operation steps are executed on computer or other programmable devices to generate computer implemented processing, thus in computer Or the instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram The step of function of being specified in one box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification can provide as the production of method, system or computer program Product.Therefore, this specification one or more embodiment can be used complete hardware embodiment, complete software embodiment or combine software With the form of the embodiment of hardware aspect.Moreover, this specification one or more embodiment can be used it is one or more wherein It include computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, the light of computer usable program code Learn memory etc.) on the form of computer program product implemented.
This specification one or more embodiment can computer executable instructions it is general on It hereinafter describes, such as program module.Generally, program module includes executing particular task or realization particular abstract data type Routine, programs, objects, component, data structure etc..Can also practice in a distributed computing environment this specification one or Multiple embodiments, in these distributed computing environments, by being executed by the connected remote processing devices of communication network Task.In a distributed computing environment, the local and remote computer that program module can be located at including storage equipment is deposited In storage media.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely the embodiments of this specification, are not limited to this specification.For art technology For personnel, this specification can have various modifications and variations.It is all made any within the spirit and principle of this specification Modification, equivalent replacement, improvement etc., should be included within the scope of the claims of this specification.

Claims (15)

1. a kind of processing method of information, which comprises
Corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the corresponding class label of the keyword and scheduled disaggregated model, classify to the corpus data, determines Class label belonging to the corpus data, the disaggregated model are the model classified based on neural network to text, institute State the attribute tags that the user that class label is resource to be provided has;
Phrase extraction processing is carried out to the corpus data of different classes of label, it is short to obtain the corresponding target of different classes of label Language shows official documents and correspondence with generate resource to be provided based on the object phrase first.
2. according to the method described in claim 1, the method also includes:
Receive the resource acquisition request of target user;
Determine the information for being issued to the resource of the target user;
From the corresponding class label of the target user, the first category label with the information matches of the resource is obtained;
According to the first category label, the corresponding object phrase of the first category label is obtained from the object phrase;
Object phrase based on acquisition, generate the resource for being sent to the target user first show official documents and correspondence.
3. according to the method described in claim 1, the method also includes:
Resource corpus data corresponding with scheduled resource keyword is obtained from scheduled corpus data library;
Based on the scheduled resource keyword and resource corpus data, generate resource to be provided second shows official documents and correspondence.
4. being generated according to the method described in claim 3, described be based on the scheduled resource keyword and resource corpus data The second of resource to be provided shows official documents and correspondence, comprising:
The scheduled resource keyword and the resource corpus data are input to scheduled Pointer-Generator model In, obtain resource to be provided second shows official documents and correspondence.
5. according to the method described in claim 1, the method also includes:
Sample data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the sample data of the keyword corresponding class label and acquisition, the disaggregated model is trained, is obtained Disaggregated model after training.
6. according to the method described in claim 1, the method also includes:
Random Mask operation is carried out to the sample data, to carry out the replacement of keyword to the sample data at random.
7. according to the method described in claim 6, the method also includes:
The quantity of the corresponding sample data of each class label is adjusted, so that the quantity of the corresponding sample data of each class label Within the scope of scheduled amount threshold.
8. according to the method described in claim 1, the corpus data to different classes of label carries out at phrase extraction Reason, obtains the corresponding object phrase of different classes of label, comprising:
Based on scheduled regular expression, phrase extraction processing is carried out to the corpus data of different classes of label, is taken out Take result;
Processing is filtered to the extraction result, obtains the corresponding object phrase of different classes of label.
9. obtaining different classes of label according to the method described in claim 8, described be filtered processing to the extraction result Corresponding object phrase, comprising:
To the extraction result carries out text size filtration treatment, IDF divides filtration treatment, one or more in NER filtration treatment , obtain the corresponding object phrase of different classes of label.
10. a kind of processing unit of information, described device include:
Corpus obtains module, for obtaining corpus data corresponding with scheduled keyword from scheduled corpus data library;
Categorization module, for being based on the corresponding class label of the keyword and scheduled disaggregated model, to the corpus data Classify, determines that class label belonging to the corpus data, the disaggregated model are to carry out based on neural network to text The model of classification;
Abstraction module carries out phrase extraction processing for the corpus data to different classes of label, obtains different classes of mark Corresponding object phrase is signed, shows official documents and correspondence with generate resource to be provided based on the object phrase first.
11. device according to claim 10, described device further include:
Resource corpus obtains module, for obtaining money corresponding with scheduled resource keyword from scheduled corpus data library Source corpus data;
Official documents and correspondence generation module generates resource to be provided for being based on the scheduled resource keyword and resource corpus data Second show official documents and correspondence.
12. device according to claim 11, the official documents and correspondence generation module, for will the scheduled resource keyword with The resource corpus data is input in scheduled Pointer-Generator model, obtains the second exhibition of resource to be provided Show official documents and correspondence.
13. device according to claim 10, the abstraction module, comprising:
Extracting unit carries out phrase pumping to the corpus data of different classes of label for being based on scheduled regular expression Processing is taken, obtains extracting result;
Filter element obtains the corresponding object phrase of different classes of label for being filtered processing to the extraction result.
14. device according to claim 13, the filter element, for carrying out text size mistake to the extraction result Filter is handled, IDF divides filtration treatment, one or more in NER filtration treatment, and it is short to obtain the corresponding target of different classes of label Language.
15. a kind of processing equipment of information, the processing equipment of the information include:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the processing when executed Device:
Corpus data corresponding with scheduled keyword is obtained from scheduled corpus data library;
Based on the corresponding class label of the keyword and scheduled disaggregated model, classify to the corpus data, determines Class label belonging to the corpus data, the disaggregated model are the model classified based on neural network to text, institute State the attribute tags that the user that class label is resource to be provided has;
Phrase extraction processing is carried out to the corpus data of different classes of label, it is short to obtain the corresponding target of different classes of label Language shows official documents and correspondence with generate resource to be provided based on the object phrase first.
CN201910458461.6A 2019-05-29 2019-05-29 Information processing method, device and equipment Active CN110263161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910458461.6A CN110263161B (en) 2019-05-29 2019-05-29 Information processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910458461.6A CN110263161B (en) 2019-05-29 2019-05-29 Information processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN110263161A true CN110263161A (en) 2019-09-20
CN110263161B CN110263161B (en) 2023-09-26

Family

ID=67915863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910458461.6A Active CN110263161B (en) 2019-05-29 2019-05-29 Information processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN110263161B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738562A (en) * 2019-10-16 2020-01-31 支付宝(杭州)信息技术有限公司 Method, device and equipment for generating risk reminding information
CN111400413A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for determining category of knowledge points in knowledge base
CN112364131A (en) * 2020-11-10 2021-02-12 中国平安人寿保险股份有限公司 Corpus processing method and related device thereof
CN112487151A (en) * 2020-12-14 2021-03-12 深圳市欢太科技有限公司 File generation method and device, storage medium and electronic equipment
CN113254633A (en) * 2021-05-10 2021-08-13 中国民生银行股份有限公司 Message pattern generation method and device
CN113744020A (en) * 2021-01-15 2021-12-03 北京沃东天骏信息技术有限公司 Commodity file processing method and device, electronic equipment and storage medium
CN116150413A (en) * 2023-02-07 2023-05-23 北京达佳互联信息技术有限公司 Multimedia resource display method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933113A (en) * 2014-06-06 2015-09-23 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
WO2018049960A1 (en) * 2016-09-14 2018-03-22 厦门幻世网络科技有限公司 Method and apparatus for matching resource for text information
CN108228665A (en) * 2016-12-22 2018-06-29 阿里巴巴集团控股有限公司 Determine object tag, the method and device for establishing tab indexes, object search
CN108256119A (en) * 2018-02-14 2018-07-06 北京方正阿帕比技术有限公司 A kind of construction method of resource recommendation model and the resource recommendation method based on the model
US20190034823A1 (en) * 2017-07-27 2019-01-31 Getgo, Inc. Real time learning of text classification models for fast and efficient labeling of training data and customization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933113A (en) * 2014-06-06 2015-09-23 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
WO2018049960A1 (en) * 2016-09-14 2018-03-22 厦门幻世网络科技有限公司 Method and apparatus for matching resource for text information
CN108228665A (en) * 2016-12-22 2018-06-29 阿里巴巴集团控股有限公司 Determine object tag, the method and device for establishing tab indexes, object search
US20190034823A1 (en) * 2017-07-27 2019-01-31 Getgo, Inc. Real time learning of text classification models for fast and efficient labeling of training data and customization
CN108256119A (en) * 2018-02-14 2018-07-06 北京方正阿帕比技术有限公司 A kind of construction method of resource recommendation model and the resource recommendation method based on the model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李雄等: "基于词项聚类的文本语义标签抽取研究", 《计算机科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738562A (en) * 2019-10-16 2020-01-31 支付宝(杭州)信息技术有限公司 Method, device and equipment for generating risk reminding information
CN111400413A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for determining category of knowledge points in knowledge base
CN112364131A (en) * 2020-11-10 2021-02-12 中国平安人寿保险股份有限公司 Corpus processing method and related device thereof
CN112364131B (en) * 2020-11-10 2024-05-17 中国平安人寿保险股份有限公司 Corpus processing method and related device thereof
CN112487151A (en) * 2020-12-14 2021-03-12 深圳市欢太科技有限公司 File generation method and device, storage medium and electronic equipment
CN113744020A (en) * 2021-01-15 2021-12-03 北京沃东天骏信息技术有限公司 Commodity file processing method and device, electronic equipment and storage medium
CN113254633A (en) * 2021-05-10 2021-08-13 中国民生银行股份有限公司 Message pattern generation method and device
CN113254633B (en) * 2021-05-10 2024-05-03 中国民生银行股份有限公司 Message document generation method and device
CN116150413A (en) * 2023-02-07 2023-05-23 北京达佳互联信息技术有限公司 Multimedia resource display method and device
CN116150413B (en) * 2023-02-07 2024-06-04 北京达佳互联信息技术有限公司 Multimedia resource display method and device

Also Published As

Publication number Publication date
CN110263161B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN110263161A (en) A kind of processing method of information, device and equipment
CN110413877A (en) A kind of resource recommendation method, device and electronic equipment
CN105224075B (en) Sensor-based mobile search, correlation technique and system
WO2019157946A1 (en) Anti-money laundering method, apparatus, and device
CN110532479A (en) A kind of information recommendation method, device and equipment
CN108701118A (en) Semantic classes is classified
CN110321482A (en) A kind of recommended method of information, device and equipment
CN109509054A (en) Method of Commodity Recommendation, electronic device and storage medium under mass data
CN109934619A (en) User's portrait tag modeling method, apparatus, electronic equipment and readable storage medium storing program for executing
CN110249354A (en) Use the mobile notice curtain showed of the animation of notice instruction
CN108334647A (en) Data processing method, device, equipment and the server of Insurance Fraud identification
CN109597661A (en) A kind of service function configuration method and device
CN110348462A (en) A kind of characteristics of image determination, vision answering method, device, equipment and medium
CN107168992A (en) Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence
CN107957831A (en) A kind of data processing method, device and processing equipment for showing interface content
CN110378726A (en) A kind of recommended method of target user, system and electronic equipment
CN109086961A (en) A kind of Information Risk monitoring method and device
CN107679082A (en) Question and answer searching method, device and electronic equipment
CN108921566A (en) A kind of wash sale recognition methods and device based on graph structure model
CN108346107A (en) A kind of social content Risk Identification Method, device and equipment
CN109154945A (en) New connection based on data attribute is recommended
CN102930048B (en) Use the data rich found automatically with reference to the semanteme with vision data
CN106919575A (en) application program searching method and device
CN107480854A (en) A kind of method and device of risk identification
CN107590205A (en) A kind of service showing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant