Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
Fig. 1 is a schematic diagram of an information push system provided in the present specification. As shown in fig. 1, the system may include a data engineering layer 102, a model training layer 104, an algorithm layer 106, and a product layer 108. Data engineering layer 102: multiple data depth mode (schema) supported, feature extraction OP, repeated exposure configured filtering. Model training layer 104: feature selection reconstruction, rigid quantity change, parameter adjustment of an optimizer, network structure optimization and hour-level model auxiliary optimization. Algorithm layer 106: DMP performs Deep learning (Deep learning) based on Ray framework, and performs high-precision algorithm yield in a manner intended to identify and enhance natural language sequences to sequences (Sequence to Sequence, seq2 seq). Product layer 108: the model of Deep learning for the DMP is communicated with a product layer, a scene intelligent semantic combination operation tool based on intelligent splitting combination of the text of intention recognition is formed by combining a specific scene (such as spring festival), and an operation tool for the attention, clicking will and clicking frequency (viscosity) of a user as a whole is formed by communicating with a product to be handled of a project in intelligent reminding.
In addition, the system may further include: an infrastructure layer 110, a model verification layer 112, and a model services layer 114. Wherein the infrastructure layer 110: the system bubble alerts the application programming interface (Application Programming Interface, API) to call association and H5 native interaction page conduction in real time. Model verification layer 112: and after the offline model is loaded, delay export, horse racing mechanism and real-time monitoring are performed. Model services layer 114: sufficient pressure measurement ensures service stability, balance effect and pressure monitoring.
In fig. 1, the Ray-based deep learning algorithm of the product layer 108 can better utilize the higher-order relationship, and in particular, under the conditions of missing user behavior data and insufficient corpus, the high-precision intention recognition can be still combined with the operation scene to perform high-fit and accurate user pushing.
It should be noted that, the information pushing method provided in the present specification is implemented based on the tag phrase mapping table, so before describing the scheme provided in the present specification, the process of creating the tag phrase mapping table is described.
Fig. 2 is a flowchart of a method for creating a tag phrase mapping table provided in the present specification. As shown in fig. 2, the setup process may include the steps of:
step 202, collecting a plurality of business corpora related to a plurality of users.
For example, multiple business corpora associated with multiple users may be collected from a business corpus. Wherein, each business corpus can have a corresponding initial label.
Specifically, the collection and arrangement of the business corpus can be performed based on specific behavior scenes (such as new spring) and keywords to be highlighted (such as five-blessing red packages and the like). In one example, the collected business corpus may be, for example: "exquisite shirt donates to exquisite you", "one sentence to spell you: never stop believing itself early ", etc. Wherein the initial label corresponding to "exquisite shirt donates to exquisite you" may be: "fashion sister" and "one sentence to you for pacing: the initial tag that never stops believing itself to be "corresponding to early" may be: "office in job site".
It should be noted that, after the service corpus is collected, the service corpus with poor user relevance may be filtered based on the relevance rule, so as to implement recall of the relevant corpus.
Step 204, for the business corpus related to each user, extracting the corresponding keywords from the business corpus.
The semantics of the keywords reflect the behavior habits of the user in the specified behavior scene.
In one implementation, the corresponding keywords may be extracted from the business corpus based on a context_service algorithm. The business corpus is as follows: for example, "exquisite shirt donates you" the extracted keywords may be, for example: "exquisite you". And then the business corpus: "one sentence is sent to you for pacing: for example, without ever stopping believing that oneself is early ", the extracted keywords may be, for example: "you spell". Here, the semantics of "exquisite you" and "spell you" reflect that the user is a behavioral habit in a shopping scenario.
The extracted keywords may be screened based on rules such as part-of-speech, relevance, and the like.
Step 206, based on the keywords, determining related words by adopting a deep learning algorithm.
Here, the employed deep learning algorithm may be a Ray frame based DMP algorithm.
The process of determining related words using a Ray-frame based DMP algorithm may be referred to in fig. 3. Specifically, in determining related words based on the DMP algorithm shown in fig. 3, keywords may be input at the uppermost final distribution position. Then, the keyword may be split to obtain a first sub-word and a second sub-word. Wherein the semantics of the first sub-word reflect the user's morphology of life. The semantics of the second sub-word reflect the user's persona designation. By key words: for example, "exquisite you" the resulting first sub-word may be: "exquisite," the second sub-term may be: "you". Then the key words are used as follows: for example, "you for pacing" the resulting first sub-word may be: "spell", the second sub-term may be: "you". For the above split sub-words, a convolutional neural network algorithm (e.g., text_cnn) may be used to filter the sub-words to remove unreasonable split words.
In fig. 3, the determined related terms include at least a first related term. The output position of the first related word may be a position of the right lower portion abstract. The semantics of the first related term reflect other behavioral habits of the user and/or user portraits in the specified behavioral scenario. Taking the user portrait as an example, it may be, for example: "gender", "occupation", "region", "work area", and "hobbies", etc. Further, the determined related terms may also include a second related term. The output position of the second related word is the position of the source text at the lower left side. The semantics of the second related term reflect the behavior habit and user portrayal of the user in other behavior scenarios. Taking the foregoing "elaborated you" and "spelled you" as examples, the second related word here may be a representation of the user in professional situations: "white collar" and "student" and so forth.
As can be seen from fig. 3, the process of determining the related words goes through the following process: the decoding hidden state- > vocabulary distribution- > final distribution, and the relationship between the three is shown in fig. 3, which is not repeated here.
In another example, the pointer timer model may also be used to determine related terms. The model is integrated with a k-means mechanism and a coverage mechanism on a basic DMP model, so that the precision of semantic recognition of a user is ensured.
In the method, when the related words are determined, the branches are adjusted on the BERT distributed strategy to perform searching together, and finally, the generated results are optimized and output, so that the flexibility of the determined words can be greatly improved. In addition, the determination method is based on the intention of the user to derive related words, so that intelligent splitting of keywords can be realized.
Returning to fig. 2, fig. 2 may further include the steps of:
step 208, based on the keywords and related words, generating user phrases and user labels corresponding to the user in the specified behavior scene.
Here, before the generating step is performed, the keyword and the related term may be normalized and mapped. Specifically, keywords and/or related words having the same meaning may be normalized to the same word, and the keywords and/or related words may be mapped to words (simply meaning words) that can embody the user's intention.
It will be appreciated that after the normalized mapping matching described above is performed, the user phrases as well as the user tags may be generated. Specifically, the user tags may be generated based on initial tags corresponding to the business corpus. The user tag may also be generated based on related terms determined from keywords extracted from the business corpus corresponding to the initial tag. For example, the initial tag described above may be: "fashion sister" as a user tag. The white collar may also be used as a user tag.
For user phrases, the first related terms may be generated based on keywords in the behavioral scenario corresponding to the respective user tags. Taking fashion sister as an example, the corresponding user phrase may be: "exquisite you". I.e. the split keywords are recombined.
When the related terms further include a second related term, the generated user tag may be: "white collar" and the corresponding phrase used may be: "on the day of the business steaming". In this example, the user phrase may be automatically generated by combining the second related terms based on the related rules. That is, when the related words further include the second related word, user tags and user phrases corresponding to the user in other behavior scenes may also be generated.
In the present specification, for the generated user phrases and user tags, a convolutional neural network algorithm (e.g., text_cnn) may be used to screen the generated user phrases and user tags, so as to screen out user phrases with higher quality.
Step 210, a tag phrase mapping table is established based at least on the user phrases and the user tags corresponding to the user in the specified behavior scenario.
It will be appreciated that in practical applications, steps 204 and 208 may be repeated until the user tags and user phrases corresponding to the respective users are generated.
When generating user tags and user phrases corresponding to the plurality of users, step 210 may be to build a tag phrase mapping table based on the user tags and user phrases of the plurality of users.
It should be noted that, the user tag and the user phrase in the finally established tag phrase mapping table can be used as a new business corpus to be complemented into the business corpus, so as to achieve the purpose of information backflow. By means of information backflow, service corpora in the service corpora library can be more and more abundant.
In summary, embodiments of the present disclosure may build the tag phrase mapping table described above based on behavioral habit data, intent words, and completion information with the user.
Fig. 4 is a schematic diagram of a process for creating a tag phrase mapping table provided in the present specification. In fig. 4, first, a plurality of business corpora related to a plurality of users are collected from a corpus, and respective corresponding keywords are extracted from each corpus. The plurality of business corpora here corresponds to a specific behavioral scenario. Then, based on the extracted keywords, learning other words related to the specific behavior scene by adopting a deep learning algorithm; and learning other words related to other behavioral scenarios. For other words learned, a convolutional neural network algorithm may be employed to filter them. And finally, generating corresponding relations between the user phrases and the user labels in different behavioral scenes based on the keywords and other filtered words, and establishing a label phrase mapping table based on the corresponding relations.
It can be appreciated that the foregoing is a description of the process of creating the tag phrase mapping table, and the following describes an information pushing method based on the mapping table.
Fig. 5 is a flowchart of an information pushing method according to an embodiment of the present disclosure. As shown in fig. 5, the method may include the steps of:
step 502, user behavior data of a user is obtained.
The user behavior data herein may refer to data generated when a user performs a business behavior such as a browsing behavior, a clicking behavior, or a consuming behavior on an application.
Step 504, determining a behavior scene of the user based on the user behavior data.
Here, it may be predefined, which may include, but is not limited to, a new spring scenario, a shopping scenario, a professional scenario, and the like.
Step 506, identify the user tag of the user.
Here, the user tag of the user may be identified based on the user behavior data. The user tag is used to describe a user portrait of the user.
In a shopping scenario, the user tag of the user may be, for example: "fashion sister", "housewife" and "office work", etc. In an employment scenario, the user tag of the user may be, for example: "white collar", "college student", and so forth.
Step 508, query the pre-established tag phrase table to obtain the user phrase corresponding to the user tag under the action scene.
The user phrase is used to describe the behavior habits of the user.
Taking the tag phrase mapping table shown in fig. 4 as an example, when determining that the behavior scenario of the user is: shopping scene, and user label of user is: when the user is fashion and sister, the obtained user phrases are as follows: "exquisite you".
Step 510, find target information that matches the user phrase.
Step 512, pushing the target information to the user.
Here, the matching target information may be found based on a preset matching relationship. For example, assume the following matching relationship:the digital product or cooking related item may be pushed to the current user.
In summary, according to the information pushing method provided by the embodiment of the present disclosure, after capturing the behavior of the user, the user tag of the user may be first identified, and then information that may be interested by the user is searched for and pushed based on the user phrase corresponding to the user tag, so that accurate pushing of the information may be achieved. In addition, the flexibility of the pushed information can be greatly improved.
Finally, in the scheme of the specification, the Ray is used as a distributed computing platform, so that an important effect is played for the accurate and efficient deep learning. The deep learning operation is realized from the acquisition of user behavior data from a data source to the efficient intention recognition of a final conversion model, and the End2End is always within a framework. The method realizes the important characteristics of data engineering collocation, model training real-time, model verification on-line, minute-level model updating and the like.
Corresponding to the above information pushing method, an embodiment of the present disclosure further provides an information pushing device, as shown in fig. 6, where the device may include:
an obtaining unit 602, configured to obtain user behavior data of a user.
A determining unit 604, configured to determine a behavior scenario of the user based on the user behavior data acquired by the acquiring unit 602.
An identification unit 606 for identifying a user tag of the user, the user tag describing a user portrait of the user.
The obtaining unit 602 is further configured to query a pre-established tag phrase table, and obtain a user phrase corresponding to the user tag in the current behavior scenario. The user phrase is used to describe the behavior habits of the user. The tag phrase mapping table is used for recording the corresponding relation between the user tag and the user phrase under different action scenes. The user labels and the user phrases in one behavior scene are determined based on behavior habits of the user in the behavior scene or other related behavior scenes and a deep learning algorithm.
The deep learning algorithm here may be, for example, a Ray-frame based DMP algorithm.
A searching unit 608, configured to search for target information that matches the user phrase acquired by the acquiring unit 602.
And a pushing unit 610, configured to push the target information to a user.
Optionally, the apparatus may further include:
a building unit (not shown in the figure) is configured to collect a plurality of business corpora related to a plurality of users.
For the business corpus related to any first user, extracting corresponding keywords from the business corpus, wherein the semantics of the keywords reflect the behavior habit of the first user under the appointed behavior scene.
Based on the keywords, a deep learning algorithm is adopted to determine related words. The related words at least comprise first related words, and the semantics of the first related words reflect other behavior habits and/or user portraits of the first user in the specified behavior scene.
Based on the keywords and related words, generating user phrases and user labels corresponding to the first user in the specified behavior scene.
And establishing a tag phrase mapping table at least based on the user phrase and the user tag corresponding to the first user in the specified behavior scene.
Optionally, the related words may further include a second phase Guan Ciyu, where the semantics of the second related word reflect the behavior habit and the user portrait of the first user in other behavior scenarios.
The determining unit 604 is further configured to determine, based on the second related word, a user phrase and a user tag corresponding to the first user in other behavior scenarios.
The establishing unit may specifically be configured to:
and establishing a tag phrase mapping table at least based on the user phrase and the user tag corresponding to the first user in the appointed behavior scene and the user phrase and the user tag corresponding to the first user in other behavior scenes.
The establishing unit may in particular be further adapted to:
splitting the keywords to obtain a first sub-word and a second sub-word. The semantics of the first sub-word reflect the life form of the first user and the semantics of the second sub-word reflect the character designation of the first user.
Based on the first sub-word and the second sub-word, a deep learning algorithm is adopted to determine related words.
Optionally, the apparatus may further include:
a screening unit (not shown in the figure) for screening the first sub-word and the second sub-word based on a convolutional neural network algorithm.
The establishing unit may in particular be further adapted to:
and based on the first sub-word and the second sub-word after screening, determining related words by adopting a deep learning algorithm.
The functions of the functional modules of the apparatus in the foregoing embodiments of the present disclosure may be implemented by the steps of the foregoing method embodiments, so that the specific working process of the apparatus provided in one embodiment of the present disclosure is not repeated herein.
In the information pushing device provided in one embodiment of the present disclosure, the obtaining unit 602 obtains user behavior data of a user. The determining unit 604 determines a behavior scene of the user based on the acquired user behavior data. The identification unit 606 identifies a user tag of the user, which is used to describe the user portrait of the user. The obtaining unit 602 queries a pre-established tag phrase table to obtain a user phrase corresponding to the user tag in the current behavior scene. The user phrase is used to describe the behavior habits of the user. The tag phrase mapping table is used for recording the corresponding relation between the user tag and the user phrase under different action scenes. The user labels and the user phrases in one behavior scene are determined based on behavior habits of the user in the behavior scene or other related behavior scenes and a deep learning algorithm. The search unit 608 searches for target information that matches the acquired user phrase. The pushing unit 610 pushes target information to a user. Therefore, accurate pushing of information can be achieved.
Corresponding to the above information pushing method, the embodiment of the present disclosure further provides an information pushing device, as shown in fig. 7, where the device may include: memory 702, one or more processors 704, and one or more programs. Wherein the one or more programs are stored in the memory 702 and configured to be executed by the one or more processors 704, the programs when executed by the processor 704 performing the steps of:
user behavior data of a user is obtained.
Based on the user behavior data, a behavior scenario of the user is determined.
A user tag of the user is identified, the user tag describing a user representation of the user.
Inquiring a pre-established tag phrase table to obtain a user phrase corresponding to the user tag under the determined behavior scene. The user phrase is used to describe the behavior habits of the user. The tag phrase mapping table is used for recording the corresponding relation between the user tag and the user phrase under different behavior scenes. The user labels and the user phrases in one behavior scene are determined based on behavior habits of the user in the behavior scene or other related behavior scenes and a deep learning algorithm.
Target information matching the user phrase is found.
Target information is pushed to a user.
The information pushing equipment provided by the embodiment of the specification can realize accurate pushing of information.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a server. The processor and the storage medium may reside as discrete components in a server.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing detailed description of the embodiments has further described the objects, technical solutions and advantages of the present specification, and it should be understood that the foregoing description is only a detailed description of the embodiments of the present specification, and is not intended to limit the scope of the present specification, but any modifications, equivalents, improvements, etc. made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.