CN113326411A - Network behavior knowledge enhancement method and device and electronic equipment - Google Patents

Network behavior knowledge enhancement method and device and electronic equipment Download PDF

Info

Publication number
CN113326411A
CN113326411A CN202010127236.7A CN202010127236A CN113326411A CN 113326411 A CN113326411 A CN 113326411A CN 202010127236 A CN202010127236 A CN 202010127236A CN 113326411 A CN113326411 A CN 113326411A
Authority
CN
China
Prior art keywords
webpage
network
behavior
hierarchy
behaviors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010127236.7A
Other languages
Chinese (zh)
Other versions
CN113326411B (en
Inventor
刘良军
黄益晓
曹勇
陈翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Fujian Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Fujian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Fujian Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010127236.7A priority Critical patent/CN113326411B/en
Publication of CN113326411A publication Critical patent/CN113326411A/en
Application granted granted Critical
Publication of CN113326411B publication Critical patent/CN113326411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a network behavior knowledge enhancement method and device and electronic equipment, and relates to the technical field of computers. When the user network behavior is detected, crawling web page information corresponding to the user network behavior; taking a webpage hierarchy corresponding to the webpage information as a current webpage hierarchy, and sequentially crawling a plurality of network sub-behaviors included in each webpage hierarchy based on the current webpage hierarchy and according to a preset webpage hierarchy relation; respectively selecting an optimal network sub-behavior from a plurality of network sub-behaviors corresponding to each webpage level; and constructing a first knowledge enhancement path based on a preset network hierarchy relation and the optimal network sub-behaviors in each webpage hierarchy so as to realize effective analysis of the network behaviors of the user.

Description

Network behavior knowledge enhancement method and device and electronic equipment
Technical Field
The application relates to the technical field of computers, in particular to a network behavior knowledge enhancement method and device and electronic equipment.
Background
With the rapid development of mobile internet and the continuous emergence and prosperity of emerging technologies such as cloud computing and internet of things, the network behavior of users generates massive data. The same is true in the telecommunication industry, and with the increasing competition of the industry, higher requirements are put on how operators understand users and characterize user behaviors and portraits.
At present, most of the operator field analyzes the collected user network behavior data based on mathematical statistics, improves the user network behavior analysis by combining with static attribute information of the operator user, and then delineates the network behavior of the user.
Disclosure of Invention
In view of the above problems, embodiments of the present application provide a method and an apparatus for enhancing network behavior knowledge, and an electronic device, which are as follows.
In a first aspect, an embodiment of the present application provides a network behavior knowledge enhancing method, where the method includes:
when a user network behavior is detected, crawling webpage information corresponding to the user network behavior;
taking a webpage hierarchy corresponding to the webpage information as a current webpage hierarchy, and sequentially crawling a plurality of network sub-behaviors included in each webpage hierarchy based on the current webpage hierarchy and according to a preset webpage hierarchy relation;
respectively selecting an optimal network sub-behavior from a plurality of network sub-behaviors corresponding to each webpage level;
and constructing a first knowledge enhancement path based on the preset network hierarchy relation and the optimal network sub-behaviors in each webpage hierarchy.
Further, as a possible implementation manner, the step of selecting an optimal network sub-behavior from a plurality of network sub-behaviors corresponding to each web page hierarchy includes:
respectively crawling a webpage text corresponding to each network sub-behavior aiming at a plurality of network sub-behaviors corresponding to each webpage hierarchy;
extracting keywords meeting preset requirements from each webpage text to form a plurality of keyword lists corresponding to each webpage text one by one;
respectively calculating semantic difference values between each keyword list and a preset user behavior information word packet;
based on the size of the semantic difference value, selecting an optimal keyword list from the plurality of keyword lists;
and taking the network sub-behavior corresponding to the optimal keyword list as the optimal network sub-behavior in the webpage hierarchy.
Further, as a possible implementation manner, the step of extracting keywords meeting preset requirements from each web page text to form a plurality of keyword lists corresponding to each web page text one to one includes:
for each webpage text, performing word segmentation processing on the webpage text to obtain a plurality of keywords;
calculating the word weight of each keyword according to a preset word weight model;
and selecting a preset number of keywords with larger word weights from the plurality of keywords according to the word weights to form a keyword list.
Further, as a possible implementation manner, the step of respectively calculating a semantic difference value between each keyword list and a preset user behavior information word packet includes:
converting each keyword list into a plurality of first word vector matrixes in one-to-one correspondence, and converting the user behavior information word packet into a second word vector matrix;
and respectively calculating the distance between each first word vector matrix and each second word vector matrix to obtain a plurality of semantic difference values which are in one-to-one correspondence with each keyword list.
Further, as a possible implementation manner, the semantic difference value SD is:
Figure BDA0002394771290000021
wherein m is the number of the participles in the user behavior information word package, n is the number of the keywords in the keyword list, kwv represents a word vector in a first word vector matrix, v represents a word vector in a second word vector matrix, α is the contribution degree of the participles in the user behavior information word package, β represents the contribution degree of the keywords in the keyword list, α β is 1/i, and i is the index value of the current participle in the user behavior information word package.
Further, as a possible implementation manner, after the step of constructing the first knowledge enhancement path based on the preset network hierarchy relationship and the optimal network sub-behavior in each webpage hierarchy, the method further includes:
constructing a second knowledge enhancement path based on an optimal keyword list corresponding to each optimal network sub-behavior in the first knowledge enhancement path;
and
and forming a knowledge enhancement word packet based on the keywords in the optimal keyword list included in the second knowledge enhancement path.
Further, as a possible implementation manner, the method further includes:
and if the number of the related webpage levels reaches a preset value when the network sub-behaviors are crawled, stopping a crawling process of the network sub-behaviors contained in the next webpage level, and executing the step of selecting the optimal network sub-behaviors from a plurality of network sub-behaviors corresponding to the webpage level aiming at each webpage level.
In a second aspect, an embodiment of the present application provides a network behavior knowledge enhancing apparatus, where the apparatus includes:
the webpage information crawling module is used for crawling webpage information corresponding to the user network behavior when the user network behavior is detected;
the network child behavior crawling module is used for sequentially crawling a plurality of network child behaviors included in each webpage hierarchy based on the current webpage hierarchy and according to a preset webpage hierarchy relation by taking the webpage hierarchy corresponding to the webpage information as the current webpage hierarchy;
the optimal sub-behavior selection module is used for selecting optimal network sub-behaviors from a plurality of network sub-behaviors corresponding to each webpage level;
and the enhanced path construction module is used for constructing a first knowledge enhanced path based on the preset network hierarchy relationship and the optimal network sub-behaviors in each webpage hierarchy.
In a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor;
at least one memory coupled to the processor;
wherein the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method as described above.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions, which cause the computer to execute the method described above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:
the method comprises the steps of using web page information corresponding to detected network behaviors of a user as a current network level, crawling network sub-behaviors contained in different web page levels, and constructing a first knowledge enhancement path based on a preset network level relation and optimal network sub-behaviors in each web page level so as to analyze behavior trends of the user in a network, and further effectively analyzing single and scattered network behaviors of the user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a network behavior knowledge enhancement method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a web page hierarchical relationship provided in the embodiment of the present application.
Fig. 3 is a block diagram of a network behavior knowledge enhancing apparatus according to an embodiment of the present application.
Fig. 4 is a block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Example one
As shown in fig. 1, a flow diagram of a network behavior knowledge enhancement method provided in the embodiment of the present application is a flowchart, and the network behavior knowledge enhancement method may be executed by, but is not limited to, an electronic device, and specifically may be executed by hardware or/and software in the electronic device. Alternatively, the electronic device may be, but is not limited to, a terminal such as a smartphone, a computer, a server, a wearable device, and the like. Referring to fig. 1, the network behavior knowledge enhancement method provided by the present application may include the following steps.
And S11, when the user network behavior is detected, crawling the webpage information corresponding to the user network behavior.
Optionally, the user network behavior may be, but is not limited to, a user input behavior initiated by the user based on the user interface, such as a website input behavior, a text input behavior, a picture input behavior, and the like, and the user network behavior may include a plurality of network sub-behaviors at the same time, such as a plurality of sub-websites may be included in the website input behavior, which is not limited in this embodiment. In addition, in practical implementation, webpage information corresponding to the user network behavior can be crawled by using a greedy crawler algorithm.
And S12, taking the webpage hierarchy corresponding to the webpage information as the current webpage hierarchy, and sequentially crawling a plurality of network sub-behaviors included in each webpage hierarchy according to a preset webpage hierarchy relation based on the current webpage hierarchy.
Optionally, referring to fig. 2 in combination, if the user network behavior is a website W input by the user, the web page information W ' corresponding to the website W is a first web page level, if the web page information W ' further includes web child behaviors (web links) a1, a2, A3, … …, An, then the web child behaviors (web links) a1, a2, A3, … …, An respectively corresponding to the web page information a1 ', a2 ', A3 ', … …, An ' are the first web page level, if the web page information a1 ' includes web child behaviors B1, B2, B3, … …, Bn, the web page information a2 ' includes web child behaviors C1, C2, C3, … …, Cn, and the web page information A3 ' includes web child behaviors Cn 1, D2, D3, … …, Dn 1, B1C 1, 1C, 1, and C, The web page information corresponding to D2, D3, … …, Dn is the third web page level, and so on, so as to obtain the hierarchical relationship among the web page information.
It should be noted that, when crawling the network child behaviors, one piece of web page information may include multiple network child behaviors or may not include the network child behaviors, which is not limited in this embodiment. In addition, in the process of crawling the webpage sub-behaviors, the webpage (corresponding to the network sub-behaviors) which is not responded or has errors can be directly removed, so that the 'peeping' of the network behaviors of the user is completed "
Further, in order to improve the knowledge enhancement efficiency of the network behavior and avoid excessive enhancement of the network behavior in actual implementation, in this embodiment, if the number of web page levels (that is, the number of knowledge enhancements) involved in crawling the network sub-behavior reaches a preset value (for example, 10 levels or the like), the crawling process of the network sub-behavior included in the next web page level is stopped, and the step of selecting the optimal network sub-behavior from the plurality of network sub-behaviors corresponding to each web page level in S13 is performed.
And S13, selecting the optimal network sub-behaviors from the multiple network sub-behaviors corresponding to the web page levels respectively.
In this embodiment, when the optimal network sub-behavior is selected, a semantic difference value between text information included in web page information corresponding to each network sub-behavior and a preset network user behavior information word packet is used as a selection basis, so that behavior information and a trend of the user network behavior can be introduced in a network behavior knowledge enhancement process, and the optimal network sub-behavior is obtained from each web page level for subsequent network behavior knowledge enhancement.
Optionally, in some implementations, the optimal network sub-behavior in each web page hierarchy described in S13 can be selected through the following steps S131 to S135, as follows.
S131, crawling the webpage texts corresponding to the network sub-behaviors respectively according to the network sub-behaviors corresponding to the webpage levels. For example, assuming that each network sub-behavior is A, B, C, the web page information corresponding to each network sub-behavior A, B, C may be crawled by a greedy crawler algorithm, and then the web page information is analyzed by a web page analysis tool, so as to obtain the web page sub-behaviors and web page texts included in each web page information.
S132, extracting keywords meeting preset requirements from each webpage text respectively to form a plurality of keyword lists corresponding to the webpage texts one by one. Alternatively, the aforementioned S132 may be implemented by the following S1321 to S1323, which are described below.
S1321, performing word segmentation processing on each webpage text to obtain a plurality of keywords.
S1322, calculating the word weight of each keyword according to the preset word weight model.
S1323, according to the word weight value, selecting a preset number of keywords with larger word weight values from the keywords to form a keyword list.
Illustratively, assume that the web page Text is TextiThen, but not limited to, Text of web page using word segmentation tool NLPIRiThe method comprises the steps of performing word segmentation to obtain a plurality of keywords, sequencing the keywords according to a word weight model and word weights, and selecting a preset number of (such as 3) keywords in the front sequence based on a sequencing result to form a keyword list KW (KW1, KW2, KW3, … … and KWn).
And S133, respectively calculating semantic difference values between each keyword list and a preset user behavior information word packet.
The user behavior information word package may be, for each web page level, when the web page level is crawled, update a keyword in an optimal keyword list corresponding to an optimal network sub-behavior included in the web page level to a preset user behavior information word package, so that the network behavior knowledge enhancement method provided by the application has a self-learning function, and uses the updated user behavior information word package each time as reference word meaning information when the optimal network sub-behavior is selected in a next web page level, so as to continuously adjust user information and improve accuracy of a subsequent user network behavior knowledge enhancement result, such as the first knowledge enhancement path in S14.
In addition, in some implementation manners, the user behavior information word package may also be set in advance according to interests, ages, and the like of the user, and this embodiment is not limited herein.
Optionally, as an optional implementation manner, the semantic difference value described in S133 may be implemented by S1331 and S1332, which are as follows.
S1331, converting each keyword list into a plurality of first word vector matrices corresponding to one another, and converting the user behavior information word packets into second word vector matrices.
When the word vector conversion is carried out, a word vector model can be selected, but not limited to a Skip-gram word vector model, and the word vector model can be obtained by adopting an open source tool word2vec and training based on a Wikipedia Chinese data set. Taking the keyword list KW as an example, the keyword list KW may be converted into a first word vector matrix kwv (kwv1,1, kwv1,2, kwv1,3, … …, kwvn,200) by using a word vector model, where n is the number of keywords in the keyword list KW, and 200 is a preset dimension of the word vector model (the dimension may be set according to actual needs), that is, each kwv component may be represented as (v1, v2, v3 … v 200).
S1331, calculating distances between the first word vector matrices and the second word vector matrices, respectively, to obtain a plurality of semantic difference values corresponding to the keyword lists one to one. Optionally, in the embodiment, when performing semantic difference value calculation, the semantic difference value SD may be implemented based on cosine distance, for example, assuming that the first word vector matrix kwv (kwv1,1, kwv1,2, kwv1,3, … …, kwvn,200) and the second word vector matrix is (v1,1, v1,2, v1,3, … …, vm, 200):
Figure BDA0002394771290000071
wherein m is the number of the participles in the user behavior information word package, n is the number of the keywords in the keyword list, kwv represents the word vector in the first word vector matrix, v represents the word vector in the second word vector matrix, α is the contribution degree of the participles in the user behavior information word package, β represents the contribution degree of the keywords in the keyword list, α β 1/i, and i is the index value of the current participle in the user behavior information word package.
S134, based on the size of the semantic difference value, selecting an optimal keyword list from the multiple keyword lists.
And S135, taking the network sub-behavior corresponding to the optimal keyword list as the optimal network sub-behavior in the webpage hierarchy.
According to the method and the device, the semantic difference is calculated to control the selection of the crawler path (namely the selection of the network sub-behaviors) under the guidance of the semantic difference, so that the network behavior knowledge enhancement efficiency is effectively improved, and the reliability of the result is enhanced.
It should be noted that, when the optimal network sub-behavior included in each web page level is selected, the optimal network sub-behavior may be selected immediately after the crawling of the network sub-behavior of one web page level is completed; or after crawling the network sub-behaviors in all the web page levels is completed, uniformly selecting the optimal network sub-behaviors in each web page level, which is not limited in this embodiment.
And S14, constructing a first knowledge enhancement path based on the preset network hierarchy relation and the optimal network sub-behaviors in each webpage hierarchy.
Illustratively, referring again to fig. 2, assuming that W 'is the optimal network sub-behavior in the first web page level, a 2' is the optimal network sub-behavior in the second web page level, and D3 'is the optimal network sub-behavior in the third web page level, then the first knowledge enhancement path described in S14 is W' → a2 '→ D3'.
Further, in some implementations, in addition to the network behavior path in S14 described above, network behavior knowledge enhancement may be performed from the word dimension, such as after S14, which may further include: constructing a second knowledge enhancement path based on a preset network level relation and the optimal keyword list corresponding to each webpage level; and/or forming a knowledge enhancement word package based on the keywords in the optimal keyword list corresponding to each webpage level so as to enrich knowledge of word meaning content and subject of the user network behavior. In addition, the webpage text corresponding to the optimal network sub-behavior at each time can be extracted, and LDA topic modeling is carried out to extract topic information and the like.
Based on the foregoing description, the following describes an implementation flow of the network behavior knowledge enhancement method provided by the present application, taking selection of an optimal path in a web page hierarchy as an example, and the content is as follows.
(1) Obtaining user network behavior Beh of user t momentiWherein Behi representsiThe user network behavior is the original network input entered by the user.
(2) Beh based on user network behavioriConstructing a user candidate behavior list CUBL ═ great face<Beh1>,<Beh2>,<Beh3>,……<Behp>}。
(3) Crawling user candidate behavior list CUBL (great face) by using greedy crawler algorithm<Beh1>,<Beh2>,<Beh3>,……<Behp>Extracting webpage Text in the webpage information according to the webpage information corresponding to the user network behaviorsiAnd network child behavior list SublBehi(including a plurality of network sub-behaviors).
(4) Beh based on user network behavioriText of web pageiConstructing ternary list TerBeh ═ containing said Chinese style of user network behavior<Beh1,Text1,SublBeh1>,<Beh2,Text2,SublBeh2>,<Beh3,Text3,SublBeh3>……<Behp,Textp,SublBehp>}。
(5) For each ternary in the ternary list of user network behaviors<Behi,Texti,SublBehi>Extracting webpage text Texti, and constructing a keyword list KW (KW) corresponding to each ternary formula one to one1,KW2,KW3…KWn) And calculating each keyword list KWiSemantic difference value SDi with the user behavior information package Bbw.
(6) Constructing quintuple sequence QuiBeh ═ great face based on ternary list of user network behaviors, keyword list KW and semantic difference value SDi<Beh1,Text1,SublBeh1,SD1,KW1>,<Beh2,Text2,SublBeh2,SD2,KW2>,<Beh3,Text3,SublBeh3,SD3,KW3>……<Behp,Textp,SublBehp,SDp,KWp>}。
(7) According to the semantic difference value SD included in each quintupleiCarrying out priority sequencing on each quintuple in the quintuple sequence from small to large to obtain a quintuple priority queue QuiBehsort (Q) of the candidate path1,Q2,Q3,Q4…Qp) Wherein Q isiAnd representing a candidate path five-element form with index i after sorting.
(8) Quintuple priority queue QuiBehsort (Q) based on candidate path1,Q2,Q3,Q4…Qp) The process of selecting the optimal network sub-behavior (optimal path) under the current webpage level comprises the following steps:
and (4) judging whether the current knowledge enhancement quantity Scur (the quantity of the webpage hierarchy) reaches a preset value, if so, skipping to stop path selection, and executing (9).
If not, Q in the quinary priority queue of the candidate path is processed1(where the semantic difference value SD contained is the largest) if Q1And (3) if the included sub-behavior list SublBehseq is not empty, taking the user network behavior corresponding to the sub-behavior list SublBehseq as an optimal path, and repeatedly executing (2) - (8) based on the sub-behavior list SublBehseq including each network sub-behavior to select the optimal path in the next webpage level until the knowledge enhancement number Scur reaches a preset value, and executing (9).
Alternatively, in some implementations, if Q1If the included child behavior list sulblbehseq is empty, the path selection is stopped, and (9) is performed.
OrIf Q is1If the included child behavior list SublBehseq is empty, the path backtracking is carried out to select Q in the candidate path quinary priority queue2And (5) analyzing, and analogizing in turn until the optimal path in the current webpage level and the optimal path in the next webpage level are selected, and executing (9) when the knowledge enhancement quantity Scur reaches a preset value.
It should be noted that if the optimal path in the current web page hierarchy is not found after the complete traversal backtracking of the five-tuple in the candidate path five-tuple priority queue is completed, the path selection is stopped, and (9) is performed.
(9) And constructing a first knowledge enhancement path (namely a user word sense selection path) according to the optimal path in each webpage hierarchy, and completing the knowledge enhancement of the user network behavior trend. And constructing a second knowledge enhancement path by combining the optimal selection keyword list KWop in the quintuple form so as to complete the knowledge enrichment of the word meaning content and the theme of the user network behavior. In addition, the webpage text Textop in the quinary form can be extracted for optimal selection each time, LDA theme modeling is carried out, and theme information is extracted.
It should be noted that the network behavior knowledge enhancement flow given in the foregoing (1) - (9) is only one possible implementation manner of the technical solution given in this embodiment, and this embodiment does not limit this, and meanwhile, the binary expression, the ternary expression, the quinary expression, and the like are only one description manner adopted for facilitating understanding, and therefore, the technical solution given in this application is not limited.
Further, as can be seen from the network behavior knowledge enhancement method provided in the foregoing, the technical solution provided in the present application has at least the following technical effects.
When knowledge enhancement is carried out on a single user network behavior, network sub-behavior selection is carried out based on a greedy algorithm, and an optimal network sub-behavior is selected from candidate network sub-behaviors contained in each webpage hierarchy by combining word meaning information and semantic difference values in webpage texts to construct a first knowledge enhancement path, so that the user candidate behavior information can be effectively mined, more effective semantic information is provided for path selection, meanwhile, the trend of the user network behavior can be simulated by self-learning of reference semantic information, and the effectiveness of the enhancement knowledge is effectively guaranteed.
In other words, for the single network behavior of the user, knowledge expansion (network behavior knowledge enhancement) is performed from word dimension, path dimension and text dimension, so that a basis is provided for analysis of the network behavior of the user, and the problem that the traditional analysis of the network behavior of the user is difficult to analyze the single behavior or the single thin knowledge is solved. Meanwhile, compared with a traditional user network behavior knowledge enhancement mode requiring mass data, the method and the device only need to analyze the webpage text of the current path level, so that the method and the device have higher efficiency, and simultaneously can eliminate the interference of a large amount of useless knowledge, thereby ensuring the effectiveness of the enhanced knowledge.
Example two
Fig. 3 is a block diagram illustrating a network behavior knowledge enhancement apparatus 100 according to an exemplary embodiment, where the network behavior knowledge enhancement apparatus 100 is applicable to an electronic device. Referring to fig. 3, the network behavior knowledge enhancing apparatus 100 includes a web page information crawling module 110, a network child behavior crawling module 120, an optimal child behavior selecting module 130, and an enhanced path constructing module 140.
The webpage information crawling module 110 is configured to crawl webpage information corresponding to the user network behavior when the user network behavior is detected;
the network child behavior crawling module 120 is configured to take a webpage hierarchy corresponding to the webpage information as a current webpage hierarchy, and crawl a plurality of network child behaviors included in each webpage hierarchy in sequence according to a preset webpage hierarchy relationship based on the current webpage hierarchy;
an optimal sub-behavior selection module 130, configured to select an optimal network sub-behavior from multiple network sub-behaviors corresponding to each web page level;
and the enhanced path construction module 140 is configured to construct a first knowledge enhanced path based on a preset network hierarchy relationship and the optimal network sub-behavior in each webpage hierarchy.
The specific manner in which the respective modules perform operations has been described in detail in the embodiment of the method with respect to the apparatus 100 in the present embodiment, and will not be elaborated here. For example, the detailed description about the parameter obtaining module 110 may refer to the description about S11 in the first embodiment, the detailed description about the parameter adjusting module 120 may refer to the description about S12 in the first embodiment, and so on.
EXAMPLE III
Referring to fig. 4, a block diagram of an electronic device 10 according to an exemplary embodiment is provided, where the electronic device 10 may at least include a processor 11 and a memory 12 for storing instructions executable by the processor 11. Wherein the processor 11 is configured to execute the instructions to implement all or part of the steps of the network behavior knowledge enhancement method as in the above embodiments.
The processor 11 and the memory 12 are electrically connected directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
Wherein the processor 11 is adapted to read/write data or programs stored in the memory and to perform corresponding functions.
The memory 12 is used to store programs or data, such as instructions executable by the processor 110. The Memory 12 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
Further, as a possible implementation, the electronic device 10 may also include power components, multimedia components, audio components, input/output (I/O) interfaces, sensor components, and communication components, among others.
The power supply components provide power to the various components of the electronic device 10. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 10.
The multimedia components include a screen that provides an output interface between the electronic device 10 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the electronic device 10 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component is configured to output and/or input an audio signal. For example, the audio component may include a Microphone (MIC) configured to receive an external audio signal when the electronic device 10 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 12 or transmitted via the communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.
The I/O interface provides an interface between the processing component and a peripheral interface module, which may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly includes one or more sensors for providing various aspects of status assessment for the electronic device 10. For example, the sensor assembly may detect an open/closed state of the electronic device 10, the relative positioning of the components, such as a display and keypad of the electronic device 10, the sensor assembly may also detect a change in the position of the electronic device 10 or a component of the electronic device 10, the presence or absence of user contact with the electronic device 10, orientation or acceleration/deceleration of the electronic device 10, and a change in the temperature of the electronic device 10. The sensor assembly may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component is configured to facilitate wired or wireless communication between the electronic device 10 and other devices. The electronic device 10 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 10 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
It should be understood that the configuration shown in fig. 4 is merely a schematic diagram of the configuration of the electronic device 10, and that the electronic device 10 may include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.
Example four
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 12 comprising instructions, executable by the processor 11 of the electronic device 10 to perform the network behavior knowledge enhancement method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for enhancing knowledge of network behavior, the method comprising:
when a user network behavior is detected, crawling webpage information corresponding to the user network behavior;
taking a webpage hierarchy corresponding to the webpage information as a current webpage hierarchy, and sequentially crawling a plurality of network sub-behaviors included in each webpage hierarchy based on the current webpage hierarchy and according to a preset webpage hierarchy relation;
respectively selecting an optimal network sub-behavior from a plurality of network sub-behaviors corresponding to each webpage level;
and constructing a first knowledge enhancement path based on the preset network hierarchy relation and the optimal network sub-behaviors in each webpage hierarchy.
2. The method for enhancing knowledge of network behaviors of claim 1, wherein the step of selecting the optimal network sub-behavior from the plurality of network sub-behaviors corresponding to each web page hierarchy comprises:
respectively crawling a webpage text corresponding to each network sub-behavior aiming at a plurality of network sub-behaviors corresponding to each webpage hierarchy;
extracting keywords meeting preset requirements from each webpage text to form a plurality of keyword lists corresponding to each webpage text one by one;
respectively calculating semantic difference values between each keyword list and a preset user behavior information word packet;
based on the size of the semantic difference value, selecting an optimal keyword list from the plurality of keyword lists;
and taking the network sub-behavior corresponding to the optimal keyword list as the optimal network sub-behavior in the webpage hierarchy.
3. The method for enhancing knowledge of network behaviors of claim 2, wherein the step of extracting keywords satisfying a preset requirement from each web page text to form a plurality of keyword lists corresponding to each web page text one to one comprises:
for each webpage text, performing word segmentation processing on the webpage text to obtain a plurality of keywords;
calculating the word weight of each keyword according to a preset word weight model;
and selecting a preset number of keywords with larger word weights from the plurality of keywords according to the word weights to form a keyword list.
4. The method of claim 2, wherein the step of calculating semantic difference values between the keyword lists and the predetermined user behavior information word packets respectively comprises:
converting each keyword list into a plurality of first word vector matrixes in one-to-one correspondence, and converting the user behavior information word packet into a second word vector matrix;
and respectively calculating the distance between each first word vector matrix and each second word vector matrix to obtain a plurality of semantic difference values which are in one-to-one correspondence with each keyword list.
5. The method according to claim 4, wherein the semantic difference value SD is:
Figure FDA0002394771280000021
wherein m is the number of the participles in the user behavior information word package, n is the number of the keywords in the keyword list, kwv represents a word vector in a first word vector matrix, v represents a word vector in a second word vector matrix, α is the contribution degree of the participles in the user behavior information word package, β represents the contribution degree of the keywords in the keyword list, α β is 1/i, and i is the index value of the current participle in the user behavior information word package.
6. The method of claim 2, wherein after the step of constructing the first knowledge enhancement path based on the preset network hierarchy relationship and the optimal network sub-behavior in each web page hierarchy, the method further comprises:
constructing a second knowledge enhancement path based on an optimal keyword list corresponding to each optimal network sub-behavior in the first knowledge enhancement path;
and
and forming a knowledge enhancement word packet based on the keywords in the optimal keyword list included in the second knowledge enhancement path.
7. The network behavior knowledge enhancement method of claim 2, further comprising:
and if the number of the related webpage levels reaches a preset value when the network sub-behaviors are crawled, stopping a crawling process of the network sub-behaviors contained in the next webpage level, and executing the step of selecting the optimal network sub-behaviors from a plurality of network sub-behaviors corresponding to the webpage level aiming at each webpage level.
8. A network behavior knowledge enhancement apparatus, the apparatus comprising:
the webpage information crawling module is used for crawling webpage information corresponding to the user network behavior when the user network behavior is detected;
the network child behavior crawling module is used for sequentially crawling a plurality of network child behaviors included in each webpage hierarchy based on the current webpage hierarchy and according to a preset webpage hierarchy relation by taking the webpage hierarchy corresponding to the webpage information as the current webpage hierarchy;
the optimal sub-behavior selection module is used for selecting optimal network sub-behaviors from a plurality of network sub-behaviors corresponding to each webpage level;
and the enhanced path construction module is used for constructing a first knowledge enhanced path based on the preset network hierarchy relationship and the optimal network sub-behaviors in each webpage hierarchy.
9. An electronic device, comprising:
at least one processor;
at least one memory coupled to the processor;
wherein the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.
10. A computer-readable storage medium, wherein the storage medium stores computer instructions that cause the computer to perform the method of any one of claims 1 to 7.
CN202010127236.7A 2020-02-28 2020-02-28 Network behavior knowledge enhancement method and device and electronic equipment Active CN113326411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010127236.7A CN113326411B (en) 2020-02-28 2020-02-28 Network behavior knowledge enhancement method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010127236.7A CN113326411B (en) 2020-02-28 2020-02-28 Network behavior knowledge enhancement method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113326411A true CN113326411A (en) 2021-08-31
CN113326411B CN113326411B (en) 2024-05-03

Family

ID=77412577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010127236.7A Active CN113326411B (en) 2020-02-28 2020-02-28 Network behavior knowledge enhancement method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113326411B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102119389A (en) * 2008-06-11 2011-07-06 微软公司 Automatic image annotation using semantic distance learning
CN103186676A (en) * 2013-04-08 2013-07-03 湖南农业大学 Method for searching thematic knowledge self growth form focused crawlers
CN103970729A (en) * 2014-04-29 2014-08-06 河海大学 Multi-subject extracting method based on semantic categories
CN103970730A (en) * 2014-04-29 2014-08-06 河海大学 Method for extracting multiple subject terms from single Chinese text
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN108154395A (en) * 2017-12-26 2018-06-12 上海新炬网络技术有限公司 A kind of customer network behavior portrait method based on big data
KR20190047939A (en) * 2017-10-30 2019-05-09 한림대학교 산학협력단 Method and apparatus for collecting and analyzing text data for crawling text data
CN109740091A (en) * 2018-12-26 2019-05-10 武汉大学 A kind of forecasting system and method for the user network behavior of Behavior-based control cognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102119389A (en) * 2008-06-11 2011-07-06 微软公司 Automatic image annotation using semantic distance learning
CN103186676A (en) * 2013-04-08 2013-07-03 湖南农业大学 Method for searching thematic knowledge self growth form focused crawlers
CN103970729A (en) * 2014-04-29 2014-08-06 河海大学 Multi-subject extracting method based on semantic categories
CN103970730A (en) * 2014-04-29 2014-08-06 河海大学 Method for extracting multiple subject terms from single Chinese text
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
KR20190047939A (en) * 2017-10-30 2019-05-09 한림대학교 산학협력단 Method and apparatus for collecting and analyzing text data for crawling text data
CN108154395A (en) * 2017-12-26 2018-06-12 上海新炬网络技术有限公司 A kind of customer network behavior portrait method based on big data
CN109740091A (en) * 2018-12-26 2019-05-10 武汉大学 A kind of forecasting system and method for the user network behavior of Behavior-based control cognition

Also Published As

Publication number Publication date
CN113326411B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN107491541B (en) Text classification method and device
CN107766426B (en) Text classification method and device and electronic equipment
KR102454930B1 (en) Image description statement positioning method and apparatus, electronic device and storage medium
CN108121736B (en) Method and device for establishing subject term determination model and electronic equipment
JP2021514497A (en) Face recognition methods and devices, electronic devices and storage media
CN107784034B (en) Page type identification method and device for page type identification
CN110909815A (en) Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN112668707B (en) Operation method, device and related product
CN109858614B (en) Neural network training method and device, electronic equipment and storage medium
CN111259967B (en) Image classification and neural network training method, device, equipment and storage medium
CN111127053B (en) Page content recommendation method and device and electronic equipment
CN107515870B (en) Searching method and device and searching device
CN112328793A (en) Comment text data processing method and device and storage medium
US20220245401A1 (en) Method and apparatus for training model
CN111046927A (en) Method and device for processing labeled data, electronic equipment and storage medium
CN108549641B (en) Song evaluation method, device, equipment and storage medium
CN113779257A (en) Method, device, equipment, medium and product for analyzing text classification model
CN112612949B (en) Method and device for establishing recommended data set
CN106446969A (en) User identification method and device
CN112035651A (en) Sentence completion method and device and computer-readable storage medium
CN112328809A (en) Entity classification method, device and computer readable storage medium
CN117453933A (en) Multimedia data recommendation method and device, electronic equipment and storage medium
CN112381091A (en) Video content identification method and device, electronic equipment and storage medium
CN109460458B (en) Prediction method and device for query rewriting intention
CN113326411B (en) Network behavior knowledge enhancement method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant