CN111552857B - Feature event identification method and device, electronic equipment and storage medium - Google Patents

Feature event identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111552857B
CN111552857B CN202010373434.1A CN202010373434A CN111552857B CN 111552857 B CN111552857 B CN 111552857B CN 202010373434 A CN202010373434 A CN 202010373434A CN 111552857 B CN111552857 B CN 111552857B
Authority
CN
China
Prior art keywords
target data
internet channel
event
internet
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010373434.1A
Other languages
Chinese (zh)
Other versions
CN111552857A (en
Inventor
刘小刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010373434.1A priority Critical patent/CN111552857B/en
Publication of CN111552857A publication Critical patent/CN111552857A/en
Application granted granted Critical
Publication of CN111552857B publication Critical patent/CN111552857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

One or more embodiments of the present disclosure provide a method and apparatus for identifying a feature event, an electronic device, and a storage medium. The method comprises the following steps: respectively determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event, wherein the grade is positively correlated with the association degree; distributing acquisition resources to each Internet channel, and respectively acquiring target data from each Internet channel through the distributed acquisition resources; wherein, the resource quantity of the distributed acquisition resources is positively correlated with the grade corresponding to each Internet channel; and analyzing the collected target data to identify characteristic events contained in the target data.

Description

Feature event identification method and device, electronic equipment and storage medium
Technical Field
One or more embodiments of the present disclosure relate to the field of network information processing technologies, and in particular, to a method and apparatus for identifying a feature event, an electronic device, and a storage medium.
Background
With the development of the internet, more and more users tend to spread information through the internet, and information security of the internet needs to be managed with enhanced. For information management of the internet, at present, information is generally captured by manually browsing pages, collecting pages and marking, and a large amount of labor cost is required.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a method and apparatus for identifying a feature event, an electronic device, and a storage medium
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present disclosure, a method for identifying a feature event is provided, including:
respectively determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event, wherein the grade is positively correlated with the association degree;
distributing acquisition resources to each Internet channel, and respectively acquiring target data from each Internet channel through the distributed acquisition resources; wherein, the resource quantity of the distributed acquisition resources is positively correlated with the grade corresponding to each Internet channel;
and analyzing the collected target data to identify characteristic events contained in the target data.
According to a second aspect of one or more embodiments of the present specification, there is provided an identification device for a characteristic event, including:
the determining module is used for respectively determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event, and the grade is positively correlated with the association degree;
the distribution module is used for distributing acquisition resources to each Internet channel and respectively acquiring target data from each Internet channel through the distributed acquisition resources; wherein, the resource quantity of the distributed acquisition resources is positively correlated with the grade corresponding to each Internet channel;
and the identification module is used for identifying the characteristic event contained in the target data by analyzing the collected target data.
According to a third aspect of one or more embodiments of the present specification, there is provided an electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of identifying a characteristic event as described in any preceding claim by executing the executable instructions.
According to a fourth aspect of one or more embodiments of the present description, a computer-readable storage medium is provided, on which computer instructions are stored, which instructions, when executed by a processor, implement the steps of the method for identifying a characteristic event according to any of the above.
Drawings
FIG. 1 is a schematic diagram of a system architecture for implementing feature event recognition as shown in an exemplary embodiment of the present description.
Fig. 2 is a flowchart illustrating a method for identifying a feature event according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating another method for identifying a characteristic event according to an exemplary embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating another method for identifying a characteristic event according to an exemplary embodiment of the present disclosure.
Fig. 5 is a schematic structural view of an apparatus shown in an exemplary embodiment of the present specification.
Fig. 6 is a schematic block diagram of an apparatus for identifying a characteristic event according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
In an embodiment, the recognition scheme of the feature event of the present disclosure may be applied to an electronic device, for example, the electronic device may include any type of mobile phone, tablet device, notebook computer, palm computer (PDAs, personal Digital Assistants), wearable device (such as smart glasses, smart watch, etc.), and the present disclosure is not limited thereto.
Fig. 1 is a schematic diagram of a system architecture for implementing feature event recognition according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the system may include a server 11, a network 12, a number of electronic devices, such as a cell phone 13, a cell phone 14, a cell phone 15, and the like.
The server 11 may be a physical server comprising a separate host, or the server 11 may be a virtual server carried by a cluster of hosts. During the running process, the server 11 may run a program on the server side of a certain application to implement the relevant service functions of the application, for example, when the server 11 runs a program of the identification application of the feature event, it may be implemented as a server of the identification application of the feature event.
It should be noted that the mobile phones 13-15 are only one type of electronic device that can be used by the user, and in fact, it is obvious that the user can also use electronic devices such as the following types: tablet devices, notebook computers, palm top computers (PDAs, personal Digital Assistants), wearable devices (e.g., smart glasses, smart watches, etc.), etc., as one or more embodiments of the present description are not limited in this regard. And the network 12 for interaction between the handsets 13-15 and the server 11 may comprise various types of wired or wireless networks. In one embodiment, the network 12 may include a public switched telephone network (Public Switched Telephone Network, PSTN) and the internet.
Fig. 2 is a flowchart illustrating a method for identifying a feature event according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the method may include the steps of:
step 202, determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event.
Among other things, the Internet channel may include, but is not limited to: forum, official network, APP (application), microblog, weChat applet, public number etc. In this embodiment, the status of the internet channel is not limited, and the internet channel may be an internet channel which completes domain name registration and has handled internet information service and has not handled internet information service and has expired.
The feature event is an internet event to be monitored, for example, information that the internet channel issues once and contains preset target content is called a feature event. The target content can be set according to the actual requirement and can be sensitive content, negative content and the like.
In step 202, the level of the internet channel is positively correlated with the degree of correlation, and the higher the level of the internet channel is, the lower the level of correlation of the internet channel with the feature event is, and the higher the level of the internet channel is.
In one embodiment, the degree of association between the internet channels and the feature event may be determined according to historical empirical data, so as to rank each internet channel. Taking the example that the characteristic event is the release sensitive information, if the probability of releasing the sensitive information on the microblog (the internet channel) is larger than the probability of releasing the sensitive information on the forum (the internet channel) according to the historical experience data, the association degree of the microblog and the characteristic event is higher than the association degree of the forum and the characteristic event, and the microblog grade can be determined to be higher than the forum grade.
In another embodiment, the initial collection of the target data may be performed on each internet channel, and the grade of the internet channel is determined by analyzing the target data collected initially, for example, whether the target data includes a preset keyword, so as to determine the association degree of the internet channel and the feature event. For initial collection of target data, the method can be realized by adopting the modes of open data downloading, API interface, web crawler and the like, but is not limited to.
It can be understood that the content of the page presented on the internet channel is updated from time to time, so that whether the internet channel generates the characteristic event cannot be accurately judged through the target data acquired at one time, and therefore the following steps are needed to acquire the target data of the internet channel in a circulating way and analyze the target data to determine whether the internet channel generates the characteristic event.
And 204, distributing acquisition resources to each Internet channel, and respectively acquiring target data from each Internet channel through the distributed acquisition resources.
The resource quantity of the acquisition resources allocated to each Internet channel is positively correlated with the grade corresponding to each Internet channel. Acquisition resources may be, but are not limited to, storage resources, computing resources, IP resources, etc. used in data acquisition.
It can be understood that the characteristic events on the internet have the characteristics of high burst frequency, fast transmission, wide audience, various forms and the like, and the wider the target data acquisition range of the internet channel is, the higher the frequency is, the more accurate and timely the characteristic events can be identified, which is required to be based on the premise of distributing enough acquisition resources.
However, in the case where the acquisition resources are limited, sufficient acquisition resources cannot be allocated for each internet channel. In step 204, corresponding resource amounts of the acquisition resources are allocated to each internet channel according to the grade of the internet channel, and the internet channel with higher grade can allocate more acquisition resources, so that the target data of the internet channel can be acquired with higher acquisition frequency or shorter acquisition period, and key monitoring of the internet channel with higher grade is realized. The acquisition resources may, but are not limited to, acquiring text type target data, picture type target data, video type target data, voice type target data, etc. of the internet channel.
Step 206, analyzing the collected target data to identify the characteristic event contained in the target data.
In step 206, the target data is analyzed, which may be to determine whether the target data contains a preset keyword, and determine that the internet channel may generate a feature event when the target data contains the preset keyword. The preset keywords can be keywords related to the feature event and can be set according to actual requirements. For example, taking the example of a characteristic event being the release of sensitive information containing illegal financial activity content, keywords may be, but are not limited to, "high interest credits," "high interest," and the like.
Because the internet channels of different grades are allocated with the acquisition resources of different resource amounts, the target data can be acquired in different acquisition periods (or acquisition frequencies) and acquisition ranges for the internet of different grades, the acquisition periods of the internet channels of different grades are overlapped differently, which means that the total acquisition of the target data can not be carried out on all the internet channels at the same time, and the data amount of the target data acquired in each time period or each moment can not be too large, so that the acquired target data can be analyzed in real time in step 206, and the characteristic events of the internet channels can be identified in time. By full-size collection is meant the collection of all data provided by the internet channel, including the page data of the top page, the page data of the sub-pages, etc.
In the embodiment, the resource quantity of the acquisition resources allocated to the internet channel is matched with the grade of the internet channel, the association degree of the internet channel with the characteristic event with higher grade is higher, the acquisition resources with more resource quantity are allocated, and the acquisition resources are subjected to key acquisition and monitoring, so that the acquisition frequency and range are enlarged, and the characteristic event generated in the moment of the internet channel can be captured; the correlation degree between the internet channels with lower grades and the characteristic events is lower, and acquisition resources with smaller resource quantity are allocated; therefore, the acquisition of a large amount of effective data and the maximization of the utility of the acquisition result are realized under the condition of the existing acquisition resource consumption.
In another embodiment, if the target data is identified to include the feature event, the pre-warning information may also be sent to the preset alert object, so that the preset alert object can timely cope with and process the feature event. The early warning information may include the reason for generating the feature event, the domain name of the website generating the feature event, the publisher and/or registrar of the website, the web content generating the feature event, and the like. The early warning information can be sent to a preset warning object in a short message, mail and other modes, so that early discovery and early warning of the characteristic event are realized; the user interface for viewing the characteristic event can also be provided for the preset alarm object, and the characteristic event can be prompted through a page popup window. The preset alert object may also use the user interface to set keywords for the feature event.
In another embodiment, the ranking of the plurality of Internet channels may also be updated based on the target data collected in step 204. In step 202, the grades of the internet channels are preliminary grades of the internet channels, and the degree of correlation between the internet channels and the feature event is changed along with the data update of the internet channels, so that in order to improve the accuracy of feature event identification, the grade grades of the internet channels can be adjusted according to the target data collected again in step 204.
If the target data of the internet channel is periodically collected in step 204, the target data collected in each collection period can be analyzed and the grading of the internet channel can be adjusted under the condition that the target data of each collection period is completely collected. Therefore, the accuracy of grading can be improved by circularly collecting the target data and repeatedly correcting the grading of the internet channel, the reasonable distribution of the collection resources is facilitated, and the accuracy of feature event identification is improved.
Fig. 3 is a flowchart of another method for identifying a characteristic event according to an exemplary embodiment of the present disclosure, in this embodiment, by performing an initial acquisition of target data on an internet channel, and determining a ranking of the internet channel according to the initially acquired target data. As shown in fig. 3, the method may include the steps of:
step 302, initial collection of target data is carried out on each internet channel.
In step 302, the initial collection of the target data may be implemented by, but not limited to, an open data download, an API interface, a web crawler, etc.
And 304, determining the association degree of each Internet channel and the characteristic event according to the target data initially collected from each Internet channel.
The association procedure of the internet channel and the characteristic event can be determined according to keywords contained in the target data, and the user can preset the keywords associated with the characteristic event. If the target data does not contain the content matched with the preset keywords, primarily determining that the Internet channel is irrelevant to the characteristic event; if the target data includes content matched with the preset keywords, the internet channel is initially determined to be associated with the feature event, and the degree of association between the internet channel and the feature event needs to be further quantified, which can be, but is not limited to, implemented by the following ways:
and matching keywords in the target data, scoring the association degree of the Internet channels according to the number of the keywords and the weight of the keywords in the target data, wherein the score is positively correlated with the association degree, the higher the score is, the higher the association degree of the Internet channels and the characteristic events is, and the lower the score is, the lower the association degree of the Internet channels and the characteristic events is.
For ease of understanding, one possible implementation of quantifying the degree of association is described below taking as an example that the feature event is the release of sensitive information containing illegal financial activity content:
the "high interest" and "P2P second mark" can be preset as keywords related to the feature event, and corresponding weights are set for each keyword, wherein the weights of the keywords can be set according to actual requirements. For example, "high interest credits" and "P2P seconds" are associated with a higher degree of unlawful financial activity, and may be set with a higher weight, set to 0.8; the degree of correlation of "high interest" with illegal financial activity is low, and a lower weight can be set to 0.2. Therefore, the relevance degree scoring can be performed on each internet channel according to the number of keywords contained in the target data and the corresponding weight, and a calculation formula of the relevance degree scoring can be represented as follows:
Z=a×A+b×B+c×C;
wherein Z represents the score of the relevancy score; A. b, C the number of keywords; a. c and b represent weights of keywords.
It will be appreciated that the weights corresponding to the same keyword in different types of feature events may be the same or different. For example, taking the three keywords as examples, in the characteristic event of sensitive information monitoring, the value of a is 0.8, the value of b is 0.2, and the value of c is 0.8; in the characteristic event of negative information monitoring, the value of a is 0.5, the value of b is 0.4, and the value of c is 0.5.
It should be noted that the number of keywords can be set according to the actual requirement, and is not limited to the above-provided 3 keywords, but can be 5 keywords, 6 keywords or even more keywords. Under the condition that the number of the keywords is changed, the corresponding calculation formula needing to modify the relevance scoring is correspondingly modified.
Step 306, determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event.
Wherein, the grade of the internet channel is positively correlated with the degree of association.
In one embodiment, the relevance scoring scores may be divided into different value ranges, each range corresponds to a level, the range with high scoring score corresponds to a higher level, the range with low scoring score corresponds to a lower level, for example, the internet channels with relevance scoring scores in the range of 1-0.7 correspond to important monitoring levels, the internet channels with relevance scoring scores in the range of 0.7-0.5 correspond to medium monitoring levels, the internet channels with relevance scoring scores in the range of 0.5-0.2 correspond to low monitoring levels, and the internet channels with relevance scoring scores in the range of 0.2-0 correspond to very low monitoring levels.
It should be noted that the number of the classification levels of the internet channel is not limited to the above-provided 4 levels, and the number of the levels may be set according to the actual requirements, for example, to 3 levels or 5 levels.
And 308, distributing acquisition resources to each Internet channel, and respectively acquiring target data from each Internet channel through the distributed acquisition resources.
The resource quantity of the acquisition resources allocated to each Internet channel is positively correlated with the grade corresponding to each Internet channel. The higher-grade internet channel description generates more frequent characteristic events, more resource quantity of acquisition resources can be allocated, so that target data of the internet channel can be acquired with higher acquisition frequency (shorter acquisition period) and larger acquisition range, and key monitoring of the internet channel with higher grade is realized. The lower the level of the internet channel description, the less probability of generating a characteristic event, and less acquisition resources can be allocated to acquire target data of the internet channel with a lower acquisition frequency and a smaller acquisition range (for example, only acquiring page data of part of sub-pages).
For example, assuming that the total resource amount of the collection resources is divided into 4 levels, 60% of the total resource amount of the collection resources can be distributed to the internet channels of the key monitoring level for collecting the target data of the internet channels of the key monitoring level, so that the target data is collected once in 1 minute; 30% of the total resource amount of the acquisition resources is distributed to the Internet channel with the medium monitoring level and is used for acquiring target data of the Internet channel with the medium monitoring level, so that the target data is acquired once in 1 day; distributing 8% of the total resource amount of the acquired resources to the Internet channels with medium monitoring level, and acquiring target data of the Internet channels with low monitoring level, so as to acquire the target data once in 1 week; and distributing 2% of the total resource amount of the acquisition resources to the Internet channel with the extremely low monitoring level, and acquiring the target data of the Internet channel with the extremely low monitoring level, so as to acquire the target data once in 1 month. Therefore, the grade matching of the acquisition resources and the Internet channels is realized, and the utility maximization of the acquisition result is realized under the existing acquisition resource consumption.
In another embodiment, the ranking of the internet channels may also be updated using the target data collected for each collection period. Specific:
and re-determining the association degree of each internet channel and the characteristic event according to the target data acquired in each acquisition period, and updating the grade of the internet channel according to the re-determined association degree. The quantization process of the association degree is similar to that of step 304, and will not be described herein. Therefore, the grade of the internet channel is repeatedly adjusted according to the circularly acquired target data, and the accuracy of grade division of the internet is improved.
Step 310, analyzing the collected target data to identify the characteristic event contained in the target data.
In one embodiment, for the target data of the type herein, the target data may be analyzed by, but not limited to, semantic analysis, web phrase translation, variant font recognition, etc., to identify characteristic events contained in the target data; for the video-voice type target data, the voice data can be analyzed by utilizing a voice recognition technology so as to recognize characteristic events contained in the target data; for the target data of the video type, image recognition and classification technologies of multiple context structures and depth feature mining can be utilized, features of different semantic hierarchies are fused, and the image and the video are analyzed by combining with emotion analysis technologies. If the target data contains at least two types of data, the weighted summation can be carried out on the analysis results of the at least two types of data, and finally, the characteristic event contained in the target data is identified.
In another embodiment, a pre-trained event recognition model may also be used for feature event recognition, specifically: inputting the text type target data into an event recognition model, recognizing the characteristic event contained in the Internet channel according to the confidence coefficient output by the event recognition model, and representing the association degree of the Internet channel and the characteristic event by the confidence coefficient.
The event recognition model is obtained by training a Hough Forest (Random Forest) or a yolo neural network or other convolution neural networks by adopting text pairs marked with characteristic event labels. Different models may be trained for different types of feature events, such as feature events containing sensitive information, feature events containing negative information. For an event recognition model for recognizing a characteristic event containing sensitive information, training by using a text marked with a sensitive information label as a training sample; for an event recognition model for recognizing a characteristic event containing negative information, training is carried out by adopting a text marked with the negative information label as a training sample. The specific training process of the model is not described in detail here.
The event recognition model can accurately recognize the characteristic events according to the target data and exclude the invalid characteristic events. By invalid feature event is meant that although the target data contains some keywords related to the feature event, the content belongs to active promotional content and content in public service advertisements, and should not be identified as the feature event.
In another embodiment, if the collected target data includes target data of a picture type, the target data of the picture type is first converted into target data of a text type, and then the target data is input into the event recognition model. Text conversion may be, but is not limited to, employing OCR (optical character recognition) technology.
In another embodiment, before the target data is input into the event detection model, the target data may be further subjected to processing such as cleaning, normalization, rearrangement, and the like, and the target data subjected to data cleaning, data normalization, and data rearrangement is input into the event detection model.
In this embodiment, according to different levels of internet channels, different amounts of acquisition resources are adopted to acquire target data from the internet, the internet channel with higher level has higher association degree with the characteristic event, more resource amounts can be allocated to perform key acquisition monitoring, and the acquisition frequency and range are enlarged, so that the characteristic event of the instant behavior of the website can be captured, for example, a certain page is released in the internet channel for 10 minutes, namely, the web page is put down. The internet channels with lower grades have lower association degree with characteristic events and can be distributed with less resource quantity, so that more useful information can be acquired under the existing resource consumption, and the utility of acquisition results is maximized.
In this embodiment, the collection resources with different resource amounts are allocated to the internet channels with different levels, so that the target data of the internet channels are collected with different collection periods (or collection frequencies) and/or different collection ranges, and the data amount of the target data collected in each time period or each time is not too large, so that the characteristic event can be analyzed and identified for the target data process while the target data is collected, the characteristic event can be found in time, and early warning information can be generated.
Fig. 4 is a flowchart of another method for identifying a characteristic event according to an exemplary embodiment of the present disclosure, which is substantially the same as the method for identifying a characteristic event according to fig. 3, except that the present embodiment further uses the data update frequency of the internet channel and/or the resource consumption amount when collecting the target data as a reference basis when quantifying the association degree. As shown in fig. 3, the method may include the steps of:
step 402, initial collection of target data is carried out on each internet channel.
In step 402, the target data of each internet channel may be collected synchronously or asynchronously. It can be understood that the efficiency of data acquisition in a synchronous acquisition mode is higher, more target data can be obtained in an asynchronous acquisition mode under the condition of limited acquisition resources, and a user can adopt the data acquisition mode according to actual demands.
If the internet channels are acquired by adopting a synchronous acquisition mode, acquisition resources with the same resource amount can be allocated to each internet channel, and initial acquisition of the target data is respectively carried out on each internet channel by using initial acquisition resources with the same resource amount.
Step 404, determining the data update frequency of each internet channel and/or the resource consumption when collecting the target data.
Step 406, determining the association degree of each internet channel and the characteristic event according to the target data and the data updating frequency which are initially acquired from each internet channel and/or the resource consumption when the target data are acquired.
In the process of determining the association degree, association degree scoring can be performed according to the keyword quantity contained in the target data, then weighting and summing are performed on scoring values, data updating frequency and resource consumption, and the association degree of the internet channel and the characteristic event is represented by using the weighted and summed result. Specific implementation of the association score may refer to step 304, which is not described herein.
Step 408, determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event.
And step 410, distributing acquisition resources to each Internet channel, and respectively acquiring target data from each Internet channel through the distributed acquisition resources.
Step 412, analyzing the collected target data to identify the characteristic event contained in the target data.
The specific implementation process of steps 408 to 412 is similar to that of steps 306 to 310, and will not be repeated here.
In another embodiment, the ranking of the Internet channels may also be updated based on the target data collected in step 410. If the target data from each internet channel is periodically collected in step 410, in the process of level update, the relevance degree scoring can be performed according to the keyword amount contained in the target data and the resource consumption amount of each collection period when the collection of the target data is completed in each collection period, the data update frequency of the internet channel in the collection period is determined, the three results are weighted and summed, and the relevance degree of the internet channel and the characteristic event is represented by using the weighted and summed results.
Fig. 5 is a schematic structural view of an apparatus according to an exemplary embodiment of the present specification. Referring to fig. 5, at the hardware level, the device includes a processor 502, an internal bus 504, a network interface 506, a memory 508, and a nonvolatile memory 510, although other hardware may be included as needed for other services. The processor 502 reads the corresponding computer program from the non-volatile memory 510 into the memory 508 and then runs, forming the recognition means of the characteristic event at the logic level. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.
Referring to fig. 6, in a software implementation, the identifying device for a feature event may include:
a determining module 61, configured to determine a level of each internet channel based on a degree of association between the internet channel and the feature event, where the level is positively related to the degree of association;
the distribution module 62 is configured to distribute the acquisition resources to each internet channel, and collect the target data from each internet channel through the distributed acquisition resources; wherein, the resource quantity of the distributed acquisition resources is positively correlated with the grade corresponding to each Internet channel;
the identifying module 63 is configured to identify a feature event contained in the collected target data by analyzing the target data.
Optionally, the determining module includes:
the acquisition unit is used for carrying out initial acquisition of target data on each Internet channel by initial acquisition resources with the same resource quantity;
and the determining unit is used for determining the association degree of each internet channel and the characteristic event according to the target data initially collected from each internet channel.
Optionally, the determining unit is configured to:
keyword matching is carried out on the target data;
and scoring the association degree of the Internet channels according to the number of keywords and the weight of the keywords contained in the target data, wherein the score is positively correlated with the association degree.
Optionally, the determining module is further configured to:
determining the data updating frequency of each internet channel and/or the resource consumption when collecting target data;
the degree of association is positively correlated to the page update frequency and/or the resource consumption of the data acquisition.
Optionally, the identification module includes:
the input unit is used for inputting the text type target data into an event recognition model, wherein the event recognition model is obtained by training a neural network through a text marked with a characteristic event label;
and the identification unit is used for identifying the characteristic event contained in the Internet channel according to the confidence coefficient output by the event identification model.
Optionally, the identifying device of the characteristic event further includes:
the conversion module is used for converting the target data of the picture type into the target data of the text type under the condition that the collected target data contains the target data of the picture type, so as to be used for inputting the event recognition model.
Optionally, the distribution module periodically collects target data from each internet channel;
wherein the acquisition period is inversely related to the amount of resources of the acquisition resources allocated to the corresponding internet channel.
Optionally, the identifying device further includes:
and the updating module is used for updating the grade of each internet channel according to the target data acquired by the acquisition resources from the internet channels.
Optionally, the updating module includes:
the determining unit is used for redefining the association degree of the internet channel and the characteristic event according to the target data acquired by the acquisition resources in each acquisition period;
and the adjusting unit is used for adjusting the grade of the Internet channel according to the redetermined association degree.
Optionally, the identifying device further includes:
and the early warning module is used for sending early warning information to a preset warning object under the condition that the target data is identified to contain the characteristic event.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both permanent and non-permanent, removable and non-removable media, may implement information storage by any method or technology, having stored thereon a computer program (information) that when executed by a processor performs the method steps provided by any of the embodiments described above. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims (13)

1. A method of identifying a characteristic event, comprising:
respectively determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event, wherein the grade is positively correlated with the association degree;
distributing acquisition resources to each Internet channel, and respectively acquiring target data from each Internet channel through the distributed acquisition resources; wherein, the resource quantity of the distributed acquisition resources is positively correlated with the grade corresponding to each Internet channel;
and analyzing the collected target data to identify characteristic events contained in the target data.
2. The method for identifying a feature event according to claim 1, determining a degree of association of an internet channel with the feature event, comprising:
respectively carrying out initial acquisition of target data on each Internet channel by using initial acquisition resources with the same resource quantity;
and determining the association degree of each Internet channel and the characteristic event according to the target data initially collected from each Internet channel.
3. The method for identifying a feature event according to claim 2, wherein determining the degree of association between the internet channel and the feature event according to the initially collected target data comprises:
keyword matching is carried out on the target data;
and scoring the association degree of the Internet channels according to the number of keywords and the weight of the keywords contained in the target data, wherein the score is positively correlated with the association degree.
4. The method for identifying a feature event according to claim 2, wherein the determining the association degree of each internet channel with the feature event comprises:
determining the data updating frequency of each internet channel and/or the resource consumption when collecting target data;
the degree of correlation is positively correlated with the frequency of data update and/or the consumption of resources when the target data is collected.
5. The method for identifying a feature event according to claim 1, wherein the feature event contained in the target data is identified by analyzing the collected target data, comprising:
inputting target data of a text type into an event recognition model, wherein the event recognition model is obtained by training a neural network by adopting a text marked with a characteristic event label;
and identifying the characteristic event contained in the Internet channel according to the confidence coefficient output by the event identification model.
6. The method for identifying a characteristic event according to claim 5, further comprising:
and under the condition that the collected target data comprises target data of a picture type, converting the target data of the picture type into target data of a text type for inputting the event recognition model.
7. The method for identifying characteristic events according to claim 1, wherein the collecting target data from each internet channel through the distributed collection resources comprises:
periodically collecting target data from each internet channel;
wherein the acquisition period is inversely related to the amount of resources of the acquisition resources allocated to the corresponding internet channel.
8. The method of identifying a feature event according to claim 1, the method further comprising:
and updating the grade of each internet channel according to the target data acquired by the acquisition resources from the internet channels.
9. The method for identifying a characteristic event according to claim 8, updating the level of each internet channel, comprising:
the association degree of the internet channel and the characteristic event is redetermined according to the target data acquired by the acquisition resources in each acquisition period;
and adjusting the grade of the Internet channel according to the redetermined association degree.
10. The method of identifying a feature event according to claim 1, the method further comprising:
and under the condition that the target data is identified to contain the characteristic event, sending early warning information to a preset warning object.
11. An identification device for a characteristic event, comprising:
the determining module is used for respectively determining the grade of each internet channel based on the association degree of the internet channel and the characteristic event, and the grade is positively correlated with the association degree;
the distribution module is used for distributing acquisition resources to each Internet channel and respectively acquiring target data from each Internet channel through the distributed acquisition resources; wherein, the resource quantity of the distributed acquisition resources is positively correlated with the grade corresponding to each Internet channel;
and the identification module is used for identifying the characteristic event contained in the target data by analyzing the collected target data.
12. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the method of identifying a characteristic event according to any of claims 1-10 by executing the executable instructions.
13. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of identifying a characteristic event according to any of claims 1-10.
CN202010373434.1A 2020-05-06 2020-05-06 Feature event identification method and device, electronic equipment and storage medium Active CN111552857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010373434.1A CN111552857B (en) 2020-05-06 2020-05-06 Feature event identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010373434.1A CN111552857B (en) 2020-05-06 2020-05-06 Feature event identification method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111552857A CN111552857A (en) 2020-08-18
CN111552857B true CN111552857B (en) 2023-09-19

Family

ID=72004566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010373434.1A Active CN111552857B (en) 2020-05-06 2020-05-06 Feature event identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111552857B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463686A (en) * 2017-08-10 2017-12-12 深圳市腾讯计算机系统有限公司 A kind of method and device of calculating network public sentiment temperature
CN107766481A (en) * 2017-10-13 2018-03-06 国家计算机网络与信息安全管理中心 A kind of method and system for finding internet financial platform
CN109840183A (en) * 2018-12-05 2019-06-04 平安科技(深圳)有限公司 Data center's grading forewarning system method, apparatus and storage medium
CN110557608A (en) * 2019-07-29 2019-12-10 视联动力信息技术股份有限公司 resource monitoring method, device and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10135712B2 (en) * 2016-04-07 2018-11-20 At&T Intellectual Property I, L.P. Auto-scaling software-defined monitoring platform for software-defined networking service assurance
US10153983B2 (en) * 2016-11-04 2018-12-11 Bank Of America Corporation Optimum resource routing using contextual data analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463686A (en) * 2017-08-10 2017-12-12 深圳市腾讯计算机系统有限公司 A kind of method and device of calculating network public sentiment temperature
CN107766481A (en) * 2017-10-13 2018-03-06 国家计算机网络与信息安全管理中心 A kind of method and system for finding internet financial platform
CN109840183A (en) * 2018-12-05 2019-06-04 平安科技(深圳)有限公司 Data center's grading forewarning system method, apparatus and storage medium
CN110557608A (en) * 2019-07-29 2019-12-10 视联动力信息技术股份有限公司 resource monitoring method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN111552857A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
Ma et al. A new aspect on P2P online lending default prediction using meta-level phone usage data in China
Bansal et al. On predicting elections with hybrid topic based sentiment analysis of tweets
JP5454357B2 (en) Information processing apparatus and method, and program
Koesten et al. Everything you always wanted to know about a dataset: Studies in data summarisation
WO2020156389A1 (en) Information pushing method and device
US20130263181A1 (en) Systems and methods for defining video advertising channels
TW201935292A (en) Compliant report class sorting method and apparatus
CN108563680A (en) Resource recommendation method and device
WO2011031973A1 (en) Determining client system attributes
CN111538931A (en) Big data-based public opinion monitoring method and device, computer equipment and medium
CN109977312B (en) Knowledge base recommendation system based on content tags
CN117271385A (en) Garbage collection for data storage
Unger et al. Inferring contextual preferences using deep auto-encoding
CN106033455B (en) Method and equipment for processing user operation information
US11954162B2 (en) Recommending information to present to users without server-side collection of user data for those users
CN110858326B (en) Method, device, equipment and medium for training model and acquiring additional characteristic data
CN110020196B (en) User analysis method and device based on different data sources and computing equipment
CN110717788A (en) Target user screening method and device
WO2020204812A1 (en) Privacy separated credit scoring mechanism
CN112330426A (en) Product recommendation method, device and storage medium
CN111552857B (en) Feature event identification method and device, electronic equipment and storage medium
CN111078972B (en) Questioning behavior data acquisition method, questioning behavior data acquisition device and server
CN111382343B (en) Label system generation method and device
CN110309312B (en) Associated event acquisition method and device
CN111882360A (en) User group expansion method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40035837

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant