CN111581533B - Method and device for identifying state of target object, electronic equipment and storage medium - Google Patents

Method and device for identifying state of target object, electronic equipment and storage medium Download PDF

Info

Publication number
CN111581533B
CN111581533B CN202010398237.5A CN202010398237A CN111581533B CN 111581533 B CN111581533 B CN 111581533B CN 202010398237 A CN202010398237 A CN 202010398237A CN 111581533 B CN111581533 B CN 111581533B
Authority
CN
China
Prior art keywords
word
target
preset
state
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010398237.5A
Other languages
Chinese (zh)
Other versions
CN111581533A (en
Inventor
刘健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010398237.5A priority Critical patent/CN111581533B/en
Publication of CN111581533A publication Critical patent/CN111581533A/en
Application granted granted Critical
Publication of CN111581533B publication Critical patent/CN111581533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of computers, in particular to a method and a device for identifying the state of a target object, electronic equipment and a storage medium, which are used for improving the identification efficiency and the accuracy of the state of the object, wherein the method comprises the following steps: acquiring network content of a target website, analyzing the network content, and extracting text information; filtering the text information to obtain a target sentence which simultaneously contains a preset object word and a preset state word; performing dependency syntactic analysis on the target sentence, and determining a preset object word and a preset state word aimed at by a target operation in the target sentence; when the type of the preset status word is the target type, determining a target object according to the preset object word in the target sentence, and determining the status information of the target object according to the type of the preset status word. According to the application, the dependency syntax analysis is carried out on the target sentences in the network content, so that the state information of the target object can be automatically identified, and the identification efficiency and accuracy of the object state are effectively improved.

Description

Method and device for identifying state of target object, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, and provides a method and a device for identifying the state of a target object, electronic equipment and a storage medium.
Background
In the related art, the recognition modes of the platform state include two types, one type is a manual auditing mode, and public opinion news of the related platform is collected and manually researched and judged. The method has high auditing cost, needs a large amount of manpower treatment and has certain hysteresis. The other type is to acquire the business registration information of the enterprise and check whether the current business state of the enterprise is normal. The method has too long time delay, and can last for months or even years, and the business registration information cannot be updated in time for the business operation state of the enterprise.
That is, in the above two platform state identification methods, it takes a long time to perform manual identification, and is affected by manual intervention, so that the efficiency and accuracy of identifying the platform state are low.
Disclosure of Invention
The embodiment of the application provides a method, a device, electronic equipment and a storage medium for identifying the state of a target object, which are used for improving the identification efficiency and accuracy of the state of the object.
The method for identifying the state of the target object provided by the embodiment of the application comprises the following steps:
acquiring network content of a target website, analyzing the network content, and extracting text information of the network content;
filtering the text information to obtain a target sentence which simultaneously contains a preset object word and a preset state word in the text information;
performing dependency syntactic analysis on the target sentence, and determining a preset object word and a preset state word aimed at by target operation in the target sentence;
when the type of the preset status word is a target type, determining a target object according to the preset object word in the target sentence, and determining the status information of the target object according to the type of the preset status word.
The device for identifying the state of the target object provided by the embodiment of the application comprises the following components:
the acquisition unit is used for acquiring the network content of the target website, analyzing the network content and extracting the text information of the network content;
the filtering unit is used for filtering the text information to obtain target sentences which simultaneously contain preset object words and preset state words in the text information;
The analysis unit is used for carrying out dependency syntax analysis on the target sentence and determining a preset object word and a preset state word aimed at by a target operation in the target sentence;
and the determining unit is used for determining a target object according to the preset object word in the target sentence when the type of the preset state word is the target type, and determining the state information of the target object according to the type of the preset state word.
Optionally, the apparatus further includes:
the application unit is used for marking the network content according to the acquired state information of the target object so as to prompt an account for viewing the network content; or (b)
And updating the object state corresponding to the target object in the corresponding relation between the pre-stored object name and the object state according to the acquired state information of the target object.
The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores program codes, and when the program codes are executed by the processor, the processor is caused to execute the steps of the state identification method of any one of the target objects.
An embodiment of the application provides a computer readable storage medium comprising program code for causing an electronic device to perform the steps of any one of the above-described methods of identifying a state of a target object when the program product is run on the electronic device.
The application has the following beneficial effects:
according to the method, the device, the electronic equipment and the storage medium for identifying the state of the target object, the dependency syntax analysis is carried out on the target sentences in the network content, so that the dependency relationship between the preset object words and the preset state words can be obtained, and further the determined dependency relationship is analyzed to obtain the state information of the target object, the automatic identification of the state of the target object is realized, the whole identification process does not need manual intervention, the influence of subjective consciousness of auditors can be effectively reduced, and the identification efficiency and accuracy are improved. In addition, the embodiment of the application can acquire the target website in real time, automatically identify the network content of the target website and extract the target sentence in the network content, so that the object state change can be actively perceived and discovered at the first time, and the timely update of the state information of the target object is realized.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1A is an alternative schematic diagram of a platform state monitoring system in the related art;
FIG. 1B is a schematic diagram of an alternative platform state monitoring system according to the related art;
fig. 2 is a schematic diagram of an application scenario in an embodiment of the present application;
FIG. 3 is a schematic diagram of a method for identifying a state of a target object according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a prompting method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an audit system according to an embodiment of the present application;
FIG. 6A is a diagram illustrating a lexical analysis according to an embodiment of the present application;
FIG. 6B is a schematic diagram of another lexical analysis in an embodiment of the present application;
FIG. 7 is a flowchart of a complete method for identifying the status of a target object according to one embodiment of the application;
FIG. 8 is a schematic diagram of a configuration of a status recognition device for a target object according to an embodiment of the present application;
fig. 9 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present application;
fig. 10 is a schematic diagram of a hardware configuration of a computing device to which an embodiment of the present application is applied.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.
Some of the concepts involved in the embodiments of the present application are described below.
Object words: the term indicating the object name may be a formal name or an alias. The object in the embodiment of the application can be a person, a platform, an enterprise, an organization, and the like, so that the object word can be a person name, a platform name, an enterprise name, a platform alias, and the like. The aliases refer to names other than the names of law identifiers or specifications, and one person, thing or industry has another name besides the legal names of authorities, or is written or spoken, and is called nickname, flower name or the like.
Status word: in the embodiment of the application, if the object is a platform or an enterprise, the status words can be broken, closed, destroyed, captured, put into action, out-of-connection and the like, and if the object is a person, the status words can be of various types, such as promotion, job improvement, job dictionary, entrance, job entry and the like.
Target sentence: the text information obtained by screening after analyzing the network content is in the form of one sentence, such as news headlines, sentences in news bodies, and the like. In the embodiment of the application, when the target sentences are extracted from the network content, if the network content is in the form of an article, sentence segmentation processing is needed to be performed on the article first, and generally, sentence segmentation can be performed according to punctuation regular expressions to obtain sentences, and then the sentences are screened to extract the target sentences. Similar reasoning applies when the web content is in other forms.
Dependency syntax: the sentence is analyzed into a dependency syntax tree, and the dependency relationship among the words is described, namely, the syntactic collocation relationship (also called dependency relationship) among the words is pointed out, and the collocation relationship is related with semantics. It analyzes the sentence into a dependency syntax tree describing the dependency relationship between the words. Dependency in dependency syntax theory refers to the relationship that is supported and governed from word to word, which is not peer-to-peer, and which has a directional heading. Specifically, the dominant component is referred to as the dominant, and the dominant component is referred to as the subordinate. In an embodiment of the application, the dependency relationship may be represented by a directed line segment, the direction of which is pointed to by the dependent word to the dominant word.
Dependency type: in the embodiment of the application, the dependency relationship between two words can be subdivided into a plurality of different types, and the specific syntactic relationship between two words is represented, for example, a dynamic guest relationship, a prepositional object relationship, a parallel relationship, a state middle relationship, an additional relationship, a quantity relationship, a parity relationship, a mediate relationship, a main-predicate relationship, a comparison relationship, a time relationship, a place relationship, a dynamic complement structure, a double-meaning structure, an association word, an association structure, a language structure, a continuous-predicate structure, a core relationship, a prepositional object, a double-object and a superposition word relationship.
Web crawler clusters: the network crawlers (servers) are a program for automatically capturing internet link content, and a large amount of website content information can be acquired in a large-scale and concurrent mode through server cluster deployment. In the embodiment of the application, the web content corresponding to the target website can be acquired through the web crawler cluster.
Conditional random field (conditional random field, CRF): is a kind of identification probability model, which is a kind of random field, and is commonly used for labeling or analyzing sequence data, such as natural language characters or biological sequences. The conditional random field is a conditional probability distribution model P (y|x) representing a conditional probability distribution model of another set of output random variables Y given a set of input random variables X, characterized in that it is assumed that the output random variables constitute a markov random field. Conditional random fields can be used for different prediction problems. In the embodiment of the application, the CRF-based method can analyze the dependency syntax of the target sentence and acquire the dependency relationship among the words in the target sentence.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
In the state recognition method of the target object provided by the embodiment of the application, the dependency syntax analysis of the target sentence is realized in a deep learning-based manner; deep learning is a new research direction in the field of machine learning, and by learning the internal rules and representation layers of sample data, information obtained in the learning process is greatly helpful to the interpretation of data such as words, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art. In the embodiment of the application, the deep learning is mainly applied to the field of natural language processing, after a neural network is analyzed through a large number of sample training syntactic analysis, a target sentence is input into the neural network, so that the dependency relationship among all words in the target sentence can be acquired, and the state information of the target object is analyzed based on the acquired dependency relationship.
The following briefly describes the design concept of the embodiment of the present application:
The rapid development of internet technology brings more and more convenience to people's life. People can conveniently share and download various data, acquire various important information, pay bills online and the like through the Internet. Meanwhile, the security situation of the Internet is also optimistic, and illegal activities such as illegal transactions are increasingly rampant by utilizing the Internet. When the supervision department carries out daily supervision on various financial, investment financial, network loan and other platforms on the Internet, the supervision department needs to exclude the platform which is investigated or banked by the standing case, so that repeated work is avoided, and supervision efficiency is improved. Meanwhile, the third party monitoring platform also needs to acquire the information of the platform state in time, early warning is carried out on the illegal platform, and a user is reminded.
Taking a third party monitoring platform of a certain trade as an example, the platform can record the state of each trade platform in time and mark and update the state, as shown in fig. 1A. The interface shown in fig. 1A records status information of a certain transaction platform, and as can be seen from fig. 1A, the website corresponding to the platform is a certain network, and the current status is: law enforcement personnel intervene, i.e., are being investigated.
As shown in fig. 1B, another third party monitoring platform (applet) provided in the related art, the platform has years of data accumulation and technology accumulation in the fields of website, APP (Application), communication and the like, and is driven by safety big data, and through AI big data analysis modeling, the power-assisted related supervision department pre-warns risks in the internet field, and the pre-warned risk platform also needs to update the state in time to remind the user to take precautions, as can be seen in fig. 1B, wherein a certain network is reported due to illegal transactions, and the current state of the platform is: related law enforcement personnel intervene, the related project is an A project and the like, in addition, the platform can be judged through external information such as related reports and the like, related judging results are displayed in an interface shown in FIG. 1B, and information such as sources, links and the like of the related reports can be further displayed.
In the related technology, a manual auditing mode or a mode of acquiring enterprise business registration information is mainly adopted to check whether the current business state of an enterprise is normal. The methods lead to too high delay of platform state update and incapability of updating the platform operation state in time.
In view of this, the application provides a method, a device, an electronic device and a storage medium for identifying the state of a target object, which acquire network content through automatic data acquisition, then acquire the dependency relation among words in the target sentence through dependency syntactic analysis on the target sentence in the network content, further analyze the dependency relation among preset object words and preset state words in the target sentence, determine the state information of the target object represented by the preset object words in the target sentence, realize automatic identification and monitoring of the state of the target object, improve the efficiency and accuracy of identifying the state of the target object, and actively sense and discover the state change of the target at the first time, early warn a risk enterprise, improve the supervision efficiency and reduce the loss of users to the greatest extent.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.
Fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application. The application scenario diagram includes two terminal devices 210 and a server 230, and the relevant interface 220 for executing the target service can be logged in through the terminal devices 210. Communication between the terminal device 210 and the server 230 may be through a communication network.
In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 210 and the server 230 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
In the embodiment of the present application, the terminal device 210 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability, such as a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, etc., and running instant messaging software and a website or social software and a website. Each terminal device 210 is connected to the server 230 through a wireless network, where the server 230 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
In the embodiment of the present application, a user may access a network link through the terminal device 210, and the server 230 may obtain the network content corresponding to the network link accessed by the user, obtain the target sentence from the network content, and further determine the state information of the target object represented by the object word in the target sentence by performing dependency syntax analysis on the target sentence. Optionally, the server 230 may further tag the network content according to the acquired status information of the target object, and send the network content to the terminal device 210, where the terminal device 210 prompts a user accessing the network content.
In addition, the server 230 may acquire the web content corresponding to the web link, return the web content to the terminal device 210, acquire the target sentence from the web content by the terminal device 210, further perform dependency syntax analysis on the target sentence, determine the state information of the target object represented by the object word in the target sentence, and mark the web content according to the acquired state information of the target object.
Referring to fig. 3, a flowchart of an implementation of a method for identifying a state of a target object according to an embodiment of the present application is shown, where a specific implementation flow of the method is as follows:
S31: acquiring network content of a target website, analyzing the network content, and extracting text information of the network content;
s32: filtering the text information to obtain target sentences which simultaneously contain preset object words and preset state words in the text information;
the web content refers to web articles, news, information, video and the like.
In the embodiment of the present application, when obtaining the network content of the target website, an optional implementation manner is as follows:
firstly, acquiring a target website, further judging whether content crawling is performed on the target website within a preset time period by inquiring crawling records, and if so, re-acquiring a new website. If not, the content of the target website is crawled through the web crawler cluster, and the network content of the target website is obtained.
The target website may be a network link acquired in real time, for example, a network link sent by a user in a chat process, for example, a link sent by the user a to the user B in a process that the user a and the user B chat through a certain social software; in addition, the target website can also be a network link clicked by a user in the process of browsing a website, a searched network link and the like for a certain website.
In the embodiment of the application, after content crawling is performed each time, a crawling record is generated and stored based on the crawled website and crawling time.
That is, the crawling record is generated after content crawling is performed on the target website each time, so that before content crawling is performed on the target website each time, whether the website has been crawled recently can be determined by querying the crawling record, and if so, crawling is not needed again, thereby realizing the duplication removal of the target website and reducing unnecessary load expenditure of the system.
The preset duration may be customized according to the situation, for example, the preset duration is set to one week, two weeks, or one day, which is not limited herein. For example, if the user a sends the network link 1 to the user B, the user B forwards the network link to the user C after 5 minutes, and the user C forwards the network link to the user D after 5 minutes, and if the user a sends the network link 1 to the user B, and content crawling is performed on the link and a crawling record is generated, when the user B forwards the network link to the user C, the user B can know that the link has been content-crawled before 5 minutes without repeated detection by querying the crawling record, and when the user C forwards the network link to the user D, the user C can know that the link has been content-crawled before 10 minutes by querying the crawling record without repeated detection.
In the embodiment, the same target website is only detected and analyzed once within a certain time, so that the load overhead of the system can be effectively reduced, and the detection efficiency is improved.
Alternatively, it is considered that if the analysis is performed on the network content of a web site after each web site is acquired, a plurality of web sites may be actually acquired within a certain time, for example, for a certain social software, a plurality of users chat at the same time, so that a plurality of web sites may also be acquired; for a certain website, a plurality of users browse the website at the same time and click or search the network link, so that when all acquired websites are directly analyzed, the load overhead of the system is relatively high, and considering the situation, in the embodiment of the application, the acquired websites can be screened to a certain extent when the acquired target websites are acquired, only websites meeting certain conditions are reserved as target websites, and whether the network contents of the websites need to be downloaded or not is judged by inquiring the crawling record for analyzing the reserved websites.
The website meeting a certain condition can be the website of the appointed website, or the website clicked by the appointed user, the website searched by the appointed user, the website sent by the appointed user and the like.
Taking the target website as the website of the appointed website, a news website list can be preset, the main stream web news platform websites are stored in the news website list, and links belonging to the websites can be used as the target website. In order to ensure the reliability of the network news, only the official news platform can be selected to be added into the news website category, and the method is not particularly limited.
In the embodiment of the application, after the network content is acquired based on the mode, the text information corresponding to the network content can be acquired by analyzing the network content; the text information is further filtered, so that a target sentence simultaneously containing the preset object word and the preset state word can be obtained, if the target sentence contains only the preset object word or only the preset state word, or neither the preset object word nor the preset state word, dependency syntax analysis is not needed to be carried out on the sentence, and therefore, the application only carries out syntax analysis on the target sentence, can further reduce the range of sentences needing dependency syntax analysis, and improves the recognition efficiency.
The preset object words refer to preset names of some objects, which may be formal names of the objects, or aliases of the objects, for example, names of some common enterprises or platforms, names of some known enterprises, etc., or names of some specified objects; the preset status words refer to some words representing the status of the object, which are preset, and may be some common status words representing the status of the object corresponding to the preset object words.
Specifically, if the web content is in the form of an article, semantic analysis can be performed on the text in the article by means of text recognition and the like, and the extracted target sentence can be text information such as an article title, a news title, a text and the like. If the web content is not an article, but in the form of video or audio, text in the video image can be extracted by means of image recognition, voice recognition, or the like, or the recognized voice can be converted to obtain corresponding text information.
It should be noted that, the above-listed manner of extracting text information of web content is merely an example, and in the embodiment of the present application, the manner of extracting text information is not specifically limited.
S33: performing dependency syntactic analysis on the target sentence, and determining a preset object word and a preset state word aimed at by a target operation in the target sentence;
in an embodiment of the present application, a target operation may refer to an action represented by some verb in a target sentence, such as check, destroy, close, etc.
In an alternative embodiment, the specific implementation procedure of step S33 is as follows:
performing dependency syntactic analysis on the target sentence to obtain the dependency relationship among the words in the target sentence, wherein the dependency relationship is used for representing the dependency relationship of each word in one sentence on a syntactic structure; and determining preset object words and preset state words aimed at by target operation in the target sentence according to the obtained dependency relationship among the words.
For example, a target sentence contains 5 words, which are respectively: after the dependency syntax analysis is performed on the target sentence, the word A, the word B, the word C, the word D and the word E can be known, wherein the word A and the word B have a dependency relationship, the word B and the word D have a dependency relationship, the word C and the word D have a dependency relationship, and the word D and the word E have a dependency relationship. After the dependency relationship among the words is determined, based on the dependency relationship among the words, the preset object words and the preset state words aimed at by the target operation in the target sentence are obtained through analysis.
In the embodiment of the present application, there are various manners of performing dependency syntax analysis on the target sentence, such as a rule-based method (may be a custom rule), a CRF-based method, a Deep Biaffine Attention (deep dual affine attention mechanism) neural network-based method, and the like, which are not limited herein.
In the above-listed methods, the accuracy and coverage rate of the deep learning-based method are high, and in the embodiment of the application, the syntax analysis model can adopt the Deep Biaffine Attention neural network model listed above, and the dependency relationship among the words in the sample is learned by using a dual affine attention mechanism, so that the dependency relationship among the words in the target sentence can be accurately and efficiently identified when the model is used, and the target object state can be identified later.
In an alternative embodiment, when determining the preset object word and the preset status word for the target operation in the target sentence according to the obtained dependency relationship between the words, the specific implementation manner is as follows:
determining words conforming to the dependency relationship of the designated type according to the type of the dependency relationship among the words in the target sentence; taking the preset object word in the determined word as the preset object word aimed by the target operation in the target sentence, and taking the preset state word in the determined word as the preset state word aimed by the target operation in the target sentence.
The specified type refers to a preset specific type of dependency relationship, such as a prepositioned object relationship, a parallel relationship, a moving object relationship and the like, in general, between two object words for representing a target object, the dependency relationship between an object word and a state word for representing a state of the object belongs to the specific types listed above, and for some other types of dependency relationship, such as a state-in-state relationship, an additional relationship and the like, generally does not exist between the object word and the object word, or between the object word and the state word, therefore, in the embodiment of the application, only the word conforming to the specified dependency relationship is analyzed, and the preset object word and the preset state word aimed at by the target operation in the target sentence can be determined therefrom.
For example, as for the target sentence including the word a, the word B, the word C, the word D, and the word E, which are listed in the above embodiment, it is known that the word a has a dependency relationship with the word E, the word B has a dependency relationship with the word E, the word C has a dependency relationship with the word D, and the word D and the word E have a dependency relationship after the dependency syntax analysis is performed on the target sentence.
Assuming that the word a, the word B and the word D are all preset object words, and the word E is a preset state word, if the type of the dependency relationship between the words is not analyzed, the three object words can be directly used as target objects, but actually the word D is a word representing the name of the regulatory department, wherein the dependency relationship between the word C and the word D, the word D and the word E is not of a specified type, the dependency relationship between the word a and the word E, the word B and the word E is of a specified type, based on the above method, only the word a and the word B are required to be used as the preset object words aimed by the target operation in the target sentence, and the word E is required to be used as the preset state word aimed by the target operation in the target sentence, obviously, the above implementation can also effectively improve the recognition efficiency and the accuracy.
In the above embodiment, the scope of the words to be analyzed is further narrowed according to the type of the dependency relationship, and the preset object word and the preset status word for the action in the target sentence are determined from the words conforming to the specified type of the dependency relationship, which is more accurate and efficient than the manner of determining according to the dependency relationship among all the words in the sentence.
S34: when the type of the preset status word is the target type, determining a target object according to the preset object word in the target sentence, and determining the status information of the target object according to the type of the preset status word.
In practice, when used to represent the actual state of an object, it may be expressed using a number of state words having the same meaning, which are actually state words of the same type. The object types refer to some types which are pre-designated, for example, when the object is a platform, the object types can be checked, closed and modified, and generally refer to the formal state of the platform, and the state words can be state words representing the object types, can be formal states and can be state aliases. Therefore, in the embodiment of the present application, when the type of the preset status word is the target type, the target object may be determined according to the preset object word in the target sentence, and the status information of the target object may be determined according to the type of the preset status word, and the specific implementation manner is as follows:
considering that a sentence does not necessarily contain only one preset object word, when analyzing the state of a target object, words having a dependency relationship with each preset object word can be respectively analyzed, and if any preset object word in the target sentence has a preset state word, and the type of the preset state word is the target type, the preset object word is determined to be the target object, and the type of the preset state word is determined to be the state information of the target object.
For example, for word a, word E having a dependency relationship with the word a is a preset status word, and the type of word E is a target type, in this case, word a is directly taken as target object a, and the type of word E is taken as status information of target object a. Similarly, for word B, the word B is still the word E having a dependency relationship with the word B, and in this case, word B is directly used as the target object B, and the type of word E is used as the state information of the target object B.
In the embodiment, the state information of the target object can be deduced based on the dependency relationship between the preset object word and the preset state word, so that automatic analysis is completely realized, manual intervention is not needed, the influence of subjective consciousness of an auditor during manual audit can be avoided, and the identification efficiency and accuracy can be effectively improved.
In an alternative embodiment, if the word having the dependency relationship with the preset object word further includes other preset object words, and the dependency relationship between the preset object word and the other preset object words is a parallel relationship, determining the other preset object words as another target object, and taking the state information of the target object represented by the preset object word as the state information of the other target object.
The word having the dependency relationship with the word a also includes another preset object word, namely, the word F, and the word a are juxtaposed, and at this time, the word F can be directly used as the target object F, and the state information of the word a can be used as the state information of the target object F.
In the above embodiment, based on the dependency relationship between the preset object words, the state of the object represented by one of the preset object words can be determined according to the state of the object represented by the other preset object word, which is simple and efficient.
Optionally, after the state information of the target object is obtained based on the target website, the corresponding relationship between the preset stored object name and the object state can be updated.
In the embodiment of the application, the corresponding relation between the object names and the object states can be applied to the third party supervision platform shown in fig. 1A or 1B, namely, after the object states corresponding to the target objects are updated, the updated object states are timely displayed to the user through the interface shown in fig. 1A or 1B, so that the user can timely check the latest states of the objects.
Optionally, after the state information of the target object is obtained, the network content can be marked based on the obtained information so as to prompt a user accessing the network content, so that timely early warning can be realized, and the loss of the user is reduced.
For example, fig. 4 is a schematic diagram of a prompting method in an embodiment of the present application, in which the marking of the web content is implemented mainly by marking a target web address corresponding to the web content, and in the chat interface shown in fig. 4, when the user a sends the target web address to the min, the user a has been checked because the target object is identified to be related to an illegal transaction according to the web content corresponding to the web address. At this time, prompt information can be generated according to the state information of the target object so as to mark the network content, thereby early warning the illegal platform and reminding the user and avoiding unnecessary loss.
It should be noted that the foregoing labeling manner is merely illustrative, and is not specifically limited in the embodiments of the present application, and any labeling manner is applicable to the embodiments of the present application.
The following describes the method for identifying the state of the target object in the embodiment of the present application in detail with reference to the schematic diagram of the auditing system shown in fig. 5:
referring to fig. 5, an architecture diagram of an auditing system according to an embodiment of the present application is mainly used for implementing a method for identifying a state of a target object in an embodiment of the present application. The auditing system shown in fig. 5 mainly comprises a content analysis and filtration module, a lexical analysis module and a rule extraction and analysis module. In addition, the web crawler cluster, public opinion news links and news website lists are also related.
The news website list stores mainstream network news platform websites. Links belonging to these websites enter the auditing system shown in fig. 5 for subsequent processing. Examples are as follows:
TABLE 1 News site listing
Network station name Web site
A news website news.xxa.com
B news website news.xxb.com
C news website news.xxc.com.cn
As shown in table 1, after obtaining a news link of public opinion, the news website list provided in the embodiment of the present application first determines whether the link is a link of a website included in the news website list, and if so, may enter the system shown in fig. 5 for subsequent processing. For example, a link is acquired as:
https:// news. Xxc.com/20/0324/07/F8FGV2TR00019B3E.html. The link is a specific news link under the C news website, and then the web content of the link can be grabbed and downloaded through the web crawler cluster.
The web crawler cluster is a cluster formed by web crawlers (servers), the web crawlers are programs for automatically capturing internet link content, and a large amount of website content information can be acquired in a large-scale and concurrent mode through server cluster deployment.
In the embodiment of the application, after the web content of the target website is downloaded from the internet through the web crawler cluster, the web content needs to be analyzed to obtain the target sentence in the web content, and the process can be realized based on the content analysis and filtration module in the system shown in fig. 5.
The content analysis and filtration module is used for analyzing original information (namely network content) of website content HTML (Hyper Text Markup Language ) acquired by the web crawler and extracting information such as news headline text, news main text content text and the like. Through text information matching filtering screening, only texts containing the following given platform names and platform states, namely target sentences containing preset object words and preset state words, are reserved.
As shown in table 2, a table name list provided by the embodiment of the present application stores the table names and their alias information, and examples are as follows:
table 2 list of platform names
Formal name of platform Alias name Belonging to enterprises
A platform X1 financial platform Some commercial advisor (A City) Limited company
B platform X2 financial platform B city certain information technology Co.Ltd
C platform X3 mall, X4 mall B City certain E-commerce Limited company
As can be seen from the table, the alias of the A platform is an X1 financial platform, and the enterprise name of the A platform is: a business advisor (a city) limited; the other name of the platform B is an X2 financial platform, and the enterprise name of the platform B is: b, certain information technology limited company in market; the aliases of the platform C are two, namely an X3 mall or an X4 mall, and the enterprise name of the platform C is B, which is a certain E-commerce limited company.
As shown in table 3, a platform state list provided by the embodiment of the present application stores platform states and state alias information thereof, and examples are as follows:
table 3 platform status list
Formal state State alias
Checking and processing Breaking and obtainingDestroying, detecting, capturing, landing …
Closing Run, lose connection and reverse closure …
Rectifying and modifying device Integer …
In the embodiment of the application, the type of the status word obtained from the target sentence refers to which type of formal status the status word belongs to, and as can be seen from the above table, one formal status may be correspondingly represented by a plurality of aliases. The target type is a list of several formal states in Table 3, such as check, shut down, rectification, etc.
For example, when the status word is broken, destroyed, forensic, captured, dropped or checked, the status word types belong to the checked; when the status words are running, losing connection, reverse closing or closing, the types of the status words are all closed; when the status words are integer or modified, the types of the status words belong to the modification.
In the embodiment of the application, after the target sentence is obtained from the web content, the target sentence needs to be analyzed for dependency syntax, and the process can be implemented based on a lexical analysis module in the system shown in fig. 5.
The lexical analysis module is used for carrying out word segmentation and dependency relationship judgment on the input text, such as a main-name relationship, a moving-guest relationship and the like. The following illustrates the process of performing dependency syntax analysis on a target sentence by a lexical analysis module:
example 1: for the target statement: a city police successfully destroys an X platform, wherein the X platform is a preset object word, the destroys are preset state words, and the target operation is the destroy. The result output by the lexical analysis module is shown in fig. 6A. The result of the sentence word segmentation is as follows: the city a, police, succeeds in destroying the platform X, divided into a total of 5 words.
As can be seen from fig. 6A, the 0 th word in the analysis result is fixed as a root node, and the first word is: city a, part of speech ns (geographical name), represents a geographical noun, the second word being: police officers, part of speech n (general noun), represent general nouns. The third word is: successfully, the part of speech is a (adjective), representing a generic adjective. The fourth word is: destroying, part of speech is v (verb), representing a generic verb. The fifth word is: x platform, part of speech is nh, represents the organization name, in the embodiment of the application the organization name also belongs to a class in nouns.
Wherein, there is a dependency relationship between the first word and the second word, namely, the connection line in the graph, and the type of the dependency relationship is a centering relationship. The dependency relationship is represented by a directed line segment, shown in FIG. 6A, where the dependent words are: city a, dominant term: and (5) police. There is a dependency relationship between the second word and the fourth word, and the type of the dependency relationship is a main-predicate relationship. Wherein, the subordinate words are: police, dominant word is: destroying. There is a dependency relationship between the third word and the fourth word, and the type of the dependency relationship is a in-shape relationship. Wherein, the subordinate words are: successfully, the dominant word is: destroying. There is a dependency relationship between the fourth word and the fifth word, and the type of the dependency relationship is a dynamic guest relationship. Wherein, the subordinate words are: x platform, dominant term is: destroying. The fourth word and the 0 th word are core relations.
Example 2: for the target statement: the X platform and the Y platform are sequentially checked, the X platform and the Y platform are preset object words, the check is a preset state word, the target operation is the check, the result output by the lexical analysis module is shown in fig. 6B, and the result after the sentence word segmentation is as follows: the X platform |and the |Y platform| are examined by | and divided into 6 words in total.
As can be seen from fig. 6B, the 0 th word in the analysis result is fixed as a root node, and the first word is: x platform, part of speech is nh, and the second word is: and, part of speech c (conjunction), representing a generic conjunctive. The third word is: and a Y platform, wherein the part of speech is n. The fourth word is: successively, the part of speech is d (adverb), representing a generic adverb. The fifth word is: the part of speech is p (preposition), which represents a general preposition. The sixth word is: checking that the part of speech is v.
Wherein, there is a dependency relationship between the first word and the third word, and the dependency relationship is of a parallel relationship type. Wherein, the subordinate words are: y platform, dominant term is: x platform. There is a dependency relationship between the first word and the sixth word, and the type of dependency relationship is a pre-object relationship. Wherein, the subordinate words are: x platform, dominant term is: and (5) checking. There is a dependency relationship between the second word and the third word, and the type of the dependency relationship is a left-hand attachment relationship. Wherein, the subordinate words are: and, the dominant term is: and a Y platform. There is a dependency relationship between the fourth word and the sixth word, and the type of the dependency relationship is a in-shape relationship. Wherein, the subordinate words are: successively, the dominant words are: and (5) checking. There is a dependency relationship between the fifth word and the sixth word, and the type of the dependency relationship is a in-shape relationship. Wherein, the subordinate words are: the dominant word is: and (5) checking. The sixth word and the 0 th word are core relations.
In the embodiment of the application, after the dependency relationship among the words in the target sentence is obtained through the lexical analysis module, the words conforming to the specified type dependency relationship can be combined, and the state information of the target object can be analyzed. This process may be implemented based on a rule extraction analysis module in the system shown in fig. 5.
The rule extraction and analysis module is used for extracting effective rules preset in the rule base, namely, the types of the dependency relationships are preset to be of the specified types, for example, the dependency relationships of the specified types comprise a movable guest relationship, a prepositioned object relationship, a parallel relationship and the like, and the effective rules can be adjusted according to specific requirements without limitation. Useful rules obtained in examples 1 and 2 listed above are:
dynamic guest relation: (destroy-X platform);
front object: (X platform- (quilt) -check);
parallel relationship (X platform-Y platform).
The alias of the check can be found to contain the destroy through the platform state list, namely the type of destroying the state word belongs to the target type, so that the state of the X platform can be obtained through the dynamic guest relation to be the check. The state of the X platform, the platform, can also be obtained as a check through the front object in example 2. In addition, the states of the Y platform and the X platform can be deduced to be the same through the parallel relation, and the state of the platform of the X platform is checked.
In the embodiment of the present application, after obtaining the state information of the target object based on the above process, the above derived result may be stored and updated in the platform state result library shown in table 4, and examples are as follows:
table 4 platform State results library
Platform Status of Status update time
X platform Checking and processing 2020-03-25 12:00:00
Y platform Checking and processing 2020-03-25 13:00:00
In addition, it should be noted that, when the obtained result is stored in the platform state result library, the platform alias or the state alias may be converted into the formal name through the platform name list and the platform state list, and then stored in the result library, for example, the X3 mall is converted into the Y platform, and the obtained result is converted into the check.
Referring to fig. 7, a complete flow chart of a method for identifying the status of a target object is shown. The specific implementation flow of the method is as follows:
step S71: acquiring a target website sent or received by a user when chatting through a certain social application in real time;
step S72: judging whether the target website is in a given news website list, if so, executing a step S73, otherwise, returning to the step S71;
step S73: judging whether to perform content crawling on the target website within a preset time period by querying the history crawling record, if so, ending the flow, otherwise, executing step S74;
Step S74: downloading the network content of the target website through the web crawler cluster;
step S75: judging whether the network content of the target website is successfully downloaded, if so, executing a step S76, otherwise, ending the flow;
step S76: analyzing the network content and extracting text information in the network content;
step S77: analyzing the content of the extracted text information sentence by sentence;
step S78: judging whether all text information of the network content is completely analyzed, if so, ending the flow, otherwise, executing step S79;
step S79: judging whether the current text is a sentence containing both a preset object word and a preset state word, if so, executing step S710, otherwise, returning to step S77 (i.e. skipping the current text and analyzing the next text);
step S710: the current text is a target sentence, dependency syntax analysis is carried out on the target sentence, and the dependency relation among all words in the target sentence is obtained;
step S711: judging whether the obtained dependency relationship contains the dependency relationship of the specified type, if so, executing a step S712, otherwise, returning to the step S77;
step S712: and determining the state information of each target object involved in the target sentence according to the words conforming to the specified type dependency relationship.
The text information in step S76 is not necessarily a sentence containing both the preset object word and the preset status word, but is a sentence related to a news headline, a text, or the like extracted from the web content. The text information may be further filtered based on step S79, and only sentences that contain both the preset object word and the preset state word are reserved as target sentences.
Based on the same inventive concept, the embodiment of the present application further provides a device for identifying a state of a target object, as shown in fig. 8, which is a schematic structural diagram of a device 800 for identifying a state of a target object according to the embodiment of the present application, and may include:
an obtaining unit 801, configured to obtain network content of a target website, analyze the network content, and extract text information of the network content;
a filtering unit 802, configured to filter the text information, and obtain a target sentence that includes a preset object word and a preset status word in the text information;
an analysis unit 803, configured to perform dependency syntax analysis on the target sentence, and determine a preset object word and a preset status word for a target operation in the target sentence;
the determining unit 804 is configured to determine, when the type of the preset status word is the target type, a target object according to the preset object word in the target sentence, and determine status information of the target object according to the type of the preset status word.
Optionally, the analysis unit 803 is specifically configured to:
performing dependency syntactic analysis on the target sentence to obtain the dependency relationship among the words in the target sentence, wherein the dependency relationship is used for representing the dependency relationship of each word in one sentence on a syntactic structure;
and determining preset object words and preset state words aimed at by target operation in the target sentence according to the obtained dependency relationship among the words.
Optionally, the analysis unit 803 is specifically configured to:
determining words conforming to the dependency relationship of the designated type according to the type of the dependency relationship among the words in the target sentence;
taking the preset object word in the determined word as the preset object word aimed by the target operation in the target sentence, and taking the preset state word in the determined word as the preset state word aimed by the target operation in the target sentence.
Optionally, the determining unit 804 is specifically configured to:
aiming at any preset object word in the target sentence, if the word with the dependency relationship with the preset object word comprises a preset state word, and the type of the preset state word is the target type, determining the preset object word as the target object, and determining the type of the preset state word as the state information of the target object.
Optionally, the analysis unit 803 is further configured to:
if the word having the dependency relationship with the preset object word further comprises other preset object words, and the dependency relationship between the preset object word and the other preset object words is parallel relationship, determining the other preset object words as another target object, and taking the state information of the target object represented by the preset object word as the state information of the other target object.
Optionally, the acquiring unit 801 is specifically configured to:
inquiring a crawling record corresponding to a target website in a preset time, wherein the crawling record is generated based on the crawled website and crawling time after content crawling is performed each time;
if the content crawling of the target website is not carried out within the preset time, the content crawling of the target website is carried out through the web crawler cluster, and the network content of the target website is obtained.
Optionally, the apparatus further comprises:
an application unit 805 configured to mark the network content according to the acquired state information of the target object, so as to prompt an account for viewing the network content; or (b)
And updating the object state corresponding to the target object in the corresponding relation between the pre-stored object name and the object state according to the acquired state information of the target object.
For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.
Having described the method and apparatus for recognizing the state of a target object according to an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
Fig. 9 is a block diagram of an electronic device 900, according to an example embodiment, the apparatus comprising:
a processor 910;
a memory 920 for storing instructions executable by the processor 910;
wherein the processor 910 is configured to execute instructions to implement a method for identifying a state of a target object in an embodiment of the present application, such as the steps shown in fig. 3.
In an exemplary embodiment, a storage medium is also provided, such as a memory 920, including instructions executable by the processor 910 of the electronic device 900 to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, for example, a ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
In some possible implementations, the computing device of the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps in the method for identifying the state of the target object according to the various exemplary embodiments of the application described in the present specification. For example, the processor may perform the steps as shown in fig. 3.
A computing device 100 according to such an embodiment of the application is described below with reference to fig. 10. The computing device 100 of fig. 10 is only one example and should not be taken as limiting the functionality and scope of use of embodiments of the application.
As shown in fig. 10, the computing device 100 is in the form of a general purpose computing device. Components of computing device 100 may include, but are not limited to: the at least one processing unit 101, the at least one memory unit 102, a bus 103 connecting the different system components, including the memory unit 102 and the processing unit 101.
Bus 103 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.
The storage unit 102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 1021 and/or cache memory unit 1022, and may further include Read Only Memory (ROM) 1023.
Storage unit 102 may also include program/utility 1025 having a set (at least one) of program modules 1024, such program modules 1024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The computing device 100 may also communicate with one or more external devices 104 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the computing device 100, and/or any devices (e.g., routers, modems, etc.) that enable the computing device 100 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 105. Moreover, computing device 100 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 106. As shown, network adapter 106 communicates with other modules for computing device 100 over bus 103. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with computing device 100, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
In some possible embodiments, aspects of the method for identifying a state of a target object provided by the present application may also be implemented in the form of a program product, which comprises a program code for causing a computer device to perform the steps in the method for identifying a state of a target object according to the various exemplary embodiments of the present application described herein above, when the program product is run on a computer device, e.g. the computer device may perform the steps as shown in fig. 3.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (9)

1. A method for identifying the state of a target object, the method comprising:
acquiring network content of a target website, analyzing the network content, and extracting text information of the network content;
filtering the text information to obtain a target sentence which simultaneously contains a preset object word and a preset state word in the text information; the preset state words are words for expressing the state of the target object, and the state is either a formal state or a state alias of the formal state;
Performing dependency syntactic analysis on the target sentence to obtain the dependency relationship among all the words in the target sentence, wherein the dependency relationship is used for representing the dependency relationship of all the words in one sentence on a syntactic structure;
determining words conforming to the dependency relationship of the designated type according to the type of the dependency relationship among the words in the target sentence;
taking the preset object word in the determined word as the preset object word aimed at by the target operation in the target sentence, and taking the preset state word in the determined word as the preset state word aimed at by the target operation in the target sentence; wherein the target operation is an action represented by a verb in the target sentence;
aiming at any preset object word in the target sentence, if the word with the dependency relationship with the preset object word comprises a preset state word and the type of the preset state word is a target type, determining the preset object word as a target object, and determining a formal state corresponding to the preset state word as state information of the target object; the type of the preset status word is used for expressing the formal status of the target object.
2. The method of claim 1, wherein the method further comprises:
if the word having the dependency relationship with the preset object word further comprises other preset object words, and the dependency relationship between the preset object word and the other preset object words is a parallel relationship, determining the other preset object words as another target object, and taking the state information of the target object represented by the preset object word as the state information of the other target object.
3. The method as set forth in claim 1, wherein the obtaining the web content of the target web site specifically includes:
inquiring a crawling record corresponding to the target website within a preset duration, wherein the crawling record is generated based on the crawled website and crawling time after content crawling is performed each time;
if the fact that the content crawling is not carried out on the target website within the preset duration is determined, the content crawling is carried out on the target website through a web crawler cluster, and the network content of the target website is obtained.
4. A method according to any one of claims 1 to 3, further comprising:
marking the network content according to the acquired state information of the target object so as to prompt an account for viewing the network content; or (b)
And updating the object state corresponding to the target object in the corresponding relation between the pre-stored object name and the object state according to the acquired state information of the target object.
5. A state recognition apparatus of a target object, comprising:
the acquisition unit is used for acquiring the network content of the target website, analyzing the network content and extracting the text information of the network content;
the filtering unit is used for filtering the text information to obtain target sentences which simultaneously contain preset object words and preset state words in the text information; the preset state words are words for expressing the state of the target object, and the state is either a formal state or a state alias of the formal state;
the analysis unit is used for carrying out dependency syntactic analysis on the target sentence and obtaining the dependency relationship among the words in the target sentence, wherein the dependency relationship is used for representing the dependency relationship of each word in one sentence on a syntactic structure;
determining words conforming to the dependency relationship of the designated type according to the type of the dependency relationship among the words in the target sentence;
Taking the preset object word in the determined word as the preset object word aimed at by the target operation in the target sentence, and taking the preset state word in the determined word as the preset state word aimed at by the target operation in the target sentence; wherein the target operation is an action represented by a verb in the target sentence;
the determining unit is used for determining a preset object word as a target object and determining a formal state corresponding to the preset state word as state information of the target object aiming at any preset object word in the target sentence if the word with the dependency relationship with the preset object word comprises a preset state word and the type of the preset state word is a target type; the type of the preset status word is used for expressing the formal status of the target object.
6. The apparatus of claim 5, wherein the analysis unit is further to:
if the word having the dependency relationship with the preset object word further comprises other preset object words, and the dependency relationship between the preset object word and the other preset object words is a parallel relationship, determining the other preset object words as another target object, and taking the state information of the target object represented by the preset object word as the state information of the other target object.
7. The apparatus of claim 5, wherein the acquisition unit is specifically configured to:
inquiring a crawling record corresponding to the target website within a preset duration, wherein the crawling record is generated based on the crawled website and crawling time after content crawling is performed each time;
if the fact that the content crawling is not carried out on the target website within the preset duration is determined, the content crawling is carried out on the target website through a web crawler cluster, and the network content of the target website is obtained.
8. An electronic device comprising a processor and a memory, wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-4.
9. A computer readable storage medium, characterized in that it comprises a program code for causing an electronic device to perform the steps of the method of any of claims 1-4 when said program code is run on said electronic device.
CN202010398237.5A 2020-05-12 2020-05-12 Method and device for identifying state of target object, electronic equipment and storage medium Active CN111581533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010398237.5A CN111581533B (en) 2020-05-12 2020-05-12 Method and device for identifying state of target object, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010398237.5A CN111581533B (en) 2020-05-12 2020-05-12 Method and device for identifying state of target object, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111581533A CN111581533A (en) 2020-08-25
CN111581533B true CN111581533B (en) 2023-11-03

Family

ID=72125054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010398237.5A Active CN111581533B (en) 2020-05-12 2020-05-12 Method and device for identifying state of target object, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111581533B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069381A (en) * 2020-09-27 2020-12-11 中国科学院深圳先进技术研究院 Monitoring management method and system based on natural language processing technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918633A (en) * 2017-03-23 2018-04-17 广州思涵信息科技有限公司 Sensitive public sentiment content identification method and early warning system based on semantic analysis technology
CN109614550A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Public sentiment monitoring method, device, computer equipment and storage medium
CN109635298A (en) * 2018-12-11 2019-04-16 平安科技(深圳)有限公司 Group's state identification method, device, computer equipment and storage medium
CN110555205A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 negative semantic recognition method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102246936B1 (en) * 2019-06-20 2021-04-29 엘지전자 주식회사 Method and apparatus for recognizing a voice

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918633A (en) * 2017-03-23 2018-04-17 广州思涵信息科技有限公司 Sensitive public sentiment content identification method and early warning system based on semantic analysis technology
CN110555205A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 negative semantic recognition method and device, electronic equipment and storage medium
CN109614550A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Public sentiment monitoring method, device, computer equipment and storage medium
CN109635298A (en) * 2018-12-11 2019-04-16 平安科技(深圳)有限公司 Group's state identification method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111581533A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US11748416B2 (en) Machine-learning system for servicing queries for digital content
CN103297435B (en) A kind of abnormal access behavioral value method and system based on WEB daily record
Kumar et al. Multimedia social big data: Mining
CN109614550A (en) Public sentiment monitoring method, device, computer equipment and storage medium
CN102946331B (en) A kind of social networks zombie user detection method and device
CN107612893A (en) The auditing system and method and structure short message examination & verification model method of short message
CN110263248A (en) A kind of information-pushing method, device, storage medium and server
KR102064292B1 (en) Method and Apparatus for Recommending Personalized Social Network Service Content
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN107341399A (en) Assess the method and device of code file security
CN116996325B (en) Network security detection method and system based on cloud computing
CN109492097B (en) Enterprise news data risk classification method
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN115238688A (en) Electronic information data association relation analysis method, device, equipment and storage medium
CN111581533B (en) Method and device for identifying state of target object, electronic equipment and storage medium
US20230367821A1 (en) Machine-learning system for servicing queries for digital content
CN107527289B (en) Investment portfolio industry configuration method, device, server and storage medium
CN112016317A (en) Sensitive word recognition method and device based on artificial intelligence and computer equipment
CN111447575A (en) Short message pushing method, device, equipment and storage medium
Pilankar et al. Detecting violation of human rights via social media
Kousika et al. A system for fake news detection by using supervised learning model for social media contents
KR20240013640A (en) Method for detecting harmful url
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
Qureshi et al. Detecting social polarization and radicalization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40029138

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant