CN117633051A - Virtual-real entity detection technology based on five kinds of network data - Google Patents

Virtual-real entity detection technology based on five kinds of network data Download PDF

Info

Publication number
CN117633051A
CN117633051A CN202311523796.4A CN202311523796A CN117633051A CN 117633051 A CN117633051 A CN 117633051A CN 202311523796 A CN202311523796 A CN 202311523796A CN 117633051 A CN117633051 A CN 117633051A
Authority
CN
China
Prior art keywords
data
entity
virtual
real
types
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311523796.4A
Other languages
Chinese (zh)
Inventor
林山
陈子夫
陈妙瑛
叶青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Anscen Network Technology Co ltd
Original Assignee
Xiamen Anscen Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Anscen Network Technology Co ltd filed Critical Xiamen Anscen Network Technology Co ltd
Priority to CN202311523796.4A priority Critical patent/CN117633051A/en
Publication of CN117633051A publication Critical patent/CN117633051A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a virtual-real entity detection technology based on five kinds of network data, which comprises the following specific steps: s1, acquiring five types of network data; s2, processing and cleaning the five types of network data by utilizing a big data processing technology to generate structured data; s3, automatically identifying the entity and the entity relation in the structured data by utilizing a natural language processing technology; s4, storing the entity and entity relation in the structured data by using a distributed database; and S5, counting the entities and entity relations in the structured data by using a statistical model to realize data visualization display. The invention not only improves the efficiency, but also ensures the stability of data synchronization and reduces the maintenance cost.

Description

Virtual-real entity detection technology based on five kinds of network data
Technical Field
The application relates to the field of data mining and analysis and natural language processing in big data technology, and mainly relates to a virtual-real entity detection technology based on five types of network data.
Background
In big data technology, data mining and analysis and natural language processing are two very important fields. The data mining and analysis involves collecting, cleaning and preprocessing large-scale, discrete information data resources, establishing different model processes according to different data, and then interpreting and analyzing the data mining results to obtain the required data. Natural language processing refers to the efficient management, processing, and use of large data to make it more valuable and useful. In a big data environment, it is necessary to quickly locate and extract the demand data from the massive data, so as to better serve the business demands and developments. While data mining and analysis techniques are well suited to this need, they select or create the most relevant features from the raw data to represent the data by feature selection and feature extraction, and then select the appropriate data mining model or algorithm, such as clustering, classification, association rule mining, time series analysis, etc. And establishing a proper model to process the data according to the characteristics of the data and the requirements of the problems. And natural language processing requires processing of a plurality of links such as information extraction and the like on data, and complicated data extraction work is handed to a machine for processing, so that the data quality and usability are improved, and the cost is reduced.
The existing application products rely on big data technology to perform crawling, data grabbing and data processing on different websites, and perform operations such as preprocessing, filtering, classifying and indexing on the data so as to speed up searching efficiency. The method comprises the technical means of data storage, data cleaning, analysis, mining and the like, so that searching is more convenient and quicker, the coverage is wide, but corresponding applications have the defects, when the entity identity is required to be accurately searched, for example, the virtual identity is not apprehended when being related to related identification information, only each piece of data can be integrated for inquiry through manual analysis, the time is long, false information is doped in the data, the time is wasted, the energy is wasted, and people are urgent to need an application based on the virtual and real entity detection technology more accurately at the moment to cope with more targeted service demands. This technology was developed for the need for more precision in order to allow the user to obtain relevant information and associated analysis data more quickly.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a virtual-real entity detection technology based on five types of network data, which utilizes a big data technology to treat the collected five types of virtual-real related data, carries out the identification and extraction of entity and entity relation on the data, improves the reliability, the accuracy and the analysis efficiency of query, increases the natural language processing function and automatically processes the related data.
The invention aims to provide a virtual-real entity detection technology based on five-class network data, which is characterized in that after the related virtual-real data of the five-class network are collected, a data mining technology is used for selecting a proper data mining model or algorithm for different data, such as clustering, classification, association rule mining, time sequence analysis and the like, a proper model is built, and the technologies of collection, cleaning and preprocessing are adopted. The data is processed by using natural language processing technology, and a part of the data is automatically extracted and identified from the text data to obtain various entities and corresponding attribute, relation and other information, so that a user is helped to obtain related information and knowledge more quickly. The following objects can be achieved by a virtual-real entity detection technology based on five types of network data:
1. improving the precision of search engines and recommendation systems: the virtual-real entity detection technology based on five types of network data can enable a search engine and a recommendation system to be more intelligent and accurate, so that the quality and the accuracy of search results and recommendation results are improved.
2. The decision analysis efficiency of enterprises and institutions is improved: the virtual-real entity detection technology based on five types of network data can help enterprises and institutions to find related information more quickly, and intelligent and efficient processing of information is achieved, so that decision analysis efficiency and accuracy can be improved.
3. Network security is improved: the virtual-real entity detection technology based on five types of network data can help to detect abuse of virtual identities and improve network security.
According to one aspect of the present invention, a technology for detecting virtual and real entities based on five kinds of network data is provided, which specifically includes:
and S1, collecting five types of network data.
Further, the five types of network data specifically include social media, news media, stock data, web page data, forum information and the like.
And S2, processing and cleaning the five types of network data by utilizing a big data processing technology to generate structured data.
Further, the big data processing technology specifically includes: and carrying out noise removal, segmentation, sentence segmentation, time formatting and structuring treatment on the collected five types of network data.
The structuring process specifically comprises the following steps: and identifying the data, and obtaining entities, attributes and relations in the data to generate structured data.
And step S3, automatically identifying the entity and the entity relation in the structured data by utilizing a natural language processing technology.
Further, the natural language processing technology specifically includes: and analyzing the structured data, segmenting sentences, labeling parts of speech and analyzing grammar.
Further, the automatic identification of the entity and the entity relationship in the structured data is specifically the identification and extraction of the virtual-real mapping entity relationship.
The identifying and extracting of the virtual-real mapping entity relationship specifically further comprises:
and carrying out sentence division on the structured data, and carrying out data word segmentation, part-of-speech tagging, named entity and dependency analysis on the sentences.
And carrying out rule matching by utilizing the dependency structure to obtain a syntax rule, wherein each matched generation formula is a syntax rule, and generating a triplet by utilizing the syntax rule.
The triples are further processed to identify relationships.
And S4, storing the entity and the entity relation in the structured data by using a distributed database.
Further, the distributed database specifically includes: efficient storage and querying of data is achieved using a distributed search and analysis engine elastsearch.
And S5, counting the entities and entity relations in the structured data by using a statistical model to realize data visualization display.
Further, the statistical model specifically includes: statistics on total amount of data, data category, relationship category and common data.
The invention has the advantages and beneficial effects as follows:
1. the query accuracy is improved, five types of network data correlation tables are collected, and the search accuracy is more accurate after big data processing.
2. The query efficiency is improved, the five types of network data correlation tables are collected, and the query data efficiency is more efficient.
3. And for the five types of network data, the correlation data is analyzed and processed, so that the analysis efficiency is improved and the auxiliary decision is intelligently made.
4. The collected five-class network data are subjected to analysis, word segmentation, part-of-speech tagging, grammar analysis and the like through natural language processing, so that the entity and related information thereof are extracted, the data are automatically processed, the data processing efficiency is improved, and the maintenance cost is reduced.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Many of the intended advantages of other embodiments and embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.
Fig. 1 is a schematic flow diagram of a virtual-real entity detection technique according to an embodiment of the present invention;
FIG. 2 illustrates a schematic diagram of a data distribution scenario suitable for use in implementing a statistical model of an embodiment of the present application.
Fig. 3 shows a schematic diagram of a computer system 600 suitable for use in implementing the electronic device of the embodiments of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows a schematic flow chart of a virtual-real entity detection technology according to an embodiment of the present invention, as shown in fig. 1, in this embodiment, a user needs to query a virtual-real mapping relationship, and the implementation flow of the method for detecting virtual-real entities of five types of network data provided by the present invention is as follows:
and S1, collecting five types of network data.
Further, the five types of network data specifically include social media, news media, stock data, web page data, forum information and the like.
And S2, processing and cleaning the five types of network data by utilizing a big data processing technology to generate structured data.
Further, the big data processing technology specifically includes: and carrying out noise removal, segmentation, sentence segmentation, time formatting and structuring treatment on the collected five types of network data.
The structuring process specifically comprises the following steps: and identifying the data, and obtaining entities, attributes and relations in the data to generate structured data.
And after the structured processing is completed, simple inquiry can be performed, a user can inquire the identity of an entity through a social account, and related information can be inquired in a database through virtual identity and displayed on a page.
And step S3, automatically identifying the entity and the entity relation in the structured data by utilizing a natural language processing technology.
Further, the natural language processing technology specifically includes: and analyzing the structured data, segmenting sentences, labeling parts of speech and analyzing grammar.
According to the processing of the step, the relation can be queried, and after the user grasps the virtual identity, the related association relation can be queried, or the social account can be queried to obtain related association personnel data for analysis processing.
In the embodiment, related words such as couple, friend and attendance are identified as having association relation, and can be identified and extracted; the words of the same group in the data are identified as having the acquaintance relationship, and the association information can be extracted after the association relationship is identified.
Further, the automatic identification of the entity and the entity relationship in the structured data is specifically the identification and extraction of the virtual-real mapping entity relationship.
The identifying and extracting of the virtual-real mapping entity relationship specifically further comprises:
and carrying out sentence division on the structured data, and carrying out data word segmentation, part-of-speech tagging, named entity and dependency analysis on the sentences.
And carrying out rule matching by utilizing the dependency structure to obtain a syntax rule, wherein each matched generation formula is a syntax rule, and generating a triplet by utilizing the syntax rule.
The triples are further processed to identify relationships.
In this embodiment, "someone 1 carries wife someone 2 attends XX conference", the following table is identified through the above natural language processing:
from the above table two triples can be derived:
(1 somewhere and 2 somewhere, carry, wife);
(someone 1 someone 2, attended, XX conference).
And S4, storing the entity and the entity relation in the structured data by using a distributed database.
The virtual-real entity detection technology based on five kinds of network data needs to establish an entity relation database to store and inquire information of various entities and entity relations, and the database adopts distributed search.
Further, the distributed database specifically includes: efficient storage and querying of data is achieved using a distributed search and analysis engine elastsearch.
And S5, counting the entities and entity relations in the structured data by using a statistical model to realize data visualization display.
Further, the statistical model specifically includes, but is not limited to: and the data total amount, the data category, the relation category, the common data and other classified statistics are carried out, and the models are used for displaying, so that a user can conveniently and intuitively check and count the data. In a specific embodiment, the weight data of the students are imported, and the distribution of the data can be clearly seen through a statistical model, and the data is shown in fig. 2.
In summary, the technical solution of the virtual-real entity detection technology based on five kinds of network data is a multidisciplinary cross field, and needs to cover various technologies such as natural language processing, data mining and analysis, statistical models, databases and the like, so as to realize efficient and accurate entity identification and association analysis.
In the process, the data must be observed comprehensively, and the data management work and the virtual-real mapping work are done. And secondly, after data acquisition, when the data of the same type are identified, natural language processing can be adopted to automatically process the data. The method can automatically identify the entity and the entity relationship without manual intervention. And the efficiency is improved, the stability of data synchronization is ensured, and the maintenance cost is reduced.
Referring now to FIG. 3, a schematic diagram of a computer system 600 suitable for use in implementing an electronic device of an embodiment of the present application is shown. The electronic device shown in fig. 3 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments herein.
As shown in fig. 3, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Liquid Crystal Display (LCD) or the like, a speaker or the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable storage medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments described in the present application may be implemented by software, or may be implemented by hardware.
As another aspect, the present application also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may exist alone without being incorporated into the electronic device. The computer-readable storage medium carries one or more programs that, when executed by the electronic device, enable the electronic device to implement virtual-real entity detection techniques for five types of network data.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims (11)

1. The virtual-real entity detection technology based on five types of network data is characterized by comprising the following steps of:
s1, acquiring five types of network data;
s2, processing and cleaning the five types of network data by utilizing a big data processing technology to generate structured data;
s3, automatically identifying the entity and the entity relation in the structured data by utilizing a natural language processing technology;
s4, storing the entity and entity relation in the structured data by using a distributed database;
and S5, counting the entities and entity relations in the structured data by using a statistical model to realize data visualization display.
2. The virtual-real entity detection technique of claim 1, wherein:
the five types of network data in step S1 specifically include: data collected from social media, news media, stock data, web page data, and forum information.
3. The virtual-real entity detection technique of claim 1, wherein:
the big data processing technology in the step S2 specifically includes: and carrying out noise removal, segmentation, sentence segmentation, time formatting and structuring treatment on the collected five types of network data.
4. The virtual-real entity detection technique of claim 1, wherein:
the natural language processing technique in step S3 specifically includes: and analyzing the structured data, segmenting sentences, labeling parts of speech and analyzing grammar.
5. The virtual-real entity detection technique of claim 1, wherein:
and step S3, automatically recognizing that the entity and the entity relationship in the structured data are particularly the recognition and extraction of virtual-real mapping entity relationship.
6. The virtual-real entity detection technique of claim 1, wherein:
the distributed database in step S4 specifically includes: efficient storage and querying of data is achieved using a distributed search and analysis engine elastsearch.
7. The virtual-real entity detection technique of claim 1, wherein:
the statistical model in step S5 specifically includes: statistics on total amount of data, data category, relationship category and common data.
8. A virtual-real entity detection technique according to claim 3, wherein: the structuring process specifically comprises: and identifying the data, and obtaining entities, attributes and relations in the data to generate structured data.
9. The virtual-real entity detection technique of claim 5, wherein: the identifying and extracting of the virtual-real mapping entity relationship specifically further comprises:
sentence division is carried out on the structured data, and data word segmentation, part-of-speech tagging, named entity and dependency analysis are carried out on the sentences;
rule matching is carried out by utilizing the dependency structure to obtain a syntax rule, and a triplet is generated by utilizing the syntax rule;
the triples are further processed to identify relationships.
10. The virtual-real entity detection technique of claim 9, wherein:
and performing rule matching by utilizing the dependency structure to obtain a syntax rule, wherein one generation formula matched by utilizing the dependency rule is one syntax rule.
11. A computer readable medium, in which a computer program is stored which, when executed by a processor, implements the method of any one of claims 1-10.
CN202311523796.4A 2023-11-15 2023-11-15 Virtual-real entity detection technology based on five kinds of network data Pending CN117633051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311523796.4A CN117633051A (en) 2023-11-15 2023-11-15 Virtual-real entity detection technology based on five kinds of network data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311523796.4A CN117633051A (en) 2023-11-15 2023-11-15 Virtual-real entity detection technology based on five kinds of network data

Publications (1)

Publication Number Publication Date
CN117633051A true CN117633051A (en) 2024-03-01

Family

ID=90019066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311523796.4A Pending CN117633051A (en) 2023-11-15 2023-11-15 Virtual-real entity detection technology based on five kinds of network data

Country Status (1)

Country Link
CN (1) CN117633051A (en)

Similar Documents

Publication Publication Date Title
US10599732B2 (en) Methods and systems for discovery of linkage points between data sources
US8161059B2 (en) Method and apparatus for collecting entity aliases
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN111522927B (en) Entity query method and device based on knowledge graph
CN112000773A (en) Data association relation mining method based on search engine technology and application
CN104537341A (en) Human face picture information obtaining method and device
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN111859046A (en) Water pollution tracing system and method based on pollution element source analysis
CN114722137A (en) Security policy configuration method and device based on sensitive data identification and electronic equipment
CN105095436A (en) Automatic modeling method for data of data sources
CN114861677A (en) Information extraction method, information extraction device, electronic equipment and storage medium
CN112328805A (en) Entity mapping method of vulnerability description information and database table based on NLP
CN107943937B (en) Debtor asset monitoring method and system based on judicial public information analysis
CN116842142B (en) Intelligent retrieval system for medical instrument
CN112241438A (en) Policy service information data processing and query method and system
CN111984797A (en) Customer identity recognition device and method
CN112417996A (en) Information processing method and device for industrial drawing, electronic equipment and storage medium
CN117095419A (en) PDF document data processing and information extracting device and method
KR101487871B1 (en) Manual Auto-generating device for Crisis Management Response of Online-based.
CN114238735B (en) Intelligent internet data acquisition method
CN117633051A (en) Virtual-real entity detection technology based on five kinds of network data
CN110737749B (en) Entrepreneurship plan evaluation method, entrepreneurship plan evaluation device, computer equipment and storage medium
CN113032515A (en) Method, system, device and storage medium for generating chart based on multiple data sources
CN112214615A (en) Policy document processing method and device based on knowledge graph and storage medium
CN113221538A (en) Event library construction method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination