CN110825792A - High-concurrency distributed data retrieval method based on golang middleware coroutine mode - Google Patents
High-concurrency distributed data retrieval method based on golang middleware coroutine mode Download PDFInfo
- Publication number
- CN110825792A CN110825792A CN201911117727.7A CN201911117727A CN110825792A CN 110825792 A CN110825792 A CN 110825792A CN 201911117727 A CN201911117727 A CN 201911117727A CN 110825792 A CN110825792 A CN 110825792A
- Authority
- CN
- China
- Prior art keywords
- data
- golang
- middleware
- configuration
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000011218 segmentation Effects 0.000 claims abstract description 21
- 238000005516 engineering process Methods 0.000 claims abstract description 18
- 230000003993 interaction Effects 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000013507 mapping Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 9
- 238000013500 data storage Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 3
- 238000009877 rendering Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 5
- 238000013461 design Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012423 maintenance Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000005728 strengthening Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000006317 isomerization reaction Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a high-concurrency distributed data retrieval method based on a golang middleware coroutine mode, which comprises the following specific steps of: uploading data information to be acquired according to the provided user interaction page and the operation guide of the page; preprocessing the uploaded data information, and automatically adding indexes, types and type structures of the Elasticissearch; adding a timing acquisition task for data acquisition configuration on a configuration timer page; the system automatically schedules a data acquisition task executed through the golang middleware, and stores the data into an elastic search library; performing word segmentation and semantic analysis from the acquired data through an elastic search word segmentation and semantic analysis technology; and opening the data set to the user of the terminal by using the interface configuration page. The method introduces the golang co-project high concurrency technology, accelerates the data collecting and data arranging process to a certain extent, improves the collecting efficiency, and simultaneously adopts the technology of automatically removing repeated data, and improves the data utilization rate.
Description
Technical Field
The invention relates to the technical field of database retrieval, in particular to a high-concurrency distributed data retrieval method based on a golang middleware coroutine mode, which is used for constructing resource synchronization of a public security database.
Background
Along with the rapid development of economy and science and technology in recent years, the informatization construction of the public security industry is also rapidly developed, but the problems of low data quality, poor processing capability, insufficient standard specification, insufficient sharing application, not deep professional application and the like are also accompanied. How to deal with the challenges brought by data resource quantification, isomerization, diversified and complicated application requirements and the like by means of technological strength is the key of information construction. However, the current situation of full-text search products is that each manufacturer is responsible for the product, and each manufacturer adopts different technical implementation schemes. The problems of data extraction and low efficiency of an external interface scheme appear due to the fact that a unified technical thought does not exist, and the situations that the interface is not universal, later-period maintenance is not timely and the like occur. Based on the above problems, the applicant compares and analyzes mainstream full-text search products in the existing market, and most of the full-text search products and the used technologies in the existing market have the following problems:
1. and retrieval function aspect: 1) the word hit rate is not high, and the category retrieval function is limited; 2) the word-cutting retrieval function is lacked; 3) the speed of taking information is far slower than the growth speed of network resources.
2. Data cleaning and data treatment: 1) data extraction confusion; 2) the data source is single, and the data storage mode is complex and slow and is not universal; 3) the unified technical thought is lacked, and the situations of low efficiency and non-universal interface exist in the external interface scheme.
3. In other aspects: 1) the compatibility is insufficient, and the method is only suitable for products with peripheral forms of the Internet; 2) the product has strong requirements on technical operation, is fussy to operate, and cannot provide a good application scene adaptive to diversity; 3) later maintenance is not timely, data updating is not timely, and the performance of data flow logs is lack, so that high requirements on hardware are required for tuning.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a high-concurrency distributed data retrieval method based on a golang middleware coroutine mode, and a set of simple and easy-to-use web configuration pages is designed and developed to solve the problems of single extraction data source, complex interactive interface design, complex and slow data storage mode and high data storage difficulty, so that the data acquisition efficiency and the data application efficiency are effectively improved, the later maintenance is ensured, and conditions are prepared for strengthening law enforcement regulations and improving the law enforcement efficiency.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows.
The high-concurrency distributed data retrieval method based on the golang middleware coroutine mode comprises the following specific steps of:
A. uploading data information to be acquired according to the provided user interaction page and an operation guide of the page, and then uploading configuration information of data acquisition;
B. b, preprocessing the data information uploaded in the step A to form a corresponding data structure rule, and automatically adding indexes, types and type structures of the elastic search;
C. after the collected data environment in the step B is arranged, according to the designed collected data configuration, adding a timing collection task for the collected data configuration on a configuration timer page;
D. c, configuring the timed task data in the step C, automatically scheduling the system to execute a data acquisition task through a golang middleware, and storing the data into an elastic search library;
E. when data enters an elastic search library, performing word segmentation and semantic analysis on the acquired data through word segmentation and semantic analysis technologies of the elastic search to obtain a final data set to be stored in a warehouse;
F. and opening the data set to the user of the terminal by using the interface configuration page.
Further optimizing the technical scheme, wherein the data information comprises text data and text data configuration information.
Further optimizing the technical scheme, wherein the data information comprises configuration database connection information and table information.
And B, further optimizing the technical scheme, wherein in the step B, the characteristic rule is used as a basis for page rendering, data sorting and data storage.
In the step B, the automatic addition of the index, type and type structure of the Elasticsearch is performed by adding a text directory or adding a database and a table in combination with a system background automation program according to a set of configured data structure mapping.
And C, further optimizing the technical scheme, wherein in the step C, the collected data environment is sorted by combining an automatic mode and a manual input configuration mode.
Further optimizing the technical scheme, wherein the step D comprises the following specific steps:
D1. landing data to be put in a database into a server local file through a golang code, storing a mapping relation between an input source and an output source in a program, and storing a related log;
D2. comparing the ground file with data in an index mapped by the Elasticissearch, filtering illegal data, screening out data needing to be put in a storage and storing the data in the storage into a memory;
D3. and importing the filtered data into an index of an Elasticissearch mapping through a high concurrency multiple protocol.
In the step D2, the data comparison is to classify and screen out the data mainly by using knn algorithm.
And E, further optimizing a technical scheme, wherein in the step E, the word segmentation and semantic analysis technology mainly uses a jieba word segmentation device to realize word segmentation by the following algorithm:
E1. realizing efficient word graph scanning based on a prefix dictionary, and generating a directed acyclic graph formed by all possible word forming conditions of Chinese characters in a sentence;
E2. a maximum probability path is searched by adopting dynamic programming, and a maximum segmentation combination based on word frequency is found out;
E3. for unknown words, an HMM model based on Chinese character word forming capability is adopted, a Viterbi algorithm is used, and pinyin is converted into Chinese characters and characters are segmented through a large number of real data.
Due to the adoption of the technical scheme, the technical progress of the invention is as follows.
The invention realizes the import of various data into the database by adopting a visual mode to form corresponding rules, the defined rules can be used as the basis of page rendering, data arrangement and data storage, and the acquisition method comprising a configuration mode ensures that business personnel can complete transverse expansion through the provided configuration function under the condition of not needing participation of developers, thereby meeting the acquisition requirements of various data sources, simultaneously reducing the workload of the developers to a certain extent and reducing the coupling degree of codes. The method effectively solves the problems of single extraction data source, complex design of an interactive interface, complex and slow data storage mode and large data storage difficulty, effectively improves the data acquisition efficiency and the data application efficiency, ensures timely maintenance in the later period, and prepares conditions for strengthening law enforcement standards and improving the law enforcement efficiency.
The invention combines the configuration design and the relational database application to realize the acquisition of various data of heterogeneous data sources and ensure the robustness and the robustness of the acquisition method.
The method introduces the golang co-project high concurrency technology, accelerates the data collecting and data arranging process to a certain extent, improves the collecting efficiency, and simultaneously adopts the technology of automatically removing repeated data, and improves the data utilization rate.
The invention adopts the word segmentation technology and the semantic analysis technology to carry out deep analysis on the extracted text information so as to extract element information with higher data value, provide data support for realizing more subsequent upper-layer applications, fully exert data efficiency and provide assistance for automatic data arrangement and manual data arrangement of the acquisition method.
Drawings
FIG. 1 is a general flow diagram of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the figures and specific examples.
A high concurrency distributed data retrieval method based on a golang middleware coroutine mode is characterized in that function development is carried out by combining characteristics of the golang, the high concurrency and the multiprotocol advantage of the golang language can be exerted, data processing and ES (electronic storage) importing are carried out, and full-text retrieval can be achieved. The high concurrency is one of factors which must be considered in the architecture design of the internet distributed system, and generally means that the system can simultaneously process a plurality of requests in parallel by design; the execution of the coroutine only needs 2kb of memory, thousands of concurrent tasks can be simultaneously operated, and the occupied memory is small. The cluster processing of more than 4 ten thousand processed data per second and large rules can be realized, and more than 10 ten thousand processed data per second can be realized.
The high-concurrency distributed data retrieval method based on the golang middleware coroutine mode is shown in the combined figure 1 and comprises the following specific steps:
A. and uploading data information to be acquired according to the provided user interaction page and the operation guide of the page, and then uploading configuration information of data acquisition. The data information includes text data and text data configuration information, or the data information includes configuration database connection information and table information.
B. And B, preprocessing the data information uploaded in the step A to form a corresponding data structure rule, and automatically adding the index, the type and the type structure of the Elasticissearch. Data structures are the way computers store, organize, etc. data. A data structure refers to a collection of data elements that have one or more specific relationships to each other.
The automatic addition of the index, the type and the type structure of the Elasticissearch is carried out by combining a background automation program of the system to add a text directory or add a database and a table and mapping according to a set of configured data structures. Database type in this step: oracle \ mysql \ postgresql.
C. And C, finishing the arrangement of the acquired data environment in the step B by combining an automatic mode and a manual input configuration mode, and adding a timing acquisition task for the acquired data configuration on a configuration timer page according to the designed acquired data configuration.
Automation means that a system automatically builds a data structure and automatically synchronizes data. The manual input configuration refers to manual configuration of a data source and data scheduling.
D. And C, configuring the timing task data in the step C, automatically scheduling the system to execute a data acquisition task through the golang middleware, and storing and merging the data into an elastic search library.
The step D comprises the following specific steps:
D1. and landing the data to be put in storage into a local file of the server through a golang code, storing the mapping relation between an input source and an output source in a program, and storing a related log.
D2. And comparing the floor file with data in the index mapped by the Elasticissearch, filtering illegal data, screening out data needing to be put in a storage and storing the data in the storage.
Illegal data refers to data with abnormal format and data value exceeding the set normal range.
The data comparison refers to that a database data set is inquired in the glong middleware protocol and compared with an elastic search data set through configured keywords, and the data comparison is mainly realized by adopting an knn algorithm, so that repeated data are automatically subjected to deduplication and sorting.
knn is a basic classification and regression method, which has the rule that samples of the same class are gathered in a feature space, and the data can be classified and screened by the algorithm.
D3. And importing the filtered data into an index of an Elasticissearch mapping through a high concurrency multiple protocol.
E. When data enters the elastic search library, word segmentation and semantic analysis are carried out on the collected data through word segmentation and semantic analysis technologies of the elastic search, more valuable information is extracted from the collected data, and the search hit rate and the search speed are improved. And obtaining a final data set to be put in storage.
The word segmentation and semantic analysis technology mainly uses a jieba word segmentation device to realize word segmentation by the following algorithm:
E1. efficient word graph scanning is achieved based on the prefix dictionary, and a Directed Acyclic Graph (DAG) formed by all possible word forming conditions of Chinese characters in the sentence is generated.
E2. And searching a maximum probability path by adopting dynamic programming, and finding out a maximum segmentation combination based on the word frequency.
E3. For unknown words, in order to convert pinyin into Chinese characters and divide characters into words, an HMM model is adopted, a Viterbi algorithm is used, the optimal result is calculated by the algorithm through a large number of real data, and the algorithm principle is as follows:
the probability distribution of each state St in the random process is only related to its previous state St-1, i.e. P (St | S1, S2, S3, …, St-1) ═ P (St | St-1).
The steps of the viterbi algorithm are summarized as follows:
if the most probable path p (or shortest path) passes through a certain point, such as X22 on the way, the starting point S on this path to the sub-path Q of X22 must be the shortest path between S and X22. Otherwise, replacing Q with the shortest path R from S to X22 constitutes a shorter path than P, which is clearly contradictory. The principle of satisfaction of optimality is demonstrated.
The path from S to E must pass through a certain state at the ith time, and assuming that there are k states at the ith time, if the shortest paths of all k nodes from S to the ith state are recorded, the final shortest path must pass through one of them, so that at any time, only the very limited shortest path is considered.
In connection with the above two points, assuming that when we enter the state i +1 from the state i, the shortest paths from S to the nodes on the state i are found and recorded on the nodes, then when calculating the shortest path from the starting point S to a certain node Xi +1 of the i +1 th state, it is only necessary to consider the shortest paths from S to all k nodes of the previous state i and the distance from the node to Xi +1, j.
F. The data set can be opened to the user of the terminal for use by using the interface configuration page.
The method introduces the golang co-project high concurrency technology, accelerates the data collecting and data arranging process to a certain extent, improves the collecting efficiency, and simultaneously adopts the technology of automatically removing repeated data, and improves the data utilization rate.
The invention adopts the word segmentation technology and the semantic analysis technology to carry out deep analysis on the extracted text information so as to extract element information with higher data value, provide data support for realizing more subsequent upper-layer applications, fully exert data efficiency and provide assistance for automatic data arrangement and manual data arrangement of the acquisition method.
Claims (9)
1. The high-concurrency distributed data retrieval method based on the golang middleware coroutine mode is characterized by comprising the following specific steps of:
A. uploading data information to be acquired according to the provided user interaction page and an operation guide of the page, and then uploading configuration information of data acquisition;
B. b, preprocessing the data information uploaded in the step A to form a corresponding data structure rule, and automatically adding indexes, types and type structures of the elastic search;
C. after the collected data environment in the step B is arranged, according to the designed collected data configuration, adding a timing collection task for the collected data configuration on a configuration timer page;
D. c, configuring the timed task data in the step C, automatically scheduling the system to execute a data acquisition task through a golang middleware, and storing the data into an elastic search library;
E. when data enters an elastic search library, performing word segmentation and semantic analysis on the acquired data through word segmentation and semantic analysis technologies of the elastic search to obtain a final data set to be stored in a warehouse;
F. and opening the data set to the user of the terminal by using the interface configuration page.
2. The method for highly concurrent distributed data retrieval based on the golang middleware coroutine mode as claimed in claim 1, wherein the data information comprises text data and text data configuration information.
3. The method for highly concurrent distributed data retrieval based on the golang middleware coroutine mode as claimed in claim 1, wherein the data information comprises configuration database connection information and table information.
4. The method for highly concurrent distributed data retrieval based on the golang middleware coroutine mode as claimed in claim 1, wherein in the step B, the feature rules are used as the basis for page rendering, data arrangement and data storage.
5. The highly concurrent distributed data retrieval method based on the golang middleware coroutine mode as claimed in claim 1, wherein in the step B, the automatic addition of the index, type and type structure of the Elasticsearch is performed by adding a text directory or adding a database and a table in combination with a system background automation program according to a set of configured data structure mapping.
6. The method for highly concurrent distributed data retrieval based on the golang middleware coroutine mode as claimed in claim 1, wherein in the step C, the collected data environment is arranged by combining automation and manual input configuration.
7. The method for highly concurrent distributed data retrieval based on the golang middleware coroutine mode as claimed in claim 1, wherein the step D comprises the following specific steps:
D1. landing data to be put in a database into a server local file through a golang code, storing a mapping relation between an input source and an output source in a program, and storing a related log;
D2. comparing the ground file with data in an index mapped by the Elasticissearch, filtering illegal data, screening out data needing to be put in a storage and storing the data in the storage into a memory;
D3. and importing the filtered data into an index of an Elasticissearch mapping through a high concurrency multiple protocol.
8. The method for highly concurrent distributed data retrieval based on the golang middleware assistant program mode as claimed in claim 7, wherein in the step D2, the data comparison is to mainly use knn algorithm to classify and screen out the data.
9. The high-concurrency distributed data retrieval method based on the golang middleware coroutine mode as claimed in claim 1, wherein in said step E, the word segmentation and semantic analysis technique mainly uses a jieba word segmenter to implement word segmentation by the following algorithm:
E1. realizing efficient word graph scanning based on a prefix dictionary, and generating a directed acyclic graph formed by all possible word forming conditions of Chinese characters in a sentence;
E2. a maximum probability path is searched by adopting dynamic programming, and a maximum segmentation combination based on word frequency is found out;
E3. for unknown words, an HMM model based on Chinese character word forming capability is adopted, a Viterbi algorithm is used, and pinyin is converted into Chinese characters and characters are segmented through a large number of real data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911117727.7A CN110825792B (en) | 2019-11-15 | 2019-11-15 | High concurrency distributed data retrieval method based on golang middleware cooperative mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911117727.7A CN110825792B (en) | 2019-11-15 | 2019-11-15 | High concurrency distributed data retrieval method based on golang middleware cooperative mode |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110825792A true CN110825792A (en) | 2020-02-21 |
CN110825792B CN110825792B (en) | 2024-06-07 |
Family
ID=69555567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911117727.7A Active CN110825792B (en) | 2019-11-15 | 2019-11-15 | High concurrency distributed data retrieval method based on golang middleware cooperative mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110825792B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111782396A (en) * | 2020-07-01 | 2020-10-16 | 浪潮云信息技术股份公司 | Concurrency flexible control method based on distributed database |
CN111814142A (en) * | 2020-06-29 | 2020-10-23 | 上海三零卫士信息安全有限公司 | Big data rapid threat detection system based on OpenIOC |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678684A (en) * | 2013-12-25 | 2014-03-26 | 沈阳美行科技有限公司 | Chinese word segmentation method based on navigation information retrieval |
CN109358956A (en) * | 2018-09-30 | 2019-02-19 | 上海保险交易所股份有限公司 | Service calling method |
CN109582551A (en) * | 2018-10-11 | 2019-04-05 | 平安科技(深圳)有限公司 | Daily record data analytic method, device, computer equipment and storage medium |
CN109739727A (en) * | 2019-01-03 | 2019-05-10 | 优信拍(北京)信息科技有限公司 | Service monitoring method and device in micro services framework |
-
2019
- 2019-11-15 CN CN201911117727.7A patent/CN110825792B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678684A (en) * | 2013-12-25 | 2014-03-26 | 沈阳美行科技有限公司 | Chinese word segmentation method based on navigation information retrieval |
CN109358956A (en) * | 2018-09-30 | 2019-02-19 | 上海保险交易所股份有限公司 | Service calling method |
CN109582551A (en) * | 2018-10-11 | 2019-04-05 | 平安科技(深圳)有限公司 | Daily record data analytic method, device, computer equipment and storage medium |
CN109739727A (en) * | 2019-01-03 | 2019-05-10 | 优信拍(北京)信息科技有限公司 | Service monitoring method and device in micro services framework |
Non-Patent Citations (4)
Title |
---|
WEYLAU: "Go ⽤ 500 ⾏ Golang 代码实现⾼性能的消息回调中间件", pages 1 - 12 * |
余昌发等: "基于Kubernetes的分布式TensorFlow平台的设计与实现", 计算机科学, no. 2, 15 November 2018 (2018-11-15) * |
罗东锋;李芳;郝汪洋;吴仲城;: "基于Docker的大规模日志采集与分析系统", no. 10, pages 82 - 88 * |
罗东锋等: "基于Docker 的大规模日志采集与分析系统", pages 82 - 88 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814142A (en) * | 2020-06-29 | 2020-10-23 | 上海三零卫士信息安全有限公司 | Big data rapid threat detection system based on OpenIOC |
CN111782396A (en) * | 2020-07-01 | 2020-10-16 | 浪潮云信息技术股份公司 | Concurrency flexible control method based on distributed database |
Also Published As
Publication number | Publication date |
---|---|
CN110825792B (en) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2632133C2 (en) | Method (versions) and system (versions) for creating prediction model and determining prediction model accuracy | |
CN109241266B (en) | Method and device for creating extended question based on standard question in man-machine interaction | |
EP3671526B1 (en) | Dependency graph based natural language processing | |
JP5092165B2 (en) | Data construction method and system | |
CN111460798A (en) | Method and device for pushing similar meaning words, electronic equipment and medium | |
WO2013170587A1 (en) | Multimedia question and answer system and method | |
Cruz et al. | A literature review and comparison of three feature location techniques using argouml-spl | |
KR102345410B1 (en) | Big data intelligent collecting method and device | |
Anderson et al. | An intelligent online grooming detection system using AI technologies | |
CN112667735A (en) | Visualization model establishing and analyzing system and method based on big data | |
CN104536830A (en) | KNN text classification method based on MapReduce | |
CN110825792B (en) | High concurrency distributed data retrieval method based on golang middleware cooperative mode | |
Papanikolaou et al. | Protest event analysis: A longitudinal analysis for Greece | |
KR20220095654A (en) | Social data collection and analysis system | |
CN104462552A (en) | Question and answer page core word extracting method and device | |
CN116304347A (en) | Git command recommendation method based on crowd-sourced knowledge | |
KR20200000208A (en) | Social data collection analysis system and method | |
Kühl et al. | Automatically quantifying customer need tweets: Towards a supervised machine learning approach | |
KR101718599B1 (en) | System for analyzing social media data and method for analyzing social media data using the same | |
CN114841155A (en) | Intelligent theme content aggregation method and device, electronic equipment and storage medium | |
JP6081609B2 (en) | Data analysis system and method | |
CN112926328A (en) | Method for disambiguating applicant company name in patent data | |
CN117909556B (en) | File data processing method, device, equipment and storage medium | |
EP3944127A1 (en) | Dependency graph based natural language processing | |
CN106934002B (en) | Search keyword digitalized analysis method and engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |