WO2021139343A1 - 基于自然语言处理的数据分析方法、装置和计算机设备 - Google Patents

基于自然语言处理的数据分析方法、装置和计算机设备 Download PDF

Info

Publication number
WO2021139343A1
WO2021139343A1 PCT/CN2020/124735 CN2020124735W WO2021139343A1 WO 2021139343 A1 WO2021139343 A1 WO 2021139343A1 CN 2020124735 W CN2020124735 W CN 2020124735W WO 2021139343 A1 WO2021139343 A1 WO 2021139343A1
Authority
WO
WIPO (PCT)
Prior art keywords
analysis
preset
information
analyzed
data
Prior art date
Application number
PCT/CN2020/124735
Other languages
English (en)
French (fr)
Inventor
赵亦杨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139343A1 publication Critical patent/WO2021139343A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a data analysis method, device, computer equipment and storage medium based on natural language processing.
  • Data analysis is the analysis of original data to find the root cause of the status quo.
  • layer by layer abstraction Through the establishment of data analysis models and prediction models, layer by layer abstraction, dimensionality reduction, generalization and interpretation are carried out, and finally Use data support to achieve business growth.
  • the inventor realizes that the current technical threshold of data analysis is high, which makes it impossible to use data efficiently and give full play to the value of data.
  • a data analysis method based on natural language processing comprising:
  • the data analysis result is extracted into natural language based on natural language generation technology, and an analysis report corresponding to the information to be analyzed is generated.
  • a data analysis device based on natural language processing comprising:
  • the data analysis instruction acquisition module is used to acquire the data analysis instruction, and the data analysis instruction carries the information to be analyzed based on natural language expression;
  • the semantic analysis module is used to perform semantic analysis on the information to be analyzed based on natural language processing to obtain the word segmentation structure;
  • the data query module is used to call the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set;
  • a data analysis module which is used to perform anomaly analysis on the original data set to obtain a data analysis result
  • the analysis report generation module is used to extract the data analysis result into natural language based on natural language generation technology, and generate an analysis report corresponding to the information to be analyzed.
  • a computer device includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a data analysis method based on natural language processing is implemented, including:
  • the data analysis result is extracted into natural language based on natural language generation technology, and an analysis report corresponding to the information to be analyzed is generated.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, a data analysis method based on natural language processing is realized, including:
  • the data analysis result is extracted into natural language based on natural language generation technology, and an analysis report corresponding to the information to be analyzed is generated.
  • the user can input the information to be analyzed in natural language to initiate a data analysis instruction, based on natural language processing to analyze the data to be analyzed in the data analysis instruction
  • the information is analyzed semantically to obtain the word segmentation structure, and the search engine is called to query the corresponding data according to the word segmentation structure to obtain the original data set; the original data set is analyzed for anomalies to obtain the data analysis results, and then the data analysis results are refined into Natural language, generate analysis report corresponding to the information to be analyzed, so that users can obtain the corresponding analysis report by inputting the information to be analyzed in natural language and initiating data analysis instructions, lowering the technical threshold of data analysis, and efficiently using data , Give full play to the value of data.
  • Figure 1 is an application environment diagram of a data analysis method based on natural language processing in an embodiment
  • FIG. 2 is a schematic flowchart of a data analysis method based on natural language processing in an embodiment
  • FIG. 3 is a schematic flowchart of one step of a data analysis method based on natural language processing in an embodiment
  • Figure 4 is a structural block diagram of a data analysis device based on natural language processing in an embodiment
  • Figure 5 is a structural block diagram of a data analysis device based on natural language processing in another embodiment
  • Fig. 6 is an internal structure diagram of a computer device in an embodiment.
  • the data analysis method based on natural language processing can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network.
  • the server 104 obtains the data analysis instruction sent by the user through the terminal 102.
  • the data analysis instruction carries the information to be analyzed based on natural language; the server 104 performs semantic analysis on the information to be analyzed based on natural language processing to obtain the word segmentation structure; calls the search engine according to the word segmentation Structure query corresponding data to obtain the original data set; perform anomaly analysis on the original data set to obtain data analysis results; extract the data analysis results into natural language based on natural language generation technology, and generate analysis reports corresponding to the information to be analyzed.
  • the server 104 automatically triggers the data analysis instruction according to the preset data analysis instruction trigger time period, and obtains that the data analysis instruction carries the information to be analyzed based on natural language expression; the server 104 performs semantic analysis on the information to be analyzed based on natural language processing , Obtain the word segmentation structure; call the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set; perform anomaly analysis on the original data set to obtain the data analysis result; extract the data analysis result into natural language based on natural language generation technology to generate The analysis information corresponds to the analysis report.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a data analysis method based on natural language processing is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • step S220 a data analysis instruction is obtained.
  • the data analysis instruction carries information to be analyzed based on a natural language expression.
  • the data analysis instruction is an instruction used to instruct the server to perform data analysis, and the data analysis instruction carries information to be analyzed based on natural language expressions.
  • Natural language is a language that naturally evolves with culture and is used for human communication and thinking.
  • the information to be analyzed uses natural language to describe the content information that needs to be analyzed.
  • the user needs to know how active WeChat has been in the past three months.
  • the user can enter the information input interface of the terminal that is open to the terminal through the server and input "How is WeChat active in the last three months?"
  • the terminal is based on the input "WeChat last three months.” How about monthly activity” generates data analysis instructions and sends them to the server.
  • step S240 semantic analysis is performed on the information to be analyzed based on natural language processing to obtain a word segmentation structure.
  • NLP natural language processing
  • Semantic analysis is the use of various methods of natural language processing to understand the semantic content represented by a text.
  • the word segmentation structure is a structure that splits the information to be analyzed into subject + time + qualifier + purpose.
  • NER Named-entity Recognition
  • part-of-speech tagging It is to mark the part of speech of the word according to its meaning and context content
  • stemming removing the plural of some nouns, removing the different tenses of verbs, etc.
  • the structure of the sentence grammar tree (constructed sentences) Graphic representation of the structure)
  • referential relations determine the meaning of each word or symbol in the information to be analyzed
  • Step S260 Invoke the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set.
  • the search engine is a retrieval technology that uses specific strategies to retrieve information from the Internet and feed it back to users based on user needs and certain algorithms.
  • the search engine can be Elasticsearch.
  • Elasticsearch is a full-text search engine with distributed multi-user capabilities.
  • Query DSL General Query
  • the original data set is all the data found through the search engine according to the word segmentation structure. It should be emphasized that in order to further ensure the privacy and security of the data in the original data set, the data in the original data set can also be stored in one area In the node of the block chain.
  • the word segmentation structure is correspondingly filled in the query sentence to form a complete query sentence
  • Execute the complete query statement query the database for the corresponding data
  • the queried data is the data in the original data set.
  • all data in the database is extracted according to more than one feature of subject, time, qualifier and type in advance, and more than one feature of subject, time, qualifier and type corresponding to each data is associated with the data .
  • Step S280 Perform abnormal analysis on the original data set to obtain a data analysis result.
  • anomaly analysis is based on the data in the original data set for abnormal data mining, based on the abnormal data mining to find out whether it is abnormal, further determine the abnormal point, perform correlation analysis based on the data corresponding to the abnormal point, and get the reason for the abnormal point.
  • the abnormal point and the reason for the abnormal occurrence can get the data analysis result.
  • step S300 the data analysis result is extracted into natural language based on the natural language generation technology, and an analysis report corresponding to the information to be analyzed is generated.
  • natural language generation technology is a technology that uses artificial intelligence and computational language students to transform into natural language, convert structured data into text, and express it in human language.
  • the analysis report expresses the results of data analysis in natural language.
  • Predict the next possible word based on the language model (it is based on natural language generation technology, trained to refine the data analysis result into a natural language model), that is, find the probability analysis of the word in the sequence. For example, to predict the next word of "the reason for the decrease in activity rate", the language model will predict the probability of the next word, such as "A 1 "and “B 3 ", and determine the "reason for the decrease in activity rate” according to the probability. It is followed by "A 1 "or "B 3 ".
  • the analysis report can be sent to the terminal for display, and the user can download and view it. It should be emphasized that, in order to further ensure the privacy and security of the above analysis report, the above analysis report can also be stored in a node of a blockchain.
  • the user inputs the information to be analyzed in natural language to initiate a data analysis instruction, and based on natural language processing, the semantic analysis of the information to be analyzed in the data analysis instruction is performed to obtain word segmentation Structure, call the search engine to search the corresponding data according to the word segmentation structure to obtain the original data set; perform abnormal analysis on the original data set to obtain the data analysis result, and then extract the data analysis result into natural language based on natural language generation technology to generate the information to be analyzed
  • Corresponding analysis reports enable users to obtain corresponding analysis reports by inputting the information to be analyzed in natural language and initiating data analysis instructions, lowering the technical threshold of data analysis, so as to efficiently use data and give full play to the value of data.
  • performing anomaly analysis on the original data set to obtain data analysis results includes: analyzing the original data set based on the isolated forest algorithm to obtain data abnormalities; calling an association rule analysis model to perform correlation analysis on the data abnormalities , Obtain data analysis results.
  • the isolation forest algorithm (Isolation Forest) is an unsupervised anomaly detection method suitable for continuous data.
  • the data anomaly is in the isolation forest, and the original data set is randomly divided recursively until all the points corresponding to the data in the original data set are isolated. Under this random segmentation strategy, the abnormal points usually have a shorter path to isolate them.
  • the association rule analysis model is a correlation analysis model. It is an association rule analysis model trained through a large number of sample data.
  • the association rule analysis model can be trained based on the Apriori algorithm.
  • the Apriori algorithm is an association rule mining algorithm that uses the iterative search layer by layer. The method finds out the relationship of itemsets in the database to form rules.
  • connection matrix-like operations
  • pruning removal of unnecessary intermediate results.
  • the concept of itemsets in this algorithm is a collection of items.
  • a set containing K items is a k-item set.
  • the frequency of occurrence of an item set is the number of transactions containing the item set, which is called the frequency of the item set. If an item set meets the minimum support degree, it is called a frequent items set.
  • the association rule analysis model based on the Apriori algorithm trains the data set corresponding to the abnormal point of the data (including active K (active K as an indicator) and detailed list data of dimensional items) ⁇ K
  • , A 1 , A 2 , B 1 , B 2 , B 3 >N 1 , N 2 ⁇ scan to filter out frequent itemsets L containing K. For all non-empty subsets S of L, if P(M ⁇ N ⁇ T/K) ⁇ min_conf (confidence threshold, customizable), then the frequent item set S(K, M, N, T) is an active correlation set.
  • analyzing the original data set based on the isolated forest algorithm to obtain data abnormalities includes:
  • the average path analysis of the original data set is performed to obtain the average path length of the original data set; the average path length and the path length expectation of each data in the original data set are analyzed to determine the data anomaly.
  • the isolated forest algorithm for abnormal data mining. Take the scenario that requires further analysis of whether the data in the past three months is abnormal or not as an example: the original data set is a data set of n samples in the past three months, and the average path length is calculated according to the isolated forest algorithm for:
  • H(i) is the harmonic number
  • c(n) is the average value of the path length when the number of samples n is given.
  • the abnormal score of each sample x is defined as:
  • E(h(x)) is the expectation of the path length of sample x in a batch of isolated trees.
  • E(h(x)) ⁇ 0, s ⁇ 1 it can be judged as a data abnormal point.
  • the method further includes:
  • step S420 the search engine is called to analyze the matching degree of the word segmentation structure with each preset analysis report preset in the search engine, and the matching degree of each preset analysis report is obtained.
  • Step S440 When there is a preset analysis report whose matching degree reaches the preset matching degree threshold in each preset analysis report, the preset analysis report that reaches the preset matching degree threshold is used as the analysis report corresponding to the information to be analyzed.
  • Step S460 When the matching degree of each preset analysis report does not reach the preset matching degree threshold, the corresponding data is queried according to the word segmentation structure to obtain the original data set, and step S280 is started.
  • a preset analysis report is an analysis report obtained when performing data analysis based on history
  • a preset analysis report is an analysis report that is frequently analyzed and generated in the current preset period.
  • Lucene's scoring mechanism based on search engines analyzes the match between the word segmentation structure and the preset analysis reports preset in the search engine. Lucene's scoring mechanism is based on a scoring algorithm to calculate the relevant scores of all documents and search sentences.
  • the preset mode of the preset analysis report includes: counting the analysis frequency of the information to be analyzed in the current preset period based on the preset period, and determining the analysis frequency of the information to be analyzed in the current preset period; When the analysis frequency of the analysis information reaches the preset threshold, the analysis report corresponding to the information to be analyzed is preset to the search engine as a preset analysis report.
  • the preset period can be set according to actual conditions, such as: half a month, one month, one day, and so on.
  • the current preset period is determined according to the preset period. When the preset period is half a month, the current preset period is within the current half month. When the preset period is one month, the current preset period is the current one Within a month, when the preset period is one day, the current preset period is within the current day.
  • the analysis frequency of the information to be analyzed is the number of times that the information to be analyzed is carried in the acquired data analysis instructions in the current preset period.
  • the information to be analyzed with the same semantics as the information to be analyzed can also be regarded as the occurrence of the information to be analyzed. information.
  • the preset threshold is used to filter the information to be analyzed with low analysis frequency, and obtain the information to be analyzed with high analysis frequency, which can be set according to the measurement scale of the analysis frequency.
  • the method when the analysis frequency of the information to be analyzed reaches the preset threshold, after the step of presetting the analysis report corresponding to the information to be analyzed as a preset analysis report to the search engine, the method further includes: according to the preset analysis report Corresponding to the analysis frequency of the information to be analyzed and the time preset to the search engine, determine the popularity value of the preset analysis report; according to the popularity value of the preset analysis report, update the preset analysis report preset in the search engine.
  • the time preset to the search engine is the time when the information to be analyzed is judged to be a highly popular analysis report.
  • the popularity value of the preset analysis report will be decremented.
  • the preset analysis report preset in the search engine is updated, and the preset analysis report preset in the search engine can be updated based on the daily T+1 statistical update method. If the popularity value of the preset analysis report is lower than the threshold, cancel the preset, and the preset analysis report whose popularity value exceeds the threshold is the preset analysis report.
  • the data analysis method based on natural language processing further includes: obtaining the user's satisfaction degree based on the analysis report.
  • the satisfaction degree reaches the preset satisfaction degree
  • the user information of the user is obtained.
  • Based on user information analyze the same type of users similar to user needs.
  • the demand level reaches the preset demand level
  • an analysis report is sent to users of the same type.
  • the score can be a score, such as 90 points, 80 points, etc., or it can be satisfied, dissatisfied, very satisfied, general, etc., according to the user's score to determine the user based on the analysis report
  • the degree of satisfaction which can be satisfied, dissatisfied, very satisfied, fair, and so on.
  • the preset satisfaction level can be set according to the actual situation, such as: satisfied and very satisfied, etc.
  • the user information can be the user's occupation, age, gender, industry, hobbies, and so on. Users of the same type are users who have similar needs as the user.
  • the user who sends the data analysis instruction is the operator of WeChat, and similar users of the same type can be the operator of other WeChat.
  • the user who sent the data analysis instruction has read and approved it
  • the analysis report of is also recommended to other WeChat operators, without the need for other WeChat operators to go through the above process again to get the analysis report.
  • a data analysis device based on natural language processing including: a data analysis instruction acquisition module 310, a semantic analysis module 320, a data query module 330, a data analysis module 340, and analysis The report generation module 350, where:
  • the data analysis instruction acquisition module 310 is configured to acquire data analysis instructions, and the data analysis instructions carry information to be analyzed based on natural language expressions;
  • the semantic analysis module 320 is used to perform semantic analysis on the information to be analyzed based on natural language processing to obtain the word segmentation structure;
  • the data query module 330 is used to call the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set;
  • the data analysis module 340 is used to perform anomaly analysis on the original data set to obtain data analysis results
  • the analysis report generation module 350 is used to extract the data analysis result into natural language based on the natural language generation technology, and generate an analysis report.
  • the data analysis module 340 is further configured to: analyze the original data set based on the isolated forest algorithm to obtain data abnormalities; call an association rule analysis model to perform correlation analysis on the data abnormalities to obtain data analysis results.
  • the data analysis module 340 is further configured to: perform average path analysis on the original data set based on the isolated forest algorithm to obtain the average path length of the original data set; It is expected to conduct analysis to determine the abnormal points of the data.
  • the data analysis device based on natural language processing further includes: a preset analysis report matching module 360, which is used to call a search engine to compare the word segmentation structure with each preset analysis preset in the search engine
  • the report performs matching analysis to obtain the matching degree of each of the preset analysis reports; when there is a preset analysis report whose matching degree reaches the preset matching degree threshold in each preset analysis report, it will reach the preset matching degree threshold.
  • Set the analysis report as the analysis report corresponding to the information to be analyzed; when the matching degree of each preset analysis report does not reach the preset matching degree threshold, execute the step of calling the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set.
  • the data analysis device based on natural language processing further includes: a preset analysis report preset module 370, configured to count the analysis frequency of the information to be analyzed in the current preset period based on the preset period, and determine the current preset period The analysis frequency of the information to be analyzed in the period; when the analysis frequency of the information to be analyzed reaches a preset threshold, the analysis report corresponding to the information to be analyzed is preset to the search engine as a preset analysis report.
  • a preset analysis report preset module 370 configured to count the analysis frequency of the information to be analyzed in the current preset period based on the preset period, and determine the current preset period The analysis frequency of the information to be analyzed in the period; when the analysis frequency of the information to be analyzed reaches a preset threshold, the analysis report corresponding to the information to be analyzed is preset to the search engine as a preset analysis report.
  • the preset analysis report preset module 360 is further configured to: determine the popularity value of the preset analysis report according to the analysis frequency of the preset analysis report corresponding to the information to be analyzed and the time preset to the search engine; Set the popularity value of the analysis report, and update the preset analysis report preset in the search engine.
  • the data analysis device based on natural language processing further includes: an analysis report recommendation module 380, configured to obtain the user's satisfaction degree based on the analysis report; when the satisfaction degree reaches a preset satisfaction degree, obtain user information of the user; Based on user information, analyze users of the same type that are similar to user needs; obtain user information of the same type of users for analysis, and determine the level of demand for the analysis report of the same type of users; when the level of demand reaches the preset level of demand, to the same type of users Send analysis report.
  • an analysis report recommendation module 380 configured to obtain the user's satisfaction degree based on the analysis report; when the satisfaction degree reaches a preset satisfaction degree, obtain user information of the user; Based on user information, analyze users of the same type that are similar to user needs; obtain user information of the same type of users for analysis, and determine the level of demand for the analysis report of the same type of users; when the level of demand reaches the preset level of demand, to the same type of users Send analysis report.
  • Each module in the above-mentioned data analysis device based on natural language processing can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer equipment includes a processor, a memory, and a network interface connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store the original data set.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a data analysis method based on natural language processing.
  • FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when the processor executes the computer program:
  • Obtain data analysis instructions which carry information to be analyzed based on natural language expression; perform semantic analysis on the information to be analyzed based on natural language processing to obtain the word segmentation structure; call the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set Analyze the anomaly of the original data set to obtain the data analysis result; extract the data analysis result into natural language based on the natural language generation technology, and generate the analysis report.
  • the processor further implements the following steps when executing the computer program: analyzing the original data set based on the isolated forest algorithm to obtain data abnormalities; calling the association rule analysis model to perform correlation analysis on the data abnormalities to obtain data analysis result.
  • the processor further implements the following steps when executing the computer program: based on the isolated forest algorithm, perform average path analysis on the original data set to obtain the average path length of the original data set; according to the average path length and the data in the original data set The expectation of path length is analyzed to determine the abnormal point of the data.
  • the processor further implements the following steps when executing the computer program: calling the search engine to perform matching analysis with each preset analysis report preset in the search engine to obtain the information of each preset analysis report. Matching degree; when there is a preset analysis report whose matching degree reaches the preset matching degree threshold in each preset analysis report, the preset analysis report that reaches the preset matching degree threshold will be used as the corresponding analysis report of the information to be analyzed; when each preset analysis report When the matching degree of the analysis report does not reach the preset matching degree threshold, the step of calling the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set is executed.
  • the processor further implements the following steps when executing the computer program: counting the analysis frequency of the information to be analyzed in the current preset period based on the preset period, and determining the analysis frequency of the information to be analyzed in the current preset period; When the analysis frequency of the analysis information reaches the preset threshold, the analysis report corresponding to the information to be analyzed is preset to the search engine as a preset analysis report.
  • the processor further implements the following steps when executing the computer program: determining the popularity value of the preset analysis report according to the analysis frequency of the information to be analyzed corresponding to the preset analysis report and the time preset to the search engine; Analyze the popularity value of the report, and update the preset analysis report preset in the search engine.
  • the processor further implements the following steps when executing the computer program: obtaining the user’s satisfaction level based on the analysis report; when the satisfaction level reaches the preset satisfaction level, obtaining the user’s user information; based on the user information, analyzing the user’s satisfaction level with the user Users of the same type with similar needs; obtain user information of users of the same type for analysis to determine the degree of demand for the same type of users for the analysis report; when the degree of demand reaches the preset demand level, the analysis report is sent to the same type of users.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • Obtain data analysis instructions which carry information to be analyzed based on natural language expression; perform semantic analysis on the information to be analyzed based on natural language processing to obtain the word segmentation structure; call the search engine to query the corresponding data according to the word segmentation structure to obtain the original data set Analyze the anomaly of the original data set to obtain the data analysis result; extract the data analysis result into natural language based on the natural language generation technology, and generate the analysis report.
  • the following steps are also implemented: analyze the original data set based on the isolated forest algorithm to obtain data abnormalities; call the association rule analysis model to perform correlation analysis on the data abnormalities to obtain data Analyze the results.
  • the following steps are also implemented: perform average path analysis on the original data set based on the isolated forest algorithm to obtain the average path length of the original data set; according to the average path length and each data in the original data set The expected path length is analyzed to determine the abnormal point of the data.
  • the following steps are also implemented: call the search engine to perform matching analysis with each preset analysis report preset in the search engine according to the word segmentation structure, and obtain each of the preset analysis reports When there is a preset analysis report whose matching degree reaches the preset matching degree threshold in each preset analysis report, the preset analysis report that reaches the preset matching degree threshold will be regarded as the corresponding analysis report of the information to be analyzed; when each preset analysis report When the matching degree of the analysis report does not reach the preset matching degree threshold, the step of calling the search engine to query the corresponding data according to the word segmentation structure is executed to obtain the original data set.
  • the following steps are further implemented: counting the analysis frequency of the information to be analyzed in the current preset period based on the preset period, and determining the analysis frequency of the information to be analyzed in the current preset period; When the analysis frequency of the information to be analyzed reaches the preset threshold, the analysis report corresponding to the information to be analyzed is preset to the search engine as a preset analysis report.
  • the following steps are further implemented: determine the popularity value of the preset analysis report according to the analysis frequency of the information to be analyzed corresponding to the preset analysis report and the time preset to the search engine; Set the popularity value of the analysis report, and update the preset analysis report preset in the search engine.
  • the following steps are also implemented: obtaining the user’s satisfaction degree based on the analysis report; when the satisfaction degree reaches the preset satisfaction degree, obtaining the user information of the user; based on the user information, analyzing and analyzing Users of the same type with similar user needs; obtain user information of the same type of users for analysis, and determine the degree of demand for the same type of user for the analysis report; when the degree of demand reaches the preset demand level, the analysis report is sent to the same type of user.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical storage.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM may be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
  • the blockchain referred to in the present invention is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Abstract

本申请涉及人工智能,提供一种基于自然语言处理的数据分析方法、装置、计算机设备和存储介质。所述方法包括:获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;对所述原始数据集进行异常分析,获得数据分析结果;基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成待分析信息对应的分析报告。此外,本发明还涉及区块链技术,原始数据集可存储于区块链中。采用本方法使用户通过将需要分析的待分析信息用自然语言输入,发起数据分析指令可获得分析报告,降低数据分析的技术门槛。

Description

基于自然语言处理的数据分析方法、装置和计算机设备
本申请要求于2020年06月29日提交中国专利局、申请号为202010604394.7,发明名称为“基于自然语言处理的数据分析方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种基于自然语言处理的数据分析方法、装置、计算机设备和存储介质。
背景技术
随着计算机技术的发展,出现了数据分析技术,数据分析是对原始数据进行分析来寻找导致现状的根因,通过建立数据分析模型与预测模型进行逐层抽象、降维、概括和解读,最终利用数据支撑实现业务增长。
虽然数据分析的价值已得到广泛认可,对于像Hadoop(分布式系统基础架构)、非结构化数据库、数据可视化工具这样的技术及工具,需要较高的技术基础的专业数据分析师,才能使得企业或企业中的某些部门将其运用到真实的业务场景中。
发明人意识到目前的数据分析的技术门槛高,导致无法高效的利用数据,充分发挥数据价值。
技术问题
基于此,有必要针对上述技术问题,提供一种能够降低数据分析的技术门槛的基于自然语言处理的数据分析方法、装置、计算机设备和存储介质。
技术解决方案
一种基于自然语言处理的数据分析方法,所述方法包括:
获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
对所述原始数据集进行异常分析,获得数据分析结果;
基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
一种基于自然语言处理的数据分析装置,所述装置包括:
数据分析指令获取模块,用于获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
语义解析模块,用于基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
数据查询模块,用于调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
数据分析模块,用于对所述原始数据集进行异常分析,获得数据分析结果;
分析报告生成模块,用于基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现基于自然语言处理的数据分析方法,包括:
获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
对所述原始数据集进行异常分析,获得数据分析结果;
基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现基于自然语言处理的数据分析方法,包括:
获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
对所述原始数据集进行异常分析,获得数据分析结果;
基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
有益效果
上述基于自然语言处理的数据分析方法、装置、计算机设备和存储介质,用户将需要分析的待分析信息用自然语言输入,即可发起数据分析指令,基于自然语言处理对数据分析指令中的待分析信息进行语义解析,获得分词结构,调用搜索引擎根据分词结构查询对应的数据,获得原始数据集;对原始数据集进行异常分析,获得数据分析结果,再基于自然语言生成技术将数据分析结果提炼为自然语言,生成待分析信息对应分析报告,使得用户通过将需要分析的待分析信息用自然语言输入,发起数据分析指令即可获得对应的分析报告,降低数据分析的技术门槛,从而高效的利用数据,充分发挥数据价值。
附图说明
图1为一个实施例中基于自然语言处理的数据分析方法的应用环境图;
图2为一个实施例中基于自然语言处理的数据分析方法的流程示意图;
图3为一个实施例中基于自然语言处理的数据分析方法的其中一个步骤的流程示意图;
图4为一个实施例中基于自然语言处理的数据分析装置的结构框图;
图5为另一个实施例中基于自然语言处理的数据分析装置的结构框图;
图6为一个实施例中计算机设备的内部结构图。
本发明最佳的实施方式
本申请提供的基于自然语言处理的数据分析方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。服务器104获取用户通过终端102发送的数据分析指令,数据分析指令中携带有基于自然语言表述的待分析信息;服务器104基于自然语言处理对待分析信息进行语义解析,获得分词结构;调用搜索引擎根据分词结构查询对应的数据,获得原始数据集;对原始数据集进行异常分析,获得数据分析结果;基于自然语言生成技术将数据分析结果提炼为自然语言,生成待分析信息对应分析报告。
还可以是服务器104根据预设的数据分析指令触发时间周期,自动触发数据分析指令,获取数据分析指令中携带有基于自然语言表述的待分析信息;服务器104基于自然语言处理对待分析信息进行语义解析,获得分词结构;调用搜索引擎根据分词结构查询对应的数据,获得原始数据集;对原始数据集进行异常分析,获得数据分析结果;基于自然语言生成技术将数据分析结果提炼为自然语言,生成待分析信息对应分析报告。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种基于自然语言处理的数据分析方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S220,获取数据分析指令,数据分析指令中携带有基于自然语言表述的待分析信息。
其中,数据分析指令是用于指示服务器执行数据分析的指令,该数据分析指令中携带有基于自然语言表述的待分析信息。自然语言是自然地随文化演化用于人类交流和思维的语言。待分析信息采用自然语言描述需要进行数据分析的内容信息。
在一个场景中,用户需要知道微信近三个月活跃怎么样,用户可以通过服务器开放给 终端的待分析信息输入接口,输入“微信近三个月活跃怎么样”终端基于输入的“微信近三个月活跃怎么样”生成数据分析指令,发送至服务器。
步骤S240,基于自然语言处理对待分析信息进行语义解析,获得分词结构。
其中,自然语言处理(NLP)是实现人与计算机之间用自然语言进行有效通信的各种理论和方法。语义解析是运用自然语言处理的各种方法,理解一段文本所表示的语义内容。分词结构是将待分析信息拆分为主体+时间+限定词+目的的结构。
以待分析信息是“微信近三个月活跃怎么样”为例,基于自然语言处理,通过命名实体识别(NER,Named-entity recognition,是指识别文本中具有特定意义的实体)、词性标注(是将单词的词性按其含义和上下文内容进行标记)、词干化处理(是把一些名词的复数去掉,动词的不同时态去掉等等类似的处理)、语句语法树的构造(构造的句子结构的图形表示)、指代关系(确定待分析信息中各个词或符号所表示含义)等处理,对“微信近三个月活跃怎么样”进行拆分,获得微信+近三个月+活跃+怎么样的分词结构,主体是“微信”,时间是“近三个月”,限定是“活跃”,目的是“怎么样”。
步骤S260,调用搜索引擎根据分词结构查询对应的数据,获得原始数据集。
其中,搜索引擎是根据用户需求与一定算法,运用特定策略从互联网检索出制定信息反馈给用户的一门检索技术。该搜索引擎可以是Elasticsearch,Elasticsearch一个分布式多用户能力的全文搜索引擎,在Elasticsearch搜索引擎搜索中,预先按照主体、时间、限定词和类型中的一个以上搜索字段设置了Query DSL(通用的查询框架)结构化查询的查询语句。原始数据集是通过搜索引擎根据分词结构搜索到的所有到的数据,需要强调的是,为进一步保证上述原始数据集中的数据的私密和安全性,上述原始数据集中的数据还可以存储于一区块链的节点中。
在一个实施例中,基于预先按照主体、时间、限定词和类型中的一个以上搜索字段设置的Query DSL结构化查询的查询语句,将分词结构对应填入查询语句中,构成完整的查询语句,执行该完整的查询语句,向数据库查询对应的数据,查询到的数据即为原始数据集中的数据。其中,数据库中的所有数据,都预先按照主体、时间、限定词和类型中的一个以上特征对数据进行提取,将各数据对应的主体、时间、限定词和类型中的一个以上特征与数据关联。当基于预先按照主体、时间、限定词和类型中的一个以上搜索字段设置的Query DSL结构化查询的查询语句向数据库查询时,可以查询到对应的数据。
步骤S280,对原始数据集进行异常分析,获得数据分析结果。
其中,异常分析是基于原始数据集中的数据进行异常数据挖掘,基于异常数据挖掘得出是否异常,进一步确定异常点,根据异常点对应的数据进行相关性分析,得出异常点出现的原因,根据异常点以及异常出现的原因得出数据分析结果。
步骤S300,基于自然语言生成技术将数据分析结果提炼为自然语言,生成待分析信息对应分析报告。
其中,自然语言生成技术是一种利用人工智能和计算语言学生成自然语言的技术,将结构化数据转换为文本,以人类语言表达。分析报告是将数据分析结果用自然语言表述出来展示。基于语言模型(是基于自然语言生成技术,训练用于将数据分析结果提炼为自然语言的模型)预测下一个可能出现的词语,也就是找到词语在序列中的概率分析。例如预测“活跃率下降的原因”的下一个单词,语言模型会预测下一个单词,如“A 1”,“B 3”可能出现的概率,根据概率的高低确定“活跃率下降的原因”后面接的是“A 1”还是“B 3”,当出现“A 1”的概率比出现“B 3”的概率高,则提炼为自然语言的结果为“活跃率下降的原因是A 1”。该分析报告可以发送给终端进行显示,用户可以下载和查看。需要强调的是,为进一步保证上述分析报告的私密和安全性,上述分析报告还可以存储于一区块链的节点中。
上述基于自然语言处理的数据分析方法中,用户将需要分析的待分析信息用自然语言 输入,即可发起数据分析指令,基于自然语言处理对数据分析指令中的待分析信息进行语义解析,获得分词结构,调用搜索引擎根据分词结构搜索对应的数据,获得原始数据集;对原始数据集进行异常分析,获得数据分析结果,再基于自然语言生成技术将数据分析结果提炼为自然语言,生成待分析信息对应分析报告,使得用户通过将需要分析的待分析信息用自然语言输入,发起数据分析指令即可获得对应的分析报告,降低数据分析的技术门槛,从而高效的利用数据,充分发挥数据价值。
在一个实施例中,对原始数据集进行异常分析,获得数据分析结果,包括:基于孤立森林算法对原始数据集进行分析,获得数据异常点;调用关联规则分析模型对数据异常点进行相关性分析,获得数据分析结果。
其中,孤立森林算法(Isolation Forest)是一种适用于连续数据的无监督异常检测方法。数据异常点是在孤立森林中,递归地随机分割原始数据集,直到所有的原始数据集中的数据对应的点都孤立,在这种随机分割的策略下,异常点通常具有较短的路径孤立出来的点。关联规则分析模型是进行相关性分析模型,是通过大量样本数据训练出来的关联规则分析模型,关联规则分析模型可以是基于Apriori算法训练的,Apriori算法是关联规则挖掘算法,利用逐层搜索的迭代方法找出数据库中项集的关系,以形成规则,其过程由连接(类矩阵运算)与剪枝(去掉那些没必要的中间结果)组成,该算法中项集的概念即为项的集合,包含K个项的集合为k项集,项集出现的频率是包含项集的事务数,称为项集的频率,如果某项集满足最小支持度,则称它为频繁项集。
在一个实施例中,基于Apriori算法训练的关联规则分析模型对数据异常点对应的数据集(包含活跃K(活跃K为指标)及维度项的详单数据){K|,A 1,A 2,B 1,B 2,B 3……N 1,N 2}扫描,从中筛选出包含K的频繁项集L,对于L的所有非空子集S,如果P(M∪N∪T/K)≥min_conf(置信度阈值,可自定义),则该频繁项集S(K,M,N,T)为活跃的相关集合。(其中,M=A 1,N=B 3,T=N 2),按照影响程度得出A 1、B 3、N 2的维度项,并进行排序,排序后获得的序列即为数据分析结果。
在一个实施例中,基于孤立森林算法对原始数据集进行分析,获得数据异常点,包括:
基于孤立森林算法对原始数据集进行平均路径分析,获得原始数据集的平均路径长度;根据平均路径长度和原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
其中,首先选择孤立森林算法进行异常数据挖掘,以场景需要进一步分析近三个月数据是否有异常为例:原始数据集为近三月n个样本的数据集,根据孤立森林算法计算平均路径长度为:
Figure PCTCN2020124735-appb-000001
其中H(i)为调和数,c(n)为给定样本数 n时,路径长度的平均值。
每份样本x的异常得分定义为:
Figure PCTCN2020124735-appb-000002
其中,E(h(x))为样本x在一批孤立树中的路径长度的期望。当E(h(x))→0时,s→1,可以判定为数据异常点。
在一个实施例中,请参阅图3,基于自然语言处理对待分析信息进行语义解析,获得分词结构的步骤之后,还包括:
步骤S420,调用搜索引擎将分词结构,与搜索引擎中预置的各预置分析报告进行匹配度分析,获得各预置分析报告的匹配度。
步骤S440,当各预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到预设匹配度阈值的预置分析报告作为待分析信息对应分析报告。
步骤S460,当各预置分析报告的匹配度都未达到预设匹配度阈值时,根据分词结构查询对应的数据,获得原始数据集,开始执行步骤S280。
其中,预置分析报告是根据历史执行数据分析时获得的分析报告,预置分析报告是在当前预设周期内频繁被分析生成的分析报告。基于搜索引擎的lucene的评分机制将分词结构与搜索引擎中预置的各预置分析报告进行匹配度分析,lucene的评分机制是基于一个评分算法来计算所有文档和搜索语句的相关评分,该评分算法可以是TF/IDF算法(词频算法),TF/IDF算法为:score(q,d)=queryNorm(q)·coord(q,d)·∑(tf(tind)·idf(t)2·t.getBoost()·norm(t,d))(tinq);score(q,d)是匹配度;coord(q,d)是评分因子,基于历史执行数据分析中出现查询项的个数,越多的查询项在历史执行数据分析中,说明匹配度越高;queryNorm(q)是查询的标准查询;tf(tind)指项t在历史执行数据分析中出现的次数,具体值为次数的开根号;idf(t)反转分析频次频率,出现项t的分析频次;t.getBoost查询时候查询项加权,该加权为各预置分析报告的热度值;norm(t,d)长度相关的加权因子。
在一个实施例中,预置分析报告的预置方式,包括:基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内待分析信息的分析频次;当待分析信息的分析频次达到预设阈值时,将待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
其中,预设周期可以根据实际情况设定,如:半个月、一个月、一天等等。当前预设周期内根据预设周期确定,当预设周期为半个月,当前预设周期内则为当前半个月内,当预设周期为一个月,当前预设周期内则为当前一个月内,当预设周期为一天,当前预设周期内则为当前一天内。待分析信息的分析频次是在当前预设周期内,获取到的数据分析指令中携带该待分析信息的次数,还可以将与待分析信息的语义相同的待分析信息也作为是出现该待分析信息。预设阈值用于过滤分析频次低的待分析信息,得出分析频次高的待分析信息,可以根据分析频次高低的衡量尺度进行设置。
在一个实施例中,当待分析信息的分析频次达到预设阈值时,将待分析信息对应的分析报告作为预置分析报告预置到搜索引擎中的步骤之后,还包括:根据预置分析报告对应待分析信息的分析频次及预置到搜索引擎的时间,确定预置分析报告的热度值;根据预置分析报告的热度值,对搜索引擎中预置的预置分析报告进行更新。
其中,预置到搜索引擎的时间是待分析信息被判定为是高频热门的分析报告的时间。预置到搜索引擎的时间距离当前时间越远,对预置分析报告的热度值进行递减,如:value值(即,热度值)为:value=16/(Ttoday+1-Tcreate),其中:Ttoday为当前日期,Tcreate为预置到搜索引擎的时间。预置分析报告对应待分析信息的分析频次越多,对预置分析报告的热度值进行增加,可以预先设定每多增加一次分析频次增加预设热度值。根据预置分析报告的热度值,对搜索引擎中预置的预置分析报告进行更新,可以基于每日T+1式的统计更新方式,对搜索引擎中预置的预置分析报告进行更新,预置分析报告的热度值低于阈值的取消预置,预置分析报告的热度值超过阈值的预置为预置分析报告。
在一个实施例中,基于自然语言处理的数据分析方法还包括:获取用户基于分析报告的满意程度。当满意程度达到预设满意程度时,获取用户的用户信息。基于用户信息,分析出与用户需求相似的同类型用户。获取同类型用户的用户信息进行分析,确定同类型用户对分析报告的需求程度。当需求程度达到预设需求程度时,向同类型用户发送分析报告。
其中,用户基于分析报告进行打分,该打分可以是分值,如90分、80分等等,也可以是满意、不满意、非常满意、一般等等,根据用户的打分确定用户基于分析报告的满意程度,该满意程度可以是满意、不满意、非常满意、一般等等。预设满意程度可以根据实际情况设定,如:满意和非常满意等等。用户信息可以是用户的职业、年龄、性别、所处行业、爱好等等。同类型用户是与该用户有相似需求的用户,比如:发送数据分析指令的用户是微信的运营人员,相似的同类型用户可以是其他微信的运营人员,发送数据分析指令的用户看过且认可的分析报告也会推荐给其他微信的运营人员,无需其他微信的运营人员再次经过上面的流程得到分析报告。
应该理解的是,虽然2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图4所示,提供了一种基于自然语言处理的数据分析装置,包括:数据分析指令获取模块310、语义解析模块320、数据查询模块330、数据分析模块340和分析报告生成模块350,其中:
数据分析指令获取模块310,用于获取数据分析指令,数据分析指令中携带有基于自然语言表述的待分析信息;
语义解析模块320,用于基于自然语言处理对待分析信息进行语义解析,获得分词结构;
数据查询模块330,用于调用搜索引擎根据分词结构查询对应的数据,获得原始数据集;
数据分析模块340,用于对原始数据集进行异常分析,获得数据分析结果;
分析报告生成模块350,用于基于自然语言生成技术将数据分析结果提炼为自然语言,生成分析报告。
在一个实施例中,数据分析模块340还用于:基于孤立森林算法对原始数据集进行分析,获得数据异常点;调用关联规则分析模型对数据异常点进行相关性分析,获得数据分析结果。
在一个实施例中,数据分析模块340还用于:基于孤立森林算法对原始数据集进行平均路径分析,获得原始数据集的平均路径长度;根据平均路径长度和原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
请参阅图5,在一个实施例中,基于自然语言处理的数据分析装置还包括:预置分析报告匹配模块360,用于调用搜索引擎将分词结构,与搜索引擎中预置的各预置分析报告进行匹配度分析,获得各所述预置分析报告的匹配度;当各预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到预设匹配度阈值的预置分析报告作为待分析信息对应分析报告;当各预置分析报告的匹配度都未达到预设匹配度阈值时,执行调用搜索引擎根据分词结构查询对应的数据,获得原始数据集的步骤。
在一个实施例中,基于自然语言处理的数据分析装置还包括:预置分析报告预置模块370,用于基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内待分析信息的分析频次;当待分析信息的分析频次达到预设阈值时,将待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
在一个实施例中,预置分析报告预置模块360还用于:根据预置分析报告对应待分析信息的分析频次及预置到搜索引擎的时间,确定预置分析报告的热度值;根据预置分析报告的热度值,对搜索引擎中预置的预置分析报告进行更新。
在一个实施例中,基于自然语言处理的数据分析装置还包括:分析报告推荐模块380,用于获取用户基于分析报告的满意程度;当满意程度达到预设满意程度时,获取用户的用户信息;基于用户信息,分析出与用户需求相似的同类型用户;获取同类型用户的用户信息进行分析,确定同类型用户对分析报告的需求程度;当需求程度达到预设需求程度时,向同类型用户发送分析报告。
关于基于自然语言处理的数据分析装置的具体限定可以参见上文中对于基于自然语言处理的数据分析方法的限定,在此不再赘述。上述基于自然语言处理的数据分析装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌 于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储原始数据集。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于自然语言处理的数据分析方法。
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:
获取数据分析指令,数据分析指令中携带有基于自然语言表述的待分析信息;基于自然语言处理对待分析信息进行语义解析,获得分词结构;调用搜索引擎根据分词结构查询对应的数据,获得原始数据集;对原始数据集进行异常分析,获得数据分析结果;基于自然语言生成技术将数据分析结果提炼为自然语言,生成分析报告。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:基于孤立森林算法对原始数据集进行分析,获得数据异常点;调用关联规则分析模型对数据异常点进行相关性分析,获得数据分析结果。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:基于孤立森林算法对原始数据集进行平均路径分析,获得原始数据集的平均路径长度;根据平均路径长度和原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:调用搜索引擎将分词结构,与搜索引擎中预置的各预置分析报告进行匹配度分析,获得各所述预置分析报告的匹配度;当各预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到预设匹配度阈值的预置分析报告作为待分析信息对应分析报告;当各预置分析报告的匹配度都未达到预设匹配度阈值时,执行调用搜索引擎根据分词结构查询对应的数据,获得原始数据集的步骤。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内待分析信息的分析频次;当待分析信息的分析频次达到预设阈值时,将待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:根据预置分析报告对应待分析信息的分析频次及预置到搜索引擎的时间,确定预置分析报告的热度值;根据预置分析报告的热度值,对搜索引擎中预置的预置分析报告进行更新。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取用户基于分析报告的满意程度;当满意程度达到预设满意程度时,获取用户的用户信息;基于用户信息,分析出与用户需求相似的同类型用户;获取同类型用户的用户信息进行分析,确定同类型用户对分析报告的需求程度;当需求程度达到预设需求程度时,向同类型用户发送分析报告。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
获取数据分析指令,数据分析指令中携带有基于自然语言表述的待分析信息;基于自然语言处理对待分析信息进行语义解析,获得分词结构;调用搜索引擎根据分词结构查询 对应的数据,获得原始数据集;对原始数据集进行异常分析,获得数据分析结果;基于自然语言生成技术将数据分析结果提炼为自然语言,生成分析报告。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于孤立森林算法对原始数据集进行分析,获得数据异常点;调用关联规则分析模型对数据异常点进行相关性分析,获得数据分析结果。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于孤立森林算法对原始数据集进行平均路径分析,获得原始数据集的平均路径长度;根据平均路径长度和原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:调用搜索引擎根据分词结构,与搜索引擎中预置的各预置分析报告进行匹配度分析,获得各所述预置分析报告的匹配度;当各预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到预设匹配度阈值的预置分析报告作为待分析信息对应分析报告;当各预置分析报告的匹配度都未达到预设匹配度阈值时,执行调用搜索引擎根据分词结构查询对应的数据,获得原始数据集的步骤。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内待分析信息的分析频次;当待分析信息的分析频次达到预设阈值时,将待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:根据预置分析报告对应待分析信息的分析频次及预置到搜索引擎的时间,确定预置分析报告的热度值;根据预置分析报告的热度值,对搜索引擎中预置的预置分析报告进行更新。
在一实施例中,计算机程序被处理器执行时还实现以下步骤:获取用户基于分析报告的满意程度;当满意程度达到预设满意程度时,获取用户的用户信息;基于用户信息,分析出与用户需求相似的同类型用户;获取同类型用户的用户信息进行分析,确定同类型用户对分析报告的需求程度;当需求程度达到预设需求程度时,向同类型用户发送分析报告。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种基于自然语言处理的数据分析方法,其中,所述方法包括:
    获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
    基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
    调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
    对所述原始数据集进行异常分析,获得数据分析结果;
    基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
  2. 根据权利要求1所述的方法,其中,对所述原始数据集进行异常分析,获得数据分析结果,包括:
    基于孤立森林算法对所述原始数据集进行分析,获得数据异常点;
    调用关联规则分析模型对所述数据异常点进行相关性分析,获得数据分析结果。
  3. 根据权利要求2所述的方法,其中,所述基于孤立森林算法对所述原始数据集进行分析,获得数据异常点,包括:
    基于孤立森林算法对所述原始数据集进行平均路径分析,获得所述原始数据集的平均路径长度;
    根据所述平均路径长度和所述原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
  4. 根据权利要求1所述的方法,其中,所述基于自然语言处理对所述待分析信息进行语义解析,获得分词结构的步骤之后,还包括:
    调用搜索引擎将所述分词结构,与所述搜索引擎中预置的各预置分析报告进行匹配度分析,获得各所述预置分析报告的匹配度;
    当各所述预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到所述预设匹配度阈值的预置分析报告作为所述待分析信息对应分析报告;
    当各所述预置分析报告的匹配度都未达到预设匹配度阈值时,执行所述调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集的步骤。
  5. 根据权利要求4所述的方法,其中,所述预置分析报告的预置方式,包括:
    基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内所述待分析信息的分析频次;
    当所述待分析信息的分析频次达到预设阈值时,将所述待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
  6. 根据权利要求5所述的方法,其中,所述当所述待分析信息的分析频次达到预设阈值时,将所述待分析信息对应的分析报告作为预置分析报告预置到搜索引擎中的步骤之后,还包括:
    根据所述预置分析报告对应待分析信息的分析频次及预置到所述搜索引擎的时间,确定所述预置分析报告的热度值;
    根据所述预置分析报告的热度值,对所述搜索引擎中预置的预置分析报告进行更新。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取用户基于所述分析报告的满意程度;
    当所述满意程度达到预设满意程度时,获取所述用户的用户信息;
    基于所述用户信息,分析出与所述用户需求相似的同类型用户;
    获取所述同类型用户的用户信息进行分析,确定所述同类型用户对所述分析报告的需求程度;
    当所述需求程度达到预设需求程度时,向所述同类型用户发送所述分析报告。
  8. 一种基于自然语言处理的数据分析装置,其中,所述装置包括:
    数据分析指令获取模块,用于获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
    语义解析模块,用于基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
    数据查询模块,用于调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
    数据分析模块,用于对所述原始数据集进行异常分析,获得数据分析结果;
    分析报告生成模块,用于基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现基于自然语言处理的数据分析方法,包括:
    获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
    基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
    调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
    对所述原始数据集进行异常分析,获得数据分析结果;
    基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
  10. 根据权利要求9所述的计算机设备,其中,对所述原始数据集进行异常分析,获得数据分析结果,包括:
    基于孤立森林算法对所述原始数据集进行分析,获得数据异常点;
    调用关联规则分析模型对所述数据异常点进行相关性分析,获得数据分析结果。
  11. 根据权利要求10所述的计算机设备,其中,所述基于孤立森林算法对所述原始数据集进行分析,获得数据异常点,包括:
    基于孤立森林算法对所述原始数据集进行平均路径分析,获得所述原始数据集的平均路径长度;
    根据所述平均路径长度和所述原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
  12. 根据权利要求9所述的计算机设备,其中,所述基于自然语言处理对所述待分析信息进行语义解析,获得分词结构的步骤之后,还包括:
    调用搜索引擎将所述分词结构,与所述搜索引擎中预置的各预置分析报告进行匹配度分析,获得各所述预置分析报告的匹配度;
    当各所述预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到所述预设匹配度阈值的预置分析报告作为所述待分析信息对应分析报告;
    当各所述预置分析报告的匹配度都未达到预设匹配度阈值时,执行所述调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集的步骤。
  13. 根据权利要求12所述的计算机设备,其中,所述预置分析报告的预置方式,包括:
    基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内所述待分析信息的分析频次;
    当所述待分析信息的分析频次达到预设阈值时,将所述待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
  14. 根据权利要求13所述的计算机设备,其中,所述当所述待分析信息的分析频次达到预设阈值时,将所述待分析信息对应的分析报告作为预置分析报告预置到搜索引擎中的步骤之后,还包括:
    根据所述预置分析报告对应待分析信息的分析频次及预置到所述搜索引擎的时间,确定所述预置分析报告的热度值;
    根据所述预置分析报告的热度值,对所述搜索引擎中预置的预置分析报告进行更新。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现基于自然语言处理的数据分析方法,包括:
    获取数据分析指令,所述数据分析指令中携带有基于自然语言表述的待分析信息;
    基于自然语言处理对所述待分析信息进行语义解析,获得分词结构;
    调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集;
    对所述原始数据集进行异常分析,获得数据分析结果;
    基于自然语言生成技术将所述数据分析结果提炼为自然语言,生成所述待分析信息对应分析报告。
  16. 根据权利要求15所述的计算机可读存储介质,其中,对所述原始数据集进行异常分析,获得数据分析结果,包括:
    基于孤立森林算法对所述原始数据集进行分析,获得数据异常点;
    调用关联规则分析模型对所述数据异常点进行相关性分析,获得数据分析结果。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述基于孤立森林算法对所述原始数据集进行分析,获得数据异常点,包括:
    基于孤立森林算法对所述原始数据集进行平均路径分析,获得所述原始数据集的平均路径长度;
    根据所述平均路径长度和所述原始数据集中各数据的路径长度的期望进行分析,确定数据异常点。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述基于自然语言处理对所述待分析信息进行语义解析,获得分词结构的步骤之后,还包括:
    调用搜索引擎将所述分词结构,与所述搜索引擎中预置的各预置分析报告进行匹配度分析,获得各所述预置分析报告的匹配度;
    当各所述预置分析报告中存在匹配度达到预设匹配度阈值的预置分析报告时,将达到所述预设匹配度阈值的预置分析报告作为所述待分析信息对应分析报告;
    当各所述预置分析报告的匹配度都未达到预设匹配度阈值时,执行所述调用搜索引擎根据所述分词结构查询对应的数据,获得原始数据集的步骤。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述预置分析报告的预置方式,包括:
    基于预设周期统计当前预设周期内待分析信息的分析频次,确定在当前预设周期内所述待分析信息的分析频次;
    当所述待分析信息的分析频次达到预设阈值时,将所述待分析信息对应的分析报告作为预置分析报告预置到所述搜索引擎中。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述当所述待分析信息的分析频次达到预设阈值时,将所述待分析信息对应的分析报告作为预置分析报告预置到搜索引擎中的步骤之后,还包括:
    根据所述预置分析报告对应待分析信息的分析频次及预置到所述搜索引擎的时间,确定所述预置分析报告的热度值;
    根据所述预置分析报告的热度值,对所述搜索引擎中预置的预置分析报告进行更新。
PCT/CN2020/124735 2020-06-29 2020-10-29 基于自然语言处理的数据分析方法、装置和计算机设备 WO2021139343A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010604394.7A CN111753527A (zh) 2020-06-29 2020-06-29 基于自然语言处理的数据分析方法、装置和计算机设备
CN202010604394.7 2020-06-29

Publications (1)

Publication Number Publication Date
WO2021139343A1 true WO2021139343A1 (zh) 2021-07-15

Family

ID=72678387

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124735 WO2021139343A1 (zh) 2020-06-29 2020-10-29 基于自然语言处理的数据分析方法、装置和计算机设备

Country Status (2)

Country Link
CN (1) CN111753527A (zh)
WO (1) WO2021139343A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753527A (zh) * 2020-06-29 2020-10-09 平安科技(深圳)有限公司 基于自然语言处理的数据分析方法、装置和计算机设备
CN112732743B (zh) * 2021-01-12 2023-09-22 北京久其软件股份有限公司 一种基于中文自然语言的数据分析方法及装置
CN113283760B (zh) * 2021-05-31 2023-04-18 浙江环玛信息科技有限公司 案件流程分析报告生成方法及系统
CN115438142B (zh) * 2021-06-02 2023-07-11 戎易商智(北京)科技有限公司 一种对话式交互数据分析报告系统
CN113449509A (zh) * 2021-08-05 2021-09-28 湖南特能博世科技有限公司 文本分析方法、装置及计算机设备
CN114330370B (zh) * 2022-03-17 2022-05-20 天津思睿信息技术有限公司 一种基于人工智能的自然语言处理系统及处理方法
CN115221374B (zh) * 2022-09-20 2022-11-25 华谱科仪(北京)科技有限公司 基于色谱数据分析的推送方法、装置及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107196953A (zh) * 2017-06-14 2017-09-22 上海丁牛信息科技有限公司 一种基于用户行为分析的异常行为检测方法
CN109388740A (zh) * 2017-08-06 2019-02-26 北京国双科技有限公司 一种网络信息传播效果的监测方法及装置
CN109948669A (zh) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 一种异常数据检测方法及装置
CN110147541A (zh) * 2019-05-23 2019-08-20 北京神州泰岳软件股份有限公司 一种经济报告的生成方法及装置
US10535003B2 (en) * 2013-09-20 2020-01-14 Namesforlife, Llc Establishing semantic equivalence between concepts
CN111753527A (zh) * 2020-06-29 2020-10-09 平安科技(深圳)有限公司 基于自然语言处理的数据分析方法、装置和计算机设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020164B (zh) * 2012-11-26 2015-06-10 华北电力大学 一种基于多语义分析和个性化排序的语义检索方法
US20190095444A1 (en) * 2017-09-22 2019-03-28 Amazon Technologies, Inc. Voice driven analytics
CN108241735A (zh) * 2017-12-05 2018-07-03 新华智云科技有限公司 一种数据分析方法及设备
CN109976930A (zh) * 2017-12-28 2019-07-05 腾讯科技(深圳)有限公司 异常数据的检测方法、系统及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10535003B2 (en) * 2013-09-20 2020-01-14 Namesforlife, Llc Establishing semantic equivalence between concepts
CN107196953A (zh) * 2017-06-14 2017-09-22 上海丁牛信息科技有限公司 一种基于用户行为分析的异常行为检测方法
CN109388740A (zh) * 2017-08-06 2019-02-26 北京国双科技有限公司 一种网络信息传播效果的监测方法及装置
CN109948669A (zh) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 一种异常数据检测方法及装置
CN110147541A (zh) * 2019-05-23 2019-08-20 北京神州泰岳软件股份有限公司 一种经济报告的生成方法及装置
CN111753527A (zh) * 2020-06-29 2020-10-09 平安科技(深圳)有限公司 基于自然语言处理的数据分析方法、装置和计算机设备

Also Published As

Publication number Publication date
CN111753527A (zh) 2020-10-09

Similar Documents

Publication Publication Date Title
WO2021139343A1 (zh) 基于自然语言处理的数据分析方法、装置和计算机设备
US11449538B2 (en) Method and system for high performance integration, processing and searching of structured and unstructured data
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US10706113B2 (en) Domain review system for identifying entity relationships and corresponding insights
US10331684B2 (en) Generating answer variants based on tables of a corpus
US9318027B2 (en) Caching natural language questions and results in a question and answer system
US9483519B2 (en) Authorship enhanced corpus ingestion for natural language processing
US9785684B2 (en) Determining temporal categories for a domain of content for natural language processing
Chen et al. Mining user requirements to facilitate mobile app quality upgrades with big data
US20160196265A1 (en) Tailoring Question Answer Results to Personality Traits
Penczynski Using machine learning for communication classification
US9720962B2 (en) Answering superlative questions with a question and answer system
US10586174B2 (en) Methods and systems for finding and ranking entities in a domain specific system
US11188819B2 (en) Entity model establishment
US20160196299A1 (en) Determining Answer Stability in a Question Answering System
US20200250212A1 (en) Methods and Systems for Searching, Reviewing and Organizing Data Using Hierarchical Agglomerative Clustering
US11663518B2 (en) Cognitive system virtual corpus training and utilization
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
CN113095073B (zh) 语料标签生成方法、装置、计算机设备和存储介质
CN113095078A (zh) 关联资产确定方法、装置和电子设备
CN114925185B (zh) 交互方法、模型的训练方法、装置、设备及介质
US20210383256A1 (en) System and method for analyzing crowdsourced input information
Demmelmaier et al. Data Segmentation Using NLP: Gender and Age
CN114840666A (zh) 检索方法、装置、电子设备、存储介质和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912076

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912076

Country of ref document: EP

Kind code of ref document: A1