CN113157853A - Problem mining method and device, electronic equipment and storage medium - Google Patents

Problem mining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113157853A
CN113157853A CN202110581186.4A CN202110581186A CN113157853A CN 113157853 A CN113157853 A CN 113157853A CN 202110581186 A CN202110581186 A CN 202110581186A CN 113157853 A CN113157853 A CN 113157853A
Authority
CN
China
Prior art keywords
word
vector
initial
word vector
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110581186.4A
Other languages
Chinese (zh)
Other versions
CN113157853B (en
Inventor
罗霄
孙泽烨
陈思姣
李炫�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202110581186.4A priority Critical patent/CN113157853B/en
Publication of CN113157853A publication Critical patent/CN113157853A/en
Application granted granted Critical
Publication of CN113157853B publication Critical patent/CN113157853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a problem mining method, which comprises the following steps: preprocessing data of an original text to obtain an initial word set, and calculating the weight of each word in the initial word set in the original text; converting each word in the initial word set into a word vector by using a word vector conversion model, and screening sentence vectors meeting preset conditions from the original text according to the word vectors and the weights to obtain target sentence vectors; identifying the incidence relation between the target sentence vectors, and constructing a relation network structure of the target sentence vectors according to the incidence relation; and calculating the average distance value of each target sentence vector in the relational network structure, and generating a problem mining result of the original text according to the average distance value. The invention also provides a problem mining device, electronic equipment and a storage medium. In addition, the invention also relates to a block chain technology, and the problem mining result can be stored in the block chain. The invention can improve the accuracy of problem excavation.

Description

Problem mining method and device, electronic equipment and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a problem mining method, a problem mining device, electronic equipment and a computer-readable storage medium.
Background
Problem mining refers to the non-trivial process of revealing implicit, previously unknown and potentially valuable information from data that can be used to support business intelligence applications and decision analysis, such as customer segmentation, cross-selling, fraud detection, customer churn analysis, commodity sales forecasting, etc., and is currently widely used in banking, financial, medical, industrial, retail, and telecommunications industries. For example, in a telemarketing scenario, a business member may have a large number of conversation records with a customer, and there are some high-frequency, consistent-intent-expression dialogs in the conversation, which may be used to query the customer for real needs through problem mining.
At present, problem mining is usually realized based on a clustering method, but the clustering number is not easy to set due to the complexity of original data, so that a plurality of results exist in the problem mining, and the accuracy of problem mining is influenced.
Disclosure of Invention
The invention provides a problem mining method, a problem mining device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of problem mining.
In order to achieve the above object, the problem mining method provided by the present invention comprises:
preprocessing data in an original text to obtain an initial word set, and calculating the weight of each word in the initial word set in the original text;
converting each word in the initial word set into a word vector by using a word vector conversion model, and generating a weighted word vector according to the word vector and the weight;
according to the weighted word vector, sentence vectors meeting preset conditions are screened out from the original text to obtain target sentence vectors;
identifying the incidence relation among the target sentence vectors, and constructing a relation network structure of the target sentence vectors according to the incidence relation;
and calculating the average distance value of each target sentence vector in the relational network structure, and generating the problem mining result of the original text according to the average distance value.
Optionally, the calculating a weight of each word in the initial set of words in the original text comprises:
sequentially selecting any two words in the initial word set through traversal operation, and calculating the dependence association degree of the two words in the original text;
calculating the gravity of the two words according to the dependency relevance;
calculating the association strength of the two words according to the dependency association degree and the gravity;
and calculating the weight of any one of the two words according to the association strength.
Optionally, the converting each word in the initial word set into a word vector using a word vector conversion model includes:
vector coding is carried out on each word in the initial word set by utilizing a coding layer in the word vector conversion model, and an initial word vector is generated;
and performing sequence extraction on the initial word vector by utilizing a feedforward attention mechanism in the word vector conversion model to obtain a word vector.
Optionally, the vector-coding each word in the initial word set by using a coding layer in the word vector conversion model to generate an initial word vector includes:
index coding is carried out on each word in the initial word set by utilizing the coding layer to obtain a word coding index;
and constructing a vector matrix for the word coding index to generate an initial word vector.
Optionally, the performing sequence extraction on the initial word vector by using a feedforward attention mechanism in the word vector conversion model to obtain a word vector includes:
querying the initial word vector using a self-attention module in the feed-forward attention mechanism;
performing feature extraction on the initial word vector by using a convolution module in the feedforward attention mechanism to obtain a feature word vector;
extracting an information sequence of the feature word vector by using an encoder in the feedforward attention mechanism;
and decoding the information sequence by using a decoder in the feedforward attention mechanism to obtain a word vector.
Optionally, the step of screening a sentence vector meeting a preset condition from the original text according to the weighted word vector to obtain a target sentence vector includes:
summarizing the weighted word vectors to generate initial sentence vectors, and calculating the average weight of the initial sentence vectors;
and screening sentence vectors meeting the preset conditions from the initial sentence vectors according to the average weight to obtain target sentence vectors.
Optionally, the identifying an association relationship between the target sentence vectors includes:
extracting a feature word vector in the target sentence vector;
and identifying the node relation among the feature word vectors to obtain the association relation among the corresponding target sentence vectors.
In order to solve the above problem, the present invention also provides a problem excavation apparatus, including:
the calculation module is used for preprocessing data in an original text to obtain an initial word set and calculating the weight of each word in the initial word set in the original text;
the conversion module is used for converting each word in the initial word set into a word vector by using a word vector conversion model and generating a weighted word vector according to the word vector and the weight;
the screening module is used for screening sentence vectors meeting preset conditions from the original text according to the weighted word vectors to obtain target sentence vectors;
the building module is used for identifying the incidence relation among the target sentence vectors and building a relation network structure of the target sentence vectors according to the incidence relation;
and the generating module is used for calculating the average distance value of each target sentence vector in the relational network structure and generating the problem mining result of the original text according to the average distance value.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to implement the problem-mining method described above.
In order to solve the above problem, the present invention also provides a computer-readable storage medium having at least one computer program stored therein, the at least one computer program being executed by a processor in an electronic device to implement the problem mining method described above.
The embodiment of the invention firstly preprocesses the data in the original text, and can screen out words with higher importance degree in the original text, thereby screening out sentences with higher importance degree in the original text and leading the text characteristics of the text sentences to have more representation significance; secondly, the embodiment of the invention establishes a corresponding relationship network structure according to the incidence relationship by identifying the incidence relationship of the sentence vectors so as to cluster the sentences with the same meaning, thereby determining the mining number of the subsequent problem mining and improving the sentence mining accuracy of the original text; further, the embodiment of the present invention generates the problem mining result of the text according to the average distance value of the sentence vectors in the relational network structure, so as to screen the sentence vector with the minimum average distance value in the relational network structure, and can identify the standard sentence in the original text, thereby improving the accuracy of the standard sentence mining of the original text. Therefore, the problem mining method, the problem mining device, the electronic equipment and the storage medium can improve the accuracy of problem mining.
Drawings
FIG. 1 is a schematic flow chart of a problem mining method according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart illustrating a step of the problem mining method provided in FIG. 1 according to a first embodiment of the present invention;
FIG. 3 is a block diagram of a problem-mining device according to an embodiment of the present invention;
fig. 4 is a schematic internal structural diagram of an electronic device implementing a problem mining method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a problem mining method. The execution subject of the problem mining method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the problem mining method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a problem mining method according to an embodiment of the present invention. In an embodiment of the present invention, the problem mining method includes:
s1, preprocessing data in the original text to obtain an initial word set, and calculating the weight of each word in the initial word set in the original text.
In an embodiment of the present invention, the original text is understood as a dialog text generated by summarizing a dialect proposed by a client in a business scenario, for example, in an insurance business consultation, where the dialect includes: the type of the dangerous species, the price of the dangerous species, the onset time of the dangerous species and the like. Further, it should be understood that data generated in an actual service scene is complex and huge in data amount, and in order to better analyze and process the original text, the method and the system provided by the invention clean the data of the original text, delete useless data in the original text, and improve the efficiency of subsequent text processing.
In detail, the preprocessing the data in the original text to obtain an initial word set includes: and segmenting the data in the original text by using a segmentation algorithm to obtain an original word set, and deleting stop words in the original word set by using a preset stop word list to obtain an initial word set.
In an optional embodiment, the word segmentation algorithm may be a currently known ending word segmentation algorithm, and the stop word list may be downloaded through a professional website, such as a hayada stop word list.
Further, the calculating a weight of each word in the initial set of words in the original text includes: sequentially selecting any two words in the initial word set through traversal operation, and calculating the dependence association degree of the two words in the original text; calculating the gravity of the two words according to the dependency relevance; calculating the association strength of the two words according to the dependency association degree and the gravity; and calculating the weight of any one of the two words according to the association strength.
And S2, converting each word in the initial word set into a word vector by using a word vector conversion model, and generating a weighted word vector according to the word vector and the weight.
In this embodiment of the present invention, the word vector transformation model is constructed by a Bert network, and is configured to transform each word in the initial word set into a word vector, where the word vector refers to a vector in which words or phrases from a vocabulary are mapped to real numbers.
Referring to fig. 2, the converting each word in the initial word set into a word vector using a word vector conversion model includes:
s20, carrying out vector coding on each word in the initial word set by utilizing a coding layer in the word vector conversion model to generate an initial word vector;
and S21, performing sequence extraction on the initial word vector by using a feedforward attention mechanism in the word vector conversion model to obtain a word vector.
In an alternative embodiment, the S20 includes: index coding is carried out on each word in the initial word set by utilizing the coding layer to obtain a word coding index; and constructing a vector matrix for the word coding index to generate an initial word vector. The index code is set based on different user requirements, for example, the initial word is: de in learning, performing index coding of 0, 1 and 2 on the terms in the de in learning to obtain a de ep coding index of 0, an in coding index of 1 and a learning coding index of 2; the vector matrix is constructed according to potential factor data corresponding to the initial term, for example, if there are 3 potential factors in the initial term, the corresponding vector matrix is constructed to be a vector matrix of 1 × 3, the potential factor refers to a feature node value of the term, for example, if the term is deep, the corresponding potential factor includes: height, depth, and thickness, etc.
In an alternative embodiment, the feedforward attention mechanism is composed of N identical Feed Forward Transformer blocks (FFTs), and each FFT includes an encoder, a decoder, a self-attention module, and a convolution module. In detail, the S21 includes: and querying the initial word vector by using a self-attention module in the feedforward attention mechanism, performing feature extraction on the initial word vector by using a convolution module in the feedforward attention mechanism to obtain a feature word vector, extracting an information sequence of the feature word vector by using an encoder in the feedforward attention mechanism, and decoding the information sequence by using a decoder in the feedforward attention mechanism to obtain a word vector.
Further, it should be understood that the word vector may well represent the association of the semantic context, but may not reflect the importance of the word to the text classification, and the weight may represent the importance of the word to the text, so the present invention generates the weighted word vector by weighting the word vector and the weight, so that the weighted word vector may better represent the text.
Furthermore, because a sentence is generated based on different word combinations, the sentence vectors meeting the preset conditions are screened from the original text according to the weighted word vectors, so as to ensure the premise of standard problem mining in the subsequent text.
In detail, the step of screening sentence vectors meeting preset conditions from the original text according to the weighted word vectors to obtain target sentence vectors includes: summarizing the weighted word vectors to generate initial sentence vectors, calculating the average weight of the initial sentence vectors, and screening out the sentence vectors meeting the preset conditions from the initial sentence vectors according to the average weight to obtain target sentence vectors.
The average weight is an average value of weights corresponding to all word vectors in the initial sentence vector, and the preset condition may be that the average weight is greater than a preset weight, for example, if the average weight of the initial sentence vector is 0.75 and the preset weight is 0.7, the initial sentence vector is used as the target sentence vector. Optionally, the preset weight is set to 0.6, and may also be set according to an actual service scenario.
S3, identifying the incidence relation among the target sentence vectors, and constructing a relation network structure of the target sentence vectors according to the incidence relation.
It should be understood that there will be similar sentence vectors in the target sentence vector, i.e. the expression meaning of two sentence vectors is similar meaning, such as sentence vector a is: i consult the conditions of the enterprise loan, and the sentence vector B is as follows: please help me to inquire whether my conditions meet the enterprise loan requirements, the meanings of the sentence vector a and the sentence vector B can be consulting enterprise loan conditions, and thus the sentence vector a and the sentence vector B can be obtained as similar sentence vectors. Therefore, the embodiment of the invention can more intuitively know the sentence vectors with the same meaning in the target sentence vectors by identifying the incidence relation of the target sentence vectors.
In detail, the identifying the association relationship between the target sentence vectors includes: and extracting the feature word vectors in the target sentence vectors, and identifying the node relation among the feature word vectors to obtain the association relation among the corresponding target sentence vectors.
In an alternative embodiment, the feature word vector is used to characterize the meaning of the target sentence vector, and the extraction of the feature word vector may be performed according to the weights of the corresponding words, such as the first three word vectors with the highest weights in a sentence.
In an optional embodiment, the node relationship may be identified by a community relationship identification algorithm (e.g., Louvain algorithm).
Further, according to the association relationship, a relationship network structure of the target sentence vectors is constructed, that is, the target sentence vectors with the association relationship are clustered to visually display the relationship between the sentence vectors and guarantee the problem mining premise of the subsequent original text.
Further, the embodiment of the present invention utilizes a currently known translation model (Trans) to implement the relational network structure of the target sentence vector. Wherein the utilizing translation model (Trans) comprises: multivariate relational data embedding (TransE for short), knowledge embedding into a hyperplane (TransH for short), entity and relationship separate embedding (TransR), embedding through a dynamic mapping matrix (TransD), and adaptive metric function (TransA). It should be noted that the implementation of entity-relationship vector modeling using the Trans is a current mature technology and is not further described here.
S4, calculating the average distance value of each target sentence vector in the relational network structure, and generating the problem mining result of the original text according to the average distance value.
In the embodiment of the present invention, the standard problem in the original text is found by calculating an average distance value of each target sentence vector in the relational network structure, where each sentence vector in the relational network structure has a distance value with the remaining sentence vectors in the relational network structure, so the present invention uses the average distance value of the sentence vector and the distance values of the remaining sentence vectors in the relational network structure as the average distance value of the corresponding sentence vector, for example, there are A, B, C sentence vectors in the relational network structure, the distance value of AB is calculated as a1, the distance value of AC is a2, and the average distance value of the sentence vector a is (a1+ a 2)/2.
Further, the present invention generates the problem mining result of the original text according to the average distance value, that is, the target sentence vector with the smallest average distance value is used as the problem mining result of the original text, for example, when there are A, B, C sentence vectors in the above relational network structure, and the average distance value of the sentence a vector is smallest, the text sentence corresponding to the sentence a vector is used as the problem mining result of the original text.
Further, in order to ensure privacy and security of the problem mining result, the problem mining result can also be stored in a blockchain node.
The embodiment of the invention firstly preprocesses the data in the original text, and can screen out words with higher importance degree in the original text, thereby screening out sentences with higher importance degree in the original text and leading the text characteristics of the text sentences to have more representation significance; secondly, the embodiment of the invention establishes a corresponding relationship network structure according to the incidence relationship by identifying the incidence relationship of the sentence vectors so as to cluster the sentences with the same meaning, thereby determining the mining number of the subsequent problem mining and improving the sentence mining accuracy of the original text; further, the embodiment of the present invention generates the problem mining result of the text according to the average distance value of the sentence vectors in the relational network structure, so as to screen the sentence vector with the minimum average distance value in the relational network structure, and can identify the standard sentence in the original text, thereby improving the accuracy of the standard sentence mining of the original text. Therefore, the problem mining method provided by the invention can improve the accuracy of problem mining.
Fig. 3 is a functional block diagram of the excavation apparatus according to the present invention.
The problem excavation apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the problem mining device may include a calculation module 101, a transformation module 102, a screening module 103, a construction module 104, and a generation module 105. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the calculation module 101 is configured to pre-process data in an original text to obtain an initial word set, and calculate a weight of each word in the initial word set in the original text;
the conversion module 102 is configured to convert each word in the initial word set into a word vector by using a word vector conversion model, and generate a weighted word vector according to the word vector and the weight;
the screening module 103 is configured to screen sentence vectors meeting preset conditions from the original text according to the weighted word vectors to obtain target sentence vectors;
the constructing module 104 is configured to identify an association relationship between the target sentence vectors, and construct a relationship network structure of the target sentence vectors according to the association relationship;
the generating module 105 is configured to calculate an average distance value of each target sentence vector in the relational network structure, and generate a problem mining result of the original text according to the average distance value.
In detail, when the modules in the problem mining device 100 according to the embodiment of the present invention are used, the same technical means as the problem mining method described in fig. 1 and fig. 2 are used, and the same technical effects can be produced, and no further description is given here.
Fig. 4 is a schematic structural diagram of an electronic device for implementing the problem mining method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a problem-mining program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the problem mining program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., performing problem mining, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 4 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The problem-mining program 12 stored in the memory 11 of the electronic device 1 is a combination of computer programs that, when executed in the processor 10, enable:
preprocessing data in an original text to obtain an initial word set, and calculating the weight of each word in the initial word set in the original text;
converting each word in the initial word set into a word vector by using a word vector conversion model, and generating a weighted word vector according to the word vector and the weight;
according to the weighted word vector, sentence vectors meeting preset conditions are screened out from the original text to obtain target sentence vectors;
identifying the incidence relation among the target sentence vectors, and constructing a relation network structure of the target sentence vectors according to the incidence relation;
and calculating the average distance value of each target sentence vector in the relational network structure, and generating the problem mining result of the original text according to the average distance value.
Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
preprocessing data in an original text to obtain an initial word set, and calculating the weight of each word in the initial word set in the original text;
converting each word in the initial word set into a word vector by using a word vector conversion model, and generating a weighted word vector according to the word vector and the weight;
according to the weighted word vector, sentence vectors meeting preset conditions are screened out from the original text to obtain target sentence vectors;
identifying the incidence relation among the target sentence vectors, and constructing a relation network structure of the target sentence vectors according to the incidence relation;
and calculating the average distance value of each target sentence vector in the relational network structure, and generating the problem mining result of the original text according to the average distance value.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A problem mining method, characterized in that the method comprises:
preprocessing data in an original text to obtain an initial word set, and calculating the weight of each word in the initial word set in the original text;
converting each word in the initial word set into a word vector by using a word vector conversion model, and generating a weighted word vector according to the word vector and the weight;
according to the weighted word vector, sentence vectors meeting preset conditions are screened out from the original text to obtain target sentence vectors;
identifying the incidence relation among the target sentence vectors, and constructing a relation network structure of the target sentence vectors according to the incidence relation;
and calculating the average distance value of each target sentence vector in the relational network structure, and generating the problem mining result of the original text according to the average distance value.
2. The problem mining method of claim 1, wherein said calculating a weight of each word in said initial set of words in said original text comprises:
sequentially selecting any two words in the initial word set through traversal operation, and calculating the dependence association degree of the two words in the original text;
calculating the gravity of the two words according to the dependency relevance;
calculating the association strength of the two words according to the dependency association degree and the gravity;
and calculating the weight of any one of the two words according to the association strength.
3. The problem mining method of claim 1, wherein said converting each word in said initial set of words into a word vector using a word vector conversion model comprises:
vector coding is carried out on each word in the initial word set by utilizing a coding layer in the word vector conversion model, and an initial word vector is generated;
and performing sequence extraction on the initial word vector by utilizing a feedforward attention mechanism in the word vector conversion model to obtain a word vector.
4. The problem mining method of claim 3, wherein said vector-coding each word in said initial set of words using a coding layer in said word vector transformation model to generate an initial word vector comprises:
index coding is carried out on each word in the initial word set by utilizing the coding layer to obtain a word coding index;
and constructing a vector matrix for the word coding index to generate an initial word vector.
5. The problem mining method of claim 3, wherein said extracting the sequence of said initial word vectors using a feed-forward attention mechanism in said word vector transformation model to obtain word vectors comprises:
querying the initial word vector using a self-attention module in the feed-forward attention mechanism;
performing feature extraction on the initial word vector by using a convolution module in the feedforward attention mechanism to obtain a feature word vector;
extracting an information sequence of the feature word vector by using an encoder in the feedforward attention mechanism;
and decoding the information sequence by using a decoder in the feedforward attention mechanism to obtain a word vector.
6. The problem mining method of claim 1, wherein the step of screening sentence vectors satisfying a preset condition from the original text according to the weighted word vectors to obtain target sentence vectors comprises:
summarizing the weighted word vectors to generate initial sentence vectors, and calculating the average weight of the initial sentence vectors;
and screening sentence vectors meeting the preset conditions from the initial sentence vectors according to the average weight to obtain target sentence vectors.
7. The problem mining method according to any one of claims 1 to 6, wherein the identifying the association between the target sentence vectors comprises:
extracting a feature word vector in the target sentence vector;
and identifying the node relation among the feature word vectors to obtain the association relation among the corresponding target sentence vectors.
8. A problem-mining device, the device comprising:
the calculation module is used for preprocessing data in an original text to obtain an initial word set and calculating the weight of each word in the initial word set in the original text;
the conversion module is used for converting each word in the initial word set into a word vector by using a word vector conversion model and generating a weighted word vector according to the word vector and the weight;
the screening module is used for screening sentence vectors meeting preset conditions from the original text according to the weighted word vectors to obtain target sentence vectors;
the building module is used for identifying the incidence relation among the target sentence vectors and building a relation network structure of the target sentence vectors according to the incidence relation;
and the generating module is used for calculating the average distance value of each target sentence vector in the relational network structure and generating the problem mining result of the original text according to the average distance value.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the problem-mining method of any one of claims 1 to 7.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the problem mining method of any one of claims 1 to 7.
CN202110581186.4A 2021-05-27 2021-05-27 Problem mining method, device, electronic equipment and storage medium Active CN113157853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110581186.4A CN113157853B (en) 2021-05-27 2021-05-27 Problem mining method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110581186.4A CN113157853B (en) 2021-05-27 2021-05-27 Problem mining method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113157853A true CN113157853A (en) 2021-07-23
CN113157853B CN113157853B (en) 2024-02-06

Family

ID=76877755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110581186.4A Active CN113157853B (en) 2021-05-27 2021-05-27 Problem mining method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113157853B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704411A (en) * 2021-08-31 2021-11-26 平安银行股份有限公司 Word vector-based similar passenger group mining method, device, equipment and storage medium
CN114757154A (en) * 2022-06-13 2022-07-15 深圳市承儒科技有限公司 Job generation method, device and equipment based on deep learning and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960792A (en) * 2017-12-25 2019-07-02 上海智臻智能网络科技股份有限公司 Opinion mining method and storage medium, terminal
CN110502748A (en) * 2019-07-19 2019-11-26 平安科技(深圳)有限公司 Text subject abstracting method, device and computer readable storage medium
US20190377793A1 (en) * 2018-06-07 2019-12-12 Alibaba Group Holding Limited Method and apparatus for establishing a hierarchical intent system
CN111104799A (en) * 2019-10-16 2020-05-05 中国平安人寿保险股份有限公司 Text information representation method and system, computer equipment and storage medium
CN111753498A (en) * 2020-08-10 2020-10-09 腾讯科技(深圳)有限公司 Text processing method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960792A (en) * 2017-12-25 2019-07-02 上海智臻智能网络科技股份有限公司 Opinion mining method and storage medium, terminal
US20190377793A1 (en) * 2018-06-07 2019-12-12 Alibaba Group Holding Limited Method and apparatus for establishing a hierarchical intent system
CN110502748A (en) * 2019-07-19 2019-11-26 平安科技(深圳)有限公司 Text subject abstracting method, device and computer readable storage medium
CN111104799A (en) * 2019-10-16 2020-05-05 中国平安人寿保险股份有限公司 Text information representation method and system, computer equipment and storage medium
CN111753498A (en) * 2020-08-10 2020-10-09 腾讯科技(深圳)有限公司 Text processing method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704411A (en) * 2021-08-31 2021-11-26 平安银行股份有限公司 Word vector-based similar passenger group mining method, device, equipment and storage medium
CN113704411B (en) * 2021-08-31 2023-09-15 平安银行股份有限公司 Word vector-based similar guest group mining method, device, equipment and storage medium
CN114757154A (en) * 2022-06-13 2022-07-15 深圳市承儒科技有限公司 Job generation method, device and equipment based on deep learning and storage medium

Also Published As

Publication number Publication date
CN113157853B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
US11232365B2 (en) Digital assistant platform
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN112541745A (en) User behavior data analysis method and device, electronic equipment and readable storage medium
CN113157853B (en) Problem mining method, device, electronic equipment and storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN114781832A (en) Course recommendation method and device, electronic equipment and storage medium
CN114638547A (en) Enterprise strategy intelligent early warning method and device, electronic equipment and storage medium
CN114840684A (en) Map construction method, device and equipment based on medical entity and storage medium
CN113806492A (en) Record generation method, device and equipment based on semantic recognition and storage medium
CN113658002A (en) Decision tree-based transaction result generation method and device, electronic equipment and medium
CN113505273A (en) Data sorting method, device, equipment and medium based on repeated data screening
CN116578696A (en) Text abstract generation method, device, equipment and storage medium
CN116340516A (en) Entity relation cluster extraction method, device, equipment and storage medium
CN114610854A (en) Intelligent question and answer method, device, equipment and storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN114548114A (en) Text emotion recognition method, device, equipment and storage medium
CN113626605A (en) Information classification method and device, electronic equipment and readable storage medium
CN113723114A (en) Semantic analysis method, device and equipment based on multi-intent recognition and storage medium
CN113706019A (en) Service capability analysis method, device, equipment and medium based on multidimensional data
CN113591881A (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN113657546A (en) Information classification method and device, electronic equipment and readable storage medium
CN113515591A (en) Text bad information identification method and device, electronic equipment and storage medium
CN112328960B (en) Optimization method and device for data operation, electronic equipment and storage medium
CN113688924B (en) Abnormal order detection method, device, equipment and medium
CN116483974A (en) Dialogue reply screening method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant