CN111476669A - Data analysis method and device - Google Patents

Data analysis method and device Download PDF

Info

Publication number
CN111476669A
CN111476669A CN202010222269.XA CN202010222269A CN111476669A CN 111476669 A CN111476669 A CN 111476669A CN 202010222269 A CN202010222269 A CN 202010222269A CN 111476669 A CN111476669 A CN 111476669A
Authority
CN
China
Prior art keywords
vector
data
analysis result
analyzed
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010222269.XA
Other languages
Chinese (zh)
Inventor
杨斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shiwei Network Technology Co ltd
Original Assignee
Hangzhou Shiwei Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shiwei Network Technology Co ltd filed Critical Hangzhou Shiwei Network Technology Co ltd
Priority to CN202010222269.XA priority Critical patent/CN111476669A/en
Publication of CN111476669A publication Critical patent/CN111476669A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data analysis method and a data analysis device, wherein the method comprises the following steps: the method comprises the steps of obtaining data to be analyzed of a user, identifying keywords in the data to be analyzed, obtaining a corresponding target vector, determining a similar vector corresponding to the target vector in a vector space generated in advance, determining at least one analysis result based on the similar vector, and outputting the at least one analysis result. According to the method, the analysis result highly similar to the data to be analyzed is obtained according to the data to be analyzed of the user and is sent to the customer service staff, the customer service staff are reminded of the points concerned by the user, the answers are carried out according to the points concerned by the user, the reply time length of the customer service staff is shortened, and the working efficiency of the customer service staff is improved.

Description

Data analysis method and device
Technical Field
The invention relates to the technical field of insurance, in particular to a data analysis method and device.
Background
With the improvement of living standard, people are more and more conscious of insurance purchase. More and more people are willing to consult insurance knowledge and purchase insurance products over the internet. Insurance products on the market iterate very fast, and users who consult are increasing step by step. However, when the user consults a certain insurance problem or a certain insurance product, the response time of the customer service staff is long, and the working efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a data analysis method and device, and aims to solve the problems that in the prior art, a customer service worker has long response time and low working efficiency.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, a data analysis method is provided, which includes:
acquiring data to be analyzed of a user;
identifying keywords in the data to be analyzed and obtaining corresponding target vectors;
determining a similar vector corresponding to the target vector in a vector space generated in advance;
determining at least one analysis result based on the similarity vector, and outputting the at least one analysis result.
In a second aspect, a data analysis method is provided, which further includes:
receiving at least one analysis result corresponding to the data to be analyzed, wherein the at least one analysis result is determined by a similar vector corresponding to a target vector, and the target vector is determined by a keyword corresponding to the data to be analyzed of the user.
In a third aspect, there is provided a data analysis apparatus, comprising:
the first acquisition module is used for acquiring data to be analyzed of a user;
the first identification module is used for identifying keywords in the data to be analyzed and obtaining corresponding target vectors;
a determining module, configured to determine a similar vector corresponding to the target vector in a vector space generated in advance;
an output module for determining at least one analysis result based on the similarity vector and outputting the at least one analysis result.
In a fourth aspect, there is provided a data analysis apparatus, the apparatus comprising:
the first receiving module is used for receiving at least one analysis result corresponding to the data to be analyzed, the at least one analysis result is determined through a similar vector corresponding to a target vector, and the target vector is determined through a keyword corresponding to the data to be analyzed of a user.
In a fifth aspect, an electronic device is provided, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to the first aspect.
In a sixth aspect, an electronic device is provided, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to the second aspect.
In a seventh aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the method according to the first aspect.
In an eighth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the method according to the second aspect.
In the embodiment of the invention, the target vector of the keyword is obtained by identifying the keyword in the data to be analyzed of the user, the target vector is matched with any vector in the vector space generated in advance to obtain the similar vector corresponding to the target vector, at least one analysis result is determined according to the similar vector and is output to the customer service staff, the information for replying the data to be analyzed can be obtained according to the at least one analysis result, the point concerned by the customer service staff is reminded, the reply duration of the customer service staff is shortened, the working efficiency is improved, and the condition of user loss is reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic block diagram of a data analysis system according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating analysis results obtained according to age-based similarity vectors according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of analysis results based on similarity vectors corresponding to products according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a predictor word presented when consulting a product in accordance with one embodiment of the present invention;
FIG. 5 is a schematic flow diagram of a data analysis method according to an embodiment of the invention;
FIG. 6 is a schematic flow diagram of a data analysis method according to yet another embodiment of the invention;
FIG. 7 is a schematic structural diagram of a data analysis device according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a data analysis device according to another embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic structural diagram of a data analysis system according to an embodiment of the present invention, where the system includes a user side, a server side and a customer service side, where the system may be applied in the customer service field, a user at the user side initiates a query, after receiving the query, the server side sends the query to the customer service side on one hand, and on the other hand, the server side analyzes the query, determines a keyword in the query, matches a reply (analysis result) corresponding to the query according to the keyword, and sends the reply to the customer service side, so as to provide a reply reference for a customer service staff at the client side. The method and the device can provide at least one analysis result for the customer service staff as the reply reference data, and can improve the reply speed and the reply quality of the customer service staff.
Specifically, as shown in fig. 1, the method includes:
step S102, obtaining data to be analyzed of a user.
It should be understood that the data to be analyzed may be understood as a query made by a user at the user side, and the data to be analyzed may be acquired from the user side.
For example, the data to be analyzed of the user is obtained as "can you introduce me peaceful". Or the data to be analyzed of the user is acquired as' what the jin you think the product. Or the data to be analyzed of the user is acquired as "what kind of insurance is suitable for a person aged 35", or the like.
And step S104, identifying keywords in the data to be analyzed, and obtaining corresponding target vectors.
In step S104, the BERT semantic model may be utilized to identify the data to be analyzed, so as to obtain the keywords in the data to be analyzed, and the labeling model in the BERT semantic model is utilized to label the keywords, so as to determine the keywords in the data to be analyzed. After determining the keywords, determining a target vector corresponding to the keywords according to a process of training word vectors by a BERT semantic model.
For example, when the data to be analyzed is "how you can introduce you with peaceful", the keyword of "how you can introduce you with peaceful" is identified as "peaceful", and "peaceful" is marked, and then the keyword in the data to be analyzed can be conveniently determined according to the mark, and then the target vector of "peaceful" is obtained.
Or, when the data to be analyzed is "what the product of jin you chai sheng", the keyword for identifying "what the product of jin you sheng" is "jin you life", and "jin you life" is marked, then the keyword in the data to be analyzed can be conveniently determined according to the mark, and then the target vector corresponding to "jin you life" is obtained.
Or, when the data to be analyzed is "what type of insurance the person aged 35 is suitable for", identifying that the keyword of "what type of insurance the person aged 35 is suitable for" is "35 years", marking the "35 years", then conveniently determining the keyword in the data to be analyzed according to the mark, and then obtaining the target vector of "35 years".
Step S106, determining a similar vector corresponding to the target vector in a vector space generated in advance.
Before analyzing the data to be analyzed, a large number of articles in related fields and chat records between customer service and users can be used for identifying keywords in the data, and vectors corresponding to the keywords are obtained. Combining vectors of the plurality of keywords, a vector space is generated. Specifically, a plurality of vector spaces may be determined according to the category corresponding to the keyword, or all keywords may be determined as one vector space. After the vector space is generated and the target vector is determined. One or more similarity vectors are determined in a vector space based on the target vector. Wherein the similarity between the vectors can be determined according to the angle between the target vector and the vector in the vector space. Specifically, the similarity between the target vector and any vector in the vector space is calculated, and if the similarity between the target vector and any vector in the vector space satisfies a threshold, the vector is determined to be a similar vector.
For a plurality of vector spaces determined according to the category, the target vectors may be classified first, and then the similar vectors may be determined from the vector space corresponding to the category according to the category of the target vectors. Specifically, determining a similar vector corresponding to the target vector in a pre-generated vector space may include:
classifying the target vector;
determining a target vector space in the vector space according to the category of the target vector;
and determining a similar vector corresponding to the target vector space.
It is understood that a plurality of vector spaces represent a plurality of classes of vectors, i.e. one vector space represents one class of vectors.
In some embodiments, the category of the target vector is obtained by classifying the target vector, and a target vector space is obtained according to the category of the target vector, where the target vector space is one of a plurality of vector spaces, and the target vector is matched with any vector in the target vector space to obtain a similar vector corresponding to the target vector. By classifying the target vectors, the range of a vector space can be reduced, some interference vectors are reduced, and the working efficiency is improved.
For example, the target vector representing "jin you life" is classified into a product name class to obtain a target vector space of the product name class, and the target vector representing "jin you life" is matched with any vector in the target vector space of the product name class to obtain a similar vector corresponding to the target vector representing "jin you life".
Or classifying the target vector representing the 'severe danger' into a disease class to obtain a target vector space of the disease class, and matching the target vector representing the 'severe danger' with any vector in the target vector space of the disease class to obtain a similar vector corresponding to the target vector representing the 'severe danger'.
Optionally, in some embodiments, the categories of the target vector include: product name class, disease class, and status class.
Of course, it should be understood that the product name category includes jin you life, ping' an life, jin Fu life and jia and bao, the disease category includes serious disease, accident, medical risk and cancer prevention, and the status category refers to the form of the person or thing, such as zhang san three 20 years old.
In some embodiments, the classification of the target vector can accurately determine which of the product name class, the disease class, the state class and the like the class of the target vector belongs to, so that the practical value is embodied, and the method has the characteristics of accuracy and landing.
Optionally, in some embodiments, the generating of the pre-generated vector space may comprise:
acquiring training data;
identifying keywords in the training data;
and obtaining the vector space based on the keywords and the reference data in the training data, wherein the reference data comprises key fields and vectors corresponding to the key fields.
It should be appreciated that the training data may be queries made by the user and the baseline data may be chat records of the user and customer service and articles in the relevant field.
In some embodiments, in order to improve training efficiency, the obtained training data is pre-trained in a randomly mask 20% of tokens mode in the BERT semantic model to obtain keywords in the training data, then the keywords in the training data are trained by the BERT semantic model, and then parameters of the BERT semantic model are modified and calibrated by using reference data to obtain a vector space.
For vector spaces of multiple classes, training data may be classified first, and training data and reference data belonging to one class may be trained to obtain a vector space corresponding to the class.
Optionally, in some embodiments, the obtaining the vector space based on the keywords in the training data and the reference data may include:
combining keywords with similarity meeting preset requirements in the training data;
determining word vectors corresponding to the combined keywords, and calibrating the word vectors according to the reference data to obtain the vector space.
It should be understood that the preset requirements are set according to actual conditions or human beings. In order to improve the accuracy, the recall rate and the comprehensive evaluation index of a vector space, keywords with similarity meeting preset requirements in training data are combined to obtain combined keywords, corresponding word vectors are obtained based on the combined keywords, the word vectors are calibrated through reference data in combination with a context until errors between the reference data and the word vectors are within an error allowable range, and the vector space is obtained. And determining similarity among the keywords, and training by taking the similar keywords as one keyword. For example, for a lifetime and a peaceful lifetime, two keywords can be used as one keyword to train to obtain one vector. For the vector, a plurality of keywords may be corresponded to obtain a plurality of results. The analysis result (the corresponding answer of the inquiry) corresponding to the data to be analyzed can be more comprehensively determined.
Step S108, determining at least one analysis result based on the similarity vector, and outputting the at least one analysis result.
In step S108, a plurality of analysis results are determined, and the plurality of analysis results are sorted, and then the plurality of analysis results are output to the customer service end, and the customer service staff at the customer service end can quickly reply the reply corresponding to the data to be analyzed according to the plurality of analysis results. The recovery speed of the customer service staff can be improved.
For example, as shown in fig. 2, if the target vector representing "age 35" is used to obtain the similar vector representing "age 21 to age 35", and an analysis result is obtained based on the similar vector representing "age 21 to age 35", the analysis result is information that the age range of age 21 to age 35 is suitable for purchasing related insurance, and the analysis result is sent to the customer service end, and the customer service staff at the customer service end determines a point of user attention according to an analysis result, and can quickly reply a reply corresponding to the data to be analyzed.
Or, as shown in fig. 3, if the similar vector represents "jin you chai", two analysis results are obtained, and the two analysis results are sorted, and then the two analysis results are sent to the customer service staff at the customer service end for the customer service staff to refer to, and the reply corresponding to the data to be analyzed can be quickly replied.
Optionally, in some embodiments, in order to improve the user satisfaction, the answer of the customer service person with higher satisfaction is used as a priority push, the priority push is performed according to a confidence score made according to the reference data, and the confidence score is determined according to the reasonability, accuracy and success rate of the customer service to answer the user question.
In particular, determining at least one analysis result based on the similarity vector may include:
obtaining at least one predicted word corresponding to the similar vector;
and determining an analysis result corresponding to each predicted word.
In some embodiments, each similar vector corresponds to at least one predicted word, that is, according to the similar vectors, a plurality of predicted words can be determined, so that analysis results can be provided for customer service staff more comprehensively. The recovery speed of the customer service staff is improved.
For example, the similarity vector corresponds to two predicted words, the two predicted words are respectively 'safety happy' and 'safety happy', each predicted word has a corresponding analysis result, and more comprehensive prompt or information for replying to-be-analyzed data can be given or recommended based on the two analysis results.
Alternatively, as shown in fig. 4, when the data to be analyzed is "what the product of jin you cherish", the keyword is determined to be "jin you life", and then a corresponding similar vector is obtained based on the keyword, the similar vector corresponds to two predicted words, the two predicted words are "jin you life" and "heavy disease", respectively, and each predicted word has a corresponding analysis result.
Optionally, in some embodiments, as shown in fig. 1, the method further comprises:
step S110, obtaining feedback information corresponding to the at least one analysis result.
And step S112, sending the feedback information to the user.
In some embodiments, the server side obtains the feedback information obtained by the customer service side according to the at least one analysis result, and sends the feedback information to the user at the user side, so that the user can quickly obtain a satisfactory reply, and the purchase intention of the user is improved.
In the embodiment of the invention, the target vector of the keyword is obtained by identifying the keyword in the data to be analyzed of the user, the target vector is matched with any vector in the vector space generated in advance to obtain the similar vector corresponding to the target vector, at least one analysis result is determined according to the similar vector and is output to the customer service staff, the information for replying the data to be analyzed can be obtained according to the at least one analysis result, the point concerned by the customer service staff is reminded, the reply duration of the customer service staff is shortened, the working efficiency is improved, and the condition of user loss is reduced.
Fig. 5 is a flow chart of a data analysis method according to an embodiment of the present invention, where the method shown in fig. 5 may be executed by a customer service end, and the method shown in fig. 5 includes:
step S502, receiving at least one analysis result corresponding to the data to be analyzed, wherein the at least one analysis result is determined by a similar vector corresponding to a target vector, and the target vector is determined by a keyword corresponding to the data to be analyzed of the user.
In some embodiments, the customer service end receives at least one analysis result sent by the service end, and performs subsequent processing on the at least one analysis result. The at least one analysis result is obtained according to the similar vector, the similar vector is obtained according to the target vector, and the target vector is obtained according to the keywords in the data to be analyzed of the user.
Optionally, in other embodiments, the method shown in fig. 5 further includes:
displaying the at least one analysis result;
receiving an operation instruction of the at least one analysis result to obtain feedback information;
and sending the feedback information.
In some embodiments, the at least one analysis result is displayed to the customer service staff, so that the customer service staff can directly obtain the at least one analysis result, receive an operation instruction of the customer service staff on the at least one analysis result, obtain feedback information, and send the feedback information to the user through the server.
For example, three analysis results are displayed in sequence according to the satisfaction degree, an operation instruction for a customer service worker to click and copy a first analysis result is received, feedback information is obtained, the feedback information is the first analysis result, the feedback information is sent to the user, the customer service worker can quickly reply to the user, and the work efficiency of the customer service worker is improved.
Fig. 6 is a schematic flow chart of a data analysis method according to still another embodiment of the present invention, and the method shown in fig. 6 includes:
step S602, obtaining data to be analyzed of the user, that is, obtaining a user question.
Step S604, identifying the user problem by adopting a BERT semantic model to obtain a keyword, and marking the user problem by adopting a character-level marking mode (BIO) in the BERT semantic model, wherein the step S comprises the following steps: you can introduce you with safety, which is the keyword of the sentence and mark the sentence as [ O O O O O O O O B-IPN I-IPNI-IPN O ], and get the corresponding target vector based on safety, such as: the target vector for the B-IPN I-IPN I-IPN is 123.
Step S606, classifying the target vectors to obtain a target vector space, for example, the target vector space may be a product name class, a status class, a disease class, or the like.
Step S608, calculating a similarity between the target vector and any vector in the target vectors by using a similarity calculation method, determining a similar vector corresponding to the target vector, indicating that the target vector is similar to the similar vector in semantics, and the label is 1, otherwise, the label is 0.
Step S610, determining at least one predicted word corresponding to the similar vector, obtaining an analysis result based on the predicted word, obtaining an answer of a question based on the analysis result, reminding customer service staff of a point concerned by the user, and achieving accurate service to prevent the user from losing.
Fig. 7 is a schematic structural diagram of a data analysis apparatus according to an embodiment of the present invention, and as shown in fig. 7, the apparatus 70 includes:
a first obtaining module 71, configured to obtain data to be analyzed of a user;
a first identification module 72, configured to identify a keyword in the data to be analyzed, and obtain a corresponding target vector;
a determining module 73, configured to determine a similar vector corresponding to the target vector in a vector space generated in advance;
an output module 74, configured to determine at least one analysis result based on the similarity vector, and output the at least one analysis result.
In the embodiment of the invention, the target vector of the keyword is obtained by identifying the keyword in the data to be analyzed of the user, the target vector is matched with any vector in the vector space generated in advance to obtain the similar vector corresponding to the target vector, at least one analysis result is determined according to the similar vector and is output to the customer service staff, the information for replying the data to be analyzed can be obtained according to the at least one analysis result, the point concerned by the customer service staff is reminded, the reply duration of the customer service staff is shortened, the working efficiency is improved, and the condition of user loss is reduced.
Optionally, as an embodiment, the apparatus 70 further includes:
the second acquisition module is used for acquiring feedback information corresponding to the at least one analysis result;
and the feedback module is used for sending the feedback information to the user.
Optionally, as an embodiment, the determining module 73 includes:
a classification submodule for classifying the target vectors;
the category determination submodule is used for determining a target vector space in the vector space according to the category of the target vector;
and the vector determining submodule is used for determining the similar vector corresponding to the target vector space.
Optionally, as an embodiment, the categories of the target vector include: product name class, disease class, and status class.
Optionally, as an embodiment, the output module 74 includes:
the word obtaining submodule is used for obtaining at least one predicted word corresponding to the similar vector;
and the result determining submodule is used for determining an analysis result corresponding to each predicted word.
Optionally, as an embodiment, the apparatus 70 further includes:
the third acquisition module is used for acquiring training data;
the second identification module is used for identifying the keywords in the training data;
and the obtaining module is used for obtaining the vector space based on the keywords in the training data and the reference data, wherein the reference data comprises key fields and vectors corresponding to the key fields.
Optionally, as an embodiment, the obtaining module includes:
the combination sub-module is used for combining the keywords with the similarity meeting the preset requirements in the training data;
and the calibration submodule is used for determining word vectors corresponding to the combined keywords and calibrating the word vectors according to the reference data to obtain the vector space.
Fig. 8 is a schematic structural diagram of a data analysis apparatus according to another embodiment of the present invention, and as shown in fig. 8, the apparatus 80 further includes:
the first receiving module 81 is configured to receive at least one analysis result corresponding to data to be analyzed, where the at least one analysis result is determined by a similar vector corresponding to a target vector, and the target vector is determined by a keyword corresponding to the data to be analyzed of a user.
Optionally, as an embodiment, the apparatus 80 further includes:
a display module for displaying the at least one analysis result;
the second receiving module is used for receiving an operation instruction of the at least one analysis result to obtain feedback information;
and the sending module is used for sending the feedback information.
An electronic device according to an embodiment of the present application will be described in detail below with reference to fig. 9. Referring to fig. 9, at a hardware level, the electronic device includes a processor, optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended EISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the data analysis device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
acquiring data to be analyzed of a user;
identifying keywords in the data to be analyzed and obtaining corresponding target vectors;
determining a similar vector corresponding to the target vector in a vector space generated in advance;
determining at least one analysis result based on the similarity vector, and outputting the at least one analysis result.
In the embodiment of the invention, the target vector of the keyword is obtained by identifying the keyword in the data to be analyzed of the user, the target vector is matched with any vector in the vector space generated in advance to obtain the similar vector corresponding to the target vector, at least one analysis result is determined according to the similar vector and is output to the customer service staff, the information for replying the data to be analyzed can be obtained according to the at least one analysis result, the point concerned by the customer service staff is reminded, the reply duration of the customer service staff is shortened, the working efficiency is improved, and the condition of user loss is reduced.
The method performed by the data analysis apparatus according to the embodiment shown in fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
The embodiment of the invention provides a computer-readable storage medium, which is used for acquiring data to be analyzed of a user; identifying keywords in the data to be analyzed and obtaining corresponding target vectors; determining a similar vector corresponding to the target vector in a vector space generated in advance; determining at least one analysis result based on the similarity vector, and outputting the at least one analysis result.
In the embodiment of the invention, the target vector of the keyword is obtained by identifying the keyword in the data to be analyzed of the user, the target vector is matched with any vector in the vector space generated in advance to obtain the similar vector corresponding to the target vector, at least one analysis result is determined according to the similar vector and is output to the customer service staff, the information for replying the data to be analyzed can be obtained according to the at least one analysis result, the point concerned by the customer service staff is reminded, the reply duration of the customer service staff is shortened, the working efficiency is improved, and the condition of user loss is reduced.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transient media) such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (10)

1. A method of data analysis, the method comprising:
acquiring data to be analyzed of a user;
identifying keywords in the data to be analyzed and obtaining corresponding target vectors;
determining a similar vector corresponding to the target vector in a vector space generated in advance;
determining at least one analysis result based on the similarity vector, and outputting the at least one analysis result.
2. The method of claim 1, wherein the method further comprises:
obtaining feedback information corresponding to the at least one analysis result;
and sending the feedback information to the user.
3. The method of claim 1 or 2, wherein the determining a similar vector in a pre-generated vector space corresponding to the target vector comprises:
classifying the target vector;
determining a target vector space in the vector space according to the category of the target vector;
and determining a similar vector corresponding to the target vector space.
4. The method of claim 3, wherein the categories of the target vectors comprise: product name class, disease class, and status class.
5. The method of claim 1 or 2, wherein said determining at least one analysis result based on said similarity vector comprises:
obtaining at least one predicted word corresponding to the similar vector;
and determining an analysis result corresponding to each predicted word.
6. The method of claim 1 or 2, wherein the method further comprises:
acquiring training data;
identifying keywords in the training data;
and obtaining the vector space based on the keywords and the reference data in the training data, wherein the reference data comprises key fields and vectors corresponding to the key fields.
7. The method of claim 6, wherein the deriving the vector space based on the keywords and the reference data in the training data comprises:
combining keywords with similarity meeting preset requirements in the training data;
determining word vectors corresponding to the combined keywords, and calibrating the word vectors according to the reference data to obtain the vector space.
8. A method of data analysis, the method further comprising:
receiving at least one analysis result corresponding to the data to be analyzed, wherein the at least one analysis result is determined by a similar vector corresponding to a target vector, and the target vector is determined by a keyword corresponding to the data to be analyzed of the user.
9. The method of claim 8, wherein the method further comprises:
displaying the at least one analysis result;
receiving an operation instruction of the at least one analysis result to obtain feedback information;
and sending the feedback information.
10. A data analysis apparatus, the apparatus comprising:
the first acquisition module is used for acquiring data to be analyzed of a user;
the first identification module is used for identifying keywords in the data to be analyzed and obtaining corresponding target vectors;
a determining module, configured to determine a similar vector corresponding to the target vector in a vector space generated in advance;
an output module for determining at least one analysis result based on the similarity vector and outputting the at least one analysis result.
CN202010222269.XA 2020-03-26 2020-03-26 Data analysis method and device Pending CN111476669A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010222269.XA CN111476669A (en) 2020-03-26 2020-03-26 Data analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010222269.XA CN111476669A (en) 2020-03-26 2020-03-26 Data analysis method and device

Publications (1)

Publication Number Publication Date
CN111476669A true CN111476669A (en) 2020-07-31

Family

ID=71748450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010222269.XA Pending CN111476669A (en) 2020-03-26 2020-03-26 Data analysis method and device

Country Status (1)

Country Link
CN (1) CN111476669A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193811A (en) * 2016-03-09 2017-09-22 阿里巴巴集团控股有限公司 Information processing method and device
CN107301213A (en) * 2017-06-09 2017-10-27 腾讯科技(深圳)有限公司 Intelligent answer method and device
CN108804529A (en) * 2018-05-02 2018-11-13 深圳智能思创科技有限公司 A kind of question answering system implementation method based on Web
CN109145099A (en) * 2018-08-17 2019-01-04 百度在线网络技术(北京)有限公司 Answering method and device based on artificial intelligence
CN109284377A (en) * 2018-09-13 2019-01-29 云南电网有限责任公司 A kind of file classification method and device based on vector space
CN109446302A (en) * 2018-09-25 2019-03-08 中国平安人寿保险股份有限公司 Question and answer data processing method, device and computer equipment based on machine learning
CN109947909A (en) * 2018-06-19 2019-06-28 平安科技(深圳)有限公司 Intelligent customer service answer method, equipment, storage medium and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193811A (en) * 2016-03-09 2017-09-22 阿里巴巴集团控股有限公司 Information processing method and device
CN107301213A (en) * 2017-06-09 2017-10-27 腾讯科技(深圳)有限公司 Intelligent answer method and device
CN108804529A (en) * 2018-05-02 2018-11-13 深圳智能思创科技有限公司 A kind of question answering system implementation method based on Web
CN109947909A (en) * 2018-06-19 2019-06-28 平安科技(深圳)有限公司 Intelligent customer service answer method, equipment, storage medium and device
CN109145099A (en) * 2018-08-17 2019-01-04 百度在线网络技术(北京)有限公司 Answering method and device based on artificial intelligence
CN109284377A (en) * 2018-09-13 2019-01-29 云南电网有限责任公司 A kind of file classification method and device based on vector space
CN109446302A (en) * 2018-09-25 2019-03-08 中国平安人寿保险股份有限公司 Question and answer data processing method, device and computer equipment based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高扬: "《智能摘要与深度学习》", 北京理工大学出版社, pages: 46 - 54 *

Similar Documents

Publication Publication Date Title
US10878181B2 (en) Removing personal information from text using a neural network
CN108763952B (en) Data classification method and device and electronic equipment
CN112199506B (en) Information detection method, device and equipment for application program
CN112214418B (en) Application compliance detection method and device and electronic equipment
US20180268491A1 (en) Cognitive regulatory compliance automation of blockchain transactions
CN110083623B (en) Business rule generation method and device
WO2021098327A1 (en) Private data protection-based method and device for abnormal collection behavior recognition
CN111339751A (en) Text keyword processing method, device and equipment
US20150269142A1 (en) System and method for automatically generating a dataset for a system that recognizes questions posed in natural language and answers with predefined answers
CN112860841A (en) Text emotion analysis method, device and equipment and storage medium
CN111160012A (en) Medical term recognition method and device and electronic equipment
CN113420229B (en) Social media information pushing method and system based on big data
CN112511546A (en) Vulnerability scanning method, device, equipment and storage medium based on log analysis
WO2020135247A1 (en) Legal document parsing method and device
CN112182391A (en) User portrait drawing method and device
CN113283675A (en) Index data analysis method, device, equipment and storage medium
CN111782946A (en) Book friend recommendation method, calculation device and computer storage medium
CN113641833B (en) Service demand matching method and device
US20140365410A1 (en) Apparatus and method for building and using inference engines based on representations of data that preserve relationships between objects
CN110532374B (en) Insurance information processing method and device
CN113535817A (en) Method and device for generating characteristic broad table and training business processing model
CN111949697B (en) Data processing method, device, terminal and medium based on search engine
CN109993190B (en) Ontology matching method and device and computer storage medium
CN111476669A (en) Data analysis method and device
CN114491134B (en) Trademark registration success rate analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination