CN113761182A - Method and device for determining service problem - Google Patents

Method and device for determining service problem Download PDF

Info

Publication number
CN113761182A
CN113761182A CN202010553996.4A CN202010553996A CN113761182A CN 113761182 A CN113761182 A CN 113761182A CN 202010553996 A CN202010553996 A CN 202010553996A CN 113761182 A CN113761182 A CN 113761182A
Authority
CN
China
Prior art keywords
sentence
category
vector
vectors
sentence vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010553996.4A
Other languages
Chinese (zh)
Inventor
刘沛文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010553996.4A priority Critical patent/CN113761182A/en
Publication of CN113761182A publication Critical patent/CN113761182A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a method and a device for determining a business problem, and relates to the technical field of computers. One embodiment of the method comprises: performing semantic analysis on the obtained audit opinion data to obtain a sentence vector of the audit opinion data and storing the sentence vector; inquiring the sentence vectors meeting the conditions from the sentence vectors of the stored audit opinion data, and clustering the inquired sentence vectors; and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering. The method can automatically find out the potential problem points, effectively reduce the cognitive deviation caused by subjective reasons when the business problem is artificially analyzed, directly hit the business pain points, avoid resource waste in research and development, reduce the number of times of information transmission between people, reduce information loss, improve the accuracy of positioning requirements, find out the potential problem before the problem really occurs so as to optimize the system, and avoid the business problem from showing the user and influencing the user experience.

Description

Method and device for determining service problem
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a business problem.
Background
Operation team and operation management system all can not leave behind every internet product, operation team often can contact various problem feedbacks as the first line of contact customer, current business problem from producing to finally falling to the ground and realizing the overall process that promotes the product iteration, wherein passed through two important nodes, operation team product manager's summary is extracted, technical team product manager's understanding conversion, rely on each product manager's thinking and some simple data statistics to accomplish usually.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
the business problem is determined by depending on the analysis and summary of the product manager of the operation team, the understanding and summary capacity of the product manager of the operation team on the business is very checked, the proposed demand is likely to consume a large amount of research and development resources, and the problem of business pain points cannot be really solved finally, so that the effect is very small;
in the existing process, because the thinking ways of each person are different, one-time message transmission may mean that a part of information is lost, and a product requirement document obtained by a technician is probably greatly different from the initial requirement;
the business problem can only be determined after it has occurred, which affects the user experience.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining a service problem, which can automatically find out potential problem points, effectively reduce cognitive deviation caused by subjective reasons when a service problem is manually analyzed, directly hit the service pain points, avoid resource waste in research and development, and reduce the number of times of information transmission between people, thereby reducing information loss, improving accuracy of positioning requirements, finding a potential problem before the problem really occurs to optimize a system, and avoiding influence on user experience due to service problem presentation to a user.
To achieve the above object, according to an aspect of an embodiment of the present invention, a method for determining a service problem is provided.
A method of determining business problems, comprising: performing semantic analysis on the obtained audit opinion data to obtain a sentence vector of the audit opinion data and storing the sentence vector; inquiring the sentence vectors meeting the conditions from the sentence vectors of the stored audit opinion data, and clustering the inquired sentence vectors; and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering.
Optionally, a timing task is used for regularly inquiring and acquiring newly added audit opinion data in a latest preset time period, performing semantic analysis on each audit opinion data to obtain a sentence vector of the newly added audit opinion data, and storing the sentence vector into a database, wherein the database is used for storing the sentence vectors of the stored audit opinion data.
Optionally, semantically analyzing the opinion data by: performing word segmentation on the audit opinion data to obtain a word segmentation array; and calculating a word vector for each word in the word segmentation array by using a word vector model, and performing weighted average on each obtained word vector according to weights corresponding to the parts of speech to obtain a sentence vector of the audit opinion data.
Optionally, the determining a service problem to be solved according to each category of sentence vector obtained by clustering includes: and determining the service problem indicated by the category with the largest sentence vector quantity in the sentence vectors of all categories obtained by clustering as the service problem to be solved.
Optionally, the determining a service problem to be solved according to each category of sentence vector obtained by clustering includes: respectively calculating the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category of the sentence vector; determining the convergence degree of the sentence vectors of each category according to the distance and the number of the sentence vectors of each category; and determining the service problem indicated by the category with the convergence degree meeting the preset condition as the service problem to be solved.
Optionally, the determining the convergence degree of each category of sentence vectors according to the distance and the number of each category of sentence vectors includes: sorting the sentence vectors of each category according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sentence vector sequence of each category; and for each category sentence vector sequence, calculating the ratio of the number of the sentence vectors of which the distance is smaller than the distance threshold in the category sentence vector sequence to the total number of the category sentence vectors to obtain the convergence of the category sentence vectors.
Optionally, the distance threshold is determined by: building a plurality of sentence sets, wherein each sentence set comprises a plurality of sentences with similar semantemes; respectively converting the sentences in the sentence sets into sentence vectors, and calculating the average distance of the sentence vectors of each sentence set, wherein the average distance of the sentence vectors of the sentence sets is the average value of the distance between every two sentence vectors in the sentence sets; and calculating the average value of the sentence vector average distance of each sentence set to obtain the distance threshold value.
According to another aspect of the embodiments of the present invention, an apparatus for determining a service problem is provided.
An apparatus for determining business problems, comprising: the semantic analysis module is used for performing semantic analysis on the obtained audit opinion data to obtain and store a sentence vector of the audit opinion data; the clustering module is used for querying the sentence vectors meeting the conditions from the stored sentence vectors of the audit opinion data and clustering the queried sentence vectors; and the service problem determination module is used for determining the service problem to be solved according to the sentence vectors of each category obtained by clustering.
Optionally, the semantic analysis module is further configured to: and regularly inquiring and acquiring newly added audit opinion data in a latest preset time period through a timing task, performing semantic analysis on each audit opinion data to obtain a sentence vector of the newly added audit opinion data, and storing the sentence vector into a database, wherein the database is used for storing the stored sentence vector of the audit opinion data.
Optionally, the semantic analysis module performs semantic analysis on the review opinion data by: performing word segmentation on the audit opinion data to obtain a word segmentation array; and calculating a word vector for each word in the word segmentation array by using a word vector model, and performing weighted average on each obtained word vector according to weights corresponding to the parts of speech to obtain a sentence vector of the audit opinion data.
Optionally, the service problem determination module is further configured to: and determining the service problem indicated by the category with the largest sentence vector quantity in the sentence vectors of all categories obtained by clustering as the service problem to be solved.
Optionally, the service problem determination module is further configured to: respectively calculating the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category of the sentence vector; determining the convergence degree of the sentence vectors of each category according to the distance and the number of the sentence vectors of each category; and determining the service problem indicated by the category with the convergence degree meeting the preset condition as the service problem to be solved.
Optionally, the service problem determination module includes a convergence determination sub-module, configured to: sorting the sentence vectors of each category according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sentence vector sequence of each category; and for each category sentence vector sequence, calculating the ratio of the number of the sentence vectors of which the distance is smaller than the distance threshold in the category sentence vector sequence to the total number of the category sentence vectors to obtain the convergence of the category sentence vectors.
Optionally, a distance threshold determining module is further included, configured to determine the distance threshold by: building a plurality of sentence sets, wherein each sentence set comprises a plurality of sentences with similar semantemes; respectively converting the sentences in the sentence sets into sentence vectors, and calculating the average distance of the sentence vectors of each sentence set, wherein the average distance of the sentence vectors of the sentence sets is the average value of the distance between every two sentence vectors in the sentence sets; and calculating the average value of the sentence vector average distance of each sentence set to obtain the distance threshold value.
According to yet another aspect of an embodiment of the present invention, an electronic device is provided.
An electronic device, comprising: one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of determining business problems provided by embodiments of the present invention.
According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of determining a business problem provided by an embodiment of the invention.
One embodiment of the above invention has the following advantages or benefits: performing semantic analysis on the obtained audit opinion data to obtain a sentence vector of the audit opinion data and storing the sentence vector; inquiring the sentence vectors meeting the conditions from the sentence vectors of the stored audit opinion data, and clustering the inquired sentence vectors; and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering. The method has the advantages that potential problem points can be automatically found out, cognitive deviation caused by subjective reasons when business problems are analyzed manually can be effectively reduced, business pain points are directly hit, resource waste in research and development is avoided, information transmission times between people is reduced, information loss is reduced, accuracy of positioning requirements is improved, potential problems can be found before the problems really occur so as to optimize a system, and the phenomenon that business problems are displayed for users to influence user experience is avoided.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method of determining a business problem according to one embodiment of the present invention;
FIG. 2 is a flow diagram of determining a business problem according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of the main blocks of an apparatus for determining business problems according to one embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 5 is a schematic block diagram of a computer system suitable for use with a server implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of the main steps of a method of determining a traffic problem according to one embodiment of the present invention.
As shown in fig. 1, the method for determining a business problem according to an embodiment of the present invention mainly includes the following steps S101 to S103.
Step S101: and performing semantic analysis on the obtained audit opinion data to obtain a sentence vector of the audit opinion data and storing the sentence vector.
Step S102: and inquiring the sentence vectors meeting the conditions from the stored sentence vectors of the audit opinion data, and clustering the inquired sentence vectors.
Step S103: and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering.
Various auditing functions are available in an operation system, important information is inevitably available in each auditing record and is called 'auditing opinions', the auditing opinions can reflect non-normative positions in client operation and can also indirectly reflect unreasonable points, misleading and problem points of flow design in application, and the auditing opinion data of the embodiment of the invention is the 'auditing opinions', for example: auditing real-name authentication information of a user, auditing a large-amount transaction list generated by the user, submitting auditing comments of enterprise qualification during user registration, auditing for opening a certain authority applied by the user, auditing for invoicing applied by the user and the like. The audit opinion data of the embodiment of the invention is not limited to the above listed audit opinions, and can be audit opinions under various scenes needing audit, which are generated by an operation team in daily operation audit work, but not provided from a user side.
In one embodiment, newly added audit opinion data in a latest preset time period can be regularly inquired and acquired through a timing task, semantic analysis is performed on each audit opinion data to obtain a sentence vector of the newly added audit opinion data, and the sentence vector is stored in a database, wherein the database is used for storing the stored sentence vector of the audit opinion data. The sentence vector of the stored review comment data is a sentence vector of the review comment data stored in the index database.
A piece of audit opinion data may be semantically analyzed as follows: performing word segmentation on the audit opinion data to obtain a word segmentation array; and calculating a word vector for each word in the word segmentation array by using a word vector model, and performing weighted average on each obtained word vector according to weights corresponding to the parts of speech to obtain a sentence vector of the audit opinion data.
The word segmentation operation can be realized by various word segmentation algorithms, and the word vector model includes but is not limited to a word2vec model. Each word obtained by word segmentation has a corresponding part-of-speech, each part-of-speech has a corresponding weight, the weight is related to the part-of-speech, nouns and verbs can represent the main semantics of the sentence, so that the nouns and verbs have higher weights, and the weights of the other parts-of-speech are lower, and the weights of the parts-of-speech can be specifically set according to the weight setting principle as required.
In one embodiment, determining a service problem to be solved according to each category sentence vector obtained by clustering includes: and determining the service problem indicated by the category with the largest sentence vector quantity in the sentence vectors of all categories obtained by clustering as the service problem to be solved.
In another embodiment, determining a service problem to be solved according to each category sentence vector obtained by clustering includes: respectively calculating the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category of the sentence vector; determining the convergence degree of each category of sentence vectors according to the distance (namely the distance between each sentence vector in each category of sentence vectors and the central point vector of the category of the sentence vector) and the number of each category of sentence vectors; and determining the service problem indicated by the category with the convergence degree meeting the preset condition as the service problem to be solved.
Determining the convergence degree of each category of sentence vectors according to the distance between each sentence vector in each category of sentence vectors and the central point vector of the category where the sentence vector is located and the number of each category of sentence vectors, which may specifically include: sorting the sentence vectors of each category according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sentence vector sequence of each category; and for each category sentence vector sequence, calculating the ratio of the number of the sentence vectors of which the distance is smaller than the distance threshold in the category sentence vector sequence to the total number of the category sentence vectors to obtain the convergence of the category sentence vectors.
Wherein the distance threshold may be determined by: building a plurality of sentence sets, wherein each sentence set comprises a plurality of sentences with similar semantemes; respectively converting the sentences in the sentence sets into sentence vectors, and calculating the average distance of the sentence vectors of each sentence set, wherein the average distance of the sentence vectors of the sentence sets is the average value of the distance between every two sentence vectors in the sentence sets; and calculating the average value of the sentence vector average distance of each sentence set to obtain a distance threshold value. Preferably, there should be a distinction in the semantics of the different sets of sentences, i.e. one topic for each set of sentences, the topics of the different sets of sentences being different.
The preset condition according to which the service problem to be solved is determined may be set as needed, for example, the service problem indicated by the category whose convergence is greater than or equal to a certain value (between 0 and 1, for example, 0.8) is set as the service problem to be solved.
Fig. 2 is a flow diagram illustrating a process for determining a business problem according to one embodiment of the present invention.
As shown in fig. 2, the process of determining the service problem according to an embodiment of the present invention may be executed at a server, and may specifically be implemented by using a data analysis server and an operation system (a server related to the operation system), where the data analysis server and the operation system server may be the same server or different servers.
Specifically, the audit opinion data is obtained through an operation system and subjected to semantic analysis to obtain a sentence vector of the audit opinion data, and then the sentence vector is stored in a database. The data analysis server comprises a data analysis tool (analysis tool for short), sentence vectors meeting conditions are inquired from a database according to specified conditions through the analysis tool, the inquired sentence vectors are clustered, the distance between each audit opinion and the center point of the class in which the audit opinion is located, namely the distance between the sentence vector of the audit opinion and the center point vector of the class in which the audit opinion is located, and the sentence vectors of the audit opinions of each class are sorted from small to large according to the distance so as to determine the service problem to be solved. The flow of the present embodiment will be described in detail below.
Because model loading, sentence vector calculation, and the like of semantic recognition are relatively time-consuming and CPU (central processing unit) -consuming processes, and the requirement on data real-time performance of analysis of audit opinion data is not high, in order not to affect on-line normal business, the above processing can be performed in a timed task manner (for example, audit opinion data in the first 24 hours are uniformly processed at 2 am every day). The operation system of this embodiment performs semantic analysis on the newly added audit opinion data within a time period by using a timing task frame, generates a sentence vector of the audit opinion data, and stores the sentence vector in the database. At present, most operating systems are constructed based on spring, and the embodiment can be realized by selecting a stable and mature timer framework quartz.
The implementation steps of the timing task are as follows: compiling a jobcode, wherein the jobcode is the content of specific timing task execution, and specifically comprises auditing opinion data in an inquiry processing period, circularly traversing inquiry results, calculating a sentence vector of the auditing opinion data, and storing the sentence vector of the auditing opinion data; configuring trigger, wherein the trigger is a trigger of a timing task and comprises information such as execution time, execution times and the like of the timing task; configuring a scheduler, wherein the scheduler is a scheduler of a timing task and is responsible for performing association binding on the jobs and the trigger; and after the steps are completed, the system is directly released online, and the timing task starts to be generated.
The analysis tool can inquire the audit opinion data to be analyzed according to the conditions of time period, service line, audit service type and the like, then analyze according to the selected clustering algorithm, automatically cluster the sentence vectors of the audit opinion data, calculate the distance between the sentence vector of each audit opinion data and each centroid in the clustering result in the data set formed by the inquired audit opinion data, classify the sentence vectors into the class with the nearest distance, calculate the distance between each audit opinion and the center point of the class in which the audit opinion is located, sentence vectors of all the auditing opinions of each category can be sorted from small to large according to the distance, a statistical chart or a statistical analysis report is finally generated, and a technical product manager can know which aspect the auditing opinions are concentrated under a specified condition according to the statistical chart or the statistical analysis report, so that the system improvement requirement is put forward more pertinently.
Where semantic analysis may be implemented based on the word2vec model. Each piece of audit opinion data requires the computation of a sentence vector for subsequent clustering calculations. The implementation steps are as follows:
the Chinese word segmentation tool is used for segmenting the newly added review opinion data, and various open-source Chinese word segmenters can be selected at present, for example, review opinion data: "interest and hobby", the participle array [ "interest", "hobby" ]canbe obtained after the participle; and (3) independently transmitting each word in the participle array into the model and calculating a corresponding word vector, then carrying out weighted average on all word vectors according to the part of speech, wherein nouns and verbs can represent the main semantics of the sentence, so that the words can be set to be corresponding to higher weight, other part of speech words can be set to be corresponding to lower weight, and the value obtained after weighted average is the sentence vector of the audit opinion data. For example:
word directionQuantity: VEC1=[v11,v12,...,v1n]And (3) weighting: k1
Word vector: VEC2=[v21,v22,...,v2n]And (3) weighting: k2
Word vector: VECm=[vm1,vm2,...,vmn]And (3) weighting: km
Sentence vector:
Figure BDA0002543616820000091
the calculated sentence vector and the audit opinion data can be stored in the same database table, or a separate database table can be used, or even a special distributed database is used for storage, in order to not influence the service processing of the audit opinion data, a second scheme is preferably adopted, and an audit opinion data ID (identification) is also required to be stored when the sentence vector is stored so as to be associated.
The main functions of the data analysis tool are data query, cluster calculation, result analysis, generation of statistical charts or statistical analysis reports. The method comprises the following implementation steps: and calling a database interface to query the sentence vector according to a specified condition to obtain a data set, wherein the query condition (namely the specified condition) can be customized according to the requirement of a user, and can be a checking service type, a checking time and the like, and the database can support the checking service type, the checking time and the like as long as the database is stored. The query result is an audit opinion data ID and a sentence vector, and the audit opinion data ID and the sentence vector can be directly stored in a memory if the data volume of the data set is not large, or can be stored in a special cloud storage system if the data volume is large; reading a data set, and performing clustering calculation on the data set, wherein a clustering algorithm and a clustering number (category number) can be selected by a user, and a clustering calculation result is a central point vector of each category; and (3) solving the distance between each sentence vector in the data set and each centroid in the clustering result, then classifying the vectors into the class with the minimum distance, and finally generating a clustering statistical table, wherein each class can be understood as a certain class of service problems.
In one embodiment, the product manager can find the business problem with the most concentrated auditing opinions according to the cluster statistical table, so as to provide new requirements for analysis of breakthrough, thereby being capable of solving the real business pain point problem more easily.
In another embodiment, the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category where the sentence vector is located may be calculated, then the sentence vectors of each category are sorted according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sequence of sentence vectors of each category, and for each sequence of sentence vectors of each category, the ratio of the number of sentence vectors in the sequence of sentence vectors of each category, the distance between which and the central point vector of the category is smaller than the distance threshold, in the total number of the sentence vectors of the category is calculated as the convergence degree of the sentence vectors of the category, and it is determined that the convergence degree of the sentence vectors of each category meets the preset condition, and if the convergence degree of the sentence vectors of a certain category meets the preset condition, the service problem indicated by the category may be determined as the service problem to be solved.
For example, a total of 100 pieces of review opinion data are obtained, and after clustering calculation, A, B, C three types of service problems respectively correspond to 50, 30 and 20 pieces of service problems, if the sentence vectors of 50 pieces of review opinion data in a are relatively far from the central point vector of a and relatively discrete, and the sentence vectors of 30 pieces of review opinion data in B are relatively close to the central point vector of B and relatively convergent, although the sentence vectors of review opinion data in a class are large in quantity, the sentence vectors of review opinion data in a class may be review opinion data with a large semantic difference with A, B, C class, and only because the sentence vectors are closer to the central point of a class, they are classified into a class a, and the clustering result in B class is more convergent and more practical semantic, so that the system should be optimized preferentially for the service problems indicated by B class, and by calculating the convergence degree of each category of sentence vectors, the service problem to be solved can be more accurately determined based on the convergence degree.
The clustering algorithm of the embodiment of the invention can be various clustering algorithms, and is introduced by a k-means (k mean) algorithm in the following. The clustering operation of one embodiment of the invention is realized by the following steps: firstly, a k value, namely the number of categories of the cluster, needs to be selected, the selection of the k value has a great influence on the result, and the selection methods generally have two types: one is elbow method, which is to say that how much k is the best result is judged according to the functional relationship between the clustering result and k; the other is determined according to specific requirements, for example, when a certain product is classified, the product may be divided into several classes according to certain attribute of the product so as to set the k value; the initial cluster points (or centroids) are selected, usually randomly within the data range, and there are two approaches: one is multiple mean taking, and the other is unsectioning K-means (binary K-means), and the main idea of the binary K-means algorithm is as follows: all points are first treated as a cluster and then the cluster is divided in half. Then, selecting a cluster which can reduce the clustering cost function (namely the sum of squared errors) to the maximum extent to divide the cluster into two clusters, and continuing until the number of the clusters is equal to the k value; then, the distances between all the points in the data set and the centroids are calculated, the points are divided into the class closest to the centroids, after the calculation, the average value of each cluster is calculated, the point is used as a new centroid, and the two steps are repeated repeatedly until the new centroid is completely the same as the original centroid, so that a final result is obtained. And repeatedly calculating the finally stabilized centroid, namely the central point vector, by a clustering algorithm.
The flow for determining the business problems of the embodiment of the invention analyzes the audit opinion data in the operation system through the combination of a series of artificial intelligence algorithms, thereby finding out the points with concentrated product problems, and on the basis, the technical product manager directly generates the PRD (product demand document), thereby effectively reducing the cognitive deviation of the product manager and the technical team product manager caused by subjective reasons when analyzing the business problems, reducing the information loss caused by transmission, leading the PRD to directly hit the business pain points and avoiding the waste of research and development resources. Meanwhile, potential problems can be found before the problems really occur, the system is optimized, the service problems are killed in the cradle, and the user experience is effectively improved.
Fig. 3 is a schematic diagram of main blocks of an apparatus for determining a business problem according to an embodiment of the present invention.
As shown in fig. 3, an apparatus 300 for determining a business problem according to an embodiment of the present invention mainly includes: a semantic analysis module 301, a clustering module 302 and a business problem determination module 303.
And the semantic analysis module 301 is configured to perform semantic analysis on the obtained review opinion data to obtain a sentence vector of the review opinion data and store the sentence vector.
And the clustering module 302 is configured to query the sentence vectors meeting the condition from the stored sentence vectors of the audit opinion data, and cluster the queried sentence vectors.
And a service problem determination module 303, configured to determine a service problem to be solved according to each category sentence vector obtained by clustering.
Various auditing functions are provided in an operation system, important information is inevitably provided in each auditing record and is called 'auditing opinions', which can reflect the non-standard part in the operation of a client and also can indirectly reflect the unreasonable points in the application, which are easy to mislead and have problematic process design.
In an embodiment, the semantic analysis module 301 periodically queries and obtains newly added review opinion data in a latest preset time period through a timing task, performs the semantic analysis on each review opinion data to obtain a sentence vector of the newly added review opinion data, and stores the sentence vector in a database, where the database is used to store the sentence vector of the stored review opinion data.
The semantic analysis module 301 may perform semantic analysis on a piece of review comment data by: performing word segmentation on the audit opinion data to obtain a word segmentation array; and calculating a word vector for each word in the word segmentation array by using a word vector model, and performing weighted average on each obtained word vector according to weights corresponding to the parts of speech to obtain a sentence vector of the audit opinion data.
In an embodiment, the service problem determining module 303 may be specifically configured to: and determining the service problem indicated by the category with the largest sentence vector quantity in the sentence vectors of all categories obtained by clustering as the service problem to be solved.
In another embodiment, the service problem determining module 303 may be specifically configured to: respectively calculating the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category of the sentence vector; determining the convergence degree of each category of sentence vectors according to the distance and the number of each category of sentence vectors; and determining the service problem indicated by the category with the convergence degree meeting the preset condition as the service problem to be solved.
The service problem determining module 303 may include a convergence determining sub-module, configured to: sorting the sentence vectors of each category according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sentence vector sequence of each category; and for each category sentence vector sequence, calculating the ratio of the number of the sentence vectors in the category sentence vector sequence, the distance between the sentence vectors and the central point of the category is less than the distance threshold value, in the total number of the category sentence vectors, so as to obtain the convergence of the category sentence vectors.
The apparatus for determining a traffic problem 300 may further comprise a distance threshold determination module for determining the distance threshold by: building a plurality of sentence sets, wherein each sentence set comprises a plurality of sentences with similar semantemes; respectively converting the sentences in the sentence sets into sentence vectors, and calculating the average distance of the sentence vectors of each sentence set, wherein the average distance of the sentence vectors of the sentence sets is the average value of the distance between every two sentence vectors in the sentence sets; and calculating the average value of the sentence vector average distance of each sentence set to obtain a distance threshold value.
According to the embodiment of the invention, through the combination of a series of artificial intelligence algorithms, the audit opinion data in the operation system is regularly analyzed, so that potential problem points existing in the product are found out, the cognitive deviation caused by subjective reasons when the business problem is analyzed manually can be effectively reduced, the product requirement document can directly hit the business pain points, the resource waste of research and development is avoided, the number of times of information transmission between people is reduced, the information loss is reduced, the requirement accuracy is improved, the potential problem can be found before the problem really occurs, the system is optimized, the business problem is prevented from being exposed to users, and the user experience can be improved.
In addition, the detailed implementation of the apparatus for determining a business problem in the embodiment of the present invention has been described in detail in the above method for determining a business problem, and therefore, the repeated content will not be described again.
Fig. 4 illustrates an exemplary system architecture 400 of a method of determining a business problem or an apparatus for determining a business problem to which embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, product information — just an example) to the terminal device.
It should be noted that the method for determining the business problem provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the device for determining the business problem is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown. The server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a semantic analysis module, a clustering module and a business problem determination module. The names of these modules do not form a limitation on the module itself under certain circumstances, for example, the semantic analysis module may also be described as a "module for performing semantic analysis on the obtained review comment data to obtain a sentence vector of the review comment data and saving the sentence vector".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: performing semantic analysis on the obtained audit opinion data to obtain a sentence vector of the audit opinion data and storing the sentence vector; inquiring the sentence vectors meeting the conditions from the sentence vectors of the stored audit opinion data, and clustering the inquired sentence vectors; and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering.
According to the technical scheme of the embodiment of the invention, semantic analysis is carried out on the obtained audit opinion data to obtain and store sentence vectors of the audit opinion data; inquiring the sentence vectors meeting the conditions from the sentence vectors of the stored audit opinion data, and clustering the inquired sentence vectors; and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering. The method has the advantages that potential problem points can be automatically found out, cognitive deviation caused by subjective reasons when business problems are analyzed manually can be effectively reduced, business pain points are directly hit, resource waste in research and development is avoided, information transmission times between people is reduced, information loss is reduced, the accuracy of demand is improved, potential problems can be found before the problems really occur so as to optimize a system, and the phenomenon that business problems are displayed for users to influence user experience is avoided.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method for determining a business problem, comprising:
performing semantic analysis on the obtained audit opinion data to obtain a sentence vector of the audit opinion data and storing the sentence vector;
inquiring the sentence vectors meeting the conditions from the sentence vectors of the stored audit opinion data, and clustering the inquired sentence vectors;
and determining the service problem to be solved according to the sentence vectors of each category obtained by clustering.
2. The method as claimed in claim 1, wherein the newly added audit opinion data within a latest preset time period is regularly queried and obtained through a timing task, and the semantic analysis is performed on each audit opinion data to obtain a sentence vector of the newly added audit opinion data, and the sentence vector is stored in a database, wherein the database is used for storing the sentence vector of the stored audit opinion data.
3. The method according to claim 1 or 2, wherein the opinion review data is semantically analyzed by:
performing word segmentation on the audit opinion data to obtain a word segmentation array;
and calculating a word vector for each word in the word segmentation array by using a word vector model, and performing weighted average on each obtained word vector according to weights corresponding to the parts of speech to obtain a sentence vector of the audit opinion data.
4. The method according to claim 1, wherein the determining the service problem to be solved according to the sentence vectors of each category obtained by clustering comprises:
respectively calculating the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category of the sentence vector;
determining the convergence degree of the sentence vectors of each category according to the distance and the number of the sentence vectors of each category;
and determining the service problem indicated by the category with the convergence degree meeting the preset condition as the service problem to be solved.
5. The method of claim 4, wherein said determining the convergence of the sentence vectors of each category according to the distance and the number of the sentence vectors of each category comprises:
sorting the sentence vectors of each category according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sentence vector sequence of each category;
and for each category sentence vector sequence, calculating the ratio of the number of the sentence vectors of which the distance is smaller than the distance threshold in the category sentence vector sequence to the total number of the category sentence vectors to obtain the convergence of the category sentence vectors.
6. The method of claim 5, wherein the distance threshold is determined by:
building a plurality of sentence sets, wherein each sentence set comprises a plurality of sentences with similar semantemes;
respectively converting the sentences in the sentence sets into sentence vectors, and calculating the average distance of the sentence vectors of each sentence set, wherein the average distance of the sentence vectors of the sentence sets is the average value of the distance between every two sentence vectors in the sentence sets;
and calculating the average value of the sentence vector average distance of each sentence set to obtain the distance threshold value.
7. An apparatus for determining business problems, comprising:
the semantic analysis module is used for performing semantic analysis on the obtained audit opinion data to obtain and store a sentence vector of the audit opinion data;
the clustering module is used for querying the sentence vectors meeting the conditions from the stored sentence vectors of the audit opinion data and clustering the queried sentence vectors;
and the service problem determination module is used for determining the service problem to be solved according to the sentence vectors of each category obtained by clustering.
8. The apparatus of claim 7, wherein the semantic module is further configured to: and regularly inquiring and acquiring newly added audit opinion data in a latest preset time period through a timing task, performing semantic analysis on each audit opinion data to obtain a sentence vector of the newly added audit opinion data, and storing the sentence vector into a database, wherein the database is used for storing the stored sentence vector of the audit opinion data.
9. The apparatus according to claim 7 or 8, wherein the semantic analysis module performs semantic analysis on the opinion review data by:
performing word segmentation on the audit opinion data to obtain a word segmentation array;
and calculating a word vector for each word in the word segmentation array by using a word vector model, and performing weighted average on each obtained word vector according to weights corresponding to the parts of speech to obtain a sentence vector of the audit opinion data.
10. The apparatus of claim 7, wherein the traffic problem determination module is further configured to:
respectively calculating the distance between each sentence vector in each category of sentence vectors obtained by clustering and the central point vector of the category of the sentence vector;
determining the convergence degree of the sentence vectors of each category according to the distance and the number of the sentence vectors of each category;
and determining the service problem indicated by the category with the convergence degree meeting the preset condition as the service problem to be solved.
11. The apparatus of claim 10, wherein the traffic problem determination module comprises a convergence determination sub-module configured to:
sorting the sentence vectors of each category according to the distance between the sentence vectors of each category and the central point vector of the corresponding category to obtain a sentence vector sequence of each category;
and for each category sentence vector sequence, calculating the ratio of the number of the sentence vectors of which the distance is smaller than the distance threshold in the category sentence vector sequence to the total number of the category sentence vectors to obtain the convergence of the category sentence vectors.
12. The apparatus of claim 11, further comprising a distance threshold determination module configured to determine the distance threshold by:
building a plurality of sentence sets, wherein each sentence set comprises a plurality of sentences with similar semantemes;
respectively converting the sentences in the sentence sets into sentence vectors, and calculating the average distance of the sentence vectors of each sentence set, wherein the average distance of the sentence vectors of the sentence sets is the average value of the distance between every two sentence vectors in the sentence sets;
and calculating the average value of the sentence vector average distance of each sentence set to obtain the distance threshold value.
13. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.
14. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202010553996.4A 2020-06-17 2020-06-17 Method and device for determining service problem Pending CN113761182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010553996.4A CN113761182A (en) 2020-06-17 2020-06-17 Method and device for determining service problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010553996.4A CN113761182A (en) 2020-06-17 2020-06-17 Method and device for determining service problem

Publications (1)

Publication Number Publication Date
CN113761182A true CN113761182A (en) 2021-12-07

Family

ID=78785475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010553996.4A Pending CN113761182A (en) 2020-06-17 2020-06-17 Method and device for determining service problem

Country Status (1)

Country Link
CN (1) CN113761182A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248399A1 (en) * 2008-03-21 2009-10-01 Lawrence Au System and method for analyzing text using emotional intelligence factors
US9336192B1 (en) * 2012-11-28 2016-05-10 Lexalytics, Inc. Methods for analyzing text
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN107644323A (en) * 2017-09-30 2018-01-30 成都莲合软件科技有限公司 A kind of intelligent checks system of service-oriented stream
CN109726383A (en) * 2017-10-27 2019-05-07 普天信息技术有限公司 A kind of article semantic vector representation method and system
CN109783806A (en) * 2018-12-21 2019-05-21 众安信息技术服务有限公司 A kind of text matching technique using semantic analytic structure
CN109800307A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Analysis method, device, computer equipment and the storage medium of product evaluation
CN109918498A (en) * 2019-01-16 2019-06-21 平安科技(深圳)有限公司 A kind of problem storage method and device
CN110147452A (en) * 2019-05-17 2019-08-20 北京理工大学 A kind of coarseness sentiment analysis method based on level BERT neural network
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110874531A (en) * 2020-01-20 2020-03-10 湖南蚁坊软件股份有限公司 Topic analysis method and device and storage medium
CN110908663A (en) * 2018-09-18 2020-03-24 北京京东尚科信息技术有限公司 Service problem positioning method and positioning device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248399A1 (en) * 2008-03-21 2009-10-01 Lawrence Au System and method for analyzing text using emotional intelligence factors
US9336192B1 (en) * 2012-11-28 2016-05-10 Lexalytics, Inc. Methods for analyzing text
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN107644323A (en) * 2017-09-30 2018-01-30 成都莲合软件科技有限公司 A kind of intelligent checks system of service-oriented stream
CN109726383A (en) * 2017-10-27 2019-05-07 普天信息技术有限公司 A kind of article semantic vector representation method and system
CN110908663A (en) * 2018-09-18 2020-03-24 北京京东尚科信息技术有限公司 Service problem positioning method and positioning device
CN109783806A (en) * 2018-12-21 2019-05-21 众安信息技术服务有限公司 A kind of text matching technique using semantic analytic structure
CN109918498A (en) * 2019-01-16 2019-06-21 平安科技(深圳)有限公司 A kind of problem storage method and device
CN109800307A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Analysis method, device, computer equipment and the storage medium of product evaluation
CN110147452A (en) * 2019-05-17 2019-08-20 北京理工大学 A kind of coarseness sentiment analysis method based on level BERT neural network
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110874531A (en) * 2020-01-20 2020-03-10 湖南蚁坊软件股份有限公司 Topic analysis method and device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李开荣, 林颖, 杭月芹: "基于语义模型的文档特征提取", 计算机工程与应用, no. 17, 1 May 2006 (2006-05-01) *
赵凡;马胜利;: "数据库内容结构分析法的理论与实践进展研究", 情报理论与实践, no. 02, 30 March 2008 (2008-03-30) *

Similar Documents

Publication Publication Date Title
CN108536650B (en) Method and device for generating gradient lifting tree model
US20060235885A1 (en) Selective delivery of digitally encoded news content
CN112527649A (en) Test case generation method and device
CN107392259B (en) Method and device for constructing unbalanced sample classification model
US11775894B2 (en) Intelligent routing framework
CN112861529A (en) Method and device for managing error codes
CN112257868A (en) Method and device for constructing and training integrated prediction model for predicting passenger flow
CN110852057A (en) Method and device for calculating text similarity
CN113641713A (en) Data processing method and device
CN110727759B (en) Method and device for determining theme of voice information
CN113918577B (en) Data table identification method and device, electronic equipment and storage medium
CN113778818A (en) Method, apparatus, device and computer readable medium for optimizing system
CN110852078A (en) Method and device for generating title
CN113761182A (en) Method and device for determining service problem
US20230342369A1 (en) Data processing method and apparatus, and electronic device and storage medium
CN112783615B (en) Data processing task cleaning method and device
CN112862554A (en) Order data processing method and device
CN113590322A (en) Data processing method and device
CN113066479A (en) Method and device for evaluating model
CN113779017A (en) Method and apparatus for data asset management
CN111782776A (en) Method and device for realizing intention identification through slot filling
CN113626175A (en) Data processing method and device
CN113434754A (en) Method and device for determining recommended API (application program interface) service, electronic equipment and storage medium
CN110851438A (en) Database index optimization suggestion and verification method and device
CN116361112B (en) Alarm convergence method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination