CN115357220A - Industrial APP development-oriented crowd-sourcing demand acquisition method - Google Patents

Industrial APP development-oriented crowd-sourcing demand acquisition method Download PDF

Info

Publication number
CN115357220A
CN115357220A CN202211007604.XA CN202211007604A CN115357220A CN 115357220 A CN115357220 A CN 115357220A CN 202211007604 A CN202211007604 A CN 202211007604A CN 115357220 A CN115357220 A CN 115357220A
Authority
CN
China
Prior art keywords
industrial
app
demand
phrases
apps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211007604.XA
Other languages
Chinese (zh)
Inventor
孙海龙
应昌君
齐斌航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202211007604.XA priority Critical patent/CN115357220A/en
Publication of CN115357220A publication Critical patent/CN115357220A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention realizes a crowd-sourcing demand acquisition method for industrial APP development by a method in the field of artificial intelligence. Based on data acquired from an industrial APP platform and a mobile APP market, a user demand acquisition method and a macroscopic demand prediction method are established to realize a crowd-sourcing demand acquisition system. Aiming at the problems that the user requirements of the industrial APP and the mobile APP are inconsistent and the quantity of the industrial APP comment data is small, the method provided by the invention combines the transfer learning technology, pre-trains the model through a large quantity of mobile APP comment data, and then finely adjusts the model by using a small quantity of industrial APP comment data. Aiming at the problem that part of industrial APPs do not have field classification and the problem that the existing research lacks demand prediction from a macro layer to the market, the APP is subjected to multi-label classification through description information, and the macro demand prediction of the market is carried out according to the field labels and the release time of the APP.

Description

Industrial APP development-oriented crowd-sourcing demand acquisition method
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a crowd-sourcing demand acquisition method for industrial APP development.
Background
In recent years, industrial APPs have received a great deal of attention and are deployed in large numbers on various industrial internet platforms. However, there are still two problems with industrial APP development: (1) Part of industrial APP is difficult to meet the actual requirements of users on functional and non-functional characteristics (2) the industrial APP is unbalanced in development of various application fields and cannot meet the requirements of part of application fields. How to accurately acquire the real demand of the industrial APP becomes one of the key problems for promoting the further development of the industrial APP.
Aiming at the problem of software demand acquisition, people provide a crowd-sourcing demand acquisition method, and aim to extract demand information from feedback and comments of user groups by means of crowdsourcing, big data analysis and the like. Most of the existing researches on the acquisition of crowd-sourcing requirements are concentrated in the field of mobile APP, and mainly develop user requirements from mobile APP comments.
Currently, many research works have been tried to solve the problem of the acquisition method of the group intellectualized demand for industrial APP development. In order to ensure that the APP can meet the real demand of the user group, the existing research uses a demand acquisition method based on mobile APP comments to solve the problem, which includes a demand type classification method, a functional demand acquisition method, and a non-functional demand acquisition method.
Because the data characteristics of the industrial APP comments are inconsistent with those of the mobile APP comments and the data quantity of the industrial APP comments is small, the conventional research method cannot be applied to obtaining user requirements from the industrial APP comments. Furthermore, since some industrial APPs have no domain classification and existing research lacks the prediction of market demand from the macro level, developers cannot know the market demand of each industrial domain.
Aiming at the problems that a large number of industrial APPs are poor in quality and the industrial APPs are unbalanced in development in various fields, the method for obtaining the actual demands of the users based on the comments of the user groups is researched, developers are helped to update and improve the APPs, the quality of the industrial APPs is improved, meanwhile, a macroscopic demand prediction method oriented to the industrial APP market is researched, the developers are helped to predict industrial APPs which are urgently needed in the future in the industrial fields, and the condition that the industrial APPs in various fields are unbalanced in development is improved.
Disclosure of Invention
Therefore, the invention firstly provides an industrial APP development-oriented crowd-sourcing demand acquisition method, two models, namely a user demand acquisition method and a macro demand prediction method, are constructed based on user comment data acquired from an industrial APP platform and a mobile APP market, the user demand acquisition method focuses on users, demands are acquired from the perspective of the users, and the macro demand acquisition method focuses on the field of industry and acquires demands from the macro perspective. One is a key phrase derived from user comments, and the other is that which industrial fields are in urgent need of industrial apps in the future is predicted from the number of industrial apps of each industrial field in the past period. The latter is for helping developers locate which areas urgently need the industrial app before the industrial app is developed, and the former is for helping developers iteratively update the industrial app after the industrial app is developed. A crowd-sourcing requirement acquisition system is realized, and key phrases representing user requirements in the comments are finally output;
the user requirement acquisition method comprises the following specific steps: firstly, generating positive and negative samples, matching mobile APP comments with an entity of a specific network information source, screening to obtain matched phrases, marking the screened phrases as required phrases, screening to obtain phrases which appear twice or more in each mobile APP comment, marking the phrases as required phrases, taking the marked required phrases as positive samples, randomly intercepting partial continuous word sequences in the mobile APP comments, and taking the randomly obtained word sequences as negative samples; then, performing feature extraction, putting the mobile APP comment data set and the industrial APP comment of the migration learning into a pre-training model RoBERTA, encoding a comment text by the RoBERTA, adopting a transform as a feature extractor by the RoBERTA-based BERT, and generating a corresponding attention machine drawing for each positive and negative sample phrase by using a multi-head attention machine of the transform; then, carrying out requirement phrase classification, after obtaining an attention drawing corresponding to each sample, inputting the attention drawing into a Convolutional Neural Network (CNN) for carrying out two-classification training, and classifying the phrases into requirement phrases or non-requirement phrases by adopting the CNN according to the attention drawing of each phrase; after CNN training is finished, putting industrial APP comment data into a pre-training model for coding, then calculating an attention machine drawing for each word sequence randomly generated in the front, putting the attention drawings into a CNN classifier for classification to obtain a demand phrase, finally grouping the obtained demand phrases according to the APPs to which the demand phrases belong, carrying out demand phrase clustering on the demand phrases of each comment under each APP, wherein the obtained clustering core is the user attention point;
the macro demand prediction method predicts which industrial field urgent need industrial APPs through the release time of the industrial APPs and the field labels of the industrial APPs, comprises two parts of multi-label classification and trend prediction, performs field label classification on all the industrial APPs by selecting a field classification standard, performs multi-label classification on the industrial APPs through the description information of the industrial APPs, and then predicts which future field urgent need industrial APPs according to the labels and the release time.
The positive and negative sample generation method comprises the following steps: the method is characterized in that the comment data are required to be extracted in an unsupervised mode, starting from the characteristics of the data, automatically digging out the comment data from the text according to the characteristics of the required phrases, and taking the phrases appearing twice or more in one comment as the required phrases.
The feature extraction method comprises the following steps: and mining the degree of relation between the phrases and other words through a self-attention mechanism.
The requirement phrase classification method comprises the following steps: the method comprises the steps of training a classifier by adopting a positive and negative sample data set, extracting features of the data set by using RoBERTA in feature extraction, wherein the RoBERTA has 12 layers in a default mode, each layer has 12 attention heads, regarding a text with N words, input data as a picture with the length and the width of N and 144 channels, converting a problem of phrase text classification into a problem of image classification, judging whether a phrase is a required phrase or not by giving a multichannel attention mechanism image, and classifying a multichannel attention mechanism drawing by using a two-layer Convolutional Neural Network (CNN) model.
The transfer learning technical means is as follows: use removal APP comment to train the model, after having trained the model fully, again with the model migration to industry APP comment on the data set fine setting work, it is higher with industry APP comment data similarity to remove APP comment data, the vocabulary frequency, text length, and text format, the text semanteme is very close, and the characteristics that industry APP comment on the data set is less, do not retrain the parameter to original CNN model, retrain full tie layer, add one deck sigmoid layer at the back at full tie layer at last, study is migrated to industry APP comment data.
The demand phrase clustering method comprises the following steps: and clustering the requirement phrases by adopting an unsupervised K-means algorithm, and calculating the distance between words by using cosine similarity.
The multi-tag classification part obtains a large amount of APP description information from each industrial Internet platform crawler, the description information comprises industrial fields suitable for industrial APPs, and the industrial APPs are classified into different fields through the APP description information; firstly, training a model through the existing labeled data, then applying the model to other unlabeled data, performing multi-label classification by using a combination mode of ALBERT and TextCNN, wherein the ALBERT is used as the coding of a data set, the TextCNN is used for performing feature extraction on a coded text vector, performing convolution pooling by using different convolution kernels, then connecting the data, discarding a part of data for training in order to prevent overfitting of the model, and finally inputting the data to a full connection layer to obtain a multi-label classification result through a sigmoid function.
The trend prediction part carries out macroscopic trend prediction according to the quantity of the field tags and the release time of industrial APPs, a month is taken as a time unit, statistics is carried out until the end of each month, the quantity of the industrial APPs in each field is counted, the quantity of the industrial APPs in each field is predicted according to the quantity of the industrial APPs in each month in the past, the quantity of the industrial APPs in each field in three months in the future is predicted by using a polynomial regression equation, a polynomial regression equation is generated for each field by using a sklern packet of python, the average growth rate of the industrial APPs in the past year is 5.8%, after the quantity change of the industrial APPs in the future is predicted for each industrial field, the field with the average growth rate exceeding 10% in a prediction result is screened out, and the field is considered as the industrial field with vigorous demand.
The technical effects to be realized by the invention are as follows:
aiming at the problems that the user requirements of the industrial APP and the mobile APP are inconsistent and the amount of the comment data of the industrial APP is small, the model is pre-trained through a large amount of mobile APP comment data by combining the migration learning technology, and then the model is finely adjusted by using a small amount of industrial APP comment data. Aiming at the problem that part of industrial APPs do not have field classification and the problem that the existing research lacks demand prediction from a macro layer to the market, the APP is subjected to multi-label classification through description information, and the macro demand prediction of the market is carried out according to the field labels and the release time of the APP.
Drawings
FIG. 1 is an overall frame diagram;
FIG. 2 is a demand acquisition method framework based on industry APP reviews;
FIG. 3 a demand prediction method framework;
FIG. 4ALBERT + TextCNN method diagram;
FIG. 5 trend prediction method framework;
Detailed Description
The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.
The invention provides an industrial APP development-oriented crowd-sourcing demand acquisition method. The general framework diagram of the crowd sourcing requirement acquisition system is shown in FIG. 1. The invention provides a demand obtaining method based on comments and a macroscopic demand forecasting method based on industrial APP description information and release time based on data obtained from a mobile APP market and domestic and foreign industrial APP platforms.
The method comprises two parts of user demand acquisition and macroscopic demand prediction, and is an application scene oriented to industrial APP development, wherein the user demand acquisition method is used for mining key phrases representing user demands for comments of users. The input of the method is user comments in an English text format, and the output of the method is key phrases representing user requirements in the comments. In a specific scenario, under a certain industrial APP, a comment of a user is input as "if sound more models added to it floor better than that of the real a floor organization", the user demand acquisition method performs demand extraction on the user comment, finds that the user wants to add more example models, and therefore the output result is "example models added". The macro demand prediction is to predict which fields are in urgent need of the industrial app to a user according to the development quantity of each industrial app field in the past period. The industrial fields comprise 'electronic information', 'wind power photovoltaic' and the like, and the total number of the fields is 17. Therefore, the output result of the macroscopic demand prediction is a plurality of specific industrial fields. One is a key phrase derived from user comments, and the other is that which industrial fields are urgently needed in the future from the number of industrial apps of each industrial field in the past period of time. The latter is used to help developers locate which areas of the industry app are urgently needed before the development of the industry app, and the former is used to help developers iteratively update the industry app after the development of the industry app
1. User demand acquisition method
One problem encountered with industrial APPs today is the inability to meet the actual needs of the user. Existing research, which mainly derives user needs from several aspects of demand classification, user functional need acquisition, and user non-functional need acquisition, is essentially expanding around mobile APP reviews. However, the amount of industrial APP comment data is small, and the demand of an industrial APP user is not consistent with that of a mobile APP user, so the existing method cannot be directly applied to the field of industrial APP. Therefore, the problem that the data size of the industrial APP is small is solved by using the migration learning, a large number of mobile APP comments are used for pre-training the model by introducing the migration learning technology, and then the model is finely adjusted by using the industrial APP comments. In addition, because the data volume of the mobile APP comment is large and no data label exists, the invention automatically generates a large number of samples from the comment data based on a certain rule by using an unsupervised learning mode, wherein the positive samples are all phrases representing the requirements of users. Through the unsupervised phrase mining method based on the transfer learning, the real requirements of the user are obtained from the comments, and therefore the current situation that the industrial APP cannot meet the actual requirements of the user is solved. Fig. 2 shows an overall framework diagram of the method. The method mainly comprises five parts of positive and negative sample generation, feature extraction, requirement phrase classification, transfer learning and requirement phrase clustering.
The method is provided for solving the problem of small data volume of industrial app comments, and has the innovation points that a small sample is trained based on transfer learning, a mobile app comment with a large sample amount is used for training, and then a model is finely adjusted according to the industrial app.
And generating positive and negative samples. The invention uses an unsupervised mode to extract the comment data, does not depend on external manual classification, and the generation of the positive and negative samples is completely and automatically generated. According to the characteristics of the requirement phrase, in general, in the text, the repeated and repeated vocabulary often means the requirement phrase of the section of words, and the section of words refers to a series of sentences around a certain core word. Therefore, according to the characteristic, the invention selects phrases appearing twice or more in one comment as the requirement phrases.
And (5) feature extraction. The feature extraction is to extract features capable of distinguishing a required phrase from a non-required phrase from a text, and the features can help the invention to distinguish what role each sample phrase plays in the text and know the importance degree of the sample phrase in the text. The requirement phrase is a central word sequence surrounded by a series of sentences, and since the requirement phrase is a center surrounded by a word, the requirement phrase must have a strong relationship with other words in the text. In the existing work, the self-attention mechanism self-attention can well reveal the attention distribution of sentences, and the degree of the relation between phrases and other words can be mined through the self-attention mechanism.
And (5) requirement phrase classification. The requirement phrase classification is to construct a classifier to determine whether the phrase is a requirement phrase. The invention adopts positive and negative sample data sets to train a classifier, uses RoBERTA to carry out feature extraction on the data set in the feature extraction, and the RoBERTA [1] has 12 layers under the default mode, and each layer has 12 attention heads, so that for a text with N words, the input data can be regarded as a picture with the length and the width of N and 144 channels. Aiming at the problem of converting the phrase text classification into an image classification, the invention judges whether the phrase is a demand phrase by giving a multi-channel attention mechanism image, and classifies the multi-channel attention mechanism image by using a two-layer Convolutional Neural Network (CNN) model.
And (4) transfer learning. The migration learning refers to training a model by using mobile APP comments, and after the model is fully trained, the model is migrated to an industrial APP comment data set for fine adjustment. Because the similarity of the mobile APP comment data and the industrial APP comment data is high, the similarity comprises vocabulary frequency, text length, text format, text semantics and the like which are very close to each other, and because the industrial APP comment data set is small, parameters are not retrained for the original CNN model, and the full connection layer is retrained. And finally, adding a sigmoid layer behind the full-connection layer, and performing transfer learning on the industrial APP comment data.
And clustering the requirement phrases. Demand phrase clustering refers to clustering keywords of different comments under the same APP. Since there are multiple comments per APP, and each comment may contain multiple keywords, this is certainly another burden for developers if these keywords are all submitted to developers. Therefore, the requirement phrases need to be clustered, and the unsupervised K-means algorithm is adopted for clustering the requirement phrases. K-means is a classic distance-based clustering algorithm, and the distance between words is calculated by using cosine similarity.
The complete procedure is as follows. In a first step, the invention matches entities in the wiki encyclopedia in the reviews of mobile APP [2], filters the matched phrases, and labels the filtered phrases as requirement phrases, because entities in the wiki encyclopedia are nouns of some non-everyday terms, which when appearing in the reviews, will generally be the aspect of interest to the user for the APP. And then screening phrases appearing twice or more in each mobile APP comment, and marking the phrases as requirement phrases. These labeled demand phrases will be taken as positive samples. Then, the method randomly intercepts partial continuous word sequences in the mobile APP comment, and takes the randomly obtained word sequences as negative samples. And putting the mobile APP comment data set into a pre-training model RoBERTA, encoding the comment text by the RoBERTA, adopting a Transformer as a feature extractor by the RoBERTA continuation BERT, and generating a corresponding attention map for each positive and negative sample phrase by using a multi-head attention mechanism of the Transformer. After obtaining the attention map corresponding to each sample, the invention inputs the attention map into the convolutional neural network CNN for two-class training. Because CNN is commonly used in training image recognition classification, and has good adaptability to image classification, the present invention uses CNN to classify phrases into required phrases or non-required phrases according to an attention chart of each phrase. After the CNN training is completed, the whole model is trained, the industrial APP comment data are randomly intercepted to obtain a large number of continuous word sequences, for example, the word sequence of 'this APP needed path method' is classified into continuous word sequences of 'this APP', 'APP needed', 'path method' and 'need path', the industrial APP comment data are put into a pre-training model to be coded, then, attention drawing charts are calculated for each word sequence randomly generated in the front to obtain attention drawing charts, the attention drawing charts are put into a CNN classifier to be classified to obtain demand phrases, finally, the obtained demand phrases are grouped according to the APPs to which the demand phrases belong, the demand phrases of each comment under each APP are clustered, and the obtained clustering core is the user attention point.
2. Macroscopic demand prediction method
In order to solve the problem that the existing industrial APP cannot meet the requirements of partial industrial fields, the method predicts which industrial fields are in urgent need of the industrial APP through the release time of the industrial APP and the field labels of the industrial APP. Because the classification of each industrial APP in China to the field of industrial APPs is different, and the field classification of industrial APPs is not carried out on partial platforms, a field classification standard needs to be selected for the method, all industrial APPs are subjected to field label classification, one industrial APP may contain a plurality of field labels at the same time, and the description information of the industrial APP contains the information of the applicable field, so that the method firstly needs to carry out multi-label classification on the industrial APPs through the description information of the industrial APPs, and then predicts which fields are urgently needed for the industrial APPs in the future according to the labels and the release time. The present invention contemplates a demand prediction method as shown in fig. 3.
The method mainly comprises two parts of multi-label classification and trend prediction.
And (4) multi-label classification. According to the method, a large amount of APP description information is obtained from each industrial internet platform crawler, the description information contains industrial fields suitable for industrial APPs, and the industrial APPs are classified into different fields through the APP description information. According to a crawler result, some platforms distribute domain labels for APPs, but a large number of platforms of the APPs do not classify the domain labels, and the demand prediction analysis should contain the existing published industrial APP data set in China as much as possible, so that the method firstly needs to train the model through the existing labeled data and then applies the model to other unlabeled data. Unlike multi-classification, multi-label classification is to classify an object into multiple categories, for example, in an industrial APP, the same industrial APP may simultaneously include multiple labels of smart agriculture, water conservancy detection, smart country, etc., which relate to multiple fields, and an industrial APP may belong to multiple labels simultaneously or not belong to any label, which is a problem that may be encountered in multi-label classification. There may be multiple tags for a piece of data in a multi-tag classification task. The method uses the combination mode of ALBERT [3] and TextCNN to classify the multiple labels. ALBERT is used as the code of the data set, textCNN is used for extracting the characteristics of the coded text vector, and different convolution kernels are used for performing convolution pooling and then connecting the data. To prevent the model from overfitting, the method discards a part of the data for training, so that the part of the data does not participate in the training. And finally, inputting the data into a full connection layer, and obtaining a multi-label classification result through a sigmoid function. Fig. 4 is a diagram illustrating a multi-label classification method.
And (5) predicting the trend. After each industrial APP is classified according to the field tags according to the description information, the method carries out macroscopic trend prediction according to the number of the field tags and the release time of the industrial APP, and helps developers to know which industrial fields are in vigorous demand in the future. The industrial APP release time used by the method spans 2017.8.15 to 2022.3.30. The method takes months as time units, counts the number of industrial APPs in each field at the end of each month, and predicts the number of industrial APPs in each field in three months in the future according to the number of industrial APPs in each month in the past. Because the model input of the method is simple, and only the past industrial APP number is taken as the only influence factor when the future industrial APP number is predicted, the method considers the use of a polynomial regression equation for prediction, and a polynomial regression equation is generated for each field by using the sklern package of python. The average monthly growth rate of industrial APP in the past year is 5.8%, after the quantity change of the industrial APP in the next three months is predicted for each industrial field, the method screens out the field with the average monthly growth rate exceeding 10% in the prediction result, and the field is considered as the industrial field with vigorous demand. Fig. 5 is a block diagram of the trend prediction method.
Aiming at the problem that the market demand of industrial APP is lack of being obtained in the existing research, the invention provides a macroscopic demand prediction method based on the description information and the release time of the industrial APP, and helps developers to predict which industrial fields need the industrial APP urgently. According to the method, the industrial APP is subjected to field classification according to the description information of the industrial APP through a multi-label classification technology, and then the market demand is predicted according to the field classification result and the release time of the industrial APP, so that the fields which can become vigorous in future time can be obtained.

Claims (8)

1. A crowd-sourcing demand acquisition method for industrial APP development is characterized by comprising the following steps: based on user comment data obtained from an industrial APP platform and a mobile APP market, two models, namely a user demand obtaining method and a macroscopic demand forecasting method, are constructed, the user demand obtaining method focuses on users and obtains demands from user comments, the macroscopic demand obtaining method focuses on the industry field and obtains demands from a macroscopic view, a crowd intellectualized demand obtaining system is realized, key phrases are obtained from the user comments, industrial APPs urgently needed in the future industry field are forecasted from the quantity distribution of the industrial APPs in the industry fields in a period of time in the past, and the key phrases representing the user demands in comments and the industrial fields urgently needed with the industrial APPs are finally output;
the user requirement acquisition method comprises the following specific steps: firstly, generating positive and negative samples, matching mobile APP comments with an entity of a specific network information source, screening to obtain matched phrases, marking the screened phrases as required phrases, screening to obtain phrases which appear twice or more in each mobile APP comment, marking the phrases as required phrases, taking the marked required phrases as positive samples, randomly intercepting partial continuous word sequences in the mobile APP comments, and taking the randomly obtained word sequences as negative samples; then, performing feature extraction, putting the mobile APP comment data set and the industrial APP comment of the migration learning into a pre-training model RoBERTA, encoding a comment text by the RoBERTA, adopting a transform as a feature extractor by the RoBERTA-based BERT, and generating a corresponding attention machine drawing for each positive and negative sample phrase by using a multi-head attention machine of the transform; then, carrying out requirement phrase classification, inputting the attention machine drawing into a Convolutional Neural Network (CNN) for carrying out classification training after obtaining the attention machine drawing corresponding to each sample, and classifying the phrases into requirement phrases or non-requirement phrases by adopting the CNN according to the attention machine drawing of each phrase; after CNN training is finished, industrial APP comment data are placed in a pre-training model to be coded, attention force charts are calculated for each word sequence generated randomly in the front, the attention force charts are placed in a CNN classifier to be classified to obtain demand phrases, the obtained demand phrases are grouped according to the APPs to which the demand phrases belong, demand phrase clustering is carried out on the demand phrases of comments under each APP, and the obtained clustering core is the user attention point;
the macro demand prediction method predicts which industrial field urgent need industrial APPs through the release time of the industrial APPs and the field labels of the industrial APPs, comprises two parts of multi-label classification and trend prediction, performs field label classification on all the industrial APPs by selecting a field classification standard, performs multi-label classification on the industrial APPs through the description information of the industrial APPs, and then predicts the field of future urgent need industrial APPs according to the labels and the release time and feeds back the field to a user.
2. The method for acquiring the crowd sourcing requirement for industrial APP development as claimed in claim 1, wherein: the positive and negative sample generation method comprises the following steps: the method is characterized in that the comment data are required to be extracted in an unsupervised mode, starting from the characteristics of the data, automatically digging out the comment data from the text according to the characteristics of the required phrases, and taking the phrases appearing twice or more in one comment as the required phrases.
3. The method for acquiring the crowd sourcing demand for industrial APP development as claimed in claim 2, wherein: the feature extraction method comprises the following steps: and mining the degree of relation between the phrases and other words through a self-attention mechanism.
4. The method for acquiring the crowd sourcing demand for industrial APP development as claimed in claim 3, wherein: the requirement phrase classification method comprises the following steps: the method comprises the steps of training a classifier by adopting a positive and negative sample data set, performing feature extraction on the data set by using RoBERTA in the feature extraction, wherein the RoBERTA has 12 layers in a default mode, each layer has 12 attention heads, regarding a text with N words, input data is regarded as a picture with the length and the width of N and 144 channels, converting a problem of phrase text classification into a problem of image classification, judging whether a phrase is a required phrase by giving a multi-channel attention mechanism image, and classifying a multi-channel attention mechanism drawing by using a two-layer convolutional neural network CNN model.
5. The method for acquiring the crowd sourcing demand for industrial APP development as claimed in claim 4, wherein: the transfer learning technical means is as follows: use removal APP comment to train the model, after having trained the model fully, again with the model migration to industry APP comment on the data set fine setting work, it is higher with industry APP comment data similarity to remove APP comment data, the vocabulary frequency, text length, and text format, the text semanteme is very close, and the characteristics that industry APP comment on the data set is less, do not retrain the parameter to original CNN model, retrain full tie layer, add one deck sigmoid layer at the back at full tie layer at last, study is migrated to industry APP comment data.
6. The method for acquiring the crowd sourcing demand for industrial APP development as claimed in claim 5, wherein: the demand phrase clustering method comprises the following steps: and clustering the requirement phrases by adopting an unsupervised K-means algorithm, and calculating the distance between words by using cosine similarity.
7. The method for acquiring the crowd sourcing demand for industrial APP development as claimed in claim 6, wherein: the multi-tag classification part obtains a large amount of APP description information from each industrial Internet platform crawler, the description information comprises industrial fields suitable for industrial APPs, and the industrial APPs are classified into different fields through the APP description information; firstly, training a model through the existing labeled data, then applying the model to other unlabeled data, performing multi-label classification by using a combination mode of ALBERT and TextCNN, wherein the ALBERT is used as the coding of a data set, the TextCNN is used for performing feature extraction on a coded text vector, performing convolution pooling by using different convolution kernels, then connecting the data, discarding a part of data for training in order to prevent overfitting of the model, and finally inputting the data to a full connection layer to obtain a multi-label classification result through a sigmoid function.
8. The method for acquiring the crowd sourcing demand for industrial APP development as claimed in claim 7, wherein: the trend prediction part carries out macroscopic trend prediction according to the quantity of the field tags and the release time of industrial APPs, a month is taken as a time unit, statistics is carried out until the end of each month, the quantity of the industrial APPs in each field is counted, the quantity of the industrial APPs in each field is predicted according to the quantity of the industrial APPs in each month in the past, the quantity of the industrial APPs in each field in three months in the future is predicted by using a polynomial regression equation, a polynomial regression equation is generated for each field by using a sklern packet of python, the average growth rate of the industrial APPs in the past year is 5.8%, after the quantity change of the industrial APPs in the future is predicted for each industrial field, the field with the average growth rate exceeding 10% in a prediction result is screened out, and the field is considered as the industrial field with vigorous demand.
CN202211007604.XA 2022-08-22 2022-08-22 Industrial APP development-oriented crowd-sourcing demand acquisition method Pending CN115357220A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211007604.XA CN115357220A (en) 2022-08-22 2022-08-22 Industrial APP development-oriented crowd-sourcing demand acquisition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211007604.XA CN115357220A (en) 2022-08-22 2022-08-22 Industrial APP development-oriented crowd-sourcing demand acquisition method

Publications (1)

Publication Number Publication Date
CN115357220A true CN115357220A (en) 2022-11-18

Family

ID=84002249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211007604.XA Pending CN115357220A (en) 2022-08-22 2022-08-22 Industrial APP development-oriented crowd-sourcing demand acquisition method

Country Status (1)

Country Link
CN (1) CN115357220A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932853A (en) * 2023-07-25 2023-10-24 重庆邮电大学 User demand acquisition method based on APP comment data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932853A (en) * 2023-07-25 2023-10-24 重庆邮电大学 User demand acquisition method based on APP comment data

Similar Documents

Publication Publication Date Title
CN109934293B (en) Image recognition method, device, medium and confusion perception convolutional neural network
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
US7657089B2 (en) Automatic classification of photographs and graphics
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
Naranjo-Alcazar et al. Acoustic scene classification with squeeze-excitation residual networks
CN106886580B (en) Image emotion polarity analysis method based on deep learning
CN107085581A (en) Short text classification method and device
CN109919252B (en) Method for generating classifier by using few labeled images
CN109344884A (en) The method and device of media information classification method, training picture classification model
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
CN104063683A (en) Expression input method and device based on face identification
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN110910283A (en) Method, device, equipment and storage medium for generating legal document
CN107480723B (en) Texture Recognition based on partial binary threshold learning network
CN109784368A (en) A kind of determination method and apparatus of application program classification
CN110751191A (en) Image classification method and system
CN112861524A (en) Deep learning-based multilevel Chinese fine-grained emotion analysis method
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN112507800A (en) Pedestrian multi-attribute cooperative identification method based on channel attention mechanism and light convolutional neural network
CN113673607A (en) Method and device for training image annotation model and image annotation
CN116842194A (en) Electric power semantic knowledge graph system and method
CN112786160A (en) Multi-image input multi-label gastroscope image classification method based on graph neural network
CN115357220A (en) Industrial APP development-oriented crowd-sourcing demand acquisition method
CN115954001A (en) Speech recognition method and model training method
CN115345243A (en) Text classification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination