CN115994261B

CN115994261B - Numerical value recommendation method in form linkage change

Info

Publication number: CN115994261B
Application number: CN202211414329.3A
Authority: CN
Inventors: 何一帆; 陈林; 牟红兵; 鲁聪
Original assignee: Guangdong Hotent Software Co ltd
Current assignee: Guangdong Hotent Software Co ltd
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-07-07
Anticipated expiration: 2042-11-11
Also published as: CN115994261A

Abstract

The application provides a numerical value recommendation method in form linkage change, which comprises the following steps: acquiring form data input by a user, and constructing a user input matrix; identifying the similarity of the users according to the user input matrix; judging the form relevance according to the form theme relevance, the form structure and the form content; recommending numerical values according to the form association degree and the user similarity degree; acquiring a title attribute of a form, and judging a master-slave relationship between the forms; acquiring the periodicity of the title attribute of the form and the master-slave relation of the form, and judging whether the form to be tested and the master-slave form thereof need to be updated or not; after the form is updated, judging whether the recommended value is updated or not and acquiring an updating period; acquiring user operation behaviors, judging numerical accuracy and estimating recommended numerical error risk level; and re-recommending the numerical value according to the numerical accuracy and the recommended numerical error risk level.

Description

Numerical value recommendation method in form linkage change

Technical Field

The invention relates to the technical field of information, in particular to a numerical value recommendation method in form linkage change.

Background

The low-code application development system can directly generate usable content after dragging the form control; however, for form controls containing numerical content, generally only types can be generated and numerical values cannot be directly generated; for example, dragging a date control into the device to fill in the date, or dragging an amount control into the device to fill in the amount; however, when the number content to be filled is large, two problems are generated, namely excessive filling of the content is large in workload, and errors are easy to fill; for example, a form which is all numerical values or a form which is typeset with a plurality of numerical value types can be easily filled in errors; according to the information filled in by a user before or the information filled in by different users at the same post of the same company, although some recommendations can be made for the content to be filled in, it is difficult to mine what filled-in forms are related to each other and the relevance between the main form and the sub-form; in addition, there are cases where numerical updates are performed between forms at a certain frequency, for example, the price may be updated according to months or weeks, but the birthday is often not updated any more, the names of customers purchasing the C-terminal commodity may be different each time, and customers purchasing the B-terminal commodity may have higher repurchase rate and infrequent updates; filling recommendation for numerical content can be continued for numerical content which is not updated, but predicting how to infer numerical content which needs to be updated and how to update the numerical content is an unresolved problem; meanwhile, the prediction of error severity after recommendation is also a problem to be solved urgently; if the value is wrong in birthday recommendation, the influence is small, if the value is wrong in price recommendation, the user defaults to select the recommendation of the system to cause the price error and possibly cause a big problem later, and therefore the error and risk of the value-associated recommendation also need to be estimated. It is therefore desirable to propose a numerical recommendation method that takes account of form linkage changes.

Disclosure of Invention

The invention provides a numerical value recommendation method in form linkage change, which mainly comprises the following steps:

acquiring form data input by a user, and constructing a user input matrix; identifying the similarity of the users according to the user input matrix; judging the form relevance according to the form theme relevance, the form structure and the form content, wherein the judging the form relevance according to the form theme relevance, the form structure and the form content specifically comprises the following steps: calculating the topic relevance of the form, obtaining title attributes of the form controls based on the form structure classification of the decision tree, and calculating the similarity of the form contents; according to the form association degree and the user similarity recommendation value, the method specifically comprises the following steps: determining a recommended numerical sequence according to the user similarity; acquiring a title attribute of a form, and judging a master-slave relationship between the forms; acquiring the periodicity of the title attribute of the form and the master-slave relation of the form, and judging whether the form to be tested and the master-slave form thereof need to be updated or not; after the form is updated, judging whether the recommended value is updated or not and acquiring an updating period; acquiring user operation behaviors, judging numerical accuracy and estimating recommended numerical error risk level; and re-recommending the numerical value according to the numerical accuracy and the recommended numerical error risk level.

Further optionally, the obtaining the form data entered by the user and constructing the user entry matrix include:

acquiring all contents input by a user in a form; constructing a user input model, wherein the user input model comprises a user input matrix M; the user input matrix M describes a matrix of the occurrence times of all input values when the user fills in the form; the user input matrix M is an M-by-n order matrix, U is a user set, I is input values in a form, and Sij is the number of times each input value is filled in when the user fills in the form.

Further optionally, the identifying the similarity of the user according to the user input matrix includes:

acquiring a history filling record of a target user, and converting the history filling record into a user input matrix M of the target user; dynamically acquiring a similarity set S of a target user Ua and a user set U through a Pearson correlation coefficient method on the basis of constructing a user input matrix M; the similarity set S is a set of similarity values of history filling records of each user in the target user Ua and the user set U, and the method is as follows: traversing the user set U, respectively calculating similarity values of the target user Ua and the user set U, and representing the similarity values by the set S; the similarity value elements Sm in the set S are arranged in the order from big to small, the user similarity is higher as the similarity value elements Sm are larger, and the user similarity is smaller as the similarity value elements Sm are smaller.

Further optionally, the distinguishing the form relevance according to the form theme relevance, the form structure and the form content comprises:

the form association degree discrimination comprises the steps of calculating the form theme association degree, classifying the form structure based on a decision tree and calculating the form content similarity; the calculating of the topic relevance of the form comprises determining the topic of the form through the title of the form, and taking the title similarity as a form topic similarity value; the list structure classification based on the decision tree comprises the steps of judging whether a list belongs to a numerical value input list, if so, marking 1, otherwise, marking 0; the step of calculating the similarity of the contents of the forms comprises judging the similarity of the contents of the forms according to the title attribute of the corresponding numerical input control in the forms; respectively giving corresponding weights to the three results, and recording the weights as w1, w2 and w3, wherein the degree of form association=w1×form subject similarity+w2×form structure classification result+w3×form content similarity; wherein three weights are required to be modified through repeated testing; comprising the following steps: calculating the degree of correlation of the form subject; classifying the list structure based on the decision tree; acquiring title attributes of the form controls and calculating the similarity of the form contents;

the calculating the form theme relevance specifically comprises the following steps:

Acquiring a title of a form, segmenting the title by using jieba, removing stop words, and outputting words; calculating word similarity based on a corpus, and calculating similarity between two words according to the number of different words by the corpus; firstly judging which layer of the two words used as leaf nodes in the corpus is different, multiplying 1 if the two words are the same, otherwise multiplying corresponding coefficients, and multiplying adjusting parameters and control parameters; the similarity of the word a and the word B = each layer coefficient × adjustment parameter × control parameter; wherein the coefficients of each layer are determined after multiple experiments; form subject relevance is equal to the average of the heading word similarity.

The list structure classification based on the decision tree specifically comprises the following steps:

the method comprises the steps of obtaining a form set, and dividing the form set into two main types according to whether a numerical value input box is included or not: the method comprises the steps of inputting a numerical value into a form and inputting a non-numerical value into the form, and marking each form with a category label; training a decision tree by taking the form set with the label as a training set; extracting the list structure characteristics through the trained decision tree, and generating a decision tree for judging the list structure type; the decision process includes: firstly, respectively counting two major categories in a training set, and calculating the probability P and information entropy of each category; then inputting the characteristic value of the form to be classified into a decision tree, recalculating the information entropy once in each decision, and selecting the branch with the maximum information entropy increase as a decision result; the generated decision tree is classified using the following rules: if the < input > tag does not exist in the form, the non-numerical entry form is selected, if the < input > tag is contained, the constraint type of the form control is extracted, if the constraint type is numerical, the non-numerical entry form is selected, and otherwise, the non-numerical entry form is selected.

The title attribute of the form control is obtained and the similarity of the form content is calculated, which comprises the following steps:

the similarity of the contents of the forms comprises judging the similarity of the contents of the forms according to the title attribute of the corresponding numerical input control in the forms; acquiring title attributes of corresponding numerical value input controls in a category numerical value input form as feature words; firstly, preprocessing is carried out, content in brackets in title attributes is removed, if a characteristic word is composed of a plurality of words, the jieba is used for word segmentation operation, and meaningless stop words are removed; then calculating the similarity of the contents of the form; the processed title attributes are used as feature vectors, and the similarity of the title attributes to be detected is calculated by using the feature vectors based on synonym forest; traversing title attributes of the corresponding numerical value input controls of the to-be-tested forms, calculating similarity between each title attribute and title attributes in the to-be-compared forms, and taking average similarity between the to-be-tested forms and all title attributes in the to-be-compared forms as two form content similarity values.

Further optionally, the recommending values according to the form relevance and the user similarity includes:

acquiring historical filling data of a target user and a form set filled by all users; the user history filling data comprises a user filling form and user filling content; the same user fills in the form, recommends from the history filling data, and recommends according to the form association degree and the user similarity if the part of the filled form information of different users is the same; firstly, matching a currently filled form of a user in user history filling data, and recommending the history filling data closest to the current time to the user if the matching result is not null; if the matching result is null, firstly acquiring the association degree of the currently filled form of the user and the background form set, and taking the average value of the association degree as a first threshold; then, extracting a form with the association degree of the form filled in currently by the user being greater than a first threshold value as an association form, calculating the content similarity of the form filled in currently by the user and the association form, outputting title attributes with the content similarity being greater than the average content similarity, acquiring a filled-in user set corresponding to the title attributes, and determining a recommended numerical sequence according to the user similarity; comprising the following steps: determining a recommended numerical sequence according to the user similarity;

The determining the recommended numerical value sequence according to the user similarity specifically comprises the following steps:

obtaining user similarity, and selecting a user with the similarity rank K as a similar user set; wherein the K value is set according to the total amount of users; acquiring a target user input matrix and user input matrices of all similar users; and extracting a common input value set between each user in the target user and the similar user set as a recommended numerical value sequence.

Further optionally, the obtaining the title attribute of the form, and determining the master-slave relationship between the forms includes:

respectively acquiring all title attributes of the forms to be judged and main keys or external keys of the forms; judging the master-slave relation between the forms according to the title attributes and the main keys or the external keys of the forms, wherein the master-slave relation comprises the step of setting the form numbers as A and B respectively, and if the main keys or the external keys of the form B are contained in the title attribute set of the form A, the form B is a sub-form of the form A, and the form A is a father form of the form B.

Further optionally, the obtaining the periodicity of the title attribute of the form and the master-slave relationship of the form, and determining whether the form to be tested and the master-slave table thereof need to be updated includes:

acquiring title attributes of the value input controls in all the value input forms, marking whether the attributes belong to periodicity or aperiodicity, and finally marking whether each form belongs to a form needing to be updated or a form not needing to be updated; taking the marked data as a training set, and judging whether the form needs to be updated or not based on a naive Bayesian model; acquiring a currently filled form of a user, extracting title attributes of a numerical entry control, traversing the title attributes in a training set, calculating the similarity between the title attributes and the title attributes of the currently filled form of the user, and taking periodicity or non-periodicity corresponding to the title attribute with the highest similarity as a characteristic value of the currently filled form of the user; inputting the characteristic values of the form to be tested into a trained naive Bayesian model, and outputting the update category of the form to be tested; if the form to be tested belongs to the form to be updated, acquiring a master-slave relationship of the form to be tested; if the form to be tested is a parent form, the child form of the form to be tested is also marked as the form to be updated, and if the form to be tested is the child form, the parent form does not need to be marked.

Further optionally, after the form is updated, determining whether the recommended value is updated and acquiring an update period includes:

acquiring title attributes of a numerical entry control in a form to be updated, and judging whether the title attributes belong to periodicity or aperiodicity through similarity; if the title attribute is aperiodic, the recommended value is not updated; if the title attribute is periodic, the recommended value needs to be updated; acquiring user input data corresponding to the periodic title attribute, attaching a time stamp to each piece of data, and constructing a time sequence data set by adopting the user input data with the time stamp by taking days as a time unit; and acquiring the period of the time sequence data set by utilizing Fourier transformation, firstly calculating the frequency of each numerical value, then arranging according to the descending order of the frequency, and selecting the conversion with the highest frequency to be the update period.

Further optionally, the obtaining the user operation behavior, judging the numerical accuracy and estimating the recommended numerical error risk level includes:

after the recommended value is obtained, the operation behavior data of the user are respectively counted, the total recommended times and the times of the recommended value selected by the user are counted, and the value accuracy is calculated; then estimating the risk level of the recommended numerical error; acquiring all title attributes, eliminating data if the title attributes are non-periodic, and then extracting period; and determining three equal division points according to the period sizes of all title attributes, and dividing all title attributes into three recommended numerical error risk levels, wherein the recommended numerical error risk of the level one is highest, and the period is minimum.

Further optionally, the re-recommending the value according to the value accuracy and the recommended value error risk level includes:

acquiring the numerical accuracy and the risk level of title attributes; judging whether the numerical accuracy is smaller than a second threshold, if the numerical accuracy is larger than or equal to the second threshold, not operating, otherwise judging the risk level of the title attribute; if the risk level of the title attribute is level one or level two, pushing a prompt box to the user, prompting the user to consider the recommendable numerical value, outputting the numerical value accuracy, and then recommendable the numerical value; re-recommending the numerical value for the user by adopting a Markov model; firstly, acquiring data of a user history filling form, generating a user input matrix, and further determining a one-step transition probability matrix; solving an n-step transition probability matrix, and calculating a numerical value possibly input by a user after n steps; wherein n is determined according to title period; the predicted values are re-recommended to the user.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

the invention can carry out association analysis on the form in the low code, after linkage change is realized, when a user drags the numerical control, the numerical value to be filled out is predicted, and the trouble of filling the numerical value by the user is reduced. And meanwhile, predicting the risk and the recommendation precision existing in the numerical recommendation, and prompting the user if the recommendation result is wrong and the risk is large, so that the recommendation error is avoided.

Drawings

FIG. 1 is a flow chart of a method for recommending numerical values in form linkage changes according to the present invention.

FIG. 2 is a schematic diagram of a method for recommending numerical values in form linkage changes according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The numerical value recommendation method in form linkage change in the embodiment specifically comprises the following steps:

step 101, form data input by a user are acquired, and a user input matrix is constructed.

Acquiring all contents input by a user in a form; constructing a user input model, wherein the user input model comprises a user input matrix M; the user input matrix M describes a matrix of the occurrence times of all input values when the user fills in the form; the user input matrix M is an M-by-n order matrix, U is a user set, I is input values in a form, and Sij is the number of times each input value is filled in when the user fills in the form; for example, users Zhang III and Lifour fill out forms on site A. The form content includes prices for products sold by different suppliers, for a total of three prices: i1 = 5499, i2=5999, i3=5200. Suppose that the user fills out the records for Zhang III and Li IV are respectively I1:3 times and I2:12 times, I3:1 time, 1:12 times, I2:3 times, I3:1 time. The user entry matrix M is represented as { (3,12,1) (12,3,1) }. In Sij, i denotes the ith user and j denotes the jth entry value.

Step 102, identifying the similarity of the users according to the user input matrix.

Acquiring a history filling record of a target user, and converting the history filling record into a user input matrix M of the target user; dynamically acquiring a similarity set S of a target user Ua and a user set U through a Pearson correlation coefficient method on the basis of constructing a user input matrix M; the similarity set S is a set of similarity values of history filling records of each user in the target user Ua and the user set U, and the method is as follows: traversing the user set U, respectively calculating similarity values of the target user Ua and the user set U, and representing the similarity values by the set S; the similarity value elements Sm in the set S are arranged in the order from big to small, the user similarity is higher as the similarity value elements Sm are larger, and the user similarity is smaller as the similarity value elements Sm are smaller. For example, a user of wang five types matrix m= { (9,1,0) }. The user-entered model matrix is constructed as { (3,12,1) (12,3,1) }. Similarity s1= | (9-7.5) (3-7.5) + (1-7.5) (12-7.5) + (0-1) (1-1) |/root number (6.75≡2+35.75≡2+0) =1.168. Similarly, the similarity S2= | (9-7.5) (12-7.5) + (1-7.5) (3-7.5) + (0-1) (1-1) |/root number (8.75≡2+29.25≡2+0) =1.245. Then s= { S1, S2}. Calculating the similarity of the users by using a Pearson correlation coefficient method, wherein the formula is as follows:

Wherein a, b represent user a and user b, saj and Sbj represent the number of times each entry value of the j-th item is filled in when the form is filled in by user a and user b, respectively, and s represents the average number of entries of all the j-th item contents entered by the users.

And step 103, judging the form relevance according to the form theme relevance, the form structure and the form content.

The form association degree discrimination comprises the steps of calculating the form theme association degree, classifying the form structure based on a decision tree and calculating the form content similarity; the calculating of the topic relevance of the form comprises determining the topic of the form through the title of the form, and taking the title similarity as a form topic similarity value; the list structure classification based on the decision tree comprises the steps of judging whether a list belongs to a numerical value input list, if so, marking 1, otherwise, marking 0; the step of calculating the similarity of the contents of the forms comprises judging the similarity of the contents of the forms according to the title attribute of the corresponding numerical input control in the forms; respectively giving corresponding weights to the three results, and recording the weights as w1, w2 and w3, wherein the degree of form association=w1×form subject similarity+w2×form structure classification result+w3×form content similarity; three of the weights need to be modified by repeated testing. For example, the topic relevance, structure classification result, and content similarity of form 1 and form 2 are 0.71, 1, 0.5, respectively, and the weights are set to 0.2, 0.3, 0.5, respectively. Form association = 0.2 x 0.71+0.3 x 1+0.5 x 0.5 = 0.692.

And calculating the form theme relevance.

Acquiring a title of a form, segmenting the title by using jieba, removing stop words, and outputting words; calculating word similarity based on a corpus, and calculating similarity between two words according to the number of different words by the corpus; firstly judging which layer of the two words used as leaf nodes in the corpus is different, multiplying 1 if the two words are the same, otherwise multiplying corresponding coefficients, and multiplying adjusting parameters and control parameters; the similarity of the word a and the word B = each layer coefficient × adjustment parameter × control parameter; wherein the coefficients of each layer are determined after multiple experiments; form subject relevance is equal to the average of the heading word similarity. For example, 'Ac05B 01=': mail difference, postman, messenger, courier, 'Ac05B02 =': communication staff, traffic, ac05b03=contact staff, contact person, contact officer. The synonym forest is selected as a corpus, and each Chinese word is assigned with a number, wherein the number comprises eight digits. The first bit represents the first layer, the second bit represents the second layer, the third and fourth bits represent the third layer, the fifth bit represents the fourth layer, the sixth and seventh bits represent the fifth layer, the eighth bit is that the word relation code is not layered, and different layers represent different categories to which the words belong. Each layer coefficient is denoted by abcde, assuming a=0.1, b=0.65, c=0.8, d=0.9, and e=0.96. The regulation parameter formula is cos (n.pi/180), and the control parameter formula is (n-k+1)/n. n is the total number of nodes of the branching layer, and k is the distance between two branches. Let the similarity of the mailer and the liaison be Sim (a, B), sim (a, B) =e.pi.cos (3.pi/180), (3-2+1)/3=0.64, where as can be seen from the coding, ac05 b01=and Ac05 b03=differ only in the seventh bit, the branching layer is the fifth layer, the coefficient is e, n=3 because there are three branches in the fifth layer in total, and the mailer is in the first branch, and the liaison is in the third branch in order k=3-1=2. If the form topics are "salary level of Shanghai programmer" and "salary level of Beijing programmer", the "Shanghai, programmer, salary level" and "Beijing programmer, salary level" are obtained after word segmentation and removal of stop words, and if the word similarity is 0.13, 1, the form topic relevance= (0.13+1+1)/3=0.71.

The list structure classification is based on a decision tree.

The method comprises the steps of obtaining a form set, and dividing the form set into two main types according to whether a numerical value input box is included or not: the method comprises the steps of inputting a numerical value into a form and inputting a non-numerical value into the form, and marking each form with a category label; training a decision tree by taking the form set with the label as a training set; extracting the list structure characteristics through the trained decision tree, and generating a decision tree for judging the list structure type; the decision process includes: firstly, respectively counting two major categories in a training set, and calculating the probability P and information entropy of each category; then inputting the characteristic value of the form to be classified into a decision tree, recalculating the information entropy once in each decision, and selecting the branch with the maximum information entropy increase as a decision result; the generated decision tree is classified using the following rules: if the < input > tag does not exist in the form, the non-numerical entry form is selected, if the < input > tag is contained, the constraint type of the form control is extracted, if the constraint type is numerical, the non-numerical entry form is selected, and otherwise, the non-numerical entry form is selected. For example, for the form the following features { < input >: yes, num: yes } { < input >: yes, num: yes } { < input >: yes, num: no } assuming that the probability of occurrence of two major categories of training set is 0.6 and 0.4, respectively, then initial h= - (0.6 log (0.6) +0.4 log (0.4))=0.29. The information entropy formula is h= - Σ (p×log (p)). The information entropy containing input is h1=p1×h=1×0.29=0.29 when the first rule is decided, and the information entropy containing no input is 0, so that the decision result of the first time is that containing input, and the constraint type is continuously judged. h2 If p2=h1=2/3×0.29=0.19 and h3=0.097, the second determination result is a numerical constraint and thus belongs to the numerical entry form.

And acquiring title attributes of the form controls and calculating the similarity of the form contents.

The similarity of the contents of the forms comprises judging the similarity of the contents of the forms according to the title attribute of the corresponding numerical input control in the forms; acquiring title attributes of corresponding numerical value input controls in a category numerical value input form as feature words; firstly, preprocessing is carried out, content in brackets in title attributes is removed, if a characteristic word is composed of a plurality of words, the jieba is used for word segmentation operation, and meaningless stop words are removed; then calculating the similarity of the contents of the form; the processed title attributes are used as feature vectors, and the similarity of the title attributes to be detected is calculated by using the feature vectors based on synonym forest; traversing title attributes of the corresponding numerical value input controls of the to-be-tested forms, calculating similarity between each title attribute and title attributes in the to-be-compared forms, and taking average similarity between the to-be-tested forms and all title attributes in the to-be-compared forms as two form content similarity values. For example, the form to be tested has two numerical input controls, and the corresponding title attributes are respectively: the month of the upper sea salary and age. The form to be compared has only one numerical value input control, and the corresponding title attribute is: age. The feature vectors obtained after the preprocessing are respectively: 1{ Shanghai, monthly { 2{ age }, 3{ age }. Let the result of similarity calculation based on synonym forest be as follows: vector 1 and vector 3 similarity=0, vector 2 and vector 3 similarity=1, then the two form content similarity values= (0+1)/2=0.5.

And 104, recommending numerical values according to the form relevance and the user similarity.

Acquiring historical filling data of a target user and a form set filled by all users; the user history filling data comprises a user filling form and user filling content; the same user fills in the form, recommends from the history filling data, and recommends according to the form association degree and the user similarity if the part of the filled form information of different users is the same; firstly, matching a currently filled form of a user in user history filling data, and recommending the history filling data closest to the current time to the user if the matching result is not null; if the matching result is null, firstly acquiring the association degree of the currently filled form of the user and the background form set, and taking the average value of the association degree as a first threshold; and then extracting a form with the association degree of the form filled in currently by the user being greater than a first threshold value as an association form, calculating the content similarity of the form filled in currently by the user and the association form, outputting a title attribute with the content similarity being greater than the average content similarity, acquiring a filled-in user set corresponding to the title attribute, and determining a recommended numerical sequence according to the user similarity. For example, if Zhang three fills out table 1 once in 10 month 1 and 10 month 2 respectively, then Zhang three fills out table 2 again in 10 month 3, and the same user fills out the table, the data filled out on 10 month 2 should be recommended to the same user. The Wangwu does not fill in the table 1, the table 2 and the table 3 are calculated to be related to the table 1, and title attribute company codes and post codes with content similarity larger than average content similarity are screened out. And further acquiring a user set filled with the two attributes: and if the user similar to the king five is the Li four through calculation, the input value set common to the king five and the Li four is used as the recommended value.

And determining a recommended numerical sequence according to the user similarity.

Obtaining user similarity, and selecting a user with the similarity rank K as a similar user set; wherein the K value is set according to the total amount of users; acquiring a target user input matrix and user input matrices of all similar users; and extracting a common input value set between each user in the target user and the similar user set as a recommended numerical value sequence. For example, the user entry matrix m= { (9,1,0) } for wang five, the similar user entry matrix is { (3,12,1) (12,3,1) }. Assume a total of 3 post code entry values, i1= 5499, i2=5999, i3=5200, respectively. Suppose that the user fills out the records for Zhang III and Li IV are respectively I1:3 times and I2:12 times, I3:1 time, 1:12 times, I2:3 times, I3:1 time. The similarity between wang wu and Zhang three is 1.168, and the similarity between Li four is 1.245. If k=1, then litetra is a similar user. Since the input value set common to royal and liqueur = { I1, I2}, 5499 and 5999 should be recommended values. The K value defines the number of similar user sets, and as the amount of data entered by each user is different, the obtained recommended input value sets are also different, and in order to facilitate the user to view and select the recommended numerical value, the K value should be set according to the total amount of users.

And 105, acquiring a title attribute of the form, and judging the master-slave relationship between the forms.

Respectively acquiring all title attributes of the forms to be judged and main keys or external keys of the forms; judging the master-slave relation between the forms according to the title attributes and the main keys or the external keys of the forms, wherein the master-slave relation comprises the steps of setting the form numbers as A and B respectively, and if the main keys or the external keys of the form B are contained in the title attribute set of the form A, the form B is a sub-form of the form A, and the form A is a father form of the form B; for example, form A is a product information table and form B is an order table. Since the order form uses the product number as an external key for obtaining the product information, the two product numbers are contained in the product information form. Thus, form A is a parent form and form B is a child form. The parent form is a master form and the child form is a slave form.

And step 106, acquiring the periodicity of the title attribute of the form and the master-slave relationship of the form, and judging whether the form to be tested and the master-slave table thereof need to be updated.

Acquiring title attributes of the value input controls in all the value input forms, marking whether the attributes belong to periodicity or aperiodicity, and finally marking whether each form belongs to a form needing to be updated or a form not needing to be updated; taking the marked data as a training set, and judging whether the form needs to be updated or not based on a naive Bayesian model; acquiring a currently filled form of a user, extracting title attributes of a numerical entry control, traversing the title attributes in a training set, calculating the similarity between the title attributes and the title attributes of the currently filled form of the user, and taking periodicity or non-periodicity corresponding to the title attribute with the highest similarity as a characteristic value of the currently filled form of the user; inputting the characteristic values of the form to be tested into a trained naive Bayesian model, and outputting the update category of the form to be tested; if the form to be tested belongs to the form to be updated, acquiring a master-slave relationship of the form to be tested; if the form to be tested is a parent form, the child form of the form to be tested is also marked as the form to be updated, and if the form to be tested is the child form, the parent form does not need to be marked. For example, the age belongs to non-periodicity, the selling price of the product belongs to periodicity, and if only one piece of periodicity data is contained in the form, the form should be marked as a form to be updated when the whole form is marked. In order to automatically judge whether the form needs to be updated, a data set needs to be manually marked to train a judging model. The current filled form of the user comprises age, product selling price and product cost, and the periodic judgment results corresponding to the three title attributes are obtained by calculating the similarity: aperiodic, periodic, i.e., form eigenvalues. Form feature values are input into a trained naive Bayesian model. The naive Bayes model realizes two classification functions, namely P (needing to update |non-period, period and period) and P (needing not to update |non-period, period and period), by calculating conditional probability, and the party with high probability is the judgment result. If the form to be tested is a product information form, and belongs to a form to be updated and is a father form of the order form, the order form also needs to be synchronously updated after the product information form is updated, so that whether the order form needs to be updated or not is synchronously judged when judging whether the product information form needs to be updated or not.

And step 107, after the form is updated, judging whether the recommended value is updated and acquiring an updating period.

Acquiring title attributes of a numerical entry control in a form to be updated, and judging whether the title attributes belong to periodicity or aperiodicity through similarity; if the title attribute is aperiodic, the recommended value is not updated; if the title attribute is periodic, the recommended value needs to be updated; acquiring user input data corresponding to the periodic title attribute, attaching a time stamp to each piece of data, and constructing a time sequence data set by adopting the user input data with the time stamp by taking days as a time unit; and acquiring the period of the time sequence data set by utilizing Fourier transformation, firstly calculating the frequency of each numerical value, then arranging according to the descending order of the frequency, and selecting the conversion with the highest frequency to be the update period. For example, two title attributes of the product number and the product selling price are obtained by calculating the similarity, the product number is aperiodic, the product selling price is periodic, and the recommended value corresponding to the product selling price needs to be updated. If the highest frequency of the time series data set corresponding to the selling price of the product after fourier transformation is 0.035, the period=1/0.035=28.57 (days). The calculation formula of the update period is period=1/frequency.

Step 108, obtaining user operation behaviors, judging numerical accuracy and estimating recommended numerical error risk level.

After the recommended value is obtained, the operation behavior data of the user are respectively counted, the total recommended times and the times of the recommended value selected by the user are counted, and the value accuracy is calculated; then estimating the risk level of the recommended numerical error; acquiring all title attributes, eliminating data if the title attributes are non-periodic, and then extracting period; and determining three equal division points according to the period sizes of all title attributes, and dividing all title attributes into three recommended numerical error risk levels, wherein the recommended numerical error risk of the level one is highest, and the period is minimum. For example, numerical accuracy = Σ (number of times the user has selected the recommended numerical value/total recommended number of times)/total number of users. The system recommends 4 times in total, user 1 has selected 2 times, user 2 has selected 3 times, and then the numerical accuracy= (2/4+3/4)/2=0.625. If the title attribute includes age, product selling price and product cost, the age attribute is removed because the age is non-periodic, the product selling price and product cost period is 28.57 and 21 respectively, the three-point is (28.57+21)/3= 16.52 and (28.57+21) x 2/3=33.04 respectively, and therefore the product selling price and the product cost belong to class two. Σ (number of times the user has selected the recommended value/recommended total number of times) represents summing all users. Because of errors in numerical value recommendation, whether the recommended numerical value is accurate or not needs to be estimated through the operation behaviors of the user, and if the recommended numerical value is not aligned, the recommendation mode needs to be changed.

And step 109, re-recommending the numerical value according to the numerical accuracy and the recommended numerical error risk level.

Acquiring the numerical accuracy and the risk level of title attributes; judging whether the numerical accuracy is smaller than a second threshold, if the numerical accuracy is larger than or equal to the second threshold, not operating, otherwise judging the risk level of the title attribute; if the risk level of the title attribute is level one or level two, pushing a prompt box to the user, prompting the user to consider the recommendable numerical value, outputting the numerical value accuracy, and then recommendable the numerical value; re-recommending the numerical value for the user by adopting a Markov model; firstly, acquiring data of a user history filling form, generating a user input matrix, and further determining a one-step transition probability matrix; solving an n-step transition probability matrix, and calculating a numerical value possibly input by a user after n steps; wherein n is determined according to title period; the predicted values are re-recommended to the user. For example, if the risk level of the title attribute corresponding to the form control being entered by the user is two and the numerical accuracy is 0.625, the user is prompted to consider recommencing the numerical value, output the numerical accuracy, and recommends the numerical value. The markov model is a method for predicting a future time varying state from an event current state, and is a basic method for prediction. Since the recommended values are not accurate enough with the existing methods, a recommendation method needs to be changed. A second threshold of 90% is typically used in the recommendation algorithm to indicate a good recommendation accuracy.

The foregoing is merely illustrative of some preferred embodiments of the present invention, but the invention is not limited thereto and many modifications and variations are possible. Any modifications or variations which are based on the basic principles of the present invention should be considered as falling within the scope of the present invention.

Claims

1. A method for recommending numerical values in form linkage changes, the method comprising:

acquiring form data input by a user, and constructing a user input matrix;

performing user similarity recognition according to the user input matrix;

judging the form relevance according to the form theme relevance, the form structure and the form content, and specifically comprising the following steps: calculating the topic relevance of the form, obtaining title attributes of the form controls based on the form structure classification of the decision tree, and calculating the similarity of the form contents;

the calculating the topic relevance of the form comprises determining the topic of the form through the title of the form, taking the topic similarity as a form topic similarity value, and specifically comprises the following steps: acquiring a title of a form, segmenting the title by using jieba, removing stop words, and outputting words; calculating word similarity based on a corpus, and calculating similarity between two words according to the number of different words by the corpus; firstly judging which layer of the two words used as leaf nodes in the corpus is different, multiplying 1 if the two words are the same, otherwise multiplying corresponding coefficients, and multiplying adjusting parameters and control parameters; the similarity of the word a and the word B = each layer coefficient × adjustment parameter × control parameter; wherein the coefficients of each layer are determined after multiple experiments; form topic relevance is equal to the average of the topic word similarity; the list structure classification based on the decision tree comprises the steps of judging whether a list belongs to a numerical value input list, if so, marking 1, otherwise, marking 0; the step of calculating the similarity of the contents of the forms comprises judging the similarity of the contents of the forms according to the title attribute of the corresponding numerical input control in the forms; respectively giving corresponding weights to the three results, and recording the weights as w1, w2 and w3, wherein the degree of form association=w1×form subject similarity+w2×form structure classification result+w3×form content similarity; wherein three weights are required to be modified through repeated testing;

According to the form association degree and the user similarity recommendation value, the method specifically comprises the following steps: acquiring historical filling data of a target user and a form set filled by all users; the user history filling data comprises a user filling form and user filling content; the same user fills in the form, recommends from the history filling data, and recommends according to the form association degree and the user similarity if the part of the filled form information of different users is the same; firstly, matching a currently filled form of a user in user history filling data, and recommending the history filling data closest to the current time to the user if the matching result is not null; if the matching result is null, firstly acquiring the association degree of the currently filled form of the user and the background form set, and taking the average value of the association degree as a first threshold; then, extracting a form with the association degree of the form filled in currently by the user being greater than a first threshold value as an association form, calculating the content similarity of the form filled in currently by the user and the association form, outputting title attributes with the content similarity being greater than the average content similarity, acquiring a filled-in user set corresponding to the title attributes, and determining a recommended numerical sequence according to the user similarity;

Acquiring a title attribute of a form, and judging a master-slave relationship between the forms;

acquiring the periodicity of the title attribute of the form and the master-slave relation of the form, and judging whether the form to be tested and the master-slave form thereof need to be updated or not;

after the form is updated, judging whether the recommended value is updated or not and acquiring an updating period;

the method comprises the steps of obtaining user operation behaviors, judging numerical accuracy and estimating recommended numerical error risk levels, and specifically comprises the following steps: after the recommended value is obtained, the operation behavior data of the user are respectively counted, the total recommended times and the times of the recommended value selected by the user are counted, and the value accuracy is calculated; then estimating the risk level of the recommended numerical error; acquiring all title attributes, eliminating data if the title attributes are non-periodic, and then extracting period; determining three equally dividing points according to the period sizes of all title attributes, dividing all title attributes into three recommended numerical error risk levels, wherein the recommended numerical error risk of level one is highest, and the period is minimum;

re-recommending the numerical value according to the numerical accuracy and the recommended numerical value error risk level, and specifically comprises the following steps: acquiring the numerical accuracy and the risk level of title attributes; judging whether the numerical accuracy is smaller than a second threshold, if the numerical accuracy is larger than or equal to the second threshold, not operating, otherwise judging the risk level of the title attribute; if the risk level of the title attribute is level one or level two, pushing a prompt box to the user, prompting the user to consider the recommendable numerical value, outputting the numerical value accuracy, and then recommendable the numerical value; re-recommending the numerical value for the user by adopting a Markov model; firstly, acquiring data of a user history filling form, generating a user input matrix, and further determining a one-step transition probability matrix; solving an n-step transition probability matrix, and calculating a numerical value possibly input by a user after n steps; wherein n is determined according to title period; the predicted values are re-recommended to the user.

2. The method of claim 1, wherein the obtaining user entered form data, constructing a user entered matrix, comprises:

acquiring all contents input by a user in a form;

constructing a user input model, wherein the user input model comprises a user input matrix M; the user input matrix M is a matrix describing the occurrence times of all input values when the user fills in the form, U is a user set, I is the input values in the form, and Sij is the times when the user fills in the form.

3. The method of claim 1, wherein the user similarity identification from a user entry matrix comprises:

acquiring a history filling record of a target user, and converting the history filling record into a user input matrix M of the target user;

dynamically acquiring a similarity set S of a target user Ua and a user set U through a Pearson correlation coefficient method on the basis of constructing a user input matrix M; the similarity set S is a set of similarity values of history filling records of each user in the target user Ua and the user set U, and the method is as follows: traversing the user set U, respectively calculating similarity values of the target user Ua and the user set U, and representing the similarity values by the set S;

The similarity value elements Sm in the set S are arranged in the order from big to small, the user similarity is higher as the similarity value elements Sm are larger, and the user similarity is smaller as the similarity value elements Sm are smaller.

4. The method of claim 1, further comprising:

the method comprises the steps of obtaining a form set, and dividing the form set into two main types according to whether a numerical value input box is included or not: the method comprises the steps of inputting a numerical value into a form and inputting a non-numerical value into the form, and marking each form with a category label;

training a decision tree by taking the form set with the label as a training set;

extracting the list structure characteristics through the trained decision tree, and generating a decision tree for judging the list structure type; wherein the decision process comprises: firstly, respectively counting two major categories in a training set, and calculating the probability P and information entropy of each category;

then inputting the characteristic value of the form to be classified into a decision tree, recalculating the information entropy once in each decision, and selecting the branch with the maximum information entropy increase as a decision result;

the generated decision tree is classified using the following rules: if the < input > tag does not exist in the form, the non-numerical value input form is selected, if the < input > tag is contained, the constraint type of the form control is extracted, if the constraint type is numerical, the form is a numerical value input form, otherwise, the form belongs to the non-numerical value input form;

Further, the obtaining the title attribute of the form control and calculating the similarity of the form content specifically includes:

the similarity of the contents of the forms comprises judging the similarity of the contents of the forms according to the title attribute of the corresponding numerical input control in the forms;

acquiring title attributes of corresponding numerical value input controls in a category numerical value input form as feature words;

firstly, preprocessing is carried out, content in brackets in title attributes is removed, if a characteristic word is composed of a plurality of words, the jieba is used for word segmentation operation, and meaningless stop words are removed;

then calculating the similarity of the contents of the form; the processed title attributes are used as feature vectors, and the similarity of the title attributes to be detected is calculated by using the feature vectors based on synonym forest;

traversing title attributes of the corresponding numerical value input controls of the to-be-tested forms, calculating similarity between each title attribute and title attributes in the to-be-compared forms, and taking average similarity between the to-be-tested forms and all title attributes in the to-be-compared forms as two form content similarity values.

5. The method of claim 1, wherein the determining a recommended numerical sequence according to user similarity specifically comprises:

obtaining user similarity, and selecting a user with the similarity rank K as a similar user set; wherein the K value is set according to the total amount of users;

Acquiring a target user input matrix and user input matrices of all similar users;

and extracting a common input value set between each user in the target user and the similar user set as a recommended numerical value sequence.

6. The method of claim 1, wherein the obtaining the title attribute of the form, and determining the master-slave relationship between the forms, comprises:

respectively acquiring all title attributes of the forms to be judged and main keys or external keys of the forms;

judging the master-slave relation between the forms according to the title attributes and the main keys or the external keys of the forms, wherein the master-slave relation comprises the step of setting the form numbers as A and B respectively, and if the main keys or the external keys of the form B are contained in the title attribute set of the form A, the form B is a sub-form of the form A, and the form A is a father form of the form B.

7. The method of claim 1, wherein the obtaining the periodicity of the title attribute of the form and the master-slave relationship of the form, and determining whether the form to be tested and the master-slave table thereof need updating comprises:

acquiring title attributes of the value input controls in all the value input forms, marking whether the attributes belong to periodicity or aperiodicity, and finally marking whether each form belongs to a form needing to be updated or a form not needing to be updated;

Taking the marked data as a training set, and judging whether the form needs to be updated or not based on a naive Bayesian model;

acquiring a currently filled form of a user, extracting title attributes of a numerical entry control, traversing the title attributes in a training set, calculating the similarity between the title attributes and the title attributes of the currently filled form of the user, and taking periodicity or non-periodicity corresponding to the title attribute with the highest similarity as a characteristic value of the currently filled form of the user;

inputting the characteristic values of the form to be tested into a trained naive Bayesian model, and outputting the update category of the form to be tested;

if the form to be tested belongs to the form to be updated, acquiring a master-slave relationship of the form to be tested; if the form to be tested is a parent form, the child form of the form to be tested is also marked as the form to be updated, and if the form to be tested is the child form, the parent form does not need to be marked.

8. The method of claim 1, wherein after the form is updated, determining whether the recommended value is updated and acquiring an update period comprises:

acquiring title attributes of a numerical entry control in a form to be updated, and judging whether the title attributes belong to periodicity or aperiodicity through similarity; if the title attribute is aperiodic, the recommended value is not updated; if the title attribute is periodic, the recommended value needs to be updated;

Acquiring user input data corresponding to the periodic title attribute, attaching a time stamp to each piece of data, and constructing a time sequence data set by adopting the user input data with the time stamp by taking days as a time unit;

and acquiring the period of the time sequence data set by utilizing Fourier transformation, firstly calculating the frequency of each numerical value, then arranging according to the descending order of the frequency, and selecting the conversion with the highest frequency to be the update period.