CN110096710B - Article analysis and self-demonstration method - Google Patents

Article analysis and self-demonstration method Download PDF

Info

Publication number
CN110096710B
CN110096710B CN201910382217.6A CN201910382217A CN110096710B CN 110096710 B CN110096710 B CN 110096710B CN 201910382217 A CN201910382217 A CN 201910382217A CN 110096710 B CN110096710 B CN 110096710B
Authority
CN
China
Prior art keywords
article
database
central
verified
articles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910382217.6A
Other languages
Chinese (zh)
Other versions
CN110096710A (en
Inventor
董云鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910382217.6A priority Critical patent/CN110096710B/en
Publication of CN110096710A publication Critical patent/CN110096710A/en
Application granted granted Critical
Publication of CN110096710B publication Critical patent/CN110096710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The invention discloses an article analysis and self-demonstration method, and relates to the field of intellectual property rights. On the basis of establishing a database, verifying articles of different styles from the aspects of functions and structures; through classifying the articles, the boundary of the 'spirit, practice, spirit and practice combination' and 'literature, philosophy and science' is determined; extracting and verifying the effective overall center ideas of the article to be verified; judging the article type through the calculation and analysis of the article to be verified and the database and the article to be verified and the common knowledge; inputting an article to be verified through an input interface, verifying contribution of the article, and outputting an effective article, connotation and extension of the effective article through an output interface; the reliability is determined and the accuracy is improved; automatically publishing the article; and a self-demonstration method combining theory and practice. The method can improve the efficiency of writing and publishing articles, increase the accuracy and precision of the articles, reduce the examination load, and automatically and spirally demonstrate and guide the actual work.

Description

Article analysis and self-demonstration method
Technical Field
The invention relates to the field of intellectual property, in particular to an article analysis and self-demonstration method.
Background
With the continuous development of the scientific and technical level and the increasing importance of various organizations on academic knowledge, a batch of scholars and researchers are promoted to convert own scientific achievements into articles, but the infringement cost is low and the right maintenance cost is high due to the difficulty in intellectual property identification, so that the enthusiasm of the researchers and the scholars is greatly stricken. Some novels and literature enthusiasts are not reasonable in conception, so that the high-level article … … cannot be written out from the viewpoint and the literary collection again, and a large number of articles are produced due to indexes or project problems. Therefore, during writing, the data of the paper is subjected to a large amount of manual processing, namely, the argumentation and the data are manually compiled to reach a set target value, and a pseudo-scientific paper appears.
If the protection is signed, the collection of the protection intellectual property evidence and the logic relation thereof are not clear at present; the standardization of writing and the teaching of guidelines lead to good articles that cannot stand out; and because the written thesis is large in space, the verification process of related personnel is long, the cost is high, the efficiency is low, the intellectual property right infringement judgment is difficult, the judgment mobility of literature or writing is large, the randomness is strong, and the thesis examination basically stays in the form. Eventually leading to the occurrence of or a large number of infringement acts; or good works cannot come out; or the false data is full of the papers, so that the produced papers can only stay at the theoretical level and cannot be directly applied to practice. In order to avoid the protection of intellectual property rights, the need of writing and the identification problem of counterfeit academic papers, a set of method for analyzing and self-demonstrating articles is urgently needed.
Disclosure of Invention
The invention aims to provide a method for analyzing and self-demonstrating written articles, which solves the problems of correlation analysis of the prior intellectual property, literary acquisition caused by format problems, difficult release of valuable articles and difficult manual examination and demonstration of thesis caused by overlong space and fake data.
In order to solve the technical problem, the invention adopts the following technical scheme: a method for article analysis and self-demonstration is characterized in that: the method comprises the following steps:
1) Establishing and applying a target database: the cell symbol and format are determined by frequency of occurrence, and the target database is built by automatically performing induction by means of hyperlink or fuzzy query according to the format and cell symbol, it should be understood that the means of "hyperlink" and "fuzzy query" described herein is only used for explaining the present invention and is not used for limiting the present invention. Managing a target database through a database management system; and extracting the same factors through the target database to establish an interface database. The data are divided into narrow and broad meanings, and the data in the narrow sense refer to specific sample number, such as 67 patients with severe infantile acute laryngitis and 32 patients in a treatment group. Data in the broad sense is both representations and carriers of information, which may be in the form of symbols, words, numbers, speech, images, video, etc. Data and information are inseparable, data being the representation of information, information being the connotation of data. Data itself has no meaning, and data only becomes information when it affects the behavior of an entity. The data may be continuous values, such as sound, images, called analog data. Or discrete, such as symbols, text, referred to as digital data.
And finding a target article with a similar theme to the article content to be verified in a target database according to the article content to be verified, and statistically analyzing the most used format of the target article, wherein an input interface is formed by adopting the format and can be extracted from the interface database.
Wherein, the unit symbol is one of data, words, sentences, paragraphs or modules; if the individual data are unit symbols, determining a continuous and complete population according to the data distribution condition; if the word is marked as a unit symbol, a complete sentence is determined by combining a sentence with a main/predicate/object structure extracted according to the sentence numbers, exclamation marks, question marks and ellipsis punctuation sentences; if the sentence is marked as a unit symbol, determining a complete paragraph according to the natural segmentation and the form induced by the target database; if the paragraph is marked as a unit symbol, determining a module according to a form summarized by the natural classification and the target database; if the module is marked as a unit symbol, determining the article according to the same or similar articles summarized by the article and the target database; the structure of a subject/predicate/object is a structure including a subject, a predicate/predicate, and an object/subject-predicate object.
Data acquisition is carried out through an input interface: and in the data input interface, inputting the article to be verified according to the input interface prompt and the format.
2) Analyzing the article:
A. and (3) obtaining article concentration degree: extracting key words/sentences, paragraphs, modules and central ideas of the whole article to be verified from the perspective of combining the structure and the function; from the angle of maximum unit symbol probability density, the central thought embodied after combination is taken as a function, and the mean value of the data samples is taken as a structure, and the analysis and the calculation are carried out layer by layer.
B. And (3) calculating the article discrete degree: analyzing sentence composition in each paragraph in the article from the perspective of combining structure and function; from the perspective of the unit symbol probability distribution range, analysis and calculation are respectively carried out layer by taking the background reflected from combination as a function and taking the variance or the order of data samples as a structure.
C. And (3) relevance verification of the article to be verified: if the variance is uniform, determining a set formed by the unit symbols and the symbols, and analyzing the correlation between the unit symbols and the whole layer by layer from two angles of functions and structures by utilizing the characteristics and the similarity of the unit symbols and the set; wherein functionality is analyzed from the belonging and inclusion of sufficient requirements; the structural property is that the correlation is analyzed from t test; meanwhile, evaluating a required sample size range to determine the testing reliability of the sample size range; if the variance is not uniform, the process is performed by using a rank sum test.
3) Verifying layer by layer: and respectively verifying the relevance of the data, words, sentences, paragraphs and modules and the article to be verified and the target article.
4) Classifying the articles: determining clues of genre, sequence and property through the hyperlink and the database according to the established or existing clues of the axis of the database; the specific steps are that the fixed structure and the sequence extracted from the target database are used for inspection to determine the genre and the axis clue, the genre and the axis clue are classified to form an interface database, and the interface database is output to an output interface.
The author only needs to input the content of the author by adding, deleting and modifying according to the type and the format, so that the article to be verified is formed, and the article to be verified is automatically classified into an article at a mental level, an article combining the mental level and practice and an article at a practice level according to a certain level sequence.
5) Outputting an analysis result: outputting the full text of the effective article, and simultaneously outputting each paragraph, each module and the idea of the article center; an article background; the paper data adopted; then, the genre property and classification thereof are output.
The further technical scheme is that the chapter analysis in the step 2) comprises the following specific steps:
A. finding article concentration
Functional concentration ratio:
paragraph center thought: the background is a necessary condition of the central idea, namely if the full text background does not exist, the central idea is not necessary; otherwise if there is full text background, it is not necessaryThere is a central idea. Marking words as unit symbols, extracting nouns n and verbs v with the most occurrence times in the unit symbols of the kth paragraph, and combining the occurrence times of synonyms and synonyms when the times are counted, wherein the nouns and the verbs are combined together in a database of a slave dictionary and a database of nouns and verbs determined by the position of a 'principal and predicate object' established in the text, and the nouns and the verbs are verified mutually by taking the latter as a guide to be determined; combining noun and verb v to form search element S (n, v), counting the times of appearance of search element S (n, v) in each section to obtain search element S with maximum appearance times max (n, v) a sentence having a main/predicate/object structure in which a noun n and a verb v are combined is marked as a central concept, and S appears max (n, v) the paragraph with the most times is marked as a key paragraph; adjectives or adverbs expressed by 'di, and get' supplement and complete the idea of the paragraph center according to the main/predicate/object structure of a sentence.
The idea of the module center is as follows: dividing the article to be verified into different modules by the counted format, extracting the noun n and the verb v with the largest occurrence frequency in the kth module, and combining the occurrence frequencies of synonyms and synonyms in the frequency statistics; marking the sentences containing nouns n and verbs v combined into a main/predicate structure as central sentences, and marking the paragraphs with the most central sentence occurrence frequency as key paragraphs; adding adjacent words for modification, and finding out an original sentence from the article and recording the original sentence as a central thought; if the central sentence has more data, the original sentence can not be found in the article, and the original sentence is finally output to the background for the author to summarize.
The central idea of the article is as follows: the noun n with the most appeared full text unit symbols is counted max And verb v max Marking as a central unit symbol, and obtaining a sentence with the highest frequency of occurrence after the central unit symbol is combined as an article central sentence; the module with the most times of the central sentence of the article is marked as a key module; modifying the central sentence and the added adjacent words, and finding out an original sentence from the module to be marked as a central idea; the central idea in the key module is the central idea of the article.
Structural concentration ratio:
concentration of paragraphs: counting the number m of occurrences of each word in the paragraph y How many unit symbol types y, total number of unit symbols M, M ymax The value for which the number of occurrences of a word is the greatest,
Figure BDA0002053706690000031
is the average of the number of occurrences of a word,
Figure BDA0002053706690000041
respectively represent the number of occurrences of
Figure BDA0002053706690000042
Two words of, and the number of occurrences is m ymax The combination of the words is the distribution state of the center of the paragraph, that is, the range of the distribution of the "central thought", and this range is the concentration of the paragraph. Wherein the content of the first and second substances,
Figure BDA0002053706690000043
it is the average distribution range of each unit symbol in the article that meets the standard deviation definition, so two are used here
Figure BDA0002053706690000044
Just as with μ ± σ, this is the position containing this number of words and does not mean that the words are identical words, i.e., there are keywords or subject words between the words that appear in pairs. The central idea can then be looked for from this range. M from the perspective of probability density ymax Is a word at mean; average each type of unit symbol as
Figure BDA0002053706690000045
Is disclosed in
Figure BDA0002053706690000046
The words of a location are often presented in pairs,
Figure BDA0002053706690000047
words within the range are distributed most centrally; the step of module concentration and article concentration and the step of paragraph concentration are completedAll the same.
B. And (3) solving the paper discrete degree, namely analyzing sentence composition in each paragraph in the article:
functional dispersion: the discrete problem is a concept of convergence correspondence, and it is known that when A is established, B is certainly established, B is not necessarily established, and A is a sufficient condition of B; if A can deduce B, then A is a sufficient condition for B. In one context, "clues" and "talking about" can be inferred from "central ideas," it being understood that "clues" and "talking about" are described herein for purposes of explaining the present invention only and are not intended to limit the present invention. The discrete problem is the individual "data" and "clue". The dispersion is a sufficient condition existing in a concentration, the concentration is a central idea, and the dispersion is a composition component of the whole article; the central idea of concentration is composed of "central unit symbols"; therefore, the dispersion is all sentences formed by the unit symbols, and the sentences form a database; therefore, the potential structure of the sentence is added in the design, and the sentence is found from the article or the database for matching; the inclusion and included relationships are applied to make inferences, and if correct, its "dispersion" can be derived.
Structural dispersion: counting the number m of occurrences of each unit symbol in the paragraph y The total number of unit symbol types y, the total number of unit symbols M, and the average of each type of unit symbols is
Figure BDA0002053706690000048
A plurality of; the above represents a probability density threshold of about 68.27% of the distribution. Calculating the overall average
Figure BDA0002053706690000049
Wherein x 1 ,x 2 、…x n N is the number of data for each data; variance (variance)
Figure BDA00020537066900000410
If the unit symbol is a word, how many words of the sentence are formed is the variance of the word; if the unit symbol is a sentence, the variance is the number of sentences that the paragraph is composed ofSex, and so on. The module and article dispersion and the database dispersion are the same as the above steps.
C. And (3) relevance verification of the article to be verified:
functionally, the central cell symbol is found from the database. From the article itself, this central cell symbol is either a "subject word", or a "central thought", or a "key paragraph", or a "key module"; from the database, the central unit symbol is a 'core article', and the central unit symbol is progressive layer by layer; then, the determined central thought and background and the explanation of the central thought are found out through sufficient and necessary conditions; the full text background is a necessary condition of "central unit symbol", i.e. if there is no object case a, there is certainly no object case B; if there is an object case a but not necessarily an object case B, here B is the background. Therefore, the same 'background' condition is selected from the function, the 'central unit symbol' comparison is carried out, whether the two have correlation can be obtained by analyzing the relation between the contained and the contained data base, and then the comprehensive judgment is carried out by combining the structural correlation analysis, from another point of view, if A is a proper subset of B, A is the 'central unit symbol', B is the dispersion (A is the sufficient condition of B), wherein B is the argument and material for proving, explaining and explaining A; it should be understood that the statements and materials set forth herein are intended to be illustrative of the present invention and are not intended to limit the present invention. It is a sufficient requirement if both are met.
Structurally, the number of occurrences m of a cell symbol y The total number of unit symbol types y, the total number of unit symbols M, and the average of each type of unit symbols is
Figure BDA0002053706690000051
If the variance is uniform, the words for sentences are approximately same or the paragraphs for sentences are approximately same, and so on, the 'unit symbols' with the most matching times are counted and combined into the 'central idea' for comparison, i.e. the t test can obtain that each unit symbol is matched with the whole sentence and the whole natural segmentWhether the structures of the whole module and the whole article are related or not; the variance is irregular, so the rank sum test is adopted;
in the testing process, if the variances are uniform, the alpha and the beta are generally set to be 80% according to the actual situation; the sample size can be calculated by 80%, and when the sample size is reached, the result is obtained after the hypothesis test is automatically carried out; wherein, the formula for solving the sample size is as follows:
Figure BDA0002053706690000052
or
Figure BDA0002053706690000053
Wherein σ is the total standard deviation; δ is the difference in the overall parameter, in the two-mean Z test, noted as δ = μ 12 Mu is the overall average, and the larger delta is, the more likely the two samples with larger difference in average are obtained in the sampling; wherein the content of the first and second substances,
Figure BDA0002053706690000054
the former is a normal distribution formula, the latter is a binomial distribution formula, wherein pi is the probability of the occurrence of the result A in each Bernoulli test; the two sets of sample sizes obtained from 68.27% and 80% are used as a boundary against which known sample sizes are compared to determine the confidence level of the correlation.
The further technical scheme is that the step 3) layer-by-layer verification comprises the following specific steps:
according to the central thought of the article, searching is carried out in a database, similar or same articles are classified and extracted as target articles, if the variances are uniform, the target articles are subjected to chi-square test and t test or z test and chi test according to a certain axis clue 2 Continuously correcting the statistics to determine whether the article to be verified is related to the target article; specifically, the axis clue is a certain sequence of the central unit symbols, or a time sequence, or a magnitude sequence, or a speed sequence; it should be understood that the time sequence, the size sequence, and the fast and slow sequence described herein are only used for explaining the present invention and are not used for limiting the present invention. Further, with "centerThe unit symbol' is a base stone, the database is used as a basis for testing, T test or rank sum test is carried out on S (n, v) of the article to be verified and the target article S1 (n, v), and the similarity between the article to be verified and the target article is analyzed; if so, analyzing and judging whether the article to be verified is an invalid/valid article according to the contribution amount of the function and the structure.
Specifically, if the variances are not uniform, after sorting, performing rank sum test; structurally, whether each unit symbol is related to the structure of the whole sentence, the whole natural segment, the whole module and the whole article can be known through rank sum test; if relevant, the description is the same subject, and the structure is reasonable; further, if the structure is relevant and the function is a sufficient condition, prompting that the article to be verified is an invalid article; if the functional relevance is sufficient or necessary or structural relevance is only one item and not all items are available, the article is valid; if the structure and the function are different, the article belongs to an invalid article in different fields.
The further technical scheme is that the step 4) of article classification comprises the following specific steps: after automatic analysis, according to the spirit level of mysterization, fairy tale and science fiction, as long as the idea is good and the literary is good, the logical argument is not needed to be directly output, the analysis result is a spirit level article, after output, a posting website is extracted, and the article automatically posts a corresponding website.
And then according to the fact that the literature is derived from the common general knowledge and is sublimated into a spirit; the philosophy comes from the spirit and is restored to the principle of common knowledge, and a 'mental level database, fairy tales, science fiction and the like' and a 'common knowledge database' are established to form a comparison database; determining the category of literature and philosophy according to the cause-effect sequence of the comparison database; the analysis result is an article which is from life service to spirit or from spirit service to life level, and after the article is output, a contribution website is extracted for automatic contribution; the scientific principle is derived from the common knowledge and is an unlimited argument and application of the common knowledge; the final output interface outputs the effective article according to the format, and synchronously outputs the target article with the same theme as the effective article.
A further technical scheme is scientific category verification, and the classification of articles in the step 4) by applying matrix calculation to the discussion papers is specifically as follows:
firstly, decomposing a sample into units, forming corresponding data into a matrix by taking a structure as a model and taking a common knowledge database as a basis; the comparison paper and the paper to be verified are verified for the verification of philosophy/literature, scientificity and irrelevance.
The method comprises the following steps:
a. and establishing a common knowledge base, wherein the common knowledge base refers to common knowledge which is already demonstrated by modern natural science, and is marked as G _ ID on the basis of the teaching materials of 'counting, theory and chemistry'. Meanwhile, a database of mental levels such as myth, fairy tales, science fiction and the like is established.
b. Comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the related common sense G _ ID one by one to form a matrix K, and marking as negative correlation and marking as K (i, j) = -1 when G (i, j) is compared with the G _ ID and is opposite to the corresponding common sense; when G (i, j) is compared with G _ ID, no corresponding common sense support is found, and it is marked as irrelevant, and it is marked as k (i, j) =0; when G (i, j) is compared with G _ ID, finding the corresponding common sense support, then marking as positive correlation, and marking as k (i, j) =1;
matrix K = [ K (1,1) … … K (i, j) … … K (l, m) ];
c. the jth statement G in the ith section of the target paper 1 (i, j) is compared with the related common knowledge G _ ID one by one to obtain a matrix K 1 Matrix K 1 =[k 1 (1,1)……k 1 (i,j)……k 1 (l,m)];
d. The matrix K is compared with the matrix K 1 Multiplying by two and summing Σ k (i, j) × k 1 (i, j) obtaining a value, and if the value is a positive value, positively correlating the value with the value; if the value is negative, the two are negatively correlated; when the value is 0, the two are not related;
e. after selecting positive correlation, negative correlation and irrelevancy in d, the sum is then used to sum up ∑ k (i, j) + k 1 Obtaining a value after the (i, j), and when the two are the same in sign, obtaining that a positive value is larger than the matrix K, and the two are in positive correlation and are scientific; when the signs of the two are the same, a positive value is obtained and is approximately equal to a matrix K, the article to be verified is irrelevant, and a large amount of irrelevant content is contained in the article to be verified, so that a philosophical/literary trend is prompted, and the philosophy/literary trend is defined as 'imagination' and the practical value of the philosophy/literary trend is expressed; when the signs of the two are opposite, a positive value is obtained, so that the positive value has certain philosophical/literary guidance; if the value is negative or zero, the two are negatively correlated, which is 'pseudo-scientificity', and belongs to the mental level; then, selecting a relatively valuable paper, namely judging the possibility of being a relatively scientific article or a philosophical/literary article after the positive and negative of the symbol and the positive and negative correlation between the article to be detected and the comparison article are obtained;
f. comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the common sense G _ ID one by one, and judging that the J-th sentence is a scientific article if less than 3 common sense supports are found according to a scientific basis and a simplification principle applicable to 'axiom' after the G (i, j) and the G _ ID are compared;
g. similarly, the method a, b, c, d and e can also be used for judging the articles of the mental level such as myth, fairy tale, science fiction and the like to determine that the philosophy/literature is the article combining the mental level and practice, and specifically, the philosophy article or the scientific article is determined according to the cause and effect relationship of the mental level which appears first or the common general knowledge in the axis clue of the articles, wherein the scientific article is the article of practice.
A further technical scheme is that in the step 3), the thesis to be verified is output after overcoming the problem of multiple correlations, specifically, scene application is performed according to the statistical analysis result, and the one-sided problem caused by the cross problem of multiple correlations is eliminated. Further by multiple correlations
Figure BDA0002053706690000071
A linear evaluation was performed.
Compared with the prior art, the invention has the beneficial effects that: the method for analyzing the article and self-demonstrating the article is provided, and a certain protection effect is achieved on intellectual property rights. On the basis of the existing database establishment, articles of different styles are verified from the aspects of functions and structures; not only can play a role of getting twice the result with half the effort for the owner of the intellectual property; meanwhile, workload of the intellectual property right auditing teachers is reduced, and auditing efficiency is improved. And the boundary lines of literature, philosophy and science are determined by classifying the articles; extracting and verifying each paragraph, each module and each full-text center thought of the article to be verified through the numeralization and physicochemical data and analysis; meanwhile, the article to be verified and the target article are subjected to correlation verification, and the similarity between the article to be verified and the target article is judged; judging the article type through the calculation analysis of the article to be verified and the common knowledge; and inputting the article to be verified through an input interface, and outputting the effectiveness of the article, the central idea of the article, the background of the article and a correlation analysis result through an output interface. The method can improve the efficiency of writing the article, increase the accuracy and the precision of the article, reduce unnecessary demonstration work, improve the quality of the article, and automatically and gradually demonstrate and guide the actual work in a spiral way.
Appendix diagram
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a functional diagram of some modules of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the attached drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A method for article analysis and self-demonstration is characterized in that: comprises the following steps of (a) carrying out,
1) Database establishment and application, including: determining a unit symbol and a format through the occurrence frequency, and automatically inducing in a hyperlink or fuzzy search mode according to the format and the unit symbol to establish a target database; managing the database through a database management system, and extracting the same factors through a target database to establish an interface database; finding out articles with similar subjects in a target database according to the contents of the articles to be verified in an interface database, and statistically analyzing the most used formats of the similar articles; and forming an interface in this format; wherein, the definition of the unit symbol is: if the individual data are unit symbols, determining a continuous and complete population according to the data distribution condition; if the word is marked as a unit symbol, the complete sentence is broken into sentences according to a period number, an exclamation point, a question mark and an ellipsis, and if the sentence is marked as the unit symbol, the complete paragraph is determined according to the natural segmentation and the form induced by the database; if the paragraph is marked as a unit symbol, determining a module according to the natural classification and the form summarized by the database; similarly, if the module is marked as a unit symbol, the article is determined according to the article itself and the same or similar articles summarized in the database. Wherein, the 'subject-predicate structure' refers to a structure containing a subject, a predicate/predicate and an object/subject-predicate; data acquisition is then performed through this interface: and in the data input interface, inputting the article to be verified according to the input interface prompt and the format.
2) Analyzing the article:
A. and (3) solving the article concentration degree, namely extracting key data, words, sentences, paragraphs, modules and the central thought of the whole article from the perspective of combining the structure and the function: because each set is composed of unit symbols, from the perspective that the probability density of the unit symbols is the maximum in the 'set', the method analyzes and calculates layer by layer from the two aspects of the concentration taking the central thought shown after combination as the function and the concentration taking the data sample mean value as the structure.
B. Calculating the article discrete degree, namely analyzing the sentence composition in each paragraph in the article from the angle of combining the structure and the function: from the angle that the 'set' is the maximum probability density on the mean value position formed by the unit symbols, and the 'dispersion' is the probability distribution range of the unit symbols, the analysis and the calculation are carried out layer by layer respectively from the aspect that the combined background is the function and the data sample variance/order is the structure. When the concentration degrees of the compared articles are the same/similar, the mean value is uniform, and when the dispersion degrees are the same/similar, the variance is uniform; otherwise, the opposite is true.
C. And (3) checking the relevance of the article to be verified: one set problem has the characteristic of (μ, σ). If the unit symbols are uniform in variance, determining a set formed by the unit symbols and the symbols, and analyzing the correlation between the unit symbols and the set layer by layer from two aspects of functions and structures by utilizing the similarity between the unit symbols and the set characteristics. Wherein functionality is analyzed from the belonging and inclusion of sufficient requirements; the structural property is from t and X 2 The test or rank sum test analyzes its relevance. And meanwhile, evaluating a required sample size range to determine the reliability of the test. If the variance is not uniform, the process is performed by using a rank sum test.
3) And verifying layer by layer, namely verifying the valid data, the sentences, the segments, the modules and the relevance test of the article to be verified and the target article respectively. The specific method is the same as step 2), except that the unit symbol is different, and a continuous and complete total difference is determined according to the data distribution condition correspondingly.
4) Classifying the articles: according to the axis clues of the established or existing database, the genre, the sequence, the property and the like are determined through the hyperlink and the database. The method comprises the following steps of performing inspection by using a fixed structure and a fixed sequence extracted from a database to determine a genre and an axis clue; and classifying the data and outputting the data to an interface. According to the type and the fixed interface, the author only needs to input the content of the author by adding, deleting and modifying, so that the author forms the article to be verified. And automatically classified into articles of mental level, articles of mental level and practice combination and articles of practice type according to a certain level sequence.
5) Outputting an analysis result: outputting the full text of the effective article, and simultaneously outputting each paragraph, each module and the idea of the article center; an article background; even including adopted paper statements; then, the genre property and classification thereof are output.
The further technical scheme is that the article analysis in the step 2): the method comprises the steps of solving the article concentration degree and the article dispersion degree; wherein, the concentration degree is to extract the central thought and the key words/sentences of meaningful article paragraphs, modules and the whole article: including functional and structural concentrations.
Wherein the functional concentration comprises: the function concentration of the data, namely the total mean and the structure concentration of the data are also the total mean; the central idea of the sentences, paragraphs and modules is the functional concentration, and it should be understood that the sentences, paragraphs and modules described herein are only used for explaining the present invention and are not used for limiting the present invention.
Paragraph center thought: the full-text background is a necessary condition of the central thought, namely if the full-text background does not exist, the central thought is not necessarily existed; if there is full text background and not necessarily "central thought". It is common in life to use "only …, …" or "not …, not …" to indicate that insufficient conditions are necessary. The method for extracting the central thought comprises the steps of marking words as unit symbols, extracting statistics of the number of times of nouns n and verbs v which occur most frequently in the unit symbols of the kth paragraph according to the structures of a main part, a predicate part and a guest part, and combining the number of times of occurrence of synonyms and synonyms, wherein the nouns and the verbs are combined together in a database of a dictionary and a database of nouns and verbs determined by the position of the main-predicate guest part established in the text, and the nouns and the verbs are mutually verified under the guidance of the later to be determined; combining noun n (i, j) and verb v (i, j) to form search element S (n, v), counting the times of appearance of search element S (n, v) in each section to obtain search element S with maximum appearance times max (n, v), the sentence containing the main-predicate object structure formed by combining the noun and the verb v is marked as the central thought, S appears max The paragraph with the highest (n, v) number of times is marked as a key paragraph. The term "a structure of a subject" generally refers to either a structure of a subject or a structure of a subject, and further, the adjectives or adverbs are expressed by "ground, earth, or" a structure of a subject. The adjectives or adverbs are used to supplement and improve the meaning of the sentence, so that the main meaning of a sentence is called the object knotAnd the central idea is supplemented and perfected.
The idea of the module center is as follows: similar to the idea of paragraph center, the statistical format divides the thesis into different modules, extracts the noun and verb v with the most occurrence times in the kth module for statistics, and combines the occurrence times of synonyms and synonyms; the same as the step "idea at the center of paragraph". The sentence with noun and verb v combined into main-predicate structure is marked as central sentence a and S appears max The paragraph with the highest (n, v) number is marked as the key paragraph. And adding adjacent words for modification, and finding out the original sentence from the article as the central thought A. If the combined sentences are too many, the original sentences can not be found in the article, and finally the original sentences are output to the background for the author to summarize. The step "article-centric idea" is also the same, as described above. The logic is as follows: assuming A is a condition and B is the conclusion that B cannot be inferred by A and B can be inferred by B, then A is the necessary insufficiency condition of B
Figure BDA0002053706690000101
Wherein the structural concentration comprises: from the above-mentioned correspondence to the "central idea", it can be derived: concentration of paragraphs: counting the number m of occurrences of each unit symbol in the paragraph y How many kinds of unit symbols y, the total number of unit symbols M, M in terms of probability density ymax Is the word at the mean. Average per class unit symbol of
Figure BDA0002053706690000102
Is disclosed in
Figure BDA0002053706690000103
The words of (a) are often to be presented in pairs,
Figure BDA0002053706690000104
respectively represent the number of occurrences of
Figure BDA0002053706690000105
Two words of, and the number of occurrences is m ymax A combination of words in (1) is a paragraphThe heart distribution status is the range of distribution of the "central thought", and this range is the concentration of the paragraph. Here are used two
Figure BDA0002053706690000106
Just as with μ ± σ, this is the position containing this number of words and does not mean that the words are identical words, i.e., there are keywords or subject words between the words that appear in pairs. The central idea can then be looked for from this range. Then the paragraphs are made up of unit symbols and the concentration of paragraphs is as above. Modules are also composed of unit symbols, where a unit symbol is a paragraph/sentence. The same applies to the above. The concentration of the modules, the concentration of the articles and the concentration of the database are the same as the steps.
Wherein, the discrete degree of paragraphs in the article is solved, namely, the sentence composition in each paragraph in the article is analyzed: wherein, the dispersion includes functional dispersion and structure dispersion, wherein, functional dispersion: the discrete problem is to centralize the corresponding concept, and we know that if there is an object case A, there is necessarily an object case B; if there is no object case A but not necessarily no object case B, A is a sufficient and unnecessary condition for B. Sufficient conditions are expressed by "… if …", "… if …", and "… as long as …" which are commonly used in life. In one article, "thinking" or "things" etc. can be inferred to be "central thought". Therefore, the discrete problems are the 'theory', 'things' and the like. The dispersion is a sufficient condition existing in a concentration, the concentration is a central idea, and the dispersion is a composition component of the whole article; the central idea of concentration is composed of a central unit symbol; therefore, the dispersion is all data and sentences composed of unit symbols, and the data and the sentences form a database. Therefore, the potential structure is added in the design, and the sentence is found from the article or the database for matching. The inclusion and included relationships are applied to make inferences, and if correct, its "dispersion" can be derived. The logic is as follows: assuming A is a condition and B is a conclusion that B can be inferred by A and B cannot be inferred, then A is a sufficient and unnecessary bar for BPiece
Figure BDA0002053706690000115
The functional and structural dispersion of the simple data are the same.
Wherein, the structure dispersion: counting the number m of occurrences of each unit symbol in the paragraph y Type of unit symbol y, total number of unit symbols M, average per type of unit symbol
Figure BDA0002053706690000111
A plurality of; as described above
Figure BDA0002053706690000112
Representing an approximate threshold of a probability density of about 68.27% of the distribution. If the unit symbol is data, the variance of the data is directly obtained, specifically, the overall mean is obtained
Figure BDA0002053706690000113
Wherein x 1 、x 2 、…x n N is the number of data for each sample data; variance (variance)
Figure BDA0002053706690000114
If the unit symbol is not data, the unit symbol can be calculated by taking a formula as a model, namely if the unit symbol is a word, the variance is the range formed by the data, and how many words of the sentence are formed is the variance; if the unit symbol is in sentences, it is the variance as to how many sentences the paragraph is composed of. And by analogy, the dispersion of the modules and the articles and the dispersion of the database are the same as the steps.
Further, through centralization and dispersion, the relevance verification of the article to be verified is carried out:
functionally, finding out a central unit symbol from a database, wherein the central unit symbol is a 'subject word' for a sentence, a 'central idea' for a paragraph, a 'core paragraph' for a module, a 'core module' for a full text, or a 'core article' for the database, and so on; and finding out the determined central thought, background and explanation of the central thought through sufficient unnecessary conditions and necessary insufficient conditions. The full text background is a necessary condition of "central unit symbol", i.e. if there is no object case a, there is certainly no object case B; if there is an object case a but not necessarily an object case B, here B is the background. Therefore, the same 'background' condition is selected in function, the 'central unit symbol' comparison is carried out, whether the two have correlation or not can be obtained, and then the structural correlation analysis is combined to carry out comprehensive judgment.
From another perspective, if A is a proper subset of B, A is the 'central unit symbol', B is the dispersion, and A is a sufficient condition of B, wherein B is the argument and material for proving, explaining and explaining A, when the 'central unit symbol' is the same, the argument and material are compared to obtain whether the two are correlated, and then the comprehensive judgment is carried out by combining whether the structural variances are 'uniform'. It should be understood that the statements and materials set forth herein are intended to be illustrative of the present invention and are not intended to limit the present invention. Then, the module and article dispersion and the database dispersion are the same as the above steps.
If the condition is a sufficient necessary condition, the judgment is satisfied. The analysis is performed by inclusion and inclusion relations by means of a database. The logic is as follows: assuming a is a condition and B is the conclusion, B can be inferred by a and B can be inferred by a, then a is a sufficient condition for B (B = a). Conversely, if B cannot be derived from A and B cannot be derived from B, then A is the insufficient and unnecessary condition for B (A phi B, and B phi A).
Wherein, structurally, the number of occurrences m of a unit symbol y The total number of unit symbol types y, the total number of unit symbols M, and the average of each type of unit symbols is
Figure BDA0002053706690000121
If the variance is uniform, the data variance is similar or equal, or the words for sentences are approximately same, or the paragraphs for sentences are approximately same, and so on, the times matched with the data variance are countedThe most "unit symbols" are combined into a "central idea" for comparison, i.e. t-test or X 2 Checking to determine whether each unit symbol is related to the structure of the whole sentence, the whole natural segment, the whole module and the whole article; the rationality of the structure is analyzed. However, in general, since the variance is not uniform, a rank sum test is adopted.
Further, credibility judgment is required: in the testing process, if t test exists, the setting of alpha and beta is set to 80% according to the actual situation. The sample size can be obtained by 80%, and when the sample size is reached, the hypothesis test is automatically carried out to obtain the result. Further, after verification, the reliability of verification must be determined, and there are three conditions for the reliability: accuracy, precision and continuous distribution, the continuous distribution being determined by the sample size. The credibility degree is judged through the accuracy and the precision, and the random continuity is the standard of judgment. So as the number of samples increases, its continuity increases. Here, it is generally sufficient to set both α and β to 80% according to actual conditions. The sample size can be calculated by 80%, and when the sample size is reached, the result is obtained after automatic hypothesis test. Wherein, the formula of sample size is:
Figure BDA0002053706690000122
or
Figure BDA0002053706690000123
Where π is the probability of the occurrence of result A in each Bernoulli test.
Wherein σ is the total standard deviation; delta is the difference in the overall parameter, in the two-mean Z test, noted as delta = mu 12 Mu is the overall average, and the larger delta is, the more likely the two samples with larger difference in average are obtained in the sampling; wherein the content of the first and second substances,
Figure BDA0002053706690000124
the former is a normal distribution formula, and the latter is a binomial distribution formula; the two sets of sample sizes obtained from 68.27% and 80% are demarcated by the known sampleThe quantities are compared to determine the confidence level of the correlation.
Further, in the step 3), in the step-by-step verification, the data, the sentence, the paragraph, the module, and the article to be verified are respectively verified, and the relevance test of the target article is as follows: as described above, the whole formed by each unit symbol and the unit symbol is similar to article verification, the article verification is used as a profit, the database is searched according to the central thought of the article, similar or same articles are classified and extracted as target articles, and if the variances are uniform, the target articles pass through chi-square verification and t-verification or z-verification and chi-verification according to certain axis clues 2 The statistics continuity correction is used for determining whether the article to be verified is related to the target article, specifically, an axis clue is a certain sequence of a central unit symbol, or a time sequence, or a size sequence, or a speed sequence, and it should be understood that the time, the size, and the speed described herein are only used for explaining the present invention, and are not used for limiting the present invention; further, according to the central thought of the article, searching is carried out in a database, similar or same articles are classified and extracted to serve as target articles, the articles are checked by taking a central unit symbol as a base stone and the database as a basis according to a certain axis clue, the S (n, v) of the article to be verified and the target article S1 (n, v) are subjected to t-test or chi-square test, and the similarity between the article to be verified and the target article is analyzed. If the variance is not uniform, after sorting, carrying out rank sum test. Structurally, whether each unit symbol is related to the structure of the whole sentence, the whole natural segment, the whole module and the whole article can be known through rank sum test; if relevant, the description is of the same subject matter, and the structure is reasonable. Further, if the structures are related, the unit symbol composition of the article to be detected and the unit symbol composition of the target article are similar; the functionally central ideas, backgrounds and materials not only have 'sufficient conditions and necessary conditions' in the article itself, but also have the processes of inclusion and inclusion in the mutual inspection process. Similar to the sufficient and necessary conditions, if the central thought of the article to be verified is equal to that of the target article, the article to be verified is prompted to be an invalid article; if there is functional or contained or structural dependency, then there is only oneItems, if not all items are available, are valid articles; if the structures and the functions are different, the articles belong to articles in different fields and also belong to invalid articles; if the structures are not related and the functions are similar, the same subject article with different genre structures is illustrated. This is because two populations, if correlated, indicate that they are different samples in the same larger population.
Further, if the two "populations" are uniform in variance, chi-square test and t-test or z-test and chi-test are performed on the populations to which they belong 2 The continuity correction of the statistics one by one proves whether the article to be verified is related to the target article.
According to the article center idea, searching is carried out on a database, similar or same articles are classified and extracted to serve as target articles, and the structural and functional inspection is carried out according to the method. Further, taking the 'central unit symbol' as a base stone and taking a database as a basis, checking the base stone, checking S (n, v) and S1 (n, v) of an article to be verified, and when the variances are uniform, passing chi-square check and t-check or z-check and chi-check according to certain axis clues 2 And (4) continuously correcting the statistics to determine whether the article to be verified is related to the target article one by one, and analyzing the similarity between the article to be verified and the target article. And if the variance is not uniform, performing rank sum test.
Further, under the condition of uniform variance, selecting one of an article to be verified and a target article, and establishing a chi-square inspection table, specifically, firstly, a structure formed by a central sentence can be roughly divided into a narrative text, a descriptive text or a discussion article, and screening is carried out; it should be understood that the description, presentation or discussion presented herein is intended to be illustrative only and is not intended to be limiting; after screening, combining nouns and phrases in central unit symbols according to a central sentence and a main predicate element structure, then checking sentences capable of being combined into sentences, namely finding documents in a database by the near meaning unit symbols and the synonymous unit symbols, then judging whether the relations of the near meaning unit symbols, the synonymous unit symbols and the main predicate element are consistent or not, and if a clue is found by coincidence, the clue is a time axis or a real time axisInspection basis or literature basis), then X is performed 2 Or queue check to determine whether the two articles are related, it should be understood that the time axis, experimental basis, and literature basis described herein are merely illustrative and not limiting of the present invention. For example: chi shape 2 In the examination, the data in article a and article B prove that the target problem is effective or ineffective, as shown in table 1.
TABLE 1
Figure BDA0002053706690000141
Wherein, the validity and invalidity are performed by t-test, i.e. data 1 in article a and data 1 in article B, as shown in table 2.
TABLE 2
Number of Data in article A Data in article B Difference d'
1
2
3
Establishing a test hypothesis and determining a test level H 0 And H 1 α =0.05, using
Figure BDA0002053706690000142
And obtaining the effect in the article A and the effect input chi in the article B by the nu table of the degree of freedom 2 In the checked table.
Using probability distributions, two groups A, B and the average total effective rate are each π 1 、π 2 And pi, then the estimated values are
Figure BDA0002053706690000143
Establishing a test hypothesis, determining a test level, H 0 =π 1 =π 2 = π and H 1 Is pi 1 Is not equal to pi 2 α =0.05, (off-set distribution can also be used for Z-test, i.e.
Figure BDA0002053706690000144
Then, proceed the data χ of the quad table 2 The continuity of the statistics is corrected by subtracting 0.5 from the absolute value of the difference between the actual observed frequency O and the theoretical expected frequency E, i.e. by using
Figure BDA0002053706690000145
Wherein, the expected frequency E of each grid is more than or equal to 5, and the total number n of the cases is more than or equal to 40, the check is adopted; although the total number n is larger than or equal to 40, one of the lattices is expected to have the frequency of 1-40E is less than or equal to 5, and continuity correction is adopted 2 Checking; frequency E expected for any one cell<1, or total number of instances n<40, or the P value obtained by checking is close to the checking level alpha, and the continuity correction is adopted to correct chi 2 And (5) checking the exact probability.
The method comprises the steps that a is the number of sentences of a central unit symbol in an article to be verified, the ratio of the central unit symbol to the sentences of the article to be verified to appear in the article to be verified is a/a + b, the a + b is the total number of the sentences of the article to be verified, c is the number of sentences of the central unit symbol in a target article, the ratio of the central unit symbol to the sentences of the target article to appear in the article to be verified is c/c + d, and the c + d is the total number of the sentences of the target article; the calculated values are substituted into a chi-square test table.
If the variance is not uniform, the variance is not uniform due to the different sizes of each segment by the rank sum test method, specifically, by the correspondence of the above "central idea". In this case, it can be found that: concentration of paragraphs: the paragraph has m sentences, and the sentences composed of words are observed values (x) according to the structure of the main and the predicate i ,y i ,z i ). Then, the ith (i =1, …, m) sentence pair has the observed value (x) i ,y i ,z i ) The number of occurrences at these positions is either 0 or 1, so we can sort each word after classifying according to the structure of the subject-predicate object, i.e. x at the subject position i Words with a frequency of s and each word appearing in the article with a frequency of s 1 ,s 2 ,…s i …,s m (ii) a The density function f (x) has the following properties: (1) f (x) ≧ 0; (2) × f (x) d (x) =1; (3)
Figure BDA0002053706690000151
A probability density function; therefore, the frequency corresponding to each word is the probability density, which is a function of the probability density
Figure BDA0002053706690000152
The word with the highest frequency is the place with the highest probability density, and the position of the word is the position of the average value; the random variable in which the distribution function is a continuous function is called a continuous type random variable. Probability density function of random data: representing instantaneous amplitude dropThe probability within a certain specified range and is therefore a function of the amplitude. Which varies with the amplitude of the range taken. The continuous random variable is often visually described through a probability density function of the continuous random variable, each word has a frequency in the article, and the sum of the frequencies of the words is the coverage area.
Probability distribution function
Figure BDA0002053706690000153
Thus, the subject position is x i Predicate and object position y i ,z i . Since the average value is exactly the place with the highest probability density, the absolute value of the difference between the frequency of each word and the frequency with the highest probability density is ordered to form a rank, so that the rank can be represented by the order of the frequency. As described above, the words with the highest frequency are often different in different sentences, different paragraphs, different templates, and different articles; the probability distribution is that the lengths of different sentences, the sizes of paragraphs, the sizes of modules and the sizes of articles are different. Therefore, they are all different (μ, σ).
We want to verify whether any two sentences of the same paragraph are related; whether two sessions of the same module are related or not is verified; whether two modules of the same article are related; and whether two articles in the same database are related. We need only define them as two populations, then any two populations a and B (table 3).
TABLE 3
Figure BDA0002053706690000161
Wherein we only need to combine (x) i ,y i ,z i ) The correspondence list is as shown in figure (table 4):
TABLE 4
Master and slave Is called Wei Guest with a lock
x 1 y 1 z 1
x 2 y 2 z 2
x j y j z j
x K y K z K
y k+p z k+p
z k+p+q
We extract the subject or predicate or object from table 4. In table 3, ranks are sorted from small to large according to frequency, and averaged if equal values are encountered, for example: assuming that the infants, laryngitis and budesonide all appear five times, their ranks are 61, 62 and 63 respectively, the average rank is
Figure BDA0002053706690000162
Calculating the rank sum, adding the ranks of each group to calculate the rank sum R X And R Υ
And calculating statistic, and taking one group of rank sum as statistic if the two groups of cases are equal. If the two groups of examples are not equal, the sum of the corresponding orders of the smaller examples is taken as the statistic.
Determining the P value and drawing an inference: looking up T boundary table, first find R from left side X And R Υ The smaller is the smaller; then find the difference (n) between two examples from the top of the table 2 -n 1 ) (ii) a The threshold value of T is defined at the intersection of the two. Comparing the value of the test statistic T with the critical value of T, and if the value of T is in the range of the threshold value, the value P is greater than the corresponding probability; if the T value is equal to or outside the threshold value, the P value is equal to or less than the corresponding probability. Or according to the normal approximation: if n is 1 Or n 2 -n 1 Outside the range of the T-cutoff of the table, a normal approximation can be used for testing. Test statistic of
Figure BDA0002053706690000171
Accordingly, analysis is performed according to the normal distribution principle. If the same order is more than 25% (for example), the correction is performed according to the following formula
Figure BDA0002053706690000172
Wherein the content of the first and second substances,
Figure BDA0002053706690000173
t j is the number of jth identical rank, N = N 1 +n 2
Obviously, in the same sentence, we require that the language be clear; in the same paragraph, we require the center to be unambiguous; in the same module, we require the same idea; in the same article, we claim the same words, phrases, modules, and articles as those described herein are only for explaining the present invention and not for limiting the present invention. In the same database, we require that the two articles are distinct, unrelated, and characteristic, but are part of the database. Therefore, when the two articles are required to have no correlation in the same database and a plurality of articles are required to be compared, the article belongs to one part of the database and has correlation, and particularly, the comparison of the two articles is subjected to rank sum test by the method; the comparison of the various articles is as follows:
is given by the formula
Figure BDA0002053706690000174
v = k-1 wherein R i Is the sum of ranks of the groups, n i For the number of frequencies that each group should appear, N = ∑ N i And k is the number of groups compared.
Determining P values and concluding
When the number of groups k =3, the number of cases n in each group i Less than or equal to 5, and can be obtained by looking up the H boundary value table of the appendix tableTo the P value.
When the number of groups k > 3, or the number of instances n i At > 5, H approximately follows X with a degree of freedom v = k-1 2 Distribution, can find X 2 The boundary value table obtains a P value.
If the same rank is more than 25%, an-rou
Figure BDA0002053706690000175
Wherein the content of the first and second substances,
Figure BDA0002053706690000176
t j is the number of jth identical rank, N = N 1 +n 2
Similarly, the article concentration and the database concentration are the same as the above steps.
Since correlation refers to whether two samples are in the same feature population. The relation between the population and the population is the relation between the inclusion and the inclusion, the sample can be regarded as the population, and only the population of the sample forms a larger population. So that there are sufficient or necessary conditions for both samples (or called population). If the condition is sufficient, the condition is equal, which means that the article to be detected is equal to the comparison article, so that the article to be detected already appears, and the 'repeatability' causes 'resource waste'. If sufficient, unnecessary or necessary, to state that the two are an inclusive-and-inclusive relationship, to state that either a more detailed demonstration of a point, or a consideration of the overall layout; it is obviously meaningful to examine the structure, and to examine the setting of the accuracy and precision, which shows the degree of their structural intersection, the higher the requirement, the greater the overlap, so after examination, it is determined whether the two articles are related, and if so, it is suggested that "the articles are similar or identical in structure". Otherwise, the reverse is true. Further, if the article a to be verified is structurally related to the comparison article B, and the functionally "central idea" and "background" of the article to be verified, and even the material of the description or demonstration, are similar or identical, the article to be verified is prompted to be an invalid article. If the above-mentioned conditions occupy one or two of them, then different conditions are contribution items, for example, structurally related, but functionally central ideas are different, backgrounds are the same, and argumentation or explanation materials are the same, then "central ideas" are contribution items; structurally related, but functionally the same central concept, different backgrounds, and the same demonstration or explanation material, the "background" is a contributing item; structurally related, but functionally central ideas are the same, backgrounds are the same, and demonstration or explanation materials are different, so that the demonstration or explanation materials are contribution items; the structure is irrelevant, but the central thought is the same in function, the background is the same, and the demonstration or description materials are the same, so that the structure is a contribution item, and specifically, if the structure is irrelevant and the functions are similar, the description is the same subject article with different genre structures; by analogy, there are fourteen combinations, and there are fourteen "contributions". If the structure and the function are different, the articles in different fields also belong to invalid articles.
Furthermore, the influence of multiple correlations needs to be eliminated before the step 3), and the regression model for eliminating multiple correlations needs to use the chi-square test method. Specifically, the thesis to be verified overcomes the problem of multiple correlations and then outputs the analysis result, specifically, the scenario application is performed according to the statistical analysis result, and the one-sided problem caused by the cross problem of multiple correlations is eliminated. Further by multiple correlations
Figure BDA0002053706690000181
The regression model was subjected to linear evaluation. The chi-square test is used in the test of the coefficients, and the test can be performed by using similar SPSS statistical software, but is not limited to the software, and the analysis result is output: outputting effective article full text, and simultaneously outputting all paragraphs, modules and article center ideas; an article background; in the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration, various elements, components, and/or steps of the invention may be practiced.
Still further, the step 4) article classification: the method comprises the following steps that the function and structure concentration ratio is used as a clue, according to the clue of an established or existing database, the type, the sequence, the property and the like are determined through hyperlink, and the specific steps are that the fixed structure and the sequence extracted from the database are used for checking to determine the type and the clue; specifically, after automatic analysis, the spirit level is the expression of mysterious expression, fairy tales and science fiction, and as long as the idea is good and the literary expression is good, the analysis result is directly output without logic argument, and the expression is a spirit level article. And after the output, extracting the contribution websites, and automatically making the corresponding website contribution on the articles. Further according to the "literature is derived from common general knowledge, but sublimes into a spirit; the philosophy is derived from spirit, and reverts to the principle of common general knowledge, namely establishing a "database of the psyche, fairy tale and science fiction level" and a "database of common general knowledge", it should be understood that the psyche, fairy tale and science fiction described herein are only used for explaining the present invention and are not used for limiting the present invention. And determining the category of the literature and the philosophy according to the cause-effect sequence of the comparison database. The analysis result is an article which is from life service to spirit or from spirit service to life level, and after the article is output, a contribution website is extracted to automatically contribute. Science is derived from the common general knowledge, and the common general knowledge is demonstrated and applied infinitely. Therefore, the output interface outputs the effective paper according to the format, and synchronously outputs the paper with the same subject target as the effective paper.
Further, in the step 2), when the original sentence containing Sk (n, v) can not be found in the article, the original sentence is displayed from the output interface, and the author summarizes the central thought. In addition, in the system flow, when the system flow cannot be carried out, the existing posting interface is automatically popped up, and the traditional posting is carried out.
Furthermore, the philosophy and hypothesis are also the technical solutions of scientific paper classification. Specifically, matrix computation is applied to the discussion paper, and samples are decomposed into unit symbols; molding a structure; and forming a matrix by using the common knowledge database as a basis and constructing the corresponding data. If the comparison paper and the paper to be verified verify the philosophy/literature, the scientificity and the irrelevance of the comparison paper and the paper to be verified.
The method comprises the following steps:
a. and establishing a common knowledge base, wherein the common knowledge base refers to common knowledge which is already demonstrated by modern natural science, and is marked as G _ ID on the basis of the teaching materials of 'counting, theory and chemistry'. Meanwhile, a database of the levels of psychiatry, fairy tales and science fiction is established. It should be understood that the psyche, fairy tales, and science fiction described herein are only intended to illustrate the present invention and are not intended to limit the present invention.
b. Comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the related common sense G _ ID one by one to form a matrix K, and marking as negative correlation and marking as K (i, j) = -1 when G (i, j) is compared with the G _ ID and is opposite to the corresponding common sense; when G (i, j) is compared with G _ ID, no corresponding common sense support is found, and it is marked as irrelevant, and it is marked as k (i, j) =0; when G (i, j) is compared with G _ ID, finding the corresponding common sense support, then marking as positive correlation, and marking as k (i, j) =1;
matrix K = [ K (1,1) … … K (i, j) … … K (l, m) ];
c. comparing the j-th statement G1 (i, j) in the i-th section of the target paper with the related common knowledge G _ ID one by one to obtain a matrix K1, wherein the matrix K1= [ K1 (1,1) … … K1 (i, j) … … K1 (l, m) ];
d. multiplying the matrix K and the matrix K1 pairwise, summing the multiplied matrixes and obtaining a value after summing the sum sigma K (i, j) × K1 (i, j), and positively correlating the matrix K and the matrix K when the sum is a positive value; if the value is negative, the two are negatively correlated; when the value is 0, the two are not correlated.
e . After positive correlation, negative correlation and irrelevance are selected in d, a value is obtained by summing sigma K (i, j) + K1 (i, j), and when the signs of the positive correlation and the negative correlation are the same, a positive value is larger than a matrix K, the positive correlation and the matrix K are positive correlation and scientific; when the signs of the two are the same, a positive value is obtained and is approximately equal to a matrix K, the article to be verified is irrelevant, and a large amount of irrelevant content is contained in the article to be verified, so that a philosophical/literary trend is prompted, and the philosophy/literary trend is defined as 'imagination' and the practical value of the philosophy/literary trend is expressed; when the signs of the two are opposite, a positive value is obtained, so that the positive value has certain philosophical/literary guidance; thenThe negative value or zero is negative correlation, the two are 'pseudo-scientificity', and the 'pseudo-scientificity' is supported by 'common knowledge' which is not found at all, so the method belongs to the mental level; then, a relatively valuable paper is selected, namely the possibility that the paper is relatively scientific or philosophical/literary is judged after the sign is positive or negative and the positive or negative correlation between the article to be detected and the comparison article is obtained.
f. Comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the common sense G _ ID one by one, and judging that the J-th sentence is a scientific article if less than 3 common sense supports are found according to a scientific basis and a simplification principle applicable to 'axiom' after the G (i, j) and the G _ ID are compared;
g. similarly, the above a, b, c, d, e method can also be used to determine the "psychology, fairy tale, science fiction level" article and determine the philosophy/literature as the "article combining mental level and practice", it should be understood that the psychology, fairy tale, science fiction described herein are only used to explain the present invention, and are not used to limit the present invention. Specifically, the philosophy article or the scientific article is determined according to the cause and effect relationship of the common sense or the mental level appearing in the axial clues of the articles, wherein the scientific article is a practice article.
Further, before the step 3), after the paper to be verified overcomes the multiple correlation problem, the analysis result is output after the step 5).
Furthermore, the effective article is combined with the actually acquired target data to guide the target data, the target data verifies the effective article to form an intelligent learning type inspection system, and the intelligent learning type media interaction analysis method and device are combined to continuously and spirally rise.
Then, the author needs to input its own content by adding/deleting/modifying, thereby forming its own paper to be verified. After automatic analysis, the process is carried out, and the analysis result is output: outputting the full text of the effective article, and simultaneously outputting each paragraph, each module and the idea of the article center; an article background; in the description materials and treatise data, it should be understood that the description materials and treatise data are for the purpose of explanation only and are not intended to limit the present invention; then, the literary property, philosophy and hypothesis are scientific papers and are automatically submitted.
In particular, FIG. 1 is a flow chart illustrating a method of analysis and self-demonstration of articles,
step one to step four are database establishment and application, when an author inputs an article to be verified, the author can call the format of the article of the type in the thesis database according to the theme, an input interface automatically extracts and displays the module of the article of the type and the content of the article of the type, and the author can input the content of the author and determine a unit symbol by adding, deleting and modifying the content of the author, so that the thesis to be verified is formed. For example: the appendix exemplifies the observation of curative effect of budesonide suspension on treating acute laryngitis in children. Roughly illustrated, which means that the problem can be clearly illustrated by way of example, but only simulations are calculated programmatically by the above-described calculation, and there is no need to describe the calculation accurately and precisely here. The target thesis is found according to the format and the content, the unit symbols are mainly data, and the evidence-based medical analysis is performed by applying the data. The database is established by verifying three angles of 'central thought', 'background' and 'interaction between the central thought and the background', wherein a central unit symbol is a word, namely 'treatment of acute laryngitis', and central sentences are obtained, namely 'observation of curative effect of local treatment of the budesonide suspension on the acute laryngitis of children', 'observation of curative effect of hormones on the acute laryngitis of children' and 'no obvious difference between curative effect of local application of the budesonide suspension and curative effect of systemic application of hormones'. Finding out articles with similar subjects in the target data according to the contents of the articles to be verified, and statistically analyzing the most used formats of the similar articles; extracting the same factors through a target database to establish the content of an interface database including a format and a part of fixed knowledge; fixed structures of medical papers are found in databases, for example: what, why, how structured. An interface is formed according to the structure, and the universally same or similar contents are screened out from an interface database to automatically generate a paper. And finally, carrying out data acquisition and input: in a data input interface, inputting an article to be verified according to a format according to an input interface prompt; we can just screen and modify, add and delete the papers. The specific technical scheme refers to the article analysis in the step 2).
A further technical scheme of the fifth step and the sixth step is that in the input data of the step 2), as shown by a module functional structure in fig. 2, the verification module extracts concentration and discreteness from both functions and structures respectively, obtains characteristics of the data according to the concentration and discreteness, and verifies the data by using a rule that the data are similar in characteristics and belong to the same whole. The method comprises the following steps:
the functionality is compared by utilizing the concentration and the discreteness; carrying out correlation test on the structure; the two are combined to comprehensively explain the problem. For example: the central idea of the article is as follows: the curative effect of the budesonide suspension for local treatment and the systemic hormone treatment for the infantile acute laryngitis has no obvious difference; the background is that the hormone has obvious effect of treating the infantile acute laryngitis but has great side effect. The hormone systemic treatment and the budesonide local treatment are compared and tested in the whole text, and no difference is found. The concentration is the maximum value of probability density, which is corresponding to the position of the average value of the samples, so the concentration is embodied as a subject word in an article; the discrete probability distribution is an effective range, and is a background relative to the central thought and a whole article relative to the subject word.
A. Taking data as a unit symbol of the text to be verified, and directly applying evidence-based study to carry out inspection; if the text to be verified takes the word as the unit symbol, performing semantic analysis on the unit symbol and segmenting the word, removing stop words, adverbs and adjectives, filtering non-semantic words, and keeping nouns and verbs.
B. Solving the degree of paper concentration, namely extracting the central ideas of paragraphs, modules and articles:
functionally centralizing, extracting 'central thought' or 'clue', extracting noun and verb v with the most occurrences in the unit symbol of the kth paragraph, counting the times, adding the times of occurrence of synonyms and synonymsTo merge; the noun n and the verb v are arranged and combined to form a central phrase S (n, v), the times of the central phrase S (n, v) appearing in the k-th paragraph are counted, and the central phrase S with the largest appearing times is obtained max (n, v) in
Figure BDA0002053706690000211
In scope or in articles, or with the complete appearance of S max The sentence of (n, v); or only the structure of the main object and the predicate is shown as the central idea of the paragraph by combining the adjectives and adverbs modified by 'ground, ground and arrow' in the article.
The idea of the module center is as follows: dividing the article to be verified into different modules according to the format counted in the step 1), extracting the noun and the verb v with the largest occurrence frequency in the kth module, and combining the occurrence frequencies of synonyms and synonyms in the frequency counting process; marking the sentences containing nouns n and verbs v combined into a main/predicate structure as central sentences, and marking the paragraphs with the most central sentence occurrence frequency as key paragraphs; adding adjacent words for modification, and finding out an original sentence from the article and recording the original sentence as a central thought; if the number of the central sentences is large, the original sentences can not be found in the article, and finally the original sentences are output to the background for the author to summarize.
The full-text central idea is as follows: the noun n with the most appeared full text unit symbols is counted max And verb v max Recording as a central unit symbol, and taking the sentence with the highest occurrence frequency after the central unit symbol is combined as an article central sentence; the module with the most times of the central sentence of the article is marked as a key module; modifying the central sentence and the added adjacent words, and finding out the original sentence from the module to be marked as a central thought; the central idea in the key module is the central idea of the article.
For example: in the cited article for 'observation of curative effect of local treatment of infantile acute laryngitis by budesonide suspension', the article is shown in 'appendix', most of data takes place, so that the data is directly examined by evidence-based medicine, budesonide suspension does not necessarily appear in each sentence, but appears for multiple times in a section, so that the unit symbol is how many cases and words, and corresponds to a section rather than a sentence, and the words are unit symbols of the section, so that the words form a central phrase corresponding to the section or module. It is assumed that the most extracted in this article is "budesonide suspension for treating laryngitis". In the context of hormone-treated acute laryngitis being effective, the former may be the case, whereas in the case of the latter, i.e. "hormone-treated acute laryngitis", the former is not necessarily the case; similarly, in the context of "hormone therapy acute laryngitis", this article may be compared with other comparative articles, as well as the inclusion and inclusion relationships between the article and the target article. Thus, the sentences with the maximum similarity in the full text can be retrieved and extracted through the combined sentences, and the further technical scheme is that if the original sentences containing S (n, v) cannot be found in the article, the original sentences are displayed in an output interface, and then the authors summarize the central ideas.
Structurally, the position with the highest probability density is the center of the structure, so that the following can be obtained according to the corresponding 'center thought': concentration of paragraphs: counting the number m of occurrences of each unit symbol in the paragraph y How many unit symbol types y, the total number of unit symbols M, M in terms of probability density ymax Is the word at the mean. Average per class unit symbol of
Figure BDA0002053706690000221
Is disclosed in
Figure BDA0002053706690000222
The words of (b) are intended to occur in pairs,
Figure BDA0002053706690000223
the words with the overall average position in the range are the distribution state of the article center, i.e. the range in which the central thought is distributed, i.e. the central thought is found from a section of words with the highest density. Then the paragraph is composed of unit symbols, where the unit symbols are sentences/words, then the concentration of paragraphs is the same as above. Modules are also composed of unit symbols, where a unit symbol is a paragraph/sentence. Similarly, the module concentration, the article concentration and the database concentration are the same as the steps.
The concentration of modules, on average, each type of unit symbol = M/y, where M is the total number of unit symbols or paragraphs in a module and y is the total number of unit symbols or paragraphs, often occurring in pairs at M/y locations. The article concentration is the same as the above, wherein M is the total number of unit symbols or modules in the article; the concentration of the database is the same as above, wherein M is the total number of symbols or articles in the database.
Functional dispersion, in the above example, we can see that the correlation analysis of the data in the whole text and that the effective of the hormone to treat acute laryngitis is the functional dispersion.
The structural dispersion is a problem of uniform and irregular variances, for example, the paper "observation of curative effect of budesonide suspension for treating infantile acute laryngitis" can obtain variances from data, and the variances can be compared with variances in a comparison article to obtain uniform and irregular variances. However, in the case of the uneven size of each paragraph, the function and size of each module are different, and it is obvious that the structure is different from the viewpoint that the unit symbol is a word. Thus, this article applies rank-sum test to rank each unit symbol, which means the frequency of words, after statistics. The frequency of the articles to be checked and the frequency of the words of the articles in the target database are mixed and then sorted.
The flowchart of the figure shows a seventh step, and the validation is performed on the flowchart, for example: the background of the method can be obtained through the relationship between the inclusion and the inclusion by taking the central idea as a clue; through the word structure distribution composition of the article, if the variances are neat, the variances are compared with each other to generate 'effective and ineffective' to carry out X 2 Checking; if the variance is not uniform, the rank sum test is carried out, and in the practical process, due to the fact that similar SPSS statistical software exists, but not limited to the support of the software, the test can be automatically carried out only by designing a reasonable test. The specific technical scheme refers to the step 2) in the article analysis. The module is shown in detail in the second figure. Specifically, the verification modules are respectively carried out according to the uniform and irregular variances. Since the function is combined with the structure, and the structure is adapted to the function, the verification module is structurally classified as shown in the functional structure of the module in fig. 2. For example: relevant articles are found in the to-be-verified paper and the database for verification, namely the budesonide suspension for treating the infantile acute laryngitis, and the relevant articles are verified: from the data perspective, the variance is directly calculated, and the uniform and irregular variances can be obtained; for the article itself, because of the variance, the article can be sorted by the frequency of words and checked for rank and rank. Detailed procedures as indicated above, since the article uses two sets of data, namely budesonide topical treatment, hormone systemic treatment for comparison. Obviously, effective and ineffective tests are generated, if the variances are calculated at first, a chi-square test table can be established with the comparison data, and the effectiveness and the ineffectiveness of each central unit symbol are determined through t test; establishing a test hypothesis and determining a test level H 0 And H 1 α =0.05, using
Figure BDA0002053706690000231
Wherein
Figure BDA0002053706690000232
As shown in Table 2, the difference between the two comparison data and the degree of freedom ν are looked up to obtain the effect of the paper to be verified and the effect of the target paper, and the obtained effect is inputted into the card-side checked table. Similarly, in the comparison article, one of the paper to be verified and the target paper is selected, a chi-square test table is established, and the validity and invalidity of each central unit symbol are determined through z test; establishing inspection hypotheses, determining inspection levels, H 0 =π 1 =π 2 = pi and H 1 Is pi 1 Is not equal to pi 2 ,α=0.05,
Figure BDA0002053706690000233
Wherein the total effective rates of the two groups of papers to be verified and the target paper and the average are respectively pi 1 、π 2 And pi, then the estimated values are
Figure BDA0002053706690000234
a is data of the central unit symbol appearing in the paper to be verified, the proportion of the data appearing in the paper to be verified is a/a + b, and a + b is the data to be verifiedVerifying the total data of the thesis, wherein c is the number of sentences appearing in the target thesis by the central unit symbol, the appearance proportion of the sentences in the to-be-verified thesis is c/c + d, and c + d is the total data of the target thesis; where the t-test is a small sample of the z-test, both tests tend to be the same as the sample size increases or the sample drawn from the population gets closer to the population. After the test, the article to be tested is equal to the article to be compared, so that the article to be tested already appears, and the 'repeatability' is caused, thereby causing 'resource waste'. If sufficient, unnecessary or necessary, to state that the two are an inclusive-and-inclusive relationship, to state that either a more detailed demonstration of a point, or a consideration of the overall layout; it is obviously meaningful to examine, by structural examination, the degree of intersection of the structures, which is described by the examination of the setting of precision and accuracy, the higher the requirement, the greater the overlap, so that after examination, it is determined whether the two articles are related, and if so, a "similar or identical article" is suggested. Otherwise, the opposite is true. Further, if the article a to be verified is structurally related to the comparison article B, and the functionally "central idea" and "background" of the article to be verified, and even the material of the description or demonstration, are similar or identical, the article to be verified is prompted to be an invalid article. If the above-mentioned conditions occupy one or two of them, then different conditions are contribution items, for example, structurally related, but functionally central ideas are different, backgrounds are the same, and argumentation or explanation materials are the same, then "central ideas" are contribution items; structurally related, but functionally the same central concept, different backgrounds, and the same demonstration or explanation material, the "background" is a contributing item; structurally related, but functionally central ideas are the same, backgrounds are the same, and demonstration or explanation materials are different, so that the demonstration or explanation materials are contribution items; the structure is irrelevant, but the central thought is the same in function, the background is the same, and the demonstration or description materials are the same, so that the structure is a contribution item, and specifically, if the structure is irrelevant and the functions are similar, the description is the same subject article with different genre structures; by analogy, there are fourteen combination modes, namely tenFour "contributions". If the structure and the function are different, the article belongs to an invalid article in different fields.
And eighthly, carrying out a credibility analysis module. The reliability can be analyzed by the probability and continuity of their assumed accuracy and precision, for example, in the exemplified article there are data which are compared, one for local treatment with budesonide and one for systemic treatment with hormones. If the target article also has the same data, the unit symbols are the data, and the article has high reliability as long as continuity exists. Similarly, if the unit symbol is a word of an article, the words of the article are sorted, if the continuity is high, the credibility is high, and if the continuity is low, the credibility is low. In combination with the building module of the database in the step one, meaningless clues are eliminated and selected, so that the selected unit symbols in the article are the experimental data, the continuity of the experimental data can be analyzed through the data, specifically, the sample size can be calculated by 80%, and after the sample size is reached, the result is obtained after the hypothesis test is automatically performed. Wherein, the formula of sample size is:
Figure BDA0002053706690000241
or
Figure BDA0002053706690000242
Where σ is the overall standard deviation, in this example, the data is averaged
Figure BDA0002053706690000243
δ is the difference in the overall parameter, in the two-mean Z test, noted as δ = μ 12 The larger delta, the more likely it is to obtain a larger difference in the average of the two samples in the sample; standard deviation of
Figure BDA0002053706690000251
Wherein the content of the first and second substances,
Figure BDA0002053706690000252
the former is a normal distribution formula, x is a sample, n is a sample amount, sigma is a standard deviation, and mu is a total mean; the latter is a binomial distribution formula, where π is the probability of occurrence of result A in each Bernoulli test, and overall is divided into two according to the position of greatest distribution density, π 1 A distribution state in the population bounded by half the maximum position of the distribution density 2 The distribution state of the other half; the number of samples is determined so that α and β are 68.27% or more, and if the number of samples is larger, the reliability is higher, and if the number of samples is 95% or more, the reliability is extremely high. Otherwise, if the reliability is less than 68.27%, the reliability is not sufficient.
If the example given is uniform, e.g. the above example, the variances are similar or equal by variance test, then the correlation of the paper to be verified with the target paper: according to the thought of an article center, searching is carried out in a paper library, similar or identical papers are extracted to serve as target papers, and whether the papers to be verified are related to the target papers or not is determined one by one through chi-square inspection and t inspection.
And according to the flow chart of the figure 1, according to the step nine, performing a module for eliminating the multiple related factors. As shown in the attached paper, "observation of curative effect of budesonide suspension for treating infantile acute laryngitis", only simple examination of local treatment of budesonide suspension and systemic treatment of hormone was selected because of multiple related factors. Although the paper seems to have a single factor, there are also potentially multiple factors, such as the magnitude of psychological dependence produced after drug administration, the correlation to therapeutic effect, and the like, which are not considered. In this case, we could also find other relevant factors, for example, the pediatric acute laryngitis is not only a systemic hormone and local budesonide treatment problem, but also a weather, diet, life status problem, etc., and multiple studies are carried out by extracting data in related articles, and multiple correlations are carried out
Figure BDA0002053706690000253
The regression model is used for linear evaluation, and the contribution of each factor is found from the angle of the coefficient through the chi-square testWeight, thereby excluding irrationality due to its multiple correlation factors. As shown in table 1 and table 2, the specific technical solution is to eliminate the influence of the multiple correlations before performing step 3) in step 5, and to eliminate the regression model and chi-square test method used for the multiple correlations. The module is shown in detail in the second figure.
According to the flow chart of fig. 1, as shown in step ten, fig. 2 is a functional structure diagram of the module of the present invention in the classification analysis module. The system comprises a mental and practical distinguishing module and a scientific and philosophical distinguishing module. However, since different kinds of articles have different structures and different contributions, they have different social contributions. The inspection purposes are different, so that different databases are required to be established for combination and comparison; different types are generated due to different causes and effects, and then the mental level, the combination of practice and spirit and the practical classification are carried out. The specific technical scheme is as above step 4) article classification. Our example application is evidence-based medicine, but the discussion has partly its logical imagination. In a database at the mental level, articles related to the database are difficult to find. However, similar statistical applications can be found in common sense databases; and in medical databases, a large number of papers similar thereto can be found. The paper is scientific in method by taking the statistical calculation as a guide and extracting practical data for verification, so that the classification analysis module is verified in the following way: the effective paper is classified according to philosophy paper and scientific paper, the specific steps are to apply matrix calculation, firstly, a sample is converted into a unit to form a specific quantized sample, the collected data in the philosophy paper is targeted at, if the collected data is negative or can not be converted into the quantized sample, the negative correlation indicates that the trend is in the philosophy category, and the paper to be demonstrated is the philosophy paper; if the correlation is positive, the trend is in the scientific category, and the paper to be demonstrated is a scientific paper.
The method comprises the following steps:
a. the method comprises the steps of establishing a common knowledge base, wherein the common knowledge base refers to common knowledge which is already demonstrated by modern natural science, and is based on 'number, theory and chemistry' teaching materials, and the common knowledge is marked as G _ ID by using statistical knowledge in the paper.
b. Comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the related common sense G _ ID one by one to form a matrix K, and marking as negative correlation and marking as K (i, j) = -1 when G (i, j) is compared with the G _ ID and is opposite to the corresponding common sense; when G (i, j) is compared with G _ ID, if no corresponding common sense support is found, it is marked as irrelevant, and it is marked as k (i, j) =0; when G (i, j) is compared with G _ ID, finding the corresponding common sense support, then marking as positive correlation, and marking as k (i, j) =1;
matrix K = [ K (1,1) … … K (i, j) … … K (l, m) ];
c. comparing the jth statement G1 (i, j) in the ith section of the target paper with the related common knowledge G _ ID one by one to obtain a matrix K1, wherein the matrix K1= [ K1 (1,1) … … K1 (i, j) … … K1 (l, m) ];
d. multiplying the matrix K and the matrix K1 pairwise, summing the multiplied matrixes and obtaining a value after summing the sum sigma K (i, j) × K1 (i, j), and positively correlating the matrix K and the matrix K when the sum is a positive value; if the value is negative, the two are negatively correlated; when the value is 0, the two are not correlated.
e. After the correlation, the negative correlation and the irrelevance are selected from the d, a value is obtained by summing sigma K (i, j) + K1 (i, j), when the two (the two refer to the multiplication and the addition of two matrixes) are the same in sign, the obtained value is larger than the matrix K, the two matrixes are in positive correlation and are scientific; when the signs of the two are the same, if the obtained value is equal to the matrix K, the article to be verified is irrelevant, the article is pseudo-scientific and has no practical value; when the signs of the two are opposite, a positive value is obtained, which indicates that the article to be detected contains a large amount of irrelevant content, so the practical value is not large; if the value is negative or zero, the two are negatively correlated, and the philosophical trend is indicated; then, selecting a relatively valuable paper, namely judging the possibility of being a relatively scientific or philosophical article after positive and negative of the symbol and positive and negative correlation between the article to be detected and the comparison article are obtained; if no "common sense" support is found at all, we define as "hypothetical" and its practical value.
f. Comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the common sense G _ ID one by one, and judging that the J-th sentence is a scientific article if less than 3 common sense supports are found according to a scientific basis and a simplification principle applicable to 'axiom' after the G (i, j) and the G _ ID are compared; only statistical knowledge was applied for verification in this article. The paper is a scientific article without any doubt, but how scientific the article is, verification is also required.
According to the flow chart of fig. 1, the validity is automatically verified according to the step eleven. To determine its function and value. For example: the examination part of the 'observation of the curative effect of the budesonide suspension on the infantile acute laryngitis' is evidence-based medicine, while the examination part is not carried out, so the examination part is only an assumption. The specific technical solutions are the ones classified as philosophical, hypothesis, or scientific treatises, as described above. The module is shown in detail in the second figure. For example, in the article, the type of thesis format in the thesis database can be called according to the theme, the input interface automatically extracts and displays the module of the type of thesis and the content thereof, and the author can input the content thereof through adding/deleting/modifying, thereby forming the self thesis to be verified. On one hand, the user needs to verify whether the article is the same as the article in the database or has the maximum similarity, and on the other hand, whether the article is the article in other fields without correlation with the subject is demonstrated. On the other hand, to demonstrate the scientificity of the thesis, such as appendix, a database of theoretical bases for treating the infantile acute laryngitis by extracting hormones is extracted to guide the treatment of the acute laryngitis, and by taking the cure rate as a standard, heat-clearing and detoxifying drugs can be extracted to treat the infantile acute laryngitis, various physical therapy treatments can be extracted to treat the acute laryngitis, and antibiotics can be extracted to treat the acute laryngitis; however, the theoretical basis may be the Huangdi's classic on medicine, the Shang Han Lun, the Neischike, etc., and it should be understood that the Huangdi's classic on medicine, the Shang Han Lun, the Neischike, etc. described herein are only for the purpose of explaining the present invention and are not intended to limit the present invention; topical budesonide treatment may be found to be an effective method in that it is not bad, when compared to two factors, namely cure rate and toxic side effects. Then, apparently effective compared to no treatment. On the other hand, to demonstrate the reliability of the thesis data, similar data is analyzed through the database and contradictory may be generated with other data, however, the reliability of the comparison article is analyzed through the database selection and the automatic learning and verifying degree, and whether the thesis data is fake or plagiarism or not is obtained. On the other hand, because the assumed probabilities of accuracy and precision are different, the continuity caused by the sample size is also different, and the data continuity is required for the skewed distribution and the normal distribution, the reliability of the thesis is high if the data is continuous; if the data is not continuous, the reliability is low. In addition, the accuracy and precision thereof are criteria for the treatises. These verification methods can be realized because they are already supported by "multimedia intelligent learning system" and "management system of database", etc. It should be understood that the "multimedia intelligent learning system" and "management system of database" described herein are only used for explaining the present invention, and are not used for limiting the present invention; the article has about dozens of data, and obviously, the credibility of the article has no support strength when sample size evaluation is not carried out. Therefore, the method not only checks the articles, but also provides judgment, support and supplement for the value of the articles.
According to the flow chart of fig. 1, as shown in the step twelve, the guidance and verification of the self-learning theory and practice are automatically performed. For example: the paper proposes that the theory of budesonide for treating the infantile acute laryngitis is utilized, and people can guide practice medication; however, in practice, some are infectious, some are "fire-flaming", some are iatrogenic, and the like, and the treatment modes cannot be generalized from person to person. Then, the collected data has different effects, so that the theory is continuously revised, and the practice is guided to be developed towards the aspect of benefiting health.
According to the flow chart of fig. 1, the publication is automatically performed according to step thirteen, that is, the analysis result is output: outputting effective full-text of the thesis, and simultaneously outputting each paragraph, module and article center thought; a paper background; the thesis data; after the analysis result is output, the output interface outputs the effective papers according to the format, and the target papers are sequentially arranged according to the similarity of the target papers; and analyzing the correlation between the effective paper and the target paper. Then, automatically selecting relevant posting websites for posting, specifically referring to claim 4, and the method for thesis analysis and self-demonstration according to claim 1, synchronously outputting posting websites of target thesis with the same theme as the effective thesis, and arranging in sequence according to effectiveness, and enabling authors to click corresponding websites for posting.
While the invention has been described herein with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the scope of the disclosure. More particularly, various variations and modifications are possible in the component parts or arrangements within the scope of the disclosure, the appended drawings and the claims. In addition to variations and modifications in the component parts or arrangements, other uses will also be apparent to those skilled in the art.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail in the following with reference to the attached "examples". It should be understood that the specific paper examples described herein are for the purpose of illustration only and are not intended to limit the invention.
Appendix:
observation of curative effect of budesonide suspension on treatment of infantile acute laryngitis
Zhou Ai
The purpose of the abstract is to observe the curative effect of the budesonide suspension for treating the infantile acute laryngitis. In 67 cases of the method, patients with severe infantile acute laryngitis were divided into two groups, and 32 cases were treated, and aerosol inhalation of BUD suspension was given; in 35 cases of the control group, cortisol succinate was administered by intravenous drip, and the improvement of symptom signs was observed 5 hours after administration. Results there was no significant difference in efficacy between the treatment and control groups. Conclusion there was no significant difference in the efficacy of topical budesonide aerosolized suspension versus systemic glucocorticoids in the treatment of pediatric acute laryngitis.
Keyword pediatric acute laryngitis saccharosebum hormone budesonide suspension
The infantile acute laryngitis is a common critical disease in pediatrics, is especially caused by children under 5 years old, is emergent in onset and serious in illness state, and has great clinical significance for timely and effective rescue treatment. The observation of the curative effect comparison of the budesonide suspension atomized and inhaled by the air compression pump and the intravenous hydrocortisone succinate in treating the moderate and severe acute laryngitis of the children shows that the glucocorticoid is an important means for treating the acute laryngitis of the children, and the report is as follows:
1 data and method:
1.1 general data 67 cases of severe acute laryngitis patients who are hospitalized in our hospital from 11 months to 5 months in 2002 all meet the diagnosis standard of acute laryngitis in children [1] . The two groups are randomly divided into two groups, wherein the treatment group comprises 32 cases, 21 cases of men and women 11 cases, the age is less than or equal to 1 year old 8 cases, 1-3 years old 15 cases, 3-14 years old 9 cases, the control group comprises 35 cases, 24 cases of men and 11 cases of women, the age is less than or equal to 1 year old 12 cases, 1-3 years old 15 cases and 3-14 years old 8 cases, and the number, sex, age, course of disease and clinical manifestation of the cases in the two groups have no significant difference.
1.2 method: the treatment group is given an air compression pump (PAR 2 BOT Germany Bairui Co., ltd.) to atomize and inhale budesonide suspension, the disposable dose is 3mg, the control group is given 3-4 mg/kg/d of hydrocortisone succinate by intravenous drip, antiviral drugs, antibiotics and symptomatic treatment can be used, and clinical observation is carried out within 5 hours after the treatment.
1.3, treatment effect formulation: the effect is shown: the throat obstruction symptom is relieved after 2 to 3 hours, the hoarseness is obviously improved, the complexion is improved, the general symptoms are improved, and the throat obstruction pillow can eat and rest. The method has the following advantages: the throat obstruction symptom is relieved after 2 to 3 hours, the hoarseness is changed, the complexion is changed, and the throat can eat and rest. And (4) invalidation: the symptoms are not improved or even worsened after 2 to 3 hours, and the patient can not eat or rest.
1.4 statistical treatment: checking by card method
2 results
The treatment effect evaluation is carried out on two groups of patients, 17 cases of treatment groups are effective, 9 cases of treatment groups are effective, 6 cases of treatment groups are ineffective, 20 cases of control groups are effective, 8 cases of treatment groups are effective, 7 cases of treatment groups are ineffective, two groups of comparison p is greater than 0.05, and the difference between the two groups has no significant meaning. Ineffective patients are treated by inhaling budesonide suspension by an air pump, injecting dexamethasone intravenously and other symptoms.
3. Discussion of the related Art
The common pathogen of the infantile acute laryngitis is virus, primary bacterial infection is rare, but secondary bacterial infection is easy. The pathological bases are acute edema of the airway mucosa, especially the laryngeal mucosa, infiltration of inflammatory cells, smooth muscle spasm and airway endocrine obstruction. Children are more likely to suffer from laryngeal obstruction than adults due to the laryngeal structure, keep the airway smooth, and relieve mucosal edema and inflammation are the keys for treating acute laryngitis of children. Glucocorticoid has the function of maintaining the stability of cell membrane, is favorable to maintaining the integrity of capillary, reducing the permeability of capillary and preventing cell from autolysis and death after being damaged. Systemic administration of glucocorticoids is positively advocated by most clinicians due to its potent non-specific anti-inflammatory effect [2] . However, the systemic large dose of glucocorticoid has more side effects, in order to reduce the adverse effects of the hormone as much as possible on the basis of ensuring the curative effect, the recent research on the significance of local glucocorticoid application has been studied, budesonide, which is a non-halogenated adrenocortical hormone, has higher glucocorticoid receptor binding force, can provide medicine particles with proper diameter by means of jet atomization inhalation, enables the medicine particles to be deposited on airway mucosa, does not increase the airway resistance, directly acts on the airway, has quick response, has high local selectivity on airway inflammatory cells and strong anti-inflammatory effect.
At home and abroad, a great deal of research also provides a demonstration of the significance of the budesonide suspension in improving symptoms and shortening hospitalization time in the treatment of the infantile acute laryngitis [3] In the study, 67 patients with severe infantile acute laryngitis are treated in groups, and the results show that all indexes of the budesonide group and the control group p>0.05 had no significant difference. The budesonide atomized suspension can achieve the aims of resisting inflammation and improving the symptoms of the infantile acute laryngitisIn (1). Meanwhile, due to low systemic absorption rate, adverse effects caused by large dose of hormone are reduced to the maximum extent, and the preparation method is worthy of popularization and application.
Reference to the literature
1. Wei Nengrun, main knitted. Otorhinopathy and laryngology. Beijing: the public health press. 1987:162-164
2. Lin Zhaofen, jing Bingwen, etc. The application of dexamethasone in airway can prevent and treat acute lung injury and acute respiratory distress syndrome. Journal of chinese emergency medicine, 2004, 13 (5): 306-308
3.Cw Godden,MJ Campbell,M Hussey,JJ Cojswell,Double blind placebo.Controlled trial of nebulised budsonide for Croup.Arch Dis child 1997;76:155-158

Claims (6)

1. A method for article analysis and self-demonstration is characterized in that: the method comprises the following steps:
1) Establishing and applying a target database: determining a unit symbol and a format through the occurrence frequency, and establishing a target database by automatically inducing in a hyperlink or fuzzy query mode according to the format and the unit symbol; managing a target database through a database management system; extracting the same factors through a target database to establish an interface database;
finding a target article with a similar theme to the article content to be verified in a target database according to the article content to be verified, statistically analyzing a format which is most used by the target article, and forming an input interface by adopting the format;
the unit symbol is one of words, sentences, paragraphs or modules, and the specific expression form is one or more of symbols, characters, numbers and images;
data acquisition is carried out through an input interface: in a data input interface, inputting an article to be verified according to a format according to an input interface prompt;
2) Analyzing the article:
A. and (3) obtaining article concentration degree: extracting key words/sentences, paragraphs, modules and central ideas of the whole article of the article to be verified from the perspective of combining the structure and the function; from the angle of maximum probability density of the unit symbol, analyzing and calculating layer by layer respectively by taking the central idea as a function and taking the mean value of the data sample as a structure;
B. finding article dispersion degree: analyzing sentence composition in each paragraph in the article from the perspective of combining structure and function; from the angle of the probability distribution range of the unit symbol, analyzing and calculating layer by layer respectively by taking the background as a function and taking the variance or order of a data sample as a structure;
C. and (3) relevance verification of the article to be verified: if the variance is uniform, determining a set formed by the unit symbols, and analyzing the correlation between the unit symbols and the whole layer by layer from two angles of functions and structures by utilizing the characteristics and the similarity of the unit symbols and the set; wherein the functionality is analyzed from the belonging and inclusion of sufficient requirements; the structural property is that the correlation is analyzed from t test; meanwhile, evaluating a required sample size range to determine the testing reliability of the sample size range; if the variance is not uniform, the method is carried out by using rank sum test;
3) Verifying layer by layer: respectively verifying the relevance of data, words, sentences, paragraphs and modules and the article to be verified and the target article;
4) Classifying the articles: determining clues of genre, sequence and property through the hyperlink and the database according to the established or existing clues of the axis of the database; the method comprises the following steps of (1) checking by using a fixed structure and a sequence extracted from a target database to determine a genre and an axis clue, classifying the genre and the axis clue to form an interface database, and outputting the interface database to an output interface;
an author only needs to input own content by adding, deleting and modifying according to types and formats, so that own articles to be verified are formed, and the articles are automatically classified into 'articles in a mental level', 'articles combined with the mental level and practices', 'articles in a practice class' according to a certain level sequence;
5) Outputting an analysis result: outputting the full text of the effective article, and simultaneously outputting each paragraph, each module and the idea of the article center; an article background; the paper theory adopted; then, the genre property and classification thereof are output.
2. The method of claim 1, wherein the method comprises: the step 2) of analyzing the seal specifically comprises the following steps:
A. finding article concentration
Functional concentration ratio:
paragraph center thought: marking words as unit symbols, extracting nouns n and verbs v with the maximum occurrence frequency in the unit symbols of the kth paragraph, combining the occurrence frequencies of synonyms and similar words when the frequencies are counted, arranging and combining the nouns n and the verbs v to form search elements S (n, v), counting the times of the occurrence of the search elements S (n, v) in each paragraph, and obtaining the search elements S with the maximum occurrence frequency max (n, v) a sentence having a main/predicate/object structure in which a noun n and a verb v are combined is marked as a central concept, and S appears max (n, v) the paragraph with the most times is marked as a key paragraph; adjectives or adverbs expressed by 'the, the ground and the get' supplement and perfect the central idea of the paragraph according to the main/predicate/object structure of a sentence;
the idea of the module center is as follows: dividing the article to be verified into different modules by the counted format, extracting the noun n and the verb v with the largest occurrence frequency in the kth module, and combining the occurrence frequencies of synonyms and synonyms in the frequency statistics; marking the sentences containing nouns n and verbs v combined into a main/predicate structure as central sentences, and marking the paragraphs with the most central sentence occurrence frequency as key paragraphs; adding adjacent words for modification, and finding out an original sentence from the article and recording the original sentence as a central thought; if the number of the central sentences is large, the original sentences can not be found in the article, and the original sentences are finally output to a background for the author to summarize;
the central idea of the article is as follows: counting nouns n with most occurrences of full text cell symbols max And verb v max Marking as a central unit symbol, and obtaining a sentence with the highest frequency of occurrence after the central unit symbol is combined as an article central sentence; the module with the most times of the central sentence of the article is marked as a key module; modifying the central sentence and the added adjacent words, and finding out an original sentence from the module to be marked as a central idea; the central idea in the key module is the central idea of the article;
structural concentration ratio:
concentration of paragraphs: counting the number m of occurrences of each word in the paragraph y How many unit symbol types y, total number of unit symbols M, M ymax The value for which the number of occurrences of a word is the greatest,
Figure FDA0003924230870000021
is the average of the number of occurrences of a word,
Figure FDA0003924230870000022
respectively represent the number of occurrences of
Figure FDA0003924230870000023
Two words of, and the number of occurrences is m ymax The combination formed by the words is the distribution state of the center of the paragraph, namely the distribution range of the central thought, and the range is the concentration ratio of the paragraph; m is a unit of ymax Location as global mean; average each type of unit symbol as
Figure FDA0003924230870000024
Is disclosed in
Figure FDA0003924230870000025
The words of (a) tend to occur in pairs; then so on, a paragraph is made up of unit symbols; the module is also composed of unit symbols, wherein the unit symbols are paragraphs/sentences; the concentration of the module, the concentration of the articles and the concentration of the database are obtained in the same way as the steps;
B. obtaining the paper discrete degree, namely analyzing the sentence composition in each paragraph in the article:
functional dispersion: concentration is the central idea, and dispersion is the composition of the whole article; the central idea of concentration is composed of "central unit symbols"; therefore, the dispersion is all sentences formed by the unit symbols, and the sentences form a database; therefore, the potential structure of the sentence is added in the design, and the sentence is found from the article or the database for matching; deducing by applying the relation between the inclusion and the included, if the relation is correct, obtaining the 'dispersion' of the relation;
structural dispersion: counting the number m of occurrences of each unit symbol in the paragraph y How many unit symbol types y, the total number of unit symbols M,
Figure FDA0003924230870000031
the average value of the occurrence times of each kind of words; using formula to obtain total mean and variance
Figure FDA0003924230870000032
Wherein x 1 ,x 2 、…x n N is the number of data for each data; variance (variance)
Figure FDA0003924230870000033
Wherein, the variance value is the dispersion thereof; if the unit symbol is a word, the variance of how many words of the sentence is formed, if the unit symbol is a sentence, the variance of how many sentences of the paragraph is formed, and so on; the dispersion of the module and the article and the dispersion of the database are obtained in the same way as the steps;
C. and (3) relevance verification of the article to be verified:
functionally, finding out a central unit symbol from a database, wherein the central unit symbol is either a subject word or a central idea or a key paragraph or a key module, and forms the concentration ratio of the database; then, the determined central thought and background and the explanation of the central thought are found out through sufficient and necessary conditions; selecting articles under the same 'background' condition, comparing 'central unit symbols', analyzing by the relation between inclusion and inclusion by means of a database to obtain whether the two are related, and then comprehensively judging by combining with the structural correlation analysis;
structurally, the number m of occurrences of a unit symbol y How many kinds of unit symbols y, the total number of unit symbols M, on average per unit symbolOne type of unit symbol is
Figure FDA0003924230870000034
Is disclosed in
Figure FDA0003924230870000035
The words of (a) are often to be presented in pairs,
Figure FDA0003924230870000036
respectively represent the number of occurrences of
Figure FDA0003924230870000037
Two words of, and the number of occurrences is m ymax The combination formed by the word (1) is the distribution state of the center of the paragraph, namely the distribution range of the central thought; if the variances are uniform, namely the words of the sentences are approximately the same or the paragraphs of the sentences are approximately the same, and so on, the 'unit symbols' which are matched with the words most frequently are counted to be combined into a 'central idea' for comparison, so that whether the structures of each unit symbol, the whole sentence, the whole natural segment, the whole module and the whole article are related or not can be known;
if the variance is not uniform, adopting rank sum test;
during the test, if the variance is uniform, both α and β are set to 80%; the sample size can be calculated by 80%, and when the sample size is reached, the result is obtained after the hypothesis test is automatically carried out; wherein, the formula of sample size is:
Figure FDA0003924230870000041
or
Figure FDA0003924230870000042
Wherein σ is the total standard deviation; delta is the difference in the overall parameter, in the two-mean Z test, noted as delta = mu 12 Mu is the overall average, and the larger delta is, the more likely it is to obtain two sample averages with larger differences in sampling; wherein the content of the first and second substances,
Figure FDA0003924230870000043
the former is a normal distribution formula, and the latter is a binomial distribution formula; the two sets of sample sizes obtained from 68.27% and 80% are used as a boundary against which known sample sizes are compared to determine the confidence level of the correlation.
3. The method of claim 1, wherein the method comprises: the step 3) of layer-by-layer verification comprises the following specific steps:
according to the central thought of the article, searching is carried out in a database, similar or same articles are classified and extracted as target articles, if the variances are uniform, the target articles are subjected to chi-square test and t test or z test and chi test according to a certain axis clue 2 Continuously correcting the statistics to determine whether the article to be verified is related to the target article; specifically, the axis clue is a certain sequence of the central unit symbols, or a time sequence, or a magnitude sequence, or a speed sequence; further, the central unit symbol is used as a base stone, the database is used as a basis, the database is checked, the S (n, v) of the article to be verified and the target article S1 (n, v) are subjected to t-test, and the similarity between the article to be verified and the target article is analyzed;
if the variance is not uniform, sorting and then carrying out rank sum test; structurally, whether each unit symbol is related to the structure of the whole sentence, the whole natural segment, the whole module and the whole article can be known through rank sum test; if relevant, the description is the same subject, and the structure is reasonable; further, if the structure is relevant and the function is a sufficient condition, the article to be verified is prompted to be an invalid article; if the functional relevance is sufficient or necessary or structural relevance is only one item and not all items are available, the article is valid; if the structure and the function are different, the article belongs to an invalid article in different fields.
4. The method of claim 1, wherein the method comprises: the step 4) of article classification comprises the following specific steps:
after automatic analysis, according to the spirit level of 'mysterization, fairy tale and science fiction', as long as the idea is good, the literary harvest is good, and the logical argument is not needed to be directly output, the analysis result is a 'spirit level' article, after output, a contribution website is extracted, and the article automatically contributes to corresponding website contribution;
further according to the "literature is derived from common general knowledge, but sublimes into a spirit; the philosophy is derived from spirit and reverts to the principle of common knowledge, and combines the established 'myth, fairy tales, science fiction level database' and 'common knowledge database'; determining the category of literature and philosophy according to a method for comparing the causal sequence of the database; the analysis result is an article which is from life service to spirit or from spirit service to life level, and after the article is output, a contribution website is extracted for automatic contribution; the scientific category is based on the principle of unlimited demonstration and application of common knowledge derived from common knowledge; the output interface outputs the effective articles according to formats, and synchronously outputs the target articles with the same subjects as the effective articles.
5. The method of claim 1, wherein the method comprises: in the step 4), classification is performed on articles by applying matrix calculation to the discussion papers, and the method specifically comprises the following steps:
firstly, decomposing a sample into units, forming corresponding data to construct a matrix by taking a structure as a model and taking a common knowledge database as a basis; the comparison paper and the paper to be verified are verified for the verification of philosophy/literature, scientificity and irrelevance;
the method comprises the following steps:
a. establishing a common knowledge base, wherein the common knowledge base refers to common knowledge which is already demonstrated by modern natural science, and is marked as G _ ID by taking 'number, theory and chemistry' teaching materials as a basis; meanwhile, a database of the levels of psychiatry, fairy tales and science fiction is established;
b. comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the related common sense G _ ID one by one to form a matrix K, and marking as negative correlation and marking as K (i, j) = -1 when G (i, j) is compared with the G _ ID and is opposite to the corresponding common sense; when G (i, j) is compared with G _ ID, no corresponding common sense support is found, and it is marked as irrelevant, and it is marked as k (i, j) =0; when G (i, j) is compared with G _ ID, finding the corresponding common sense support, then marking as positive correlation, and marking as k (i, j) =1;
matrix K = [ K (1,1) … … K (i, j) … … K (l, m) ];
c. comparing the jth statement G1 (i, j) in the ith section of the target paper with the related common knowledge G _ ID one by one to obtain a matrix K1, wherein the matrix K1= [ K1 (1,1) … … K1 (i, j) … … K1 (l, m) ];
d. multiplying the matrix K and the matrix K1 pairwise, summing the multiplied matrixes and obtaining a value after summing the sum sigma K (i, j) × K1 (i, j), and positively correlating the matrix K and the matrix K when the sum is a positive value; if the value is negative, the two are negatively correlated; when the value is 0, the two are not related;
e. after positive correlation, negative correlation and irrelevance are selected in d, a value is obtained by summing sigma K (i, j) + K1 (i, j), and when the signs of the positive correlation and the negative correlation are the same, a positive value is larger than that of a matrix K, the positive correlation and the matrix K are positive correlations and are scientific; when the signs of the two are the same, a positive value is obtained and is equal to the matrix K, the article to be verified is irrelevant, and a large amount of irrelevant content is contained in the article to be verified, so that a philosophical/literary trend is prompted, and the philosophy/literary trend is defined as 'imagination' and the practical value of the philosophy/literary trend is expressed; when the signs of the two are opposite, a positive value is obtained, so that the positive value has certain philosophical/literary guidance; if the value is negative or zero, the two are negatively correlated, which is 'pseudo-scientificity', and belongs to the mental level; then, selecting a relatively valuable paper, namely judging the possibility of being a relatively scientific article or a philosophical/literary article after the positive and negative of the symbol and the positive and negative correlation between the article to be detected and the comparison article are obtained;
f. comparing the jth sentence G (i, j) in the ith section of the paper to be verified with the common sense G _ ID one by one, and judging that the J-th sentence is a scientific article if less than 3 common sense supports are found according to a scientific basis and a simplification principle applicable to 'axiom' after the G (i, j) and the G _ ID are compared;
g. similarly, the methods a, b, c, d and e can also be used to determine the articles in the "psychology, fairy tale and science fiction level" to determine that the philosophy/literature is the "article combining mental level and practice", and specifically, determine the philosophy article or the scientific article according to the cause and effect relationship of the "mental level" appearing first or the "common general knowledge" in the axial clues of the articles, wherein the scientific article is the "article in practice".
6. The method of claim 1, wherein the method comprises: said step 3) outputting the analysis result after the paper to be verified overcomes the multiple correlation problem, specifically, according to the statistical analysis result, performing the scene application, eliminating the existence of one-sided problem caused by the cross problem of multiple correlation, and further passing the multiple correlation
Figure FDA0003924230870000061
A linear evaluation was performed.
CN201910382217.6A 2019-05-09 2019-05-09 Article analysis and self-demonstration method Active CN110096710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910382217.6A CN110096710B (en) 2019-05-09 2019-05-09 Article analysis and self-demonstration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910382217.6A CN110096710B (en) 2019-05-09 2019-05-09 Article analysis and self-demonstration method

Publications (2)

Publication Number Publication Date
CN110096710A CN110096710A (en) 2019-08-06
CN110096710B true CN110096710B (en) 2022-12-30

Family

ID=67447471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910382217.6A Active CN110096710B (en) 2019-05-09 2019-05-09 Article analysis and self-demonstration method

Country Status (1)

Country Link
CN (1) CN110096710B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114861641B (en) * 2022-07-05 2022-09-20 北京拓普丰联信息科技股份有限公司 Data extraction method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2436740A1 (en) * 2001-01-23 2002-08-01 Educational Testing Service Methods for automated essay analysis
CN103440232A (en) * 2013-09-10 2013-12-11 青岛大学 Automatic sScientific paper standardization automatic detecting and editing method
CN103605729A (en) * 2013-11-19 2014-02-26 段炼 POI (point of interest) Chinese text categorizing method based on local random word density model
CN106663087A (en) * 2014-10-01 2017-05-10 株式会社日立制作所 Text generation system
CN107967257A (en) * 2017-11-20 2018-04-27 哈尔滨工业大学 A kind of tandem type composition generation method
CN108491429A (en) * 2018-02-09 2018-09-04 湖北工业大学 A kind of feature selection approach based on document frequency and word frequency statistics between class in class
CN108549625A (en) * 2018-02-28 2018-09-18 首都师范大学 A kind of Chinese chapter Behaviour theme analysis method based on syntax object cluster
CN108595407A (en) * 2018-03-06 2018-09-28 首都师范大学 Evaluation method based on the argumentative writing structure of an article and device
CN109376238A (en) * 2018-09-14 2019-02-22 大连理工大学 A kind of paper degree of correlation quantization method based on bibliography list degree of overlapping

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2436740A1 (en) * 2001-01-23 2002-08-01 Educational Testing Service Methods for automated essay analysis
CN103440232A (en) * 2013-09-10 2013-12-11 青岛大学 Automatic sScientific paper standardization automatic detecting and editing method
CN103605729A (en) * 2013-11-19 2014-02-26 段炼 POI (point of interest) Chinese text categorizing method based on local random word density model
CN106663087A (en) * 2014-10-01 2017-05-10 株式会社日立制作所 Text generation system
CN107967257A (en) * 2017-11-20 2018-04-27 哈尔滨工业大学 A kind of tandem type composition generation method
CN108491429A (en) * 2018-02-09 2018-09-04 湖北工业大学 A kind of feature selection approach based on document frequency and word frequency statistics between class in class
CN108549625A (en) * 2018-02-28 2018-09-18 首都师范大学 A kind of Chinese chapter Behaviour theme analysis method based on syntax object cluster
CN108595407A (en) * 2018-03-06 2018-09-28 首都师范大学 Evaluation method based on the argumentative writing structure of an article and device
CN109376238A (en) * 2018-09-14 2019-02-22 大连理工大学 A kind of paper degree of correlation quantization method based on bibliography list degree of overlapping

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
体裁分析的应用:应用语言学学术文章结构分析;杨瑞英;《外语与外语教学》;20061010(第10期);第29-34页 *

Also Published As

Publication number Publication date
CN110096710A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
US10713440B2 (en) Processing text with domain-specific spreading activation methods
Abacha et al. MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies
Foufi et al. Mining of textual health information from Reddit: Analysis of chronic diseases with extracted entities and their relations
Deléger et al. Detecting negation of medical problems in French clinical notes
Wang et al. Information needs mining of COVID-19 in Chinese online health communities
Wang et al. Sideeffectptm: An unsupervised topic model to mine adverse drug reactions from health forums
Yin et al. Question answering system based on knowledge graph in traditional Chinese medicine diagnosis and treatment of viral hepatitis B
CN110096710B (en) Article analysis and self-demonstration method
Liu et al. Extracting patient demographics and personal medical information from online health forums
Zhou et al. Context-sensitive spelling correction of consumer-generated content on health care
Su et al. Developing a cardiovascular disease risk factor annotated corpus of Chinese electronic medical records
Chirila et al. Named Entity Recognition and Classification for Medical Prospectuses.
Caliskan et al. First steps to evaluate an NLP tool’s medication extraction accuracy from discharge letters
Cho et al. Identifying medications that patients stopped taking in online health forums
Allen A local grammar of cause and effect: A corpus-driven study
Han et al. Weibo users perception of the COVID-19 pandemic on Chinese social networking service (Weibo): sentiment analysis and fuzzy-c-means model
Wu et al. Evaluation of the nursing intervention classification for use by flight nurses
Zhang et al. Natural language processing in medicine
Drosatos et al. DUTH at TREC 2015 Clinical Decision Support Track.
Desai et al. Su1538 clinical efficacy and outcomes of ercp for the management of bile duct leaks: a nationwide cohort study
Goodwin et al. Cohort Sherpherd II: Verifying Cohort Constraints from Hospital Visits.
Kokkinakis Medical Event Extraction using Frame Semantics-Challenges and Opportunities.
Hina et al. SnoMedTagger: A Semantic Tagger for Medical Narratives.
Kokkinakis Initial experiments of medication event extraction using frame semantics
Mesrahi et al. The effect of cognitive-behavioral group therapy on decrease in addiction relapse in randomly assigned addicts under drug therapy: A statistical analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant