WO2020191993A1 - Method for syntactic parsing of natural language - Google Patents

Method for syntactic parsing of natural language Download PDF

Info

Publication number
WO2020191993A1
WO2020191993A1 PCT/CN2019/100638 CN2019100638W WO2020191993A1 WO 2020191993 A1 WO2020191993 A1 WO 2020191993A1 CN 2019100638 W CN2019100638 W CN 2019100638W WO 2020191993 A1 WO2020191993 A1 WO 2020191993A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
predicate
unit
infinitive
gerund
Prior art date
Application number
PCT/CN2019/100638
Other languages
French (fr)
Chinese (zh)
Inventor
秦一男
朱江
Original Assignee
北京语自成科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京语自成科技有限公司 filed Critical 北京语自成科技有限公司
Publication of WO2020191993A1 publication Critical patent/WO2020191993A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the invention relates to the field of computer data processing, in particular to a method of natural language syntax analysis.
  • Natural language processing is a very important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language.
  • Syntactic analysis is one of the key tasks in natural language processing (NLP).
  • NLP natural language processing
  • the basic task of syntactic analysis is to determine the syntactic structure of a sentence or the interdependence between words within a sentence.
  • the Probabilistic Context Free Grammars (PCFG method for short) is a technique widely used in the field of computer science.
  • the PCFG method calculates the matching probability of syntactic rules and selects the syntactic analysis result with the highest probability as the final syntactic structure.
  • Dependency Parsing is also a syntactic analysis technique often used in the field of computer science.
  • Berkeley Parser and Stanford Parser are two internationally leading natural language syntactic analysis devices recognized by the computer science community today. These two kinds of natural language syntax analysis devices both use the lexicalized PCFG method (Lexicalized Probabilistic Context-Free Grammars). While using the lexicalized PCFG method to make syntactic analysis results, Stanford Parser also gave the syntactic analysis results made using dependency analysis methods.
  • PCFG method Logicalized Probabilistic Context-Free Grammars
  • the erroneous syntactic analysis results given by Stanford Parser mentioned below include not only the results made by Stanford Parser using the lexical PCFG method, but also the results made by Stanford Parser using the dependency analysis method, that is, Stanford Parser Both the lexicalized PCFG method and the dependency analysis method are wrong results.
  • 1 is the main sentence, which is the core sentence of the whole sentence; 3 is the object of 1, that is, the object clause; 2 is the attributive clause, which modifies men; That is the qualifier, which modifies men.
  • That modifies men is wrong, That as a qualifier cannot modify the plural of a noun; the liberals wasn't remarked is wrong, and the singular and plural of the subject and the predicate are not properly matched.
  • the correct result of this sentence should be: wasn't remarked up by the press is the core sentence of the whole sentence, that is, the core subject-predicate collocation of the whole sentence; that men didn't other the liberals is the subject in the core sentence, that is , The subject clause in the core sentence; who were appointed is the attributive clause, which modifies men. That in this sentence should be parsed as a subordinate conjunction of the leading subject clause. In English, unless the subject clause is surrounded by left and right quotation marks, the subordinating conjunction that that leads the subject clause cannot be omitted, even in spoken language.
  • 1 is the core sentence of the whole sentence, which is the core subject-predicate collocation of the whole sentence; 2 is the attributive clause, which modifies the indefinite pronoun something; That is the qualifier, which modifies something. In the result, That modifies something is wrong.
  • an indefinite pronoun something cannot be modified by any qualifier, and of course it cannot be modified by the qualifier that. learned and is wrong cannot be classified under the same verb phrase, learned and is wrong are two different predicates that belong to two clauses respectively.
  • the correct result of this sentence should be: is known to the public is the core sentence of the whole sentence, that is, the core subject-verb collocation of the whole sentence; That something is wrong is the subject in the core sentence, that is, the subject clause in the core sentence ; That is the subordinate conjunction that leads the subject clause; you learned is the attributive clause, which modifies something.
  • each sentence has a subject clause guided by the subordinating conjunction that, and both have an attributive clause that can be regarded as being inserted into the aforementioned subject clause in a way of overall insertion.
  • all English sentences with the above-mentioned syntactic structure characteristics will often be parsed by Berkeley Parser and Stanford Parser with serious errors!
  • the inventor of this patent application will give the following mathematical model, which is denoted as the Q model.
  • the aforementioned two sentences are sentences that conform to the Q model.
  • the specific meaning of the Q model will be explained in the subsequent example operations.
  • S is an English sentence, and there are at least the following three subject-predicate collocations in S (represented by 6-element functions):
  • the inventor of this patent application used a set of syntactic parsers developed in China to compare with Berkeley Parser and Stanford Parser.
  • This set of syntactic parser developed in China uses the lexical PCFG method, which has the same technical principles as Berkeley Parser and Stanford Parser, and the parsing effect is very similar.
  • the lexical analysis result of this example sentence is limited to That/IN men/NNS who/WPwere/VBDappointed/VBNdid/VBDn't/RBbother/VBthe/DTliberals/NNSwas/VBD n't/RBremarked/VBNupon/INby/INthe/DTpress/NN./. This is also a lexical analysis result that can be considered correct in English linguistics.
  • the lexical analysis result of this example sentence is limited to That/INsomething/NN you/PRP learned/VBDis/VBZ wrong/JJis/VBZ known /VBN to/TO the/DT public/NN./.
  • This is a lexical analysis result that can be considered correct in English linguistics. It is required to provide 1000 syntactic analysis results with the highest probability, and the results are increased according to the probability To the small arrangement.
  • the result of syntactic analysis with the highest ranking 52 is a result that can be considered correct in English linguistics. The results before the 52nd ranking are all incorrect.
  • the corpus constructs a characteristic corpus, and then use a syntactic parser developed based on the PCFG method (including the lexicalized PCFG method) Analyze each sentence in the corpus, for example: use Berkeley Parser and Stanford Parser for analysis, then the recall rate will be very low.
  • PCFG method including the lexicalized PCFG method
  • the first sentence above was given by the linguist David R. Dowty in a linguistic monograph written by him; the second sentence was extracted from an English poem by the linguist.
  • the above 12 sentences all contain the omission of the clause guiding words; in English, the clause guiding words are not arbitrarily omitted, and the omission of the clause guiding words must meet the grammatical requirements; the above 12 sentences contain the omission of the clause guiding words , All meet the requirements of English grammar.
  • the syntactic analysis results given by Berkeley Parser and Stanford Parser on the above 12 sentences are still wrong!
  • the inventor of this patent application believes that after the first type of technical vulnerabilities, the second type of technical vulnerabilities are likely to be another technical blind spot and blind spot of Berkeley Parser and Stanford Parser. It is also another theoretical and technical bottleneck of the current PCFG method (including the lexicalized PCFG method). For the PCFG method (including the lexicalized PCFG method), it is difficult to completely break through this bottleneck within the existing theoretical and technical framework. Due to space limitations, I will not elaborate too much.
  • the probability that the subject clause guided by that acts as the subject of a sentence is usually much less than the probability that a noun acts as the subject of the sentence; but from the perspective of natural language, that guided by that The subject clause can act as the subject of the sentence, which is also a basic syntactic function derived from the definition of English itself, and it is also a possibility of the definition of English itself. Therefore, the subject clause guided by that acts as the subject of the sentence and the noun acts as the subject of the sentence.
  • the probability difference in linguistic theory is much smaller than the probability difference reflected in the English corpus.
  • the PCFG method (including the lexicalized PCFG method) has insufficient countermeasures and is not in place.
  • Lexical analysis, syntactic analysis and semantic analysis are in a relationship of mutual reference and mutual restraint.
  • lexical analysis, syntactic analysis, and semantic analysis are usually carried out independently of each other, and lexical analysis is done independently without relying on syntactic analysis.
  • This arrangement mainly considers the computational complexity and model complexity in natural language processing engineering.
  • this arrangement is likely to seriously affect the accuracy of the syntactic analysis results, that is, if the computer makes a misjudgment in the lexical analysis link, then this misjudgment cannot be corrected at all in the other links of the syntactic analysis that will be performed next.
  • constraints which have a negative impact on the accuracy of the syntactic analysis results.
  • the purpose of the present invention is to provide a natural language syntactic analysis method, including:
  • word list (i) For each word list (i), read the sentence data structure to be parsed after the aforementioned preprocessing: if there is a predicate verb unit in the sentence to be parsed, then generate a word list (ii); There is no predicate verb unit in the sentence, then the sentence is analyzed by the method of probability combined with syntactic rules or the dependency analysis method, and the result of the aforementioned analysis is used as the final analysis result of the computer, and then the corresponding word list is cleared (i) And does not generate a word list (ii);
  • the predicate vector includes a parallel guide element, a subordinate guide element, a subject element, a predicate element, a first position object element, and a second position object element;
  • the predicate element is the corresponding predicate verb unit, or the corresponding adjacent predicate verb combination unit;
  • the predicate element number is the corresponding predicate verb unit number, or the corresponding adjacent predicate verb combination unit number ;
  • the possible value of the coordinate introductory element is one of the coordinate related word units used to connect sentences with a number less than the corresponding predicate element number, or an empty unit; the coordinate related word unit that is not used to connect sentences cannot be used as a coordinate introductory The possible values of the element;
  • the possible value of the subordinate introductory element is one of the subordinate related word units whose number is smaller than the corresponding predicate element number, or one of the adjacent and juxtaposed subordinate related word combination units whose number is smaller than the corresponding predicate element number, or the number is smaller than One of the interrogative unit of the corresponding predicate element number, or one of the adjacent interrogative combination units with a number smaller than the corresponding predicate element number, or an empty unit;
  • the possible value of the subject element is one of the basic noun units whose number is less than the corresponding predicate element number, or one of the adjacent and parallel basic noun combination units whose number is less than the corresponding predicate element number, or the number is less than the corresponding One of the infinitive vectors corresponding to the infinitive element of the predicate element number, or a gerund whose number is less than the corresponding predicate element number-the gerund corresponding to the present participle element-one of the present participle vectors, or one of the corresponding predicate element numbers One of the predicate vectors corresponding to the predicate element, or an empty unit;
  • the possible value of the object element in the first position is one of the basic noun units whose number is greater than the number of the corresponding predicate element and less than the number of the first predicate element that appears after the predicate element, or the number is greater than the corresponding predicate element
  • the element number is less than one of the adjacent basic noun combination units of the first predicate element number that appears after the predicate element, or the number is greater than the corresponding predicate element number and less than the first predicate element that appears after the predicate element.
  • the corresponding predicate element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement
  • the corresponding object element in the first position is a basic noun unit or an adjacent basic noun Combination unit
  • the possible value of the object element at the second position is one of the basic noun units with a number greater than the number of the corresponding object element at the first position and less than the number of the first predicate element that appears after the predicate element, or One of the adjacent basic noun combination units whose number is greater than the number of the corresponding object element in the first position and less than the number of the first predicate element that appears after the predicate element, or corresponds to the predicate element whose number is greater than the corresponding predicate element One of the predicate vectors of, or an empty unit; if the corresponding predicate element is a unit composed of a verb that can accept a double object or a verb that can be combined with an object complement
  • the infinitive vector includes infinitive elements, infinitive first-position object elements, and infinitive second-position object elements;
  • the infinitive element is the corresponding infinitive verb unit, or the corresponding adjacent infinitive verb combination unit;
  • the infinitive element number is the corresponding infinitive verb unit number, or the corresponding adjacent infinitive infinitive Verb combination unit number;
  • the possible value of the object element in the first position of the infinitive is one of the basic noun units whose number is greater than the number of the corresponding infinitive element and less than the number of the first predicate element that appears after the infinitive element, or the number is greater than the corresponding The number of the infinitive element of and is less than one of the adjacent basic noun combination units of the first predicate element number that appears after the infinitive element, or the number is greater than the number of the corresponding infinitive element and less than the number of the infinitive element One of the infinitive vectors corresponding to the infinitive element of the first predicate element number that appears after the element, or one of the infinitive vectors whose number is greater than the corresponding infinitive element number and less than the number of the first predicate element that appears after the infinitive element Noun-the gerund corresponding to the present participle element-one of the present participle vectors, or one of the predicate vectors corresponding to the predicate element with
  • the possible value of the object element in the second position of the infinitive is a basic number greater than the number of the object element in the first position of the corresponding infinitive and less than the number of the first predicate element that appears after the infinitive element
  • the object element in the first position of the corresponding infinitive is a basic noun unit or an adjacent basic Noun combination unit
  • the gerund-present participle vector includes gerund-present participle element, gerund-present participle first position object element, gerund-present participle second position object element;
  • the gerund-present participle element is the corresponding gerund-present participle unit, or the corresponding adjacent gerund-present participle combination unit;
  • the gerund-present participle element number is the corresponding gerund-present participle Unit number, or corresponding adjacent parallel gerund-present participle combination unit number;
  • the possible value of the object element in the first position of the gerund-present participle is a basic number greater than the number of the corresponding gerund-present participle element and less than the number of the first predicate element that appears after the gerund-present participle element
  • the possible value of the object element in the second position of the gerund-present participle is that the number is greater than the number of the object element in the first position of the corresponding gerund-present participle and is smaller than the object element number in the first position of the gerund -One of the basic noun units of the first predicate element number that appears after the present participle element, or the number is greater than the corresponding gerund-number of the object element in the first position of the present participle and less than the number that appears after the gerund-present participle element One of the adjacent and juxtaposed basic noun combination units of the first predicate element number, or one of the predicate vectors corresponding to the predic
  • the past participle vector includes past participle elements and past participle object elements
  • the past participle element is the corresponding past participle unit, or the corresponding adjacent past participle combination unit; the past participle element number is the corresponding past participle unit number, or the corresponding adjacent past participle combination unit number ;
  • the possible value of the past participle object element is that the number is greater than the number of the corresponding past participle element and less than One of the basic noun units of the first predicate element number that appears after the past participle element, or the number greater than the corresponding past participle element number and less than the first predicate element number that appears after the past participle element
  • the corresponding past participle element is composed of neither a double object nor an object Combining the unit composed of the verb of the object complement, then the value of the object element of the past participle is the empty unit; wherein, the verb that can be accessed by the double object or the verb that can be combined
  • Verbs that cannot accept a double object or an object combined with an object complement can be summarized and given in advance by querying a dictionary or statistics; define the verbs that can accept a double object or a verb that can accept an object combined with an object complement and The described verbs that can neither accept double objects nor accept objects combined with object complements help to reduce the complexity of calculation;
  • the preposition vector includes a preposition element and a preposition object element
  • the preposition element is a corresponding preposition unit, or a corresponding adjacent preposition combination unit;
  • the preposition element number is a corresponding preposition unit number, or a corresponding adjacent preposition combination unit number;
  • the possible value of the preposition object element is the first basic noun unit whose number is greater than the number of the corresponding preposition element and appears after the preposition element, or the number is greater than the number of the corresponding preposition element and appears after the preposition element.
  • the first adjacent basic noun combination unit, or the first gerund-present participle vector whose number is greater than the corresponding preposition element number and appears after the preposition element, or the number is greater than the corresponding preposition element number and is The first infinitive vector that appears after the preposition element, or the preposition vector corresponding to the preposition element whose number is greater than the corresponding preposition element number and is adjacent to the number sequence of the preposition element number, or the preposition vector that is greater than the corresponding preposition element number
  • auxiliary vectors The infinitive vector, the gerund-present participle vector, the past participle vector and the preposition vector are collectively referred to as auxiliary vectors; for each auxiliary vector in the sentence to be parsed, any possible value corresponding to the auxiliary vector is selected. In this way, a set of possible values corresponding to all auxiliary vectors is obtained; the possible values corresponding to the aforementioned set of all auxiliary vectors are regarded as a set, which is called an auxiliary system;
  • a normative backbone system, a normative auxiliary system and a residual noun system corresponding to each other in the manner described in S7.1 constitute an A-B-C joint system;
  • each slot can receive at most one vector in an overall blanking operation, or no vector, that is, no blanking operation ;
  • the vector that constructs a space and receives other vectors into the space is recorded as the received vector;
  • the vector that inserts the space of other vectors is recorded as the inserted vector ;
  • the vector in the ABC joint system before the equal substitution is called the I type vector
  • the vector in the ABC joint system after the equal substitution It is called a type II vector; obviously, a certain type I vector and a certain type II vector can be the same vector, that is, a vector may not change before and after the equivalent substitution;
  • the generated vector is denoted as [ ⁇ ] i + ⁇ ; the vectors obtained through the overall blanking operation in the ABC joint system are collectively referred to as type III vectors; the order value of the overall blanking labeling in each round is limited to this Used in a round of overall plug-in process;
  • the overall insertion operation given in S8.3 is repeatedly executed in the following way: take the newly generated vector obtained from the previous round of overall insertion operation as a new round of overall Insert the received vector of the null operation, and any type II vector that has not been used in any previous steps is used as the insertion vector of the new round of the overall null operation; repeat the overall insert operation until all the II types After all the vectors are inserted into the space, it is recorded as the exhaustion of all the insertion vectors, and a type III vector is obtained while all the vectors are inserted. The type III vector obtained while inserting the exhaustion into the vector is recorded as the combined vector; S8.3 Contains 2 types of overall blanking operation methods.
  • the previous and subsequent steps should be consistent; arrange the type II vectors used in each round of the overall blanking operation in order, Until all the insertion vectors are exhausted, a blanking scheme corresponding to the ABC joint system is formed; the operations from S8.2 to S8.4 are repeated to exhaust every round of blanking operations involved in the blanking scheme Receiving the space corresponding to each element in the vector, that is, each combined vector involved in the exhaustive insertion scheme;
  • S8.7 Use the principle of multiplication in combinatorics to exhaust all ABC joint systems corresponding to each word list (ii); further, by permuting and combining all type II vectors in each ABC joint system, exhaustive All the blanking schemes corresponding to each ABC joint system; further, the operations from S8.2 to S8.6 are repeated for each blanking scheme until all the stitching vectors corresponding to each blanking scheme are exhausted;
  • Syntactic rule check Use the syntactic rules of natural language, and use the method of probability combined with syntactic rules or dependency analysis method to check each reasonable combination vector and its corresponding ABC joint system; the aforementioned use Syntactic rules inspection should include the use of event object verbs and non-event object verbs; the event object verbs refer to verbs in natural language that can only use events as objects but not people or things as objects; The non-event object verbs refer to verbs in natural language that can only take people or things as objects, but not events; event object verbs and non-event object verbs can be summarized in advance by querying a dictionary or statistics Give
  • Residual noun check use probability combined with syntactic rules or dependency analysis method to find reasonable residual nouns and unreasonable residual nouns, and discard the A-B-C joint system containing unreasonable residual nouns;
  • step S1 includes:
  • word list includes words and word correspondences The attributes of the words, the position information of the words in the sentence, punctuation marks and their position in the sentence;
  • the step S2 includes:
  • Predicate verb unit subordinate related word unit, basic noun unit, infinitive verb unit, gerund-present participle unit, past participle unit, preposition unit, adjacent predicate verb combination unit, adjacent parallel subordinate related words Combination unit, adjacent parallel basic noun combination unit, adjacent parallel infinitive verb combination unit, adjacent parallel gerund-present participle combination unit, adjacent parallel past participle combination unit, adjacent parallel preposition combination unit ;
  • the word list (ii) includes the aforementioned words, the attributes corresponding to the aforementioned words, and the comparison of the aforementioned words according to the natural language sequence
  • the numbers and main punctuation marks are marked in descending order of numbers.
  • the step S3 includes:
  • the predicate vector includes a parallel guide element, a subordinate guide element, a subject element, a predicate element, a first-position object element, and a second-position object element;
  • the inspection program of S3.3 is executed synchronously to prevent the generation of unreasonable backbone systems.
  • Figure 1 is a screenshot of the wrong analysis result of the example sentence "That men who wereappointeddidn'tbother the liberals wasn'tremarkedupon by the press" made by Berkeley Parser;
  • Figure 2 is a screenshot of the wrong analysis result of the example sentence "Thatsomething you learned is wrong is known to the public.” made by Berkeley Parser;
  • Fig. 3 is a schematic diagram of the first correct analysis result for the example sentences "That men who were appointed, didn't other, the liberals wasn't remarked, up by the press.” provided by the present invention
  • Fig. 4 is a schematic diagram of the second correct analysis result for the example sentences "That men who were appointed did not have the liberals wasn't remarked up by the press.” provided by the present invention
  • Fig. 5 is a schematic diagram of the correct analysis result of the example sentence "That something you learned is wrong is known to the public.” provided by the present invention.
  • Figure 6 is a screenshot of the wrong parsing result of the example sentence "That that men were appointed didn't other the liberals wasn't remarked up by the press.” made by Berkeley Parser;
  • Figure 7 is a screenshot of the wrong parsing result of the example sentence "That that that men were appointed didn't other the liberals wasn't remarked up by the press upset many women.” by Berkeley Parser;
  • FIG. 8 is a schematic diagram of the correct analysis result of the example sentence "That that men were appointed did not't other the liberals wasn't remarked up by the press.” provided by the present invention.
  • Fig. 9 is a schematic diagram of the correct analysis result of the example sentences "That that that men were appointed the liberals wasn't remarked upon by the press upset many women.” provided by the present invention.
  • Figure 10 is the correct analysis result of the example sentence "Behaviorists suggest the child who is raised in an environment where there are many stimuli which develop his or her capacity for appropriate response response greater" provided by the present invention.
  • FIG. 11 is a schematic diagram of the correct analysis result of the example sentence "Believing that what he wants, Tom works hard in the company.” provided by the present invention.
  • Figure 12 is a screenshot of the wrong analysis result of the example sentence "A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10destinations in the world.” made by Berkeley Parser;
  • Figure 13 is a schematic diagram of the correct analysis result of the example sentence "A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10 destinations in the world.” provided by the present invention.
  • Figure 14 is a screenshot of the wrong analysis result of the example sentence "That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.” made by Berkeley Parser;
  • Fig. 15 is a schematic diagram of the correct analysis result of the example sentence "That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.” provided by the present invention.
  • Figure 16 is a screenshot of the error analysis result of the example sentence "Jack met the patient the nurse" by Berkeley Parser; the clinic had hired sent to the doctor;
  • Figure 17 is a schematic diagram of the correct analysis result of the example sentence "Jack met the patient the nurse the clinic had hired to the doctor" provided by the present invention.
  • Figure 18 is a screenshot of the wrong analysis result of the example sentence "Jack met the boy the nurse had hired sent to the ward introduced to the patient.” made by Berkeley Parser;
  • Figure 19 is a schematic diagram of the correct analysis result of the example sentences "Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient" provided by the present invention
  • Figure 20 is a screenshot of the wrong analysis result of the example sentence "This is the malt the rat the cat the dog concerned killed by Berkeley Parser";
  • FIG. 21 is a schematic diagram of the correct analysis result of the example sentence "This is the malt the rat the cat the dog concerned killed ate.” provided by the present invention.
  • Figure 22 is a screenshot of the wrong analysis result of the example sentence "Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.” made by Berkeley Parser;
  • Figure 23 is a schematic diagram of the correct analysis result of the example sentence "Part of the reason Charles Dickens loved his own novel was closely modeled on his own life.” provided by the present invention.
  • Figure 24 is a step diagram (1) of the first overall inserting method for Example 1;
  • Figure 25 is a step diagram (2) of the first overall inserting method for Example 1;
  • Figure 26 is a step diagram (3) of the first overall inserting method for Example 1;
  • Figure 27 is a step diagram (4) of the first overall inserting method for Example 1;
  • Figure 28 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 1;
  • Figure 29 is a step diagram (1) of the second method of overall insertion for example 1;
  • Figure 30 is a step diagram (2) of the second method of overall insertion for example 1;
  • FIG. 31 is a step diagram of the optimization method for the first and second overall interpolation methods of Example 1;
  • Figure 32 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 2;
  • Figure 33 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 3;
  • Figure 34 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 4;
  • Figure 35 is a five-wheel overall inserting operation diagram of Example 5.
  • Figure 36 is a basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 6;
  • Figure 37 is an intuitive morphological diagram of the complete syntactic structure corresponding to the A a -B a -C a joint system of Example 8;
  • Figure 38 is an intuitive morphological diagram of the complete syntactic structure corresponding to the A b -B b -C b joint system of Example 8;
  • Figure 39 is a semantic relationship diagram of the syntactic structure constraint corresponding to the A a -B a -C a joint system of Example 8;
  • Fig. 40 is a semantic relation diagram of syntactic structure constraints corresponding to the A b -B b -C b joint system of Example 8;
  • Figure 41 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A a -B a -C a joint system of Example 9;
  • Figure 42 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 10;
  • Figure 43 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 11;
  • Figure 44 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 17;
  • Figure 45 is a schematic diagram of the correct analysis result of the example sentence "That men the next the doctor the clinic had hired sent to the ward introduced to the cleaners didn't other the patients wasn't marked up by the press.” provided by the present invention.
  • Figure 46 is a screenshot of the example sentence "That men the nurse the clinic had hired sent to the cleaners didn't bother the patients wasn't marked up by the press.” by Berkeley Parser. ;
  • Figure 47 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 18;
  • Figure 48 is a schematic diagram of the correct analysis result of the example sentence "That men the cleaner introduced to the nurses the doctor the clinic had hired to the ward didn't other the patients wasn't marked up by the press.” provided by the present invention.
  • Figure 49 is a screenshot of the example sentence "That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't marked up by the press.” taken by Berkeley Parser. ;
  • Figure 50 is a schematic diagram of all the links and algorithms included in the second calculation area ( ⁇ area).
  • the natural language for the following explanations including but not limited to English language.
  • the internal components of the sentence are divided into 4 categories: impurity components, main components, auxiliary components, and remaining noun components.
  • each subject-predicate collocation simple sentence
  • all the predicate vectors form a matrix structure of n rows and 6 columns, as The backbone system.
  • each auxiliary component such as infinitive structure, past participle structure, and preposition structure is processed into an auxiliary vector, and then all auxiliary vectors form a set as an auxiliary system.
  • Definition 1 Define + ⁇ as an ordered addition operation in mathematics: Let S be an English sentence to be parsed, and let a and b be two different words in the sentence S to be parsed. If (a, b) If + ⁇ is satisfied, then the number of word a in sentence S is less than the number of word b in sentence S, that is, a+ ⁇ b means that the number of word a in sentence S is less than the number of word b in sentence S.
  • Definition 2 Let S be an English sentence to be parsed, and let f be any predicate vector in the English sentence S. Define 6 variables c, l, x, r, y, z related to the predicate vector f: record c as the coordinating guide element in the predicate vector f; record l as the subordinate guide element in the predicate vector f, Denote x as the subject element in the predicate vector f, r as the predicate element in the predicate vector f, y as the object element in the first position in the predicate vector f, and z as the second element in the predicate vector f Location object element.
  • each predicate vector corresponding to n predicates is expressed in the form of a 6-element function, and the sentence S to be analyzed can be expressed as a matrix structure of n rows and 6 columns. If each independent variable in the matrix is assigned a specific value, that is, each predicate vector in the matrix is assigned a specific value, then the matrix also obtains a set of specific values accordingly.
  • the set of specific values corresponding to the aforementioned matrix structure is called a backbone system of sentence S, which is also called an A system.
  • Definition 4 Define the 6 auxiliary vectors in the sentence.
  • record the infinitive vector as g[To VB](u,v); record the gerund-present participle vector as g[VBG](u,v); record the past participle
  • the vector is denoted as g[VBN](u,v); the preposition vector is denoted as g[PREP](u).
  • auxiliary vectors of the same type are distinguished by number marks, such as: g[To VB,1](u,v), g[To VB,2](u,v), whil, or g[VBG,1](u,v), g[VBG,2](u,v), whil, or g[VBN,1](u), g[VBN,2](u ), whil, or g[PREP,1](u), g[PREP,2](u),.
  • the independent variables u and v in each auxiliary vector respectively represent the first-position object element or the second-position object element or the object element named after the name of the auxiliary vector.
  • various forms that belong to the category of verb infinitives are expressed by g[To VB](u,v), for example: the forms expressed using computational linguistic symbols To VB, To VB VBN, To VB VBN VBN, To VB VBG, etc.; various forms that belong to the category of gerunds-present participles are expressed by g[VBG](u,v), for example: forms expressed using computational linguistic symbols VBG, VBG VBN, VBG VBN VBN and many more.
  • Definition 5 Record all auxiliary vectors as a set, which is called the auxiliary system of the sentence S to be parsed, also called the B system. As follows:
  • language vectors The aforementioned predicate vector, auxiliary vector, and the remaining noun vectors mentioned in the solution of this application are collectively referred to as language vectors. Any given two language vectors ⁇ and ⁇ , and ⁇ and ⁇ are not residual noun vectors, if the language vector ⁇ acts as the subject element of ⁇ or the object element in the first position or the object element in the second position or the infinitive in the language vector ⁇
  • One position object element or infinitive second position object element or gerund-present participle first position object element or gerund-present participle second position object element or past participle object element or preposition object element then it is called language vector ⁇ It has a compound relationship with ⁇ , which is recorded as vector ⁇ compounding vector ⁇ , or vector ⁇ compounding vector ⁇ .
  • the compound relationship between language vectors is also referred to as "element substitution relationship" in the solution of this application.
  • Auxiliary vector has certain particularity. Usually the predicate vector is compounded with the auxiliary vector; but sometimes the other way around, the auxiliary vector is compounded with the predicate vector. In this regard, the solution of this application has done corresponding technical processing. (ii) The concept of overall interpolation between language vectors mentioned below is subject to the explanation of S8 of the solution of this application.
  • the composition of sentences follows this rule: the main part of the syntactic structure of any complex sentence is based on the combination of multiple language vectors and the overall interpolation. It is composed of a combination.
  • the above rule is a deterministic event, which can be verified by performing statistics in a corpus, that is, in any English sentence sample space with a standard sentence as a sample, the above rule is complicated
  • the probabilities of the sentence are all 1.
  • the above-mentioned law is the source of the common long-distance related problems and deep recursive nesting problems in computer natural language processing, and is also an important starting point for the present invention to solve the technical problems.
  • Example 1 That men who were appointed didn't bother the liberals wasn't remarked up by the press.
  • This example sentence is preprocessed to generate a word list (i-a) and a word list (i-b). Since the word that in the example sentence has structural ambiguity, that may be both a subordinate related word unit and a qualifier unit, so two word lists (i) are generated, and the two word lists (i) Give different marks.
  • word list (ia) and word list (ib) For the above word list (ia) and word list (ib), remove the adjective unit, adverb unit, adjacent adjective unit, adjacent adjacent adverb unit, non-sentence simple parenthesis unit, and particle unit , Adjacent parallel particle units, interjection units and other natural language elements as impurities, and then read the pre-processed sentence data structure to be parsed, and generate the corresponding word list (ii-a) and word list (ii -b), as shown below.
  • this patent application takes the word list (i-a) and the corresponding word list (ii-a) as examples to analyze and explain:
  • This example sentence has 3 predicate verb units were appointed, didn't bother, wasn't remarked; it can be seen that this example sentence contains 3 predicate elements, which are recorded as r 1 , r 2 , and r 3 in turn; furthermore, for these 3 Predicate elements to generate corresponding predicate vectors f 1 , f 2 , f 3 ; the value of each element in the predicate vectors f 1 , f 2 , and f 3 is as follows:
  • ⁇ y 1 ⁇ ⁇ f 2, f 3, e ⁇ .
  • all possible values of the predicate vectors f 2 and f 3 can be obtained by comparing each of f 2 and f 3 respectively. All possible values of an element are obtained by related calculations of combinatorial mathematics.
  • this example sentence has 3 predicate verb units, including 3 predicate elements, and for these 3 predicate elements, corresponding predicate vectors f 1 , f 2 , f 3 are generated; predicate vectors f 1 , f 2 , f
  • the value of each element in 3 is as follows:
  • all possible values of each of the three predicate vectors can be obtained by comparing f 1 , f 2 , and f 3 respectively All possible values of each element of is obtained by related calculations of combinatorial mathematics.
  • this example sentence has three predicate vectors, the main system of this example sentence should be composed of a matrix with 3 rows and 6 columns, and its abstract form is as follows:
  • a backbone system is also an A system. Denote the entire backbone system corresponding to this example as ⁇ A ⁇ ; denote the cardinality of the set ⁇ A ⁇ as ⁇ A ⁇ .
  • the total of all possible values of the predicate vector f 1 is recorded as the set ⁇ f 1 ⁇ ; the cardinality of the set ⁇ f 1 ⁇ is recorded as ⁇ f 1 ⁇ .
  • the same treatment is adopted for other predicate vectors and elements. Then use the multiplication principle in combinatorics:
  • the first matrix is the first matrix
  • the second matrix is the second matrix
  • the third matrix is the third matrix
  • the fourth matrix is the fourth matrix
  • the fifth matrix is the fifth matrix
  • the matrix omits the subordinate related word unit who, whose number is 3.
  • the matrix is unreasonable, that is, the backbone system is unreasonable, clear the backbone system;
  • the matrix is unreasonable, that is, the backbone system is unreasonable, clear the backbone system.
  • the fifth matrix does not violate any requirement in the application plan S3.3. Therefore, the aforementioned fifth matrix is a canonical backbone system, or a canonical A system.
  • the aforementioned fifth matrix is a canonical backbone system, or a canonical A system.
  • the 14,580 3-row and 6-column matrices generated above there are other standardized backbone systems, which are not listed one by one.
  • this example sentence has only one preposition unit by, and for the preposition unit by, a corresponding auxiliary vector g[PREP](u) is generated.
  • the B 1 system and B 2 system meet the requirements from S6.1 to S6.5 in the application plan. Since the structures of the B 1 system and the B 2 system are relatively simple, they are easy to verify and do not elaborate.
  • the B 1 system and the B 2 system are both standard auxiliary systems.
  • the B 1 system and the B 2 system can be further denoted as the standard B 1 system and the standard B 2 system.
  • the A 1 -B 1 -C 1 combined system is as follows:
  • the vectors in the above A 1 -B 1 -C 1 joint system are all the type I vectors before the equivalent substitution. Through equivalent substitution, all the type I vectors in the A 1 -B 1 -C 1 joint system are converted into type II vectors, as shown below:
  • This vector is a type III vector obtained through the overall blanking operation in the A 1 -B 1 -C 1 joint system, and this newly generated vector is marked as [ ⁇ ] 2 + ⁇ , the first round of overall blanking The operation is complete.
  • the newly generated vector [ ⁇ ] 2 + ⁇ is taken as the reception vector for the second round of the overall blanking operation. From the vector of [ ⁇ ] 2 + ⁇ [mu] the first element on the right side until the start vector [ ⁇ ] 2 + ⁇ first element of each element of the vector ⁇ ⁇ left inside contains up who, denoted by order value ; The rest of the elements in the vector [ ⁇ ] 2 + ⁇ are not marked with order values. Take the third element in the vector [ ⁇ ] 2 + ⁇ that has been marked with the order value, and only construct a unique space on the right side of the element.
  • This vector is a type III vector obtained through the overall void operation in the A 1 -B 1 -C 1 joint system, and it is also a combined vector. Denote this vector as [[ ⁇ ] 2 ⁇ ] 3 + ⁇ . That men didn't bother the liberals who by the press were appointed wasn't remarked
  • This vector is a type III vector obtained through the overall void operation in the A 1 -B 1 -C 1 joint system, and it is also a combined vector. Denote this vector as [[ ⁇ ] 4 ⁇ ] 1 + ⁇ . That men who were appointed didn't bother the liberals wasn't remarked by the press
  • the above-mentioned overall blanking operation corresponds to the blanking scheme: ⁇ .
  • Example 1 In summary, through the A 1 -B 1 -C 1 joint system, the general syntactic structure of Example 1 is obtained, that is, the A 1 -B 1 -C 1 joint system describes the basic framework of the syntactic structure of Example 1. As shown in Figure 28.
  • the aforementioned A 1 -B 1 -C 1 joint system contains 3 type II vectors ⁇ , ⁇ , ⁇ ; for the aforementioned 3 type II vectors, follow the permutation formula in combinatorics Perform calculations to obtain all the blanking schemes corresponding to the A 1 -B 1 -C 1 joint system as follows: ⁇ (plan 1), ⁇ (plan 3), ⁇ (plan 5) ),
  • the second method of overall blanking is to mark every element in the receiving vector with a sequence value in each round of the overall blanking operation, and then you can take any element that has been marked with a sequence value, construct a gap and perform the blanking operation .
  • the second overall blanking method there are no restrictions on the order value of each round of blank insertion and the selection of spaces; in the first overall blanking method, every subsequent round from the second round of overall blanking The overall round insertion is limited to the position of the first element on the second side of the previous round insertion vector contained in the received vector, the order value is marked and the space selected.
  • the first overall interpolation method will not produce the same stitching vector; the second overall inserting method may produce the same result, that is, the same stitching vector will be generated, and the result will be the same Combine into one result.
  • the operation process of the second overall plug-in method is shown in Figure 29 and Figure 30.
  • S8.6 of the application plan is the optimization of the steps from S8.2 to S8.5 of the application plan, that is, the optimization of the first and second overall insertion methods mentioned above.
  • each type II vector in the A 1 -B 1 -C 1 joint system is replaced with a corresponding number, as shown below Show:
  • the vector (1 2 5 6 7) is the receiving vector
  • the vector (3 4) is the insertion vector. Take the right side as the first side, and construct a space on the right side of each element inside the receiving vector, as shown below:
  • a system that can generate a reasonable merging vector that is, a matrix that can generate a reasonable merging vector, use the method of probability combined with syntax rules or dependency analysis to check the syntax rules.
  • a reasonable merging vector that is, a matrix that can generate a reasonable merging vector.
  • This set of syntactic rules can also be used in the syntactic structure fixes mentioned later.
  • the set of syntactic rules includes but not limited to the following syntactic rules:
  • the predicate is in the passive voice, and the predicate is a unit composed of verbs that can neither accept a double object nor an object combined with an object complement, then the predicate is both There can be no corresponding first-position object and no corresponding second-position object.
  • subject and predicate must be consistent in the singular and plural concepts without including special syntactic phenomena; although there are some nouns with the same singular and plural forms in English, they will interfere with the judgment of the aforementioned problems. But these nouns can be summarized and given in advance by querying the dictionary or statistics. Subject and predicate must maintain the same rules in singular and plural, which is easy to handle in matrix structure.
  • event object verbs and non-event object verbs Use the rules of event object verbs and non-event object verbs to check;
  • event object verbs in this patent application refer to verbs in natural language that can only use events as objects but not people or things as objects;
  • non-event object verb in refers to a verb in natural language that can only use people or things as objects but not events as objects.
  • any ABC joint system corresponding to the aforementioned word list (ii-b) the computer initially divides the structurally ambiguous qualifier unit That and the basic noun unit men into the same language segment, which is processed as That modifies men; but That modifies men is an obvious syntactic error, which can be easily recognized and eliminated by the computer in the subsequent syntactic rule checking, because according to English syntax rules, That as a qualifier cannot modify the plural form of a countable noun, men.
  • all A-B-C joint systems generated by the word list (ii-b) will be treated as unreasonable A-B-C joint systems and removed.
  • the aforementioned A 1 -B 2 -C 2 combined system is as follows.
  • the aforementioned A 1 -B 2 -C 2 joint system also generates a reasonable combined vector.
  • Run the remaining noun checking program on the aforementioned A 1 -B 2 -C 2 joint system to check whether the remaining noun the press of the C 2 system is a reasonable remaining noun. If the remaining noun the press is a reasonable remaining noun, then the A 1 -B 2 -C 2 joint system is retained; if the remaining noun the press is an unreasonable remaining noun, then the A 1 -B 2 -C 2 joint system is discarded.
  • syntactic rules Use probability combined with syntactic rules or dependency analysis methods to check remaining nouns. For example, in English, appositions can use independent nouns, the independent nominative structure of non-predicate verbs can use independent nouns, and the title of articles with colons often use independent nouns, and so on. If the method of combining probability with syntactic rules is used, then the aforementioned linguistic phenomenon is the syntactic rule corresponding to reasonable residual nouns. On the basis of these syntactic rules, special statistics can also be made for the aforementioned syntactic rules in the corpus, and the corresponding probability can be calculated.
  • the A 1 -B 1 -C 1 joint system depicts the basic framework of the syntactic structure of Example 1, as shown in Figure 28. Compared with the word list 1, there is still an impurity ingredient missing at present.
  • the basic framework of the syntactic structure obtained above can be combined with the method of combining the probability with the syntactic rules or the method of dependency analysis. Specifically, if the method of combining probability with syntactic rules is adopted, then according to the lexical mark of example sentence 1 given in the word list (i), sorted in descending order of probability, the acquisition is not in conflict with the basic framework of the aforementioned syntactic structure.
  • the method of combining probability with syntactic rules includes, but is not limited to: probabilistic context-free grammar and lexicalized probabilistic context-free grammar.
  • the result ranked 20th is the computer analysis result that does not conflict with the basic framework of the aforementioned syntactic structure and has the highest probability, and this result is regarded as the final correct result.
  • the results of several syntactic analysis above are expressed as follows:
  • the probability of the result ranked 1st is: 0.00010738
  • the probability of the second place result is: 0.00010621
  • the probability of the result ranked 3rd is: 0.00010403
  • the probability of the result ranked 20th is: 0.00010196
  • Example 1 the syntactic analysis result of Example 1 is obtained.
  • the result is a result that can be considered correct in English linguistics.
  • the result is expressed as follows: [See Figure 3]
  • Figure 3 is a schematic diagram corresponding to the string form, and the same applies to the following.
  • S is an English sentence, and there are at least the following three subject-predicate collocations in S (represented by 6-element functions):
  • the meaning of f(c 1 ,l 1 ,g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 ),r 1 ,y 1 ,z 1 ) is that the predicate vector g is the predicate vector
  • the meaning of g[h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 )] is that the predicate vector h is inserted into a certain position of the predicate vector g in a way of inserting the entire space.
  • the meaning of l 2 that is that the leading word of the predicate vector g is that.
  • the meaning of the Q model is: the predicate vector g is the subject clause of the predicate vector f, and the leading word of the predicate vector g is that, and the predicate vector h is inserted into a certain position of the predicate vector g in a way of overall insertion.
  • Example 1 conforms to the above-mentioned Q model. The verification is as follows. The auxiliary components and the empty unit e are omitted:
  • g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 ) That+ ⁇ men+ ⁇ didn't+ ⁇ bother+ ⁇ the+ ⁇ liberals;
  • Example 2 That something you learned is wrong is known to the public.
  • That in this example sentence creates structural ambiguity.
  • the word list (ii) that uses That as a subordinate related word unit for preprocessing is given, as shown below.
  • the adjective wrong in this example sentence serves as the predicative of the clause and is the main component of the sentence.
  • the adjective wrong is temporarily removed in the preprocessing step according to the operation of the application plan.
  • the predicative wrong of the subordinate sentence can be repaired in the subsequent syntactic structure repair link.
  • Example 2 The complete syntactic analysis result of Example 2 is expressed as a string as follows: [See Figure 5]
  • Example 2 also conforms to the Q model mentioned above. As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
  • Example 3 That that men were appointed didn't other the liberals wasn't remarked up by the press.
  • an A-B-C joint system can be generated as follows:
  • Example 3 The complete syntactic analysis result of Example 3 is expressed as a string as follows: [See Figure 8]
  • Example 4 That that that men were appointed didn't other the liberals wasn't remarked up by the press upset many women.
  • an A-B-C joint system can be generated as follows:
  • Example 4 The complete syntactic analysis result of Example 4 is expressed as a string as follows: [See Figure 9]
  • the computer initially classifies the structurally ambiguous qualifier unit that and the basic noun unit men in the same language In the fragment, but that modifies men is an obvious syntax error, and this error can be easily identified and eliminated by the computer in the subsequent syntactic rule checking. Therefore, the word lists (ii) generated in Examples 3 and 4 that mark that as a structurally ambiguous qualifier unit (ii) will be cleared by the computer.
  • Example 5 Behaviorists suggest the child who is raised in an environment where they are many stimuli which develop his or her capacity for appropriate responses will experience greater intellectual development.
  • the example sentence has 5 predicate verb units suggest, is raised, there are, develop, and will experience; therefore, this example sentence contains 5 predicate elements, which are recorded as r 1 , r 2 , r 3 , r 4 , and r 5 in turn; These 5 predicate elements generate corresponding predicate vectors f 1 , f 2 , f 3 , f 4 , and f 5 ; according to the information of the application plan S3, the predicate vectors f 1 , f 2 , f 3 , f 4 , f 5 The value of each element of is as follows:
  • auxiliary vector g[PREP,1](u) As ⁇ g[PREP,1](u) ⁇ ; record the base of the set ⁇ g[PREP,1](u) ⁇ as ⁇ g[PREP,1](u) ⁇ .
  • the same treatment is applied to the auxiliary vector g[PREP,2](u).
  • the above-mentioned five rounds of overall insertion operation is the general syntactic structure of the sentence 5 obtained through the A 1 -B 1 -C 1 joint system, that is, the basic framework of the syntactic structure of the sentence 5.
  • Example 5 According to the basic framework of the syntactic structure provided by the A 1 -B 1 -C 1 joint system, the complete syntactic analysis result of Example 5 is obtained.
  • the result is a result that can be considered correct in English linguistics, expressed as a string as follows: [See Figure 10] (ROOT(S(NP(NNS Behaviorists))(VP(VBP suggest)(SBAR(S( NP(DT the)(NN child))(SBAR(WHNP(WP who))(S(VP(VBZ is)(VP(VBN raised)(PP(IN in)(NP(NP(DT an)( NN environment))(SBAR(WHADVP(WRB where))(S(NP(EX there))(VP(VBP are)(NP(NP(JJ many)(NNS stimuli))(SBAR(WHNP(WP which)) (S(VP(VBP develop)(NP(NP(PRP$his)(CC or)(PRP$her)
  • Example 6 Believing that what he wants will occur, Tom works hard in the company.
  • the example sentence has 3 predicate verb units wants, will occur, and works; therefore, this example sentence contains 3 predicate elements, which are sequentially denoted as r 1 , r 2 , and r 3 ; and then for these 3 predicate elements, the corresponding predicate vector f is generated 1 ,f 2 ,f 3 ;
  • This example sentence contains 1 gerund-present participle element.
  • the gerund-present participle element correspond to the gerund-present participle vector as g[VBG](u,v); according to the application plan
  • the value of each element in the predicate vectors f 1 , f 2 , and f 3 is as follows:
  • this example sentence generates two auxiliary vectors g[VBG](u,v) and g[PREP](u):
  • the remaining nouns he in the C 21 system are not independent nouns that can be used by appositions, independent nouns that can be used by independent nominative structures that are not non-predicate verbs, independent nouns often used in article titles that are not collocation colons, and so on. Therefore, the remaining noun he in the C 21 system is an unreasonable remaining noun.
  • a 2 -B 1 -C 21 has an error in the combined system, so discard it.
  • Example 7 A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10 destinations in the world.
  • the A a -B a -C a joint system is as follows:
  • the word list (ii-a) contains only one predicate, so the matrix structure of the standard A a system degenerates into a predicate vector.
  • the A b -B b -C b joint system generated according to the word list (ii-b) is as follows:
  • Example 8 That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.
  • the A a -B a -C a joint system is as follows:
  • the A b -B b -C b joint system generated according to the word list (ii-b) is as follows:
  • the syntactic structure can be used to repair this link, distinguish and adjust the primary and secondary status of each vector in the syntactic structure of the A b -B b -C b joint system, so as to obtain A b -B b -C b
  • the said distinction and adjustment of the primary and secondary status of each vector in the syntactic structure of the A b -B b -C b joint system specifically refers to: which predicate vector serves as the main sentence and which predicate vector Make clauses, and adjust the predicate vector serving as the main clause and the predicate vector serving as the clause, etc.
  • the semantic processing methods include, but are not limited to, semantic analysis methods based on ⁇ -calculus, semantic analysis methods based on semantic fields and semantic networks, semantic analysis methods based on knowledge graphs, semantic analysis methods based on semantic graph models, and semantic analysis methods based on semantic graph models.
  • the relationship calculates the probability and selects the semantic analysis method with the largest probability among them, and so on.
  • the semantic processing method generally needs to be based on the sufficient restriction of the syntactic structure on the semantic relationship.
  • the premise that the syntactic structure fully restricts the semantic relationship means that the syntactic structure preliminarily determines the meaning of each word in the sentence and the mutual collocation relationship between the meanings of the words.
  • the first That in this example sentence is a subordinate conjunction that leads the subject clause, and the corresponding meaning of the first That is "no meaning”
  • the complete syntactic structure corresponding to the A b -B b -C b joint system According to the complete syntactic structure corresponding to the A b -B b -C b joint system.
  • the first That is a subordinate conjunction that guides the adverbial clause at the beginning of the sentence, and the corresponding meaning of the first That is "because”;
  • learned is the past participle acting as an attributive, and the corresponding semantics of learned is "learned"; according to the A b -B b -C b joint system
  • Corresponding to the complete syntactic structure in this example sentence is and learned jointly act as a predicate, then the corresponding semantics of is learned is "to be learned”; etc.
  • a syntactic-semantic constraint relational database that meets the aforementioned requirements can be constructed in a targeted manner.
  • the computer will initially classify the structurally ambiguous qualifier unit That and the basic noun unit allbehavior in the same language segment, and treat them as That modifier all behavior; and That modifies all behavior is an obvious syntax error, which can be easily identified and eliminated by the computer in the subsequent syntactic rule checking process. Therefore, the word list (ii) that marked That as a structurally ambiguous qualifier unit will be cleared by the computer.
  • Example 9 Jack met the patient the nurse the clinic had hired sent to the doctor.
  • the A a -B a -C a joint system is as follows:
  • the A b -B b -C b joint system generated according to the word list (ii-b) is as follows:
  • the method of probability combined with syntactic rules can be used. After inspection, it is found that the remaining noun the nurse of the C b system is not an independent noun that can be used by appositions, an independent noun that can be used by independent nominative structures that are not non-predicate verbs, and an independent noun that is often used in article titles that are not collocated with a colon, etc. . Therefore, the remaining noun the nurse in the C b system is an unreasonable remaining noun.
  • the A b -B b -C b joint system has an error and is discarded.
  • Example 9 was mentioned in the first half of the description. As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
  • the structural ambiguity between the past participle of this example sentence and the general past tense of the predicate verb (sent in this example sentence) is a common structural ambiguity.
  • Example 10 Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Example 10 was mentioned in the first half of the description.
  • the computer analysis process of Example 10 is similar to that of Example 9.
  • Example 11 This is the malt the rat the cat the dog concerned killed ate.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Example 11 was mentioned in the first half of the description. Example 11 is similar to the computer analysis process of Example 10.
  • Example 12 Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Example 12 was mentioned in the first half of the description. Another example of "Part of the reason why Charles Dickens loved his own novel was closely modeled on his own life.” The computer analysis process and results in Example 12 are similar.
  • Example 13 He said he wanted to improve the vineyard to allow visitors to enjoy local food and that in this way, he could make more money.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • This example sentence contains two juxtaposed object clauses.
  • Example 14 I will buy the car which my father needs and the bike which my brother wants.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Syntactic structure repair is another link that is carried out at the same time as the syntactic rule check in the proposal of this application.
  • Syntactic structure repair adopts the method of probability combined with syntactic rules or the method of dependency analysis, and the missing complex inverted sentence patterns, missing long-distance verb-object relations, missing long-distance parallel components, missing adjectives as predicative components, and missing prepositions Phrases are used as predicative components, missing infinitive structures are used as complementary components of the object, missing gerund-present participle structures are used as complementary components of the object, missing past participle structures are used as complementary components of the object, and missing prepositional phrases are used Syntactic information such as the complement of the object is re-excavated, and the defects in the syntactic structure obtained before are repaired accordingly.
  • the car and the bike are juxtaposed as the object of will buy, and the car and the bike are separated by the attributive clause which my father needs.
  • the car and the bike can be combined into one object element.
  • the two attributive clauses which are inserted respectively after the car and the bike, which my father needs and which my brother wants they are treated as separate insertions of two basic noun units within the same object element.
  • the and in this example sentence belongs to "coordinate related word units not used to connect sentences".
  • Example 15 Determining where we are in relation to our surroundings remains an essential skill for our survival.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • the in relation to in this example sentence has structural ambiguity.
  • in relation to is a complete compound preposition.
  • in relation to is a combination of the prepositional phrase in relation and the preposition to. Constitute a whole.
  • a 1 -B 1 -C 1 joint system in relation to is treated as a compound preposition.
  • the compound prepositional phrase in relation to our surroundings serves as the predicative of the clause and is the main component of the sentence.
  • the compound prepositional phrase in relation to our surroundings is not counted in the matrix, and can be followed
  • the in relation to our surroundings is repaired as a clause predicative in the syntactic structure repair link.
  • Example 16 Tom washed and polished his car, after he gave his brother a present.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Washed and polished in this example sentence is a combined unit of adjacent predicate verbs, washed and polished constitutes a predicate element; given is a verb that can accept double objects, which can be summarized and given in advance by querying a dictionary or statistics.
  • Example 17 That men the nurse the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't marked up by the press.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Example 17 was mentioned in the first half of the description.
  • Example 17 conforms to the Q model mentioned above, the verification is omitted.
  • Example 18 That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't marked up by the press.
  • This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
  • Example 18 was mentioned in the first half of the description.
  • Example 18 conforms to the Q model mentioned above, the verification is omitted.
  • the solution of this patent application aims to solve specific technical problems in computer natural language processing, and organically unifies the three aspects of computer-executed lexical analysis, syntactic analysis, and semantic analysis, so that these three aspects are cross-referenced. Restrict and correct each other.
  • the inventor established a new set of mathematical models suitable for computer processing to describe sentences.
  • the mathematical model describing the sentence has a clear and accurate structure, strong expressive ability and practicability.
  • the length of each formula contained in the model is limited, conforms to the natural laws of mathematics and computer science, and helps improve computer processing The accuracy of natural language.
  • the inventor gave a set of methods for using computers to analyze the syntactic structure of sentences.
  • the first calculation area ⁇ area
  • ⁇ area read the sentence data structure to be parsed, and perform preprocessing operations on the sentence data structure to be parsed; read the sentence data structure to be parsed after the aforementioned preprocessing; for the sentence data structure to be parsed without the predicate verb unit Analyze the sentence, use probability combined with syntactic rules or dependency analysis method to analyze the sentence, and take the aforementioned analysis result as the final analysis result of the computer; for the sentence to be parsed with predicate verb unit, generate a list of related words , And generate the predicate vector, auxiliary vector, and remaining noun vector corresponding to the aforementioned word list, and then generate the ABC joint system corresponding to the aforementioned word list.
  • the second calculation area ⁇ area
  • the overall insertion operation, the syntax rule check, the syntax structure repair, and the remaining noun check are performed.
  • This calculation area makes full use of natural laws, through screening and inspection, to generate the general syntactic structure of the sentence to be parsed, that is, the basic framework for generating the syntactic structure of the sentence to be parsed.
  • the A-B-C joint system preserved in the ⁇ area depicts the general syntactic structure of the sentence to be parsed, and immediately depicts the basic framework of the syntactic structure of the sentence to be parsed.
  • the third calculation area ⁇ area
  • the basic framework of the syntactic structure of the sentence to be parsed described by the several ABC joint systems retained in the ⁇ region is used as the standard, and obtained by analyzing the sentence to be parsed using the method of probability combined with syntactic rules or the dependency analysis method Among the sufficient number of complete syntactic structures, find the most suitable complete syntactic structure that meets the aforementioned criteria.
  • the fourth calculation area ⁇ area
  • the method of semantic processing is adopted to find the most suitable semantic relationship subject to the aforementioned syntactic structure constraints, and then the semantic relationship corresponds to The foregoing complete syntactic structure is used as the final syntactic analysis result, and the result is output.
  • the semantic processing method generally needs to be based on the sufficient restriction of the syntactic structure on the semantic relationship.
  • the premise that the syntactic structure fully restricts the semantic relationship means that the syntactic structure preliminarily determines the meaning of each word in the sentence and the mutual collocation relationship between the meanings of the words.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed is a method for syntactic parsing of natural language. The present invention addresses technical drawbacks of the Berkeley Parser and the Stanford Parser, two leading natural language syntactic parsers that are internationally recognized in the field of computer science, and discloses a technical solution to resolve these drawbacks. The present invention establishes a novel mathematical model for describing sentences, and proposes a computer-based syntactic parsing method based thereon. The present invention integrates, by using technical means, three aspects of computer processing of natural language, namely, lexical analysis, syntactic parsing, and semantic analysis, in an organic manner, and strengthens mutual constraints of these three aspects, thereby improving the ability of a computer to resolve structural ambiguity. The present invention involves high technical complexity, achieves a high level of integration, has a wide range of applications, involves a large amount of computation, and complies with natural principles of mathematics and computer science, thereby enhancing accuracy of computer syntactic parsing.

Description

一种自然语言句法分析的方法A method of natural language syntactic analysis
本申请要求了2019年3月22日提交的、申请号为201910224013.X、发明名称为“一种自然语言句法分析的方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on March 22, 2019 with the application number 201910224013.X and the title of the invention "a method of natural language syntactic analysis", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本发明涉及计算机数据处理领域,具体涉及一种自然语言句法分析的方法。The invention relates to the field of computer data processing, in particular to a method of natural language syntax analysis.
背景技术Background technique
自然语言处理(NLP),是计算机科学领域和人工智能领域中一个非常重要的方向,研究的是能实现人与计算机之间使用自然语言进行有效通信的各种理论和方法。Natural language processing (NLP) is a very important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language.
句法分析(syntactic parsing),是自然语言处理(NLP)中的关键性工作之一。句法分析的基本任务是确定语句的句法结构或者语句内部各个词语之间的相互依赖关系。在现有的各种句法分析技术中,概率上下文无关方法(Probabilistic Context Free Grammars,简称PCFG方法)是计算机科学领域广泛采用的一种技术。PCFG方法通过计算句法规则的匹配概率,选取概率最大的句法分析结果作为最终的句法结构。除PCFG方法之外,依存分析方法(Dependency Parsing)也是计算机科学领域经常使用的一种句法分析技术。Syntactic analysis (syntactic parsing) is one of the key tasks in natural language processing (NLP). The basic task of syntactic analysis is to determine the syntactic structure of a sentence or the interdependence between words within a sentence. Among the existing various syntactic analysis techniques, the Probabilistic Context Free Grammars (PCFG method for short) is a technique widely used in the field of computer science. The PCFG method calculates the matching probability of syntactic rules and selects the syntactic analysis result with the highest probability as the final syntactic structure. In addition to the PCFG method, Dependency Parsing is also a syntactic analysis technique often used in the field of computer science.
伯克利解析器(Berkeley Parser)和斯坦福解析器(Stanford Parser),是当今计算机科学界公认的两种国际领先的自然语言句法分析装置。这两种自然语言句法分析装置都使用了词汇化的PCFG方法(Lexicalized Probabilistic Context-Free Grammars)。在运用词汇化的PCFG方法做出句法分析结果的同时,斯坦福解析器(Stanford Parser)还给出了使用依存分析方法做出的句法分析结果。Berkeley Parser and Stanford Parser are two internationally leading natural language syntactic analysis devices recognized by the computer science community today. These two kinds of natural language syntax analysis devices both use the lexicalized PCFG method (Lexicalized Probabilistic Context-Free Grammars). While using the lexicalized PCFG method to make syntactic analysis results, Stanford Parser also gave the syntactic analysis results made using dependency analysis methods.
但是,伯克利解析器(Berkeley Parser)和斯坦福解析器(Stanford Parser)仍然存在一些严重的技术漏洞。However, Berkeley Parser and Stanford Parser still have some serious technical loopholes.
特别说明:Special Note:
<1>,下文提到的Stanford Parser给出的错误的句法分析结果,既包括Stanford Parser运用词汇化的PCFG方法做出的结果,又包括Stanford Parser运用依存分析方法做出的结果,即Stanford Parser运用词汇化的PCFG方法和运用依存分析方法做出的都是错误结果。<1>, the erroneous syntactic analysis results given by Stanford Parser mentioned below include not only the results made by Stanford Parser using the lexical PCFG method, but also the results made by Stanford Parser using the dependency analysis method, that is, Stanford Parser Both the lexicalized PCFG method and the dependency analysis method are wrong results.
<2>,下文对于现有的计算机句法分析技术的研判与点评,仅涉及PCFG方法,不涉及依存分析方法。<2>, the following researches and comments on the existing computer syntactic analysis technology only involve the PCFG method, not the dependency analysis method.
第一类技术漏洞:The first type of technical vulnerabilities:
本专利申请的发明人从2014年1月起,长期观察伯克利解析器(Berkeley Parser)和斯坦福解析器(Stanford Parser)在线演示的解析效果,发现这两种自然语言句法分析装置对于英文语句“That men who were appointed didn't bother the liberals wasn't remarked upon by the press.”的分析结果从2014年1月至本专利申请提交日——2019年3月22日一直都是错误的!该句是语言学家David R.Dowty在其编写的一本语言学专著中给出的。该句没有语法和逻辑上的错误,完全符合英文书面语的表达习惯。Berkeley Parser和Stanford Parser给出的结果是完全相同的,其结果如下:[参见图1]Since January 2014, the inventor of this patent application has been observing the parsing effects of Berkeley Parser and Stanford Parser online for a long time, and found that these two natural language syntactic analysis devices are effective in English sentences. men who were appointed did't other the liberals wasn't remarked up by the press. The analysis results from January 2014 to the filing date of this patent application-March 22, 2019 have been wrong! This sentence was given by the linguist David R. Dowty in a linguistic monograph written by him. There are no grammatical and logical errors in this sentence, and it is fully in line with the expression habits of written English. The results given by Berkeley Parser and Stanford Parser are exactly the same, and the results are as follows: [See Figure 1]
①That men didn't bother;①That men didn't bother;
②who were appointed;②who were appointed;
③the liberals wasn't remarked upon by the press。③the liberals wasn't remarked up by the press.
其中,①是主句,也就是全句的核心句;③是①的宾语,即,宾语从句;②是定语从句,修饰men;That是限定词,修饰men。该结果中的That修饰men是错误的,That作为限定词,不能修饰名词的复数;the liberals wasn't remarked是错误的,主语和谓语的单复数搭配不当。Among them, ① is the main sentence, which is the core sentence of the whole sentence; ③ is the object of ①, that is, the object clause; ② is the attributive clause, which modifies men; That is the qualifier, which modifies men. In the result, That modifies men is wrong, That as a qualifier cannot modify the plural of a noun; the liberals wasn't remarked is wrong, and the singular and plural of the subject and the predicate are not properly matched.
这句话的正确结果应该是:wasn't remarked upon by the press是全句的核心句,也就是全句的核心主谓搭配;That men didn't bother the liberals是核心句中的主语,即,核心句中的主语从句;who were appointed是定语从句,修饰men。本句中的That应该解析为引导主语从句的从属连词。在英语中,除非主语从句由左右引号围住,否则引导主语从句的从属连词that不可以省略,即便在口语中也是如此。The correct result of this sentence should be: wasn't remarked up by the press is the core sentence of the whole sentence, that is, the core subject-predicate collocation of the whole sentence; that men didn't other the liberals is the subject in the core sentence, that is , The subject clause in the core sentence; who were appointed is the attributive clause, which modifies men. That in this sentence should be parsed as a subordinate conjunction of the leading subject clause. In English, unless the subject clause is surrounded by left and right quotation marks, the subordinating conjunction that that leads the subject clause cannot be omitted, even in spoken language.
又有:至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于英文语句“That something you learned is wrong is known to the public.”的在线解析结果也是错误的!该句没有语法和逻辑上的错误,完全符合英文书面语的表达习惯。Berkeley Parser和Stanford Parser给出的结果是完全相同的,其结果如下:[参见图2]There is also: As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser's online analysis results for the English sentence "That something you learned is wrong is known to the public." are also wrong! There are no grammatical and logical errors in this sentence, and it is fully in line with the expression habits of written English. The results given by Berkeley Parser and Stanford Parser are exactly the same, and the results are as follows: [See Figure 2]
①That something is known to the public;① That something is known to the public;
②you learned is wrong。②you learned is wrong.
其中,①是全句的核心句,也就是全句的核心主谓搭配;②是定语从句,修饰不定代词something;That是限定词,修饰something。该结果中的That修饰something是错误的,something作为不定代词,不能被任何限定词修饰,当然也不能被限定词that修饰。learned和is wrong不能划在同一个动词短语之下,learned和is wrong是分别归属于两个从句的两个不同的谓语。Among them, ① is the core sentence of the whole sentence, which is the core subject-predicate collocation of the whole sentence; ② is the attributive clause, which modifies the indefinite pronoun something; That is the qualifier, which modifies something. In the result, That modifies something is wrong. As an indefinite pronoun, something cannot be modified by any qualifier, and of course it cannot be modified by the qualifier that. learned and is wrong cannot be classified under the same verb phrase, learned and is wrong are two different predicates that belong to two clauses respectively.
这句话的正确结果应该是:is known to the public是全句的核心句,也就是全句的核心主谓搭配;That something is wrong是核心句中的主语,即,核心句中的主语从句;That是引导主语从句的从属连词;you learned是定语从句,修饰something。The correct result of this sentence should be: is known to the public is the core sentence of the whole sentence, that is, the core subject-verb collocation of the whole sentence; That something is wrong is the subject in the core sentence, that is, the subject clause in the core sentence ; That is the subordinate conjunction that leads the subject clause; you learned is the attributive clause, which modifies something.
前述两个句子的共同的句法结构特征在于:句子中都有一个由从属连词that引导的主语从句,而且都有一个定语从句可以看作以整体插空的方式插入前述的主语从句内部。从英语语言学的角度看,凡是具备上述句法结构特征的英文语句,经常会被Berkeley Parser和Stanford Parser解析出严重错误的结果!The common syntactic structure feature of the aforementioned two sentences is that each sentence has a subject clause guided by the subordinating conjunction that, and both have an attributive clause that can be regarded as being inserted into the aforementioned subject clause in a way of overall insertion. From the perspective of English linguistics, all English sentences with the above-mentioned syntactic structure characteristics will often be parsed by Berkeley Parser and Stanford Parser with serious errors!
在后续的实例操作部分,本专利申请的发明人将给出如下的数学模型,该数学模型记为Q模型。前述的两个句子,都是符合Q模型的句子。Q模型的具体含义,在后续的实例操作中会加以说明。In the subsequent example operation part, the inventor of this patent application will give the following mathematical model, which is denoted as the Q model. The aforementioned two sentences are sentences that conform to the Q model. The specific meaning of the Q model will be explained in the subsequent example operations.
设S是一个英语句子,且S中至少存在如下3个主谓搭配(分别用6元函数表示):Suppose S is an English sentence, and there are at least the following three subject-predicate collocations in S (represented by 6-element functions):
f(c 1,l 1,x 1,r 1,y 1,z 1); f(c 1 ,l 1 ,x 1 ,r 1 ,y 1 ,z 1 );
g(c 2,l 2,x 2,r 2,y 2,z 2); g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 );
h(c 3,l 3,x 3,r 3,y 3,z 3)。 h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 ).
注:作为自变量下标的1、2、3只是为了互相区分,不代表实际的顺序含义。Note: 1, 2, and 3 as the subscripts of the independent variables are just for distinguishing each other, and do not represent the actual sequence meaning.
f,g,h满足如下三个条件:f, g, h meet the following three conditions:
①l 2=that; ①l 2 =that;
②f(c 1,l 1,g(c 2,l 2,x 2,r 2,y 2,z 2),r 1,y 1,z 1); ②f(c 1 ,l 1 ,g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 ),r 1 ,y 1 ,z 1 );
③g[h(c 3,l 3,x 3,r 3,y 3,z 3)]。 ③g[h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 )].
Berkeley Parser和Stanford Parser做出错误句法分析结果的例句有很多,但是限于本专利申 请的篇幅,发明人无法一一列举,仅列举其中的一部分如下:There are many example sentences that Berkeley Parser and Stanford Parser made incorrect syntactic analysis results, but due to the length of this patent application, the inventor cannot list them one by one. Only a few of them are listed as follows:
(1),That men who were appointed didn't bother the liberals wasn't remarked upon by the press.(1) That men who were appointed didn't bother the liberals wasn't remarked up by the press.
(2),That something you learned is wrong is known to the public.(2) That something you learned is wrong is known to the public.
(3),That something you learned is now outdated is known to the public.(3) That something you learned is now outdated is known to the public.
(4),That men didn't bother the liberals wasn't remarked upon by the press.(4) That men didn't bother the liberals wasn't remarked up by the press.
(5),That men didn't bother the liberal wasn't remarked upon by the press.(5) That men didn't bother the liberal wasn't remarked up by the press.
(6),That men who were appointed bothered the liberals wasn't remarked upon by the press.(6) That men who were appointed othered the liberals wasn't remarked up by the press.
(7),That men who were appointed didn't bother the liberal wasn't remarked upon by the press.(7) That men who were appointed didn't bother the liberal wasn't remarked up by the press.
(8),That men who were appointed didn't bother the liberals was remarked upon by the press.(8) That men who were appointed didn't other the liberals was remarked up by the press.
(9),That officials who were appointed didn't bother the liberals wasn't remarked upon by the press.(9) That officials who were appointed didn't bother the liberals wasn't remarked up by the press.
(10),That officials who were appointed didn't bother the liberals was remarked upon by the press.(10) That officials who were appointed didn't the liberals was remarked up by the press.
(11),That men didn't think the liberals bothered the students wasn't remarked upon by the press.(11) That men didn't think the liberals othered the students wasn't remarked up by the press.
(12),That men didn't think the liberal bothered the students wasn't remarked upon by the press.(12), That men didn't think the liberal othered the students wasn't remarked up by the press.
(13),That men didn't think the liberals bothered the students was remarked upon by the press.(13) That men didn't think the liberals othered the students was remarked up by the press.
(14),That men didn't think the liberals bothered the students who studied hard wasn't remarked upon by the press.(14) That men didn't think the liberals othered the students who studied hard wasn't remarked upon by the press.
(15),That men thought the liberals bothered the students wasn't remarked upon by the press.(15), That men thought the liberals othered the students wasn't remarked up by the press.
(16),That men thought the liberals bothered the students was remarked upon by the press.(16) That men thought the liberals othered the students was remarked up by the press.
(17),That officials didn't think the liberals bothered the students wasn't remarked upon by the press.(17) That officials didn't think the liberals othered the students wasn't remarked up by the press.
(18),That officials didn't think the students bothered the liberals wasn't remarked upon by the press.(18), That officials didn't think the students othered the liberals wasn't remarked up by the press.
(19),That officials thought the liberals bothered the students who studied hard wasn't remarked upon by the press.(19) That officials thought the liberals othered the students who studied hard wasn't remarked up by the press.
(20),That men thought the liberals didn't bother the musicians who worked hard was remarked upon by the press.(20), That men thought the liberals didn't both the musicians who worked hard was remarked up by the press.
(21),That men thought the liberals didn't bother the diplomats who worked hard was remarked upon by the press.(21) That men thought the liberals didn't other the diplomas who worked hard was remarked up by the press.
(22),That boys thought the liberals didn't bother the musicians who worked hard was remarked upon by the press.(22), That boys thought the liberals didn't other the musicians who worked hard was remarked up by the press.
(23),That girls thought the liberals didn't bother the musicians who worked hard was remarked upon by the press.(23) That girls thought the liberals didn't bother the musicians who worked hard was remarked up by the press.
(24),That men didn't bother the boys who studied hard wasn't remarked upon by the press.(24) That men didn't bother the boys who studied hard wasn't remarked up by the press.
(25),That men didn't bother the boys who studied hard was remarked upon by the press.(25), That men didn't bother the boys who studied hard was remarked up by the press.
(26),That men didn't bother the students who studied hard wasn't remarked upon by the press.(26) That men didn't bother the students who studied hard wasn't remarked up by the press.
(27),That men didn't bother the students who studied hard was remarked upon by the press.(27), That men didn't bother the students who studied hard was remarked up by the press.
(28),That men bothered the officials who were appointed wasn't remarked upon by the press.(28) That men bother the officials who were appointed wasn't remarked up by the press.
(29),That men bothered the officials who were appointed was remarked upon by the press.(29) That men bother the officials who were appointed was remarked up by the press.
(30),That food which the company provided to the school attracted the attention of the public wasn't remarked upon by the press.(30), That food which the company provided to the school attracted the attention of the public wasn't remarked up by the press.
(31),That money which the company provided to the school attracted the attention of the public  wasn't remarked upon by the press.(31), That money which the company provided to the school attracted the attention of the public wasn't remarked upon by the press.
(32),That Jobs which the company provided to the college attracted the attention of the public wasn't remarked upon by the press.(32), That Jobs which the company provided to the college attracted the attention of the public wasn't marked up by the press.
(33),That food which the company provided to the school attracted the attention of the public was remarked upon by the press.(33) That food which the company provided to the school attracted the attention of the public was remarked up by the press.
(34),That money which the company provided to the school attracted the attention of the public was remarked upon by the press.(34) That money which the company provided to the school attracted the attention of the public was remarked up by the press.
(35),That Jobs which the company provided to the college attracted the attention of the public was remarked upon by the press.(35) That Jobs, which the company provided to the college attracted the attention of the public was remarked upon by the press.
(36),That something you learned about America's ancient history is wrong is likely.(36), That something you learned about America's ancient history is wrong is likely.
(37),That something about America's ancient history is wrong is likely.(37) That something about America's ancient history is wrong is likely.
(38),That something Tom learned about America's ancient history is wrong is known to his classmates.(38), That something Tom learned about America's ancient history is wrong is known to his classmates.
(39),That nuclear war would be madness does not mean that it will not occur.(39), That nuclear war would be madness does not mean that it will not happen.
(40),That nearly all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.(40), That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.
(41),I don't know whether that girls are well protected represents something good.(41), I don't know whether that girls are well protected represents something good.
(42),I don't know whether that girls are well protected represents good manners.(42), I don't know whether that girls are well protected represent good men.
(43),I can understand what that food should be conserved indicates.(43), I can understand what that food should be conservative indicators.
(44),I can understand what that water should be conserved indicates.(44), I can understand what that water should be conserved indicators.
(45),That what you learned is wrong is known to the public.(45), That what you learned is wrong is known to the public.
(46),That what you learned is now outdated is known to the public.(46), That what you learned is now outdated is known to the public.
(47),What that women are amicably treated indicates is not clear.(47), What that women are amicably treated indicators is not clear.
(48),That what made the students happy didn't bother the teachers wasn't remarked upon by the press.(48) That what made the students happy didn't both the teachers wasn't remarked up by the press.
(49),That what made the students happy bothered the teachers wasn't remarked upon by the press.(49) That what made the students happy othered the teachers wasn't remarked up by the press.
(50),That what made the students happy bothered the teachers was remarked upon by the press.(50) That what made the students happy othered the teachers was remarked up by the press.
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对上述句子给出的句法分析结果,仍然是错误的!上述句子,都没有语法和逻辑上的错误,完全符合英文书面语的表达习惯。上述每一个句子,都包含了that引导的主语从句,其中的that都是从属连词(词法标签为IN);而Berkeley Parser和Stanford Parser将上述句子中的that全都错解为限定词(词法标签为DT)。从英语语言学的角度审视,从属连词和限定词是句法功能截然不同的两种词性,差异非常大,因此前述的错误是一种比较严重的错误。除前述的错误之外,上述句子还有很多其他错误,不一一列举。在本专利申请的实例操作部分,还会用到上述的一些句子。As of the filing date of this patent application-March 22, 2019, the syntactic analysis results given by Berkeley Parser and Stanford Parser on the above sentence are still wrong! There are no grammatical and logical errors in the above sentences, and they are completely in line with the expression habits of written English. Each of the above sentences contains the subject clause guided by that, in which that is a subordinate conjunction (the lexical label is IN); and Berkeley Parser and Stanford Parser misinterpret that in the above sentence as a qualifier (the lexical label is DT). From the perspective of English linguistics, subordinating conjunctions and qualifiers are two different parts of speech with completely different syntactic functions. The differences are very large. Therefore, the aforementioned error is a serious error. In addition to the aforementioned errors, there are many other errors in the above sentence, not to list them all. In the example operation part of this patent application, some of the sentences mentioned above will also be used.
此外,再来看两个难度很大的句子,如下所示。这两个句子是语言学家David R.Dowty在其编写的一本语言学专著中给出的:In addition, let’s look at two difficult sentences, as shown below. These two sentences are given by the linguist David R. Dowty in a linguistics monograph written by him:
(51),That that men were appointed didn't bother the liberals wasn't remarked upon by the press.(51) That that men were appointed didn't bother the liberals wasn't remarked up by the press.
(52),That that that men were appointed didn't bother the liberals wasn't remarked upon by the press upset many women.(52) That that that men were appointed didn't bother the liberals wasn't remarked upon by the press upset many women.
这两个句子没有语法和逻辑上的错误,完全符合英文书面语的表达习惯。这两个句子都包含了that引导的主语从句,其中的that都是从属连词(词法标签为IN);而Berkeley Parser和Stanford Parser对上述两个句子中的that的解析都存在严重错误。在本专利申请的实例操作部分,还会用到上述两个句子。特别指出:上述句子(1)——(52),全都可以使用本专利申请的方案获得正确的句法分析结果。There are no grammatical and logical errors in these two sentences, and they are fully in line with the expression habits of written English. Both of these two sentences contain the subject clause guided by that, in which that is a subordinate conjunction (the morphological label is IN); and Berkeley Parser and Stanford Parser have serious errors in the analysis of that in the above two sentences. In the example operation part of this patent application, the above two sentences will be used. In particular, it is pointed out that the above sentences (1)-(52), all of which can use the scheme of this patent application to obtain the correct syntactic analysis result.
本专利申请的发明人曾经用一套由中国国内开发的句法解析器与Berkeley Parser和Stanford Parser进行对比。中国国内开发的这一套句法解析器,使用了词汇化的PCFG方法,与Berkeley Parser和Stanford Parser的技术原理相同,解析效果也十分接近。使用中国国内开发的这一套句法解析器,本专利申请的发明人做过如下的句法分析实验:对于例句“That men who were appointed didn't bother the liberals wasn't remarked upon by the press.”,限定该例句的词法分析结果为That/IN men/NNS who/WP were/VBD appointed/VBN did/VBD n't/RB bother/VB the/DT liberals/NNS was/VBD n't/RB remarked/VBN upon/RP by/IN the/DT press/NN./.,这是一个在英语语言学上可以认为是正确的词法分析结果,要求提供概率最大的1000个句法分析结果,并将前述的结果按照概率从大到小排列,最后得到最高排名第74位的句法分析结果是英语语言学上可以认为正确的结果,排名第74位之前的结果全都不正确。同样是对于前述的例句,限定该例句的词法分析结果为That/IN men/NNS who/WP were/VBD appointed/VBN did/VBD n't/RB bother/VB the/DT liberals/NNS was/VBD n't/RB remarked/VBN upon/IN by/IN the/DT press/NN./.,这也是一个在英语语言学上可以认为是正确的词法分析结果,要求提供概率最大的1000个句法分析结果,并将结果按照概率由大到小排列,最后得到最高排名第52位的句法分析结果是英语语言学上可以认为正确的结果,排名第52位之前的结果全都不正确。The inventor of this patent application used a set of syntactic parsers developed in China to compare with Berkeley Parser and Stanford Parser. This set of syntactic parser developed in China uses the lexical PCFG method, which has the same technical principles as Berkeley Parser and Stanford Parser, and the parsing effect is very similar. Using this set of syntactic parsers developed in China, the inventor of this patent application has done the following syntactic analysis experiment: For the example sentence "That men who were appointed did not have the liberals wasn't remarked up by the press." , The lexical analysis result of this example sentence is limited to That/INmen/NNSwho/WPwere/VBDappointed/VBNdid/VBDn't/RBbother/VBthe/DTliberals/NNSwas/VBDn't/RBremarked /VBNupon/RPby/IN the/DTpress/NN./. This is a lexical analysis result that can be considered correct in English linguistics. It is required to provide 1000 syntactic analysis results with the highest probability, and combine the aforementioned The results are arranged in descending order of probability, and finally the syntactic analysis result with the highest ranking 74th is the result that can be considered correct in English linguistics. The results before the 74th ranking are all incorrect. Also for the aforementioned example sentence, the lexical analysis result of this example sentence is limited to That/IN men/NNS who/WPwere/VBDappointed/VBNdid/VBDn't/RBbother/VBthe/DTliberals/NNSwas/VBD n't/RBremarked/VBNupon/INby/INthe/DTpress/NN./. This is also a lexical analysis result that can be considered correct in English linguistics. It requires 1,000 syntactic analysis with the highest probability As a result, the results are arranged in descending order of probability. Finally, the result of syntactic analysis with the highest ranking 52nd is a result that can be considered correct in English linguistics. The results before the 52nd ranking are all incorrect.
又如:对于例句“That something you learned is wrong is known to the public.”,限定该例句的词法分析结果为That/IN something/NN you/PRP learned/VBD is/VBZ wrong/JJ is/VBZ known/VBN to/TO the/DT public/NN./.,这是一个在英语语言学上可以认为是正确的词法分析结果,要求提供概率最大的1000个句法分析结果,并将结果按照概率从大到小排列。最后得到最高排名第52位的句法分析结果是英语语言学上可以认为正确的结果,排名第52位之前的结果全都不正确。Another example: For the example sentence "That something you learned is wrong is known to the public.", the lexical analysis result of this example sentence is limited to That/INsomething/NN you/PRP learned/VBDis/VBZ wrong/JJis/VBZ known /VBN to/TO the/DT public/NN./. This is a lexical analysis result that can be considered correct in English linguistics. It is required to provide 1000 syntactic analysis results with the highest probability, and the results are increased according to the probability To the small arrangement. In the end, the result of syntactic analysis with the highest ranking 52 is a result that can be considered correct in English linguistics. The results before the 52nd ranking are all incorrect.
可见,使用前述的中国国内开发的句法解析器,针对前述的两个例句,限定正确的词法分析结果而获得的正确的句法分析结果的概率排名都非常靠后,都排在50名以后。本专利申请的发明人曾经针对与前述两个例句的句法结构相似的很多句子做过大量实验,所获得的正确的句法分析结果与前述两个例句的情况相似,经常是概率排名非常靠后的结果。It can be seen that using the aforementioned syntactic parser developed in China, the probability of the correct syntactic analysis result obtained by restricting the correct lexical analysis result for the aforementioned two example sentences is very low, ranking after 50. The inventor of this patent application has done a lot of experiments on many sentences that are similar in syntax to the aforementioned two example sentences, and the correct syntactic analysis results obtained are similar to the situation of the aforementioned two example sentences, often with very low probability rankings. result.
基于前述的对比研究,本专利申请的发明人有理由认为:如果使用Berkeley Parser和Stanford Parser按照前面给出的正确的词法标记分析前述两个例句,其结果将会与使用中国国内开发的句法解析器获得的结果相似,即,正确的句法分析结果的概率排名都比较靠后。如果想在现有的理论和技术框架之内,通过小幅度地调整统计模型和参数,对前述两个例句纠正出正确的句法分析结果,是很难做到的;而一旦大幅度地调整统计模型和参数,又会以丧失当前的很多优良性能作为代价,比如:在大幅度地调整统计模型和参数之后,句法解析器很有可能会把当前能够分析出正确结果的句子做错,或者使当前能够输出结果的句子没有输出。Based on the aforementioned comparative study, the inventor of this patent application has reason to believe that if Berkeley Parser and Stanford Parser are used to analyze the aforementioned two example sentences according to the correct lexical notation given above, the result will be similar to the syntactic analysis developed in China. The results obtained by the analyzer are similar, that is, the probability of correct syntactic analysis results is relatively low. If you want to correct the correct syntactic analysis results of the aforementioned two example sentences by slightly adjusting statistical models and parameters within the existing theoretical and technical framework, it is difficult to do; and once the statistics are adjusted significantly Models and parameters will be at the cost of losing many of the current excellent performance. For example, after greatly adjusting the statistical model and parameters, the syntactic parser is likely to make mistakes in the current sentences that can analyze the correct results, or make Sentences that can currently output results are not output.
综上所述,本专利申请的发明人认为:上述的第一类技术漏洞,很有可能是Berkeley Parser和Stanford Parser的技术死角和盲区,也是当前的PCFG方法(包括词汇化的PCFG方法在内)的 理论与技术瓶颈所在。对于PCFG方法(包括词汇化的PCFG方法在内)而言,在其现有的理论和技术框架之内,该瓶颈是很难完全突破的。试想一下:如果选取一系列具备前述的that引导主语从句等特点的句子作为语料,构造一个特征化的语料库,然后使用依据PCFG方法(包括词汇化的PCFG方法在内)开发的句法解析器对该语料库中的每一个句子进行分析,比如:使用Berkeley Parser和Stanford Parser进行分析,那么召回率一定会非常低。In summary, the inventor of this patent application believes that the above-mentioned first type of technical vulnerabilities are likely to be the technical blind spots and blind spots of Berkeley Parser and Stanford Parser, as well as current PCFG methods (including lexical PCFG methods). ) The theoretical and technical bottleneck. For the PCFG method (including the lexicalized PCFG method), it is difficult to completely break through the bottleneck within its existing theoretical and technical framework. Imagine this: If you select a series of sentences with the characteristics of that leading subject clause, etc. as the corpus, construct a characteristic corpus, and then use a syntactic parser developed based on the PCFG method (including the lexicalized PCFG method) Analyze each sentence in the corpus, for example: use Berkeley Parser and Stanford Parser for analysis, then the recall rate will be very low.
第二类技术漏洞:请看下面的句子:The second type of technical vulnerabilities: please see the following sentence:
(1),Jack met the patient the nurse the clinic had hired sent to the doctor.(1), Jack met the patient the nurse the clinic had hired sent to the doctor.
(2),This is the malt the rat the cat the dog worried killed ate.(2), This is the malt the rat the cat the dog worried killed ate.
(3),Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.(3), Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.
(4),Jack met the boy the patient introduced to the nurse the doctor the clinic had hired sent to the ward.(4), Jack met the boy the patient introduced to the nurse the doctor the clinic had hired sent to the ward.
(5),Jack met the boy the patient took to the ward the doctor the clinic had hired sent the nurse to.(5), Jack met the boy the patient took to the ward the doctor the clinic had hired sent the nurse to.
(6),Jack ate the food the patient the nurse the clinic had hired sent to the doctor took to the ward.(6), Jack ate the food the patient the nurse the clinic had hired sent to the doctor took to the ward.
(7),Jack ate the food the patient took to the nurse the doctor the clinic had hired sent to the ward.(7), Jack ate the food the patient took to the nurse the doctor the clinic had hired sent to the ward.
(8),Jack ate the food the patient took to the ward the doctor the clinic had hired sent the nurse to.(8), Jack ate the food the patient took to the ward the doctor the clinic had hired sent the nurse to.
(9),That men the nurse the doctor the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't remarked upon by the press.(9) That men the nurse the doctor the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't marked up by the press.
(10),That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't remarked upon by the press.(10) That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't marked up by the press.
(11),That men the nurse the doctor sent to the ward introduced to the cleaners didn't bother the patients wasn't remarked upon by the press.(11) That men the nurse sent to the ward introduced to the cleaners didn't other the patients wasn't remarked up by the press.
(12),That men the cleaner introduced to the nurses the doctor sent to the ward didn't bother the patients wasn't remarked upon by the press.(12) That men the cleaner introduced to the nurses the doctor sent to the ward didn't the other the patients wasn't remarked up by the press.
上述第1句,是语言学家David R.Dowty在其编写的一本语言学专著中给出的;第2句是语言学家从一篇英文诗歌中提炼出来的。上述12个句子都没有语法和逻辑上的错误。上述12个句子全都包含省略从句引导词的情况;在英语中,从句引导词不是可以随意省略的,从句引导词的省略要符合语法的要求;上述12个句子所包含的省略从句引导词的情况,都符合英语语法的要求。至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对上述12个句子给出的句法分析结果,仍然是错误的!The first sentence above was given by the linguist David R. Dowty in a linguistic monograph written by him; the second sentence was extracted from an English poem by the linguist. There are no grammatical and logical errors in the above 12 sentences. The above 12 sentences all contain the omission of the clause guiding words; in English, the clause guiding words are not arbitrarily omitted, and the omission of the clause guiding words must meet the grammatical requirements; the above 12 sentences contain the omission of the clause guiding words , All meet the requirements of English grammar. As of the filing date of this patent application-March 22, 2019, the syntactic analysis results given by Berkeley Parser and Stanford Parser on the above 12 sentences are still wrong!
上述12个句子,全都巧妙地展现了语句的深层递归嵌套的特点,并且灵活地融入了英语中的某些省略从句引导词的句法规则;在此基础之上,第9句至第12句还进一步融入了前文提到的that引导主语从句的特征,并且第9句至第12句都符合前文提到的Q模型。诚然,上述句子所包含的句法结构的分析难度非常高,不宜苛求计算机在现阶段就能完美达到人类的智力水平,但是问题却是客观存在的。类似的句子还有很多,不一一列举。特别指出:上述句子(1)——(12),全都可以使用本专利申请的方案获得正确的句法分析结果。The above 12 sentences all cleverly show the features of deep recursive nesting of sentences, and are flexibly integrated into the syntax rules of some omission clause guide words in English; on this basis, the 9th to the 12th sentences It further incorporates the characteristics of the leading subject clause of that mentioned above, and the 9th to 12th sentences all conform to the Q model mentioned above. It is true that the analysis of the syntactic structure contained in the above sentence is very difficult. It is not appropriate to demand that the computer can perfectly reach the level of human intelligence at this stage, but the problem is objective. There are many similar sentences, so I won’t list them all. In particular, it is pointed out that all the above sentences (1)-(12) can use the scheme of this patent application to obtain correct syntactic analysis results.
基于大量的对比实验,本专利申请的发明人认为:继所述的第一类技术漏洞之后,所述的第二类技术漏洞很有可能是Berkeley Parser和Stanford Parser的又一个技术死角和盲区,也是当前的PCFG方法(包括词汇化的PCFG方法在内)的又一个理论与技术瓶颈所在。对于PCFG方法(包 括词汇化的PCFG方法在内)而言,在其现有的理论和技术框架之内,该瓶颈也是很难完全突破的。限于篇幅,不做过多论述。Based on a large number of comparative experiments, the inventor of this patent application believes that after the first type of technical vulnerabilities, the second type of technical vulnerabilities are likely to be another technical blind spot and blind spot of Berkeley Parser and Stanford Parser. It is also another theoretical and technical bottleneck of the current PCFG method (including the lexicalized PCFG method). For the PCFG method (including the lexicalized PCFG method), it is difficult to completely break through this bottleneck within the existing theoretical and technical framework. Due to space limitations, I will not elaborate too much.
第三类技术漏洞:请看下面的句子:The third type of technical vulnerabilities: please see the following sentence:
①Part of the reason why Charles Dickens loved his own novel was that it was rather closely modeled on his own life.①Part of the reason why Charles Dickens loved his own novel was that it was rather closely modeled on his own life.
②Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.②Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对上述第1句给出的句法分析结果都是正确的,而对上述第2句给出的句法分析结果都是错误的!As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave the syntactic analysis results of the first sentence above are correct, and the syntactic analysis results given in the second sentence above are all Incorrect!
上述两个句子的句法结构的基本框架是等价的,差别仅在于:第1句保留了引导定语从句的关系副词why,而第2句省略了引导定语从句的关系副词why,此处的省略完全符合英语的句法规则。对于这两个句法结构的基本框架等价的句子,Berkeley Parser和Stanford Parser解析出了两种完全不同的结果。由此表明Berkeley Parser和Stanford Parser以及前述的两种解析器所依据的PCFG方法(包括词汇化的PCFG方法在内),没有对语句内部的各种语言成分之间的主次关系进行有效的区分,处理很不到位,因此才会出现对于大致等价的句法结构的微小变化解析失误的情况。The basic frameworks of the syntactic structure of the above two sentences are equivalent. The only difference is: the first sentence retains the relative adverb why of the introductory attributive clause, while the second sentence omits the relative adverb why of the introductory attributive clause, which is omitted here. Fully conforms to English syntax rules. For the sentences that are equivalent to the basic frameworks of the two syntactic structures, Berkeley Parser and Stanford Parser parsed two completely different results. This shows that the PCFG method (including the lexicalized PCFG method) on which Berkeley Parser and Stanford Parser and the aforementioned two parsers are based, does not effectively distinguish the primary and secondary relationships between the various language components within the sentence , The processing is not in place, so there will be errors in parsing roughly equivalent minor changes in syntactic structure.
在长期的观察中,本专利申请的发明人经常遇到与上述句子类似的解析结果不稳定的情况。甚至有的时候,即使改变原句中的一个在句法结构上无关紧要的简单副词,Berkeley Parser和Stanford Parser解析出来的前后两个结果也会发生很大的改变。类似的句子还有很多,不一一列举。特别指出:上述两个句子,全都可以使用本专利申请的方案获得正确的句法分析结果。During long-term observation, the inventor of this patent application often encountered unstable analysis results similar to the above sentence. Sometimes, even if you change a simple adverb in the original sentence that is insignificant in the syntactic structure, the two results of Berkeley Parser and Stanford Parser will be greatly changed. There are many similar sentences, so I won’t list them all. In particular, it is pointed out that all of the above two sentences can use the scheme of this patent application to obtain correct syntactic analysis results.
对前述的三类技术漏洞的反思和总结:Reflection and summary of the aforementioned three types of technical vulnerabilities:
本专利申请的发明人认为:前述的三类技术漏洞,是Berkeley Parser和Stanford Parser的严重技术隐患,也暴露了PCFG方法(包括词汇化的PCFG方法在内)的严重理论缺陷。造成前述的三类技术漏洞的原因,很有可能是如下这些:The inventor of this patent application believes that the aforementioned three types of technical vulnerabilities are serious technical hidden dangers of Berkeley Parser and Stanford Parser, and also expose serious theoretical defects of the PCFG method (including the lexicalized PCFG method). The reasons for the aforementioned three types of technical vulnerabilities are likely to be as follows:
[1]、语料库的随机性与自然语言本身固有的一些基本句法功能和定义发生冲突。[1]. The randomness of the corpus conflicts with some basic syntactic functions and definitions inherent in natural language itself.
从统计学的角度看,在任何一个英文语料库中,从句充当句子主语的概率,通常远远小于名词充当句子主语的概率;但是从自然语言的角度看,从句可以充当句子主语,与名词可以充当句子主语,都是英语自身定义的一种基本句法功能,都是英语自身定义的一种可能性,因此二者在语言学理论上的概率是对等的。进一步地,从统计学的角度看,在任何一个英文语料库中,that引导的主语从句充当句子主语的概率,通常远远小于名词充当句子主语的概率;但是从自然语言的角度看,that引导的主语从句可以充当句子主语,也是源于英语自身定义的一种基本句法功能,也是英语自身定义的一种可能性,因此that引导的主语从句充当句子主语和名词充当句子主语这两种可能性,在语言学理论上的概率差异,要远比在英文语料库中所反映出来的概率差异小得多。由此,就产生了语料库的随机性与自然语言本身固有的一些基本句法功能和定义之间的冲突。From a statistical point of view, in any English corpus, the probability of a clause acting as the subject of a sentence is usually far less than the probability of a noun acting as the subject of a sentence; but from the perspective of natural language, clauses can act as the subject of a sentence, and nouns can act as the subject of a sentence. Sentence subject is a basic syntactic function defined by English itself, and both are a possibility defined by English itself. Therefore, the probability of the two in linguistic theory is equal. Further, from a statistical point of view, in any English corpus, the probability that the subject clause guided by that acts as the subject of a sentence is usually much less than the probability that a noun acts as the subject of the sentence; but from the perspective of natural language, that guided by that The subject clause can act as the subject of the sentence, which is also a basic syntactic function derived from the definition of English itself, and it is also a possibility of the definition of English itself. Therefore, the subject clause guided by that acts as the subject of the sentence and the noun acts as the subject of the sentence. The probability difference in linguistic theory is much smaller than the probability difference reflected in the English corpus. As a result, there is a conflict between the randomness of the corpus and some basic syntactic functions and definitions inherent in natural language itself.
[2]、对于自然语言内部的一些重要的结构特征,PCFG方法(包括词汇化的PCFG方法在内)的应对措施不充分,处理不到位。[2]. For some important structural features of natural language, the PCFG method (including the lexicalized PCFG method) has insufficient countermeasures and is not in place.
对于语句中的主干成分和修饰成分的区分,对于语句中的主要结构和次要结构的区分,对于离散性较强的远距离相关情况和深层递归嵌套情况的刻画,等等,上述这些问题是关系到自 然语言内部的结构特征的重要问题。作为Berkeley Parser和Stanford Parser技术原理的PCFG方法(包括词汇化的PCFG方法在内),针对上述这些问题的应对措施不充分,有兼顾不到之处。For the distinction between the main component and the modifier in the sentence, the distinction between the main structure and the secondary structure in the sentence, the description of the discrete long-distance correlation situation and the deep recursive nesting situation, etc., these problems It is an important issue related to the structural characteristics of natural language. The PCFG method (including the lexicalized PCFG method), which is the technical principle of Berkeley Parser and Stanford Parser, has insufficient countermeasures against the above-mentioned problems, and there are some points that cannot be taken into account.
[3]、自然语言的词法分析和句法分析本来应该是互相约束的,但是在实际的自然语言处理(NLP)工程中,词法分析和句法分析被割裂成两个独立的部分。[3] The lexical analysis and syntactic analysis of natural language should be mutually constrained, but in actual natural language processing (NLP) projects, lexical analysis and syntactic analysis are separated into two independent parts.
词法分析、句法分析和语义分析,三者之间是互相参照和互相约束的关系。但是,在实际的自然语言处理工程中,词法分析、句法分析和语义分析通常都是各自独立进行的,词法分析是不依赖于句法分析而单独完成的。这样安排,主要是考虑到自然语言处理工程中的计算复杂度和模型复杂度问题。但是,这样的安排很可能会严重影响句法分析结果的准确性,即,如果计算机在词法分析环节出现误判,那么这一误判在接下来将要进行的句法分析的其他环节中根本无法得到纠正和约束,这就给句法分析结果的准确性带来了负面影响。Lexical analysis, syntactic analysis and semantic analysis are in a relationship of mutual reference and mutual restraint. However, in actual natural language processing projects, lexical analysis, syntactic analysis, and semantic analysis are usually carried out independently of each other, and lexical analysis is done independently without relying on syntactic analysis. This arrangement mainly considers the computational complexity and model complexity in natural language processing engineering. However, this arrangement is likely to seriously affect the accuracy of the syntactic analysis results, that is, if the computer makes a misjudgment in the lexical analysis link, then this misjudgment cannot be corrected at all in the other links of the syntactic analysis that will be performed next. And constraints, which have a negative impact on the accuracy of the syntactic analysis results.
发明内容Summary of the invention
有鉴于此,本发明的目的在于提供一种自然语言句法分析的方法,包括:In view of this, the purpose of the present invention is to provide a natural language syntactic analysis method, including:
S1、读取待解析的语句数据结构,并针对待解析的语句数据结构进行预处理操作;S1. Read the sentence data structure to be parsed, and perform preprocessing operations on the sentence data structure to be parsed;
S2、针对每一个词语列表(i),读取待解析的经过前述的预处理的语句数据结构:如果在待解析的语句中存在谓语动词单元,那么生成词语列表(ii);如果在待解析的语句中不存在谓语动词单元,那么改为采用概率结合句法规则的方法或依存分析方法对该语句进行分析,取前述分析的结果作为计算机的最终分析结果,进而清除对应的词语列表(i)且不生成词语列表(ii);S2. For each word list (i), read the sentence data structure to be parsed after the aforementioned preprocessing: if there is a predicate verb unit in the sentence to be parsed, then generate a word list (ii); There is no predicate verb unit in the sentence, then the sentence is analyzed by the method of probability combined with syntactic rules or the dependency analysis method, and the result of the aforementioned analysis is used as the final analysis result of the computer, and then the corresponding word list is cleared (i) And does not generate a word list (ii);
S3、针对每一个谓语元素,生成对应的谓语向量;所述谓语向量包括并列引导语元素、从属引导语元素、主语元素、谓语元素、第一位置宾语元素、第二位置宾语元素;S3. For each predicate element, generate a corresponding predicate vector; the predicate vector includes a parallel guide element, a subordinate guide element, a subject element, a predicate element, a first position object element, and a second position object element;
其中,所述谓语元素是对应的谓语动词单元,或对应的相邻并列的谓语动词组合单元;所述谓语元素编号是对应的谓语动词单元编号,或对应的相邻并列的谓语动词组合单元编号;Wherein, the predicate element is the corresponding predicate verb unit, or the corresponding adjacent predicate verb combination unit; the predicate element number is the corresponding predicate verb unit number, or the corresponding adjacent predicate verb combination unit number ;
其中,所述并列引导语元素的可能取值是编号小于对应的谓语元素编号的用于连接句子的并列关联词单元之一,或空单元;不用于连接句子的并列关联词单元,不能作为并列引导语元素的可能取值;Wherein, the possible value of the coordinate introductory element is one of the coordinate related word units used to connect sentences with a number less than the corresponding predicate element number, or an empty unit; the coordinate related word unit that is not used to connect sentences cannot be used as a coordinate introductory The possible values of the element;
其中,所述从属引导语元素的可能取值是编号小于对应的谓语元素编号的从属关联词单元之一,或编号小于对应的谓语元素编号的相邻并列的从属关联词组合单元之一,或编号小于对应的谓语元素编号的疑问词单元之一,或编号小于对应的谓语元素编号的相邻并列的疑问词组合单元之一,或空单元;Wherein, the possible value of the subordinate introductory element is one of the subordinate related word units whose number is smaller than the corresponding predicate element number, or one of the adjacent and juxtaposed subordinate related word combination units whose number is smaller than the corresponding predicate element number, or the number is smaller than One of the interrogative unit of the corresponding predicate element number, or one of the adjacent interrogative combination units with a number smaller than the corresponding predicate element number, or an empty unit;
其中,所述主语元素的可能取值是编号小于对应的谓语元素编号的基本名词单元之一,或编号小于对应的谓语元素编号的相邻并列的基本名词组合单元之一,或编号小于对应的谓语元素编号的不定式元素对应的不定式向量之一,或编号小于对应的谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的谓语元素编号小的谓语元素对应的谓语向量之一,或空单元;Wherein, the possible value of the subject element is one of the basic noun units whose number is less than the corresponding predicate element number, or one of the adjacent and parallel basic noun combination units whose number is less than the corresponding predicate element number, or the number is less than the corresponding One of the infinitive vectors corresponding to the infinitive element of the predicate element number, or a gerund whose number is less than the corresponding predicate element number-the gerund corresponding to the present participle element-one of the present participle vectors, or one of the corresponding predicate element numbers One of the predicate vectors corresponding to the predicate element, or an empty unit;
其中,所述第一位置宾语元素的可能取值是编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的不定式元素对应的不定式向量之一,或编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应 的谓语元素编号大的谓语元素对应的谓语向量之一,或空单元;谓语元素对应的符合前述要求的表语成分,也当作第一位置宾语元素处理;Wherein, the possible value of the object element in the first position is one of the basic noun units whose number is greater than the number of the corresponding predicate element and less than the number of the first predicate element that appears after the predicate element, or the number is greater than the corresponding predicate element The element number is less than one of the adjacent basic noun combination units of the first predicate element number that appears after the predicate element, or the number is greater than the corresponding predicate element number and less than the first predicate element that appears after the predicate element. One of the infinitive vectors corresponding to the infinitive element of a predicate element number, or a gerund whose number is greater than the number of the corresponding predicate element and less than the number of the first predicate element that appears after the predicate element-the verb corresponding to the present participle element Noun-one of the present participle vectors, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding predicate element, or an empty unit; the predicative component corresponding to the predicate element that meets the aforementioned requirements is also regarded as the first position object element deal with;
其中,如果对应的谓语元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的第一位置宾语元素是一个基本名词单元或一个相邻并列的基本名词组合单元,那么所述第二位置宾语元素的可能取值是编号大于对应的第一位置宾语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的第一位置宾语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的谓语元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的谓语元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的第一位置宾语元素既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么所述第二位置宾语元素的取值是空单元;如果对应的谓语元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述第二位置宾语元素的可能取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;Among them, if the corresponding predicate element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the corresponding object element in the first position is a basic noun unit or an adjacent basic noun Combination unit, then the possible value of the object element at the second position is one of the basic noun units with a number greater than the number of the corresponding object element at the first position and less than the number of the first predicate element that appears after the predicate element, or One of the adjacent basic noun combination units whose number is greater than the number of the corresponding object element in the first position and less than the number of the first predicate element that appears after the predicate element, or corresponds to the predicate element whose number is greater than the corresponding predicate element One of the predicate vectors of, or an empty unit; if the corresponding predicate element is a unit composed of a verb that can accept a double object or a verb that can be combined with an object complement, and the corresponding object element in the first position is neither a basic noun If the unit is not an adjacent basic noun combination unit, then the value of the object element in the second position is an empty unit; if the corresponding predicate element is a verb that is complemented by neither a double object nor an unacceptable object combined with an object The possible value of the object element in the second position is an empty unit; among them, the verb of the double-object can be accessed or the verb of the complementary object combined with the object complement and the unacceptable double-object Verbs that cannot accept an object combined with an object complement can be summarized and given in advance by querying a dictionary or statistically; define the verbs that can accept double objects or the verbs that can accept an object combined with the object complement and the said both. Verbs that cannot accept double objects and cannot accept an object combined with an object complement will help reduce the complexity of calculations;
S4、针对每一个不定式元素,生成对应的不定式向量;针对每一个动名词-现在分词元素,生成对应的动名词-现在分词向量;针对每一个过去分词元素,生成对应的过去分词向量;针对每一个介词元素,生成对应的介词向量;根据所述不定式元素、不定式第一位置宾语元素、不定式第二位置宾语元素的可能取值,获取每一个不定式元素对应的不定式向量的所有可能取值;根据所述动名词-现在分词元素、动名词-现在分词第一位置宾语元素、动名词-现在分词第二位置宾语元素的可能取值,获取每一个动名词-现在分词元素对应的动名词-现在分词向量的所有可能取值;根据所述过去分词元素、过去分词宾语元素的可能取值,获取每一个过去分词元素对应的过去分词向量的所有可能取值;根据所述介词元素、介词宾语元素的可能取值,获取每一个介词元素对应的介词向量的所有可能取值;S4. For each infinitive element, generate a corresponding infinitive vector; for each gerund-present participle element, generate a corresponding gerund-present participle vector; for each past participle element, generate a corresponding past participle vector; For each preposition element, a corresponding preposition vector is generated; according to the possible values of the infinitive element, the infinitive first-position object element, and the infinitive second-position object element, obtain the infinitive vector corresponding to each infinitive element All possible values of; According to the possible values of the gerund-present participle element, gerund-present participle object element in the first position, gerund-present participle object element in the second position, obtain each gerund-present participle The gerund corresponding to the element-all possible values of the present participle vector; according to the possible values of the past participle element and the past participle object element, all possible values of the past participle vector corresponding to each past participle element are obtained; State the possible values of preposition elements and preposition object elements, and obtain all possible values of the preposition vector corresponding to each preposition element;
其中,所述不定式向量包括不定式元素、不定式第一位置宾语元素、不定式第二位置宾语元素;Wherein, the infinitive vector includes infinitive elements, infinitive first-position object elements, and infinitive second-position object elements;
所述不定式元素是对应的不定式动词单元,或对应的相邻并列的不定式动词组合单元;所述不定式元素编号是对应的不定式动词单元编号,或对应的相邻并列的不定式动词组合单元编号;The infinitive element is the corresponding infinitive verb unit, or the corresponding adjacent infinitive verb combination unit; the infinitive element number is the corresponding infinitive verb unit number, or the corresponding adjacent infinitive infinitive Verb combination unit number;
所述不定式第一位置宾语元素的可能取值是编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的不定式元素对应的不定式向量之一,或编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的不定式元素编号大的谓语元素对应的谓语向量之一,或空单元;不定式元素对应的符合前述要求的表语成分,也当作不定式第一位置宾语元素处理;The possible value of the object element in the first position of the infinitive is one of the basic noun units whose number is greater than the number of the corresponding infinitive element and less than the number of the first predicate element that appears after the infinitive element, or the number is greater than the corresponding The number of the infinitive element of and is less than one of the adjacent basic noun combination units of the first predicate element number that appears after the infinitive element, or the number is greater than the number of the corresponding infinitive element and less than the number of the infinitive element One of the infinitive vectors corresponding to the infinitive element of the first predicate element number that appears after the element, or one of the infinitive vectors whose number is greater than the corresponding infinitive element number and less than the number of the first predicate element that appears after the infinitive element Noun-the gerund corresponding to the present participle element-one of the present participle vectors, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding infinitive element, or an empty unit; the infinitive element corresponds to the predicative that meets the aforementioned requirements Component, also treated as an object element in the first position of the infinitive;
如果对应的不定式元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单 元,且对应的不定式第一位置宾语元素是一个基本名词单元或一个相邻并列的基本名词组合单元,那么所述不定式第二位置宾语元素的可能取值是编号大于对应的不定式第一位置宾语元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的不定式第一位置宾语元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的不定式元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的不定式元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的不定式第一位置宾语元素既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么所述不定式第二位置宾语元素的取值是空单元;如果对应的不定式元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述不定式第二位置宾语元素的取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;If the corresponding infinitive element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the object element in the first position of the corresponding infinitive is a basic noun unit or an adjacent basic Noun combination unit, then the possible value of the object element in the second position of the infinitive is a basic number greater than the number of the object element in the first position of the corresponding infinitive and less than the number of the first predicate element that appears after the infinitive element One of the noun units, or one of the adjacent basic noun combination units whose number is greater than the number of the object element in the first position of the corresponding infinitive and less than the number of the first predicate element that appears after the infinitive element, or one of the corresponding basic noun combination units One of the predicate vectors corresponding to the predicate element with the larger number of the infinitive element, or an empty unit; if the corresponding infinitive element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and corresponds to The object element in the first position of the infinitive is neither a basic noun unit nor an adjacent basic noun combination unit, then the value of the object element in the second position of the infinitive is an empty unit; if the corresponding infinitive element is A unit composed of verbs that can neither accept a double object nor a non-acceptable object combined with an object complement, then the value of the object element in the second position of the infinitive is an empty unit; among them, the verb that can accept a double object or can The verbs that receive the object combined with the object complement and the verbs that can neither receive the double object nor the object combined with the object complement can be summarized and given in advance by querying the dictionary or statistics; define the said acceptable double object The verbs or the verbs that can accept the object and the object complement and the verbs that can not accept the double object or the unacceptable object and the object complement can help reduce the complexity of calculation;
其中,所述动名词-现在分词向量包括动名词-现在分词元素、动名词-现在分词第一位置宾语元素、动名词-现在分词第二位置宾语元素;Wherein, the gerund-present participle vector includes gerund-present participle element, gerund-present participle first position object element, gerund-present participle second position object element;
所述动名词-现在分词元素是对应的动名词-现在分词单元,或对应的相邻并列的动名词-现在分词组合单元;所述动名词-现在分词元素编号是对应的动名词-现在分词单元编号,或对应的相邻并列的动名词-现在分词组合单元编号;The gerund-present participle element is the corresponding gerund-present participle unit, or the corresponding adjacent gerund-present participle combination unit; the gerund-present participle element number is the corresponding gerund-present participle Unit number, or corresponding adjacent parallel gerund-present participle combination unit number;
所述动名词-现在分词第一位置宾语元素的可能取值是编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的不定式元素对应的不定式向量之一,或编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的动名词-现在分词元素编号大的谓语元素对应的谓语向量之一,或空单元;动名词-现在分词元素对应的符合前述要求的表语成分,也当作动名词-现在分词第一位置宾语元素处理;The possible value of the object element in the first position of the gerund-present participle is a basic number greater than the number of the corresponding gerund-present participle element and less than the number of the first predicate element that appears after the gerund-present participle element One of the noun units, or one of the adjacent basic noun combination units whose numbers are greater than the corresponding gerund-present participle element number and less than the number of the first predicate element that appears after the gerund-present participle element, or One of the infinitive vectors corresponding to the infinitive element whose number is greater than the corresponding gerund-present participle element number and less than the first predicate element number that appears after the gerund-present participle element, or the number is greater than the corresponding gerund -The present participle element number is less than the gerund with the number of the first predicate element that appears after the present participle element-the gerund corresponding to the present participle element-one of the present participle vectors, or more than the corresponding gerund- One of the predicate vectors corresponding to the predicate element with the higher number of the present participle element, or an empty unit; the predicative component corresponding to the gerund-present participle element that meets the aforementioned requirements is also treated as the object element in the first position of the gerund-present participle;
如果对应的动名词-现在分词元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的动名词-现在分词第一位置宾语元素是一个基本名词单元或一个相邻并列的基本名词组合单元,那么所述动名词-现在分词第二位置宾语元素的可能取值是编号大于对应的动名词-现在分词第一位置宾语元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的动名词-现在分词第一位置宾语元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的动名词-现在分词元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的动名词-现在分词元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的动名词-现在分词第一位置宾语元素既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么所述动名词-现在分词第二位置宾语元素的取值是空单元;如果对应的动名词-现在分词元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成 的单元,那么所述动名词-现在分词第二位置宾语元素的取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;If the corresponding gerund-present participle element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the corresponding gerund-present participle first position object element is a basic noun unit or An adjacent basic noun combination unit, then the possible value of the object element in the second position of the gerund-present participle is that the number is greater than the number of the object element in the first position of the corresponding gerund-present participle and is smaller than the object element number in the first position of the gerund -One of the basic noun units of the first predicate element number that appears after the present participle element, or the number is greater than the corresponding gerund-number of the object element in the first position of the present participle and less than the number that appears after the gerund-present participle element One of the adjacent and juxtaposed basic noun combination units of the first predicate element number, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding gerund-present participle element, or an empty unit; if the corresponding gerund- The present participle element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the corresponding gerund-the object element in the first position of the present participle is neither a basic noun unit nor an adjacent juxtaposition The basic noun combination unit of, then the value of the object element in the second position of the gerund-present participle is the empty unit; if the corresponding gerund-present participle element is composed of both unacceptable double objects and unacceptable objects combined with object complements The unit of the verb constituted by the verb, then the value of the object element in the second position of the gerund-present participle is the empty unit; wherein the verb that can accept the double object or the verb of the object complement and the Verbs that can neither accept double objects nor accept objects combined with object complements can be summarized and given in advance by querying the dictionary or statistics; define the verbs that can accept double objects or the verbs that can accept objects combined with object complements Verbs and the mentioned verbs that can neither take double objects nor can take the object combined with the object complement will help reduce the complexity of calculation;
其中,所述过去分词向量包括过去分词元素、过去分词宾语元素;Wherein, the past participle vector includes past participle elements and past participle object elements;
所述过去分词元素是对应的过去分词单元,或对应的相邻并列的过去分词组合单元;所述过去分词元素编号是对应的过去分词单元编号,或对应的相邻并列的过去分词组合单元编号;The past participle element is the corresponding past participle unit, or the corresponding adjacent past participle combination unit; the past participle element number is the corresponding past participle unit number, or the corresponding adjacent past participle combination unit number ;
如果对应的过去分词元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,那么所述过去分词宾语元素的可能取值是编号大于对应的过去分词元素编号且小于在所述过去分词元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的过去分词元素编号且小于在所述过去分词元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的过去分词元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的过去分词元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述过去分词宾语元素的取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;If the corresponding past participle element is a unit consisting of a verb that can accept a double object or a verb that can be combined with an object complement, then the possible value of the past participle object element is that the number is greater than the number of the corresponding past participle element and less than One of the basic noun units of the first predicate element number that appears after the past participle element, or the number greater than the corresponding past participle element number and less than the first predicate element number that appears after the past participle element One of the basic noun combination units that are adjacent to each other, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding past participle element, or an empty unit; if the corresponding past participle element is composed of neither a double object nor an object Combining the unit composed of the verb of the object complement, then the value of the object element of the past participle is the empty unit; wherein, the verb that can be accessed by the double object or the verb that can be combined with the object complement and the verb of the object complement. Verbs that cannot accept a double object or an object combined with an object complement can be summarized and given in advance by querying a dictionary or statistics; define the verbs that can accept a double object or a verb that can accept an object combined with an object complement and The described verbs that can neither accept double objects nor accept objects combined with object complements help to reduce the complexity of calculation;
其中,所述介词向量包括过介词元素、介词宾语元素;Wherein, the preposition vector includes a preposition element and a preposition object element;
所述介词元素是对应的介词单元,或对应的相邻并列的介词组合单元;所述介词元素编号是对应的介词单元编号,或对应的相邻并列的介词组合单元编号;The preposition element is a corresponding preposition unit, or a corresponding adjacent preposition combination unit; the preposition element number is a corresponding preposition unit number, or a corresponding adjacent preposition combination unit number;
所述介词宾语元素的可能取值是编号大于对应的介词元素编号且在所述介词元素之后出现的第一个基本名词单元,或编号大于对应的介词元素编号且在所述介词元素之后出现的第一个相邻并列的基本名词组合单元,或编号大于对应的介词元素编号且在所述介词元素之后出现的第一个动名词-现在分词向量,或编号大于对应的介词元素编号且在所述介词元素之后出现的第一个不定式向量,或编号大于对应的介词元素编号且与所述介词元素编号的数字顺序相邻的介词元素对应的介词向量,或比对应的介词元素编号大的谓语元素对应的谓语向量之一,或空单元;The possible value of the preposition object element is the first basic noun unit whose number is greater than the number of the corresponding preposition element and appears after the preposition element, or the number is greater than the number of the corresponding preposition element and appears after the preposition element The first adjacent basic noun combination unit, or the first gerund-present participle vector whose number is greater than the corresponding preposition element number and appears after the preposition element, or the number is greater than the corresponding preposition element number and is The first infinitive vector that appears after the preposition element, or the preposition vector corresponding to the preposition element whose number is greater than the corresponding preposition element number and is adjacent to the number sequence of the preposition element number, or the preposition vector that is greater than the corresponding preposition element number One of the predicate vectors corresponding to the predicate element, or an empty unit;
S5、将不定式向量、动名词-现在分词向量、过去分词向量和介词向量,统称为辅助向量;针对待解析语句中的每一个辅助向量,分别任取一个该辅助向量对应的可能取值,从而获得一组全体辅助向量对应的可能取值;将前述的一组全体辅助向量对应的可能取值看作一个集合,称为一个辅助系统;S5. The infinitive vector, the gerund-present participle vector, the past participle vector and the preposition vector are collectively referred to as auxiliary vectors; for each auxiliary vector in the sentence to be parsed, any possible value corresponding to the auxiliary vector is selected. In this way, a set of possible values corresponding to all auxiliary vectors is obtained; the possible values corresponding to the aforementioned set of all auxiliary vectors are regarded as a set, which is called an auxiliary system;
S6、任意给定一个规范主干系统,搭配一个对应的辅助系统;将前述的辅助系统中的每一个辅助向量内部的每一个排除向量之外的元素全都替换为对应的编号;替换编号之后,检查该辅助系统;如果在该辅助系统中出现下述不合理的情况,那么清除该辅助系统;如果在该辅助系统中没有出现下述不合理的情况,那么保留该辅助系统;将保留下来的辅助系统称为规范辅助系统;接下来提到的谓语向量,都是指前述给定的规范主干系统中的谓语向量;S6. Given a standard backbone system arbitrarily, collocation with a corresponding auxiliary system; replace every element outside the excluded vector in each auxiliary vector in the aforementioned auxiliary system with the corresponding number; after replacing the number, check The auxiliary system; if the following unreasonable situation occurs in the auxiliary system, then the auxiliary system is removed; if the following unreasonable situation does not occur in the auxiliary system, then the auxiliary system is retained; the remaining auxiliary system The system is called the specification auxiliary system; the predicate vectors mentioned in the following all refer to the predicate vectors in the aforementioned canonical backbone system;
S6.1、如果在两个不同的辅助向量中出现相同的编号或相同的谓语向量或相同的不定式向 量或相同的动名词-现在分词向量或相同的介词向量,那么该辅助系统不合理,清除该辅助系统;S6.1. If the same number or the same predicate vector or the same infinitive vector or the same gerund-present participle vector or the same preposition vector appears in two different auxiliary vectors, then the auxiliary system is unreasonable, Clear the auxiliary system;
S6.2、如果一个辅助向量内部和一个谓语向量内部同时出现相同的编号或相同的谓语向量或相同的不定式向量或相同的动名词-现在分词向量,那么该辅助系统不合理,清除该辅助系统;S6.2. If the same number or the same predicate vector or the same infinitive vector or the same gerund-present participle vector appears in an auxiliary vector and in a predicate vector at the same time, then the auxiliary system is unreasonable and the auxiliary system is removed. system;
S6.3、如果在一个辅助向量内部出现两个顺序逆反的编号,那么该辅助系统不合理,清除该辅助系统;S6.3. If two numbers in reverse order appear in an auxiliary vector, then the auxiliary system is unreasonable, and the auxiliary system is cleared;
S6.4、将两两之间存在元素代入关系的任意两个辅助向量,全都进行等量代换;如果出现向量之间的代入交叉矛盾,那么该辅助系统不合理,清除该辅助系统;如果在等量代换之后出现两个顺序逆反的编号,那么该辅助系统不合理,清除该辅助系统;S6.4. Substituting any two auxiliary vectors that have elements between the two into the relationship, all of which are equivalently substituted; if there is a cross-substitution contradiction between the vectors, then the auxiliary system is unreasonable, and the auxiliary system is cleared; if If two numbers in reverse order appear after equal substitution, then the auxiliary system is unreasonable. Clear the auxiliary system;
S6.5、将两两之间存在元素代入关系的任意一个辅助向量和任意一个谓语向量,全都进行等量代换;如果出现向量之间的代入交叉矛盾,那么该辅助系统不合理,清除该辅助系统;如果在等量代换之后出现两个顺序逆反的编号,那么该辅助系统不合理,清除该辅助系统;S6.5. Substituting any auxiliary vector and any predicate vector that have elements between the two elements into the relationship, all of which are equivalently substituted; if there is a contradiction in the substitution between the vectors, then the auxiliary system is unreasonable, and the Auxiliary system; if there are two numbers in reverse order after equal substitution, then the auxiliary system is unreasonable, and the auxiliary system is cleared;
S6.6、检查过后,恢复到检查之前的原状,以备后续的各项操作使用;S6.6. After the inspection, restore to the original state before the inspection for use in subsequent operations;
S7、生成剩余名词系统和A-B-C联合系统;S7. Generate residual noun system and A-B-C joint system;
S7.1、任意给定一个规范主干系统和一个与该规范主干系统对应的规范辅助系统,将没有进入前述的规范主干系统和规范辅助系统的剩余的基本名词单元和相邻并列的基本名词组合单元的全体看作一个集合,将这个集合称为一个剩余名词系统;将剩余名词系统中的每一个元素,称为一个剩余名词元素;一个剩余名词元素的编号,是该剩余名词元素对应的基本名词单元或基本名词组合单元的编号;针对每一个剩余名词元素,生成一个对应的剩余名词向量;所述剩余名词向量,仅包括剩余名词元素,即剩余名词向量与剩余名词元素是一一对应的;S7.1. Given a canonical backbone system and a canonical auxiliary system corresponding to the canonical backbone system, the remaining basic noun units and adjacent parallel basic noun combinations that do not enter the aforementioned canonical backbone system and standard auxiliary system The whole unit is regarded as a set, which is called a residual noun system; each element in the residual noun system is called a residual noun element; the number of a residual noun element is the basic corresponding to the residual noun element The number of the noun unit or the basic noun combination unit; for each remaining noun element, a corresponding remaining noun vector is generated; the remaining noun vector includes only the remaining noun elements, that is, the remaining noun vector and the remaining noun elements are in one-to-one correspondence ;
S7.2、按照S7.1所述的方式互相对应的一个规范主干系统、一个规范辅助系统和一个剩余名词系统,就构成一个A-B-C联合系统;S7.2. A normative backbone system, a normative auxiliary system and a residual noun system corresponding to each other in the manner described in S7.1 constitute an A-B-C joint system;
S8、任意给定一个A-B-C联合系统,针对该A-B-C联合系统执行整体插空操作;每一个空位,在一次整体插空操作中至多可以接收一个向量,也可以不接收任何向量,即无插空操作;在整体插空操作之前,清除空单元;在整体插空操作中,将构造空位且接收其他向量进入该空位的向量,记为接收向量;将插入其他向量的空位的向量,记为插入向量;S8. For any given ABC joint system, perform the overall blanking operation for the ABC joint system; each slot can receive at most one vector in an overall blanking operation, or no vector, that is, no blanking operation ; Before the overall blanking operation, clear the empty unit; in the overall blanking operation, the vector that constructs a space and receives other vectors into the space is recorded as the received vector; the vector that inserts the space of other vectors is recorded as the inserted vector ;
S8.1、在前述的A-B-C联合系统中,对每一个向量内部的每一个可以用其他向量进行代换的元素,全都使用对应的向量进行等量代换,无论对应的向量是谓语向量还是辅助向量;执行前述的等量代换,直至将每一个向量内部的其他向量全都替换完毕;经过前述的等量代换,如果某一个向量被代入另一个向量内部,那么取消代入另一个向量内部的向量在A-B-C联合系统中的原有位置,从而令经过前述的等量代换操作的两个向量完全融合;通过等量代换,将A-B-C联合系统中原有的向量,全都转化为相互之间不存在元素代入关系的新的向量;以等量代换为界限,将等量代换之前的A-B-C联合系统中的向量称为第I类向量,将等量代换之后的A-B-C联合系统中的向量称为第II类向量;显然,某一个第I类向量和某一个第II类向量,可以是同一个向量,即一个向量在等量代换的之前和之后可以不发生变化;S8.1. In the aforementioned ABC joint system, for each element in each vector that can be replaced by other vectors, all the corresponding vectors are used for equivalent substitution, regardless of whether the corresponding vector is a predicate vector or an auxiliary vector Vector; perform the aforementioned equal substitution until all the other vectors in each vector are replaced; after the aforementioned equal substitution, if a vector is substituted into another vector, then cancel the substitution into the other vector The original position of the vector in the ABC joint system, so that the two vectors after the aforementioned equal substitution operation are completely integrated; through equal substitution, all the original vectors in the ABC joint system are transformed into mutual differences. There is a new vector in which the elements are substituted; taking equal substitution as the limit, the vector in the ABC joint system before the equal substitution is called the I type vector, and the vector in the ABC joint system after the equal substitution It is called a type II vector; obviously, a certain type I vector and a certain type II vector can be the same vector, that is, a vector may not change before and after the equivalent substitution;
S8.2、在A-B-C联合系统中进行第一轮整体插空操作:任取一个第II类向量ω,作为第一轮整体插空操作的接收向量;按照预定的方向逐一标注向量ω中的每一个元素的顺序值;按照已经标注的顺序值,任取向量ω中的第i个元素,仅在该元素的第一侧构造唯一的空位;造空之后,任取一个排除前述的向量ω之外的第II类向量μ,作为第一轮整体插空操作的插入向量;以整体插空的方式,将向量μ插入前述第i个元素对应的空位,进而生成一个新的向量,将这个 新生成的向量记为[ω] i+<μ;将A-B-C联合系统中经过整体插空操作而获得的向量,统称为第III类向量;每一轮整体插空标注的顺序值,仅限于在这一轮整体插空过程中使用; S8.2. Perform the first round of the overall blanking operation in the ABC joint system: take any type II vector ω as the receiving vector of the first round of the overall blanking operation; label each of the vectors ω one by one according to a predetermined direction The order value of an element; according to the order value that has been marked, the i-th element in the vector ω can be selected, and a unique space is constructed only on the first side of the element; after the space is created, any one that excludes the aforementioned vector ω The second type of vector μ outside is used as the insertion vector for the first round of the overall blanking operation; in the way of overall blanking, the vector μ is inserted into the space corresponding to the aforementioned i-th element, and then a new vector is generated. The generated vector is denoted as [ω] i +<μ; the vectors obtained through the overall blanking operation in the ABC joint system are collectively referred to as type III vectors; the order value of the overall blanking labeling in each round is limited to this Used in a round of overall plug-in process;
S8.3、在A-B-C联合系统中进行第二轮整体插空操作:取第III类向量[ω] i+<μ作为第二轮整体插空操作的接收向量;按照预定的方向,对从向量[ω] i+<μ中的第一侧第一个元素开始直到向量[ω] i+<μ包含的向量μ内部的第二侧第一个元素为止的每一个元素,标注顺序值;向量[ω] i+<μ中的其余元素,全都不标注顺序值;按照已经标注的顺序值,取第j个元素,仅在该元素的第一侧构造唯一的空位;造空之后,任取一个之前任何步骤都没有使用过的第II类向量ξ,作为第二轮整体插空操作的插入向量;以整体插空的方式将向量ξ插入前述第j个元素对应的空位,进而生成一个新的向量,将新生成的向量记为[[ω] i\μ] j+<ξ;或者 S8.3. Perform the second round of the overall blanking operation in the ABC joint system: take the type III vector [ω] i +<μ as the receiving vector of the second round of the overall blanking operation; according to the predetermined direction, the slave vector Each element from the first element on the first side in [ω] i +<μ to the first element on the second side inside the vector μ contained in the vector [ω] i +<μ is marked with an order value; vector The rest of the elements in [ω] i +<μ are not marked with the order value; according to the marked order value, the j-th element is taken, and only a unique space is constructed on the first side of the element; after the space is created, you can take any A type II vector ξ that has not been used in any previous steps is used as the insertion vector for the second round of the overall blanking operation; the vector ξ is inserted into the space corresponding to the j-th element in the overall blanking manner, and then a new , The newly generated vector is marked as [[ω] i \μ] j +<ξ; or
取第III类向量[ω] i+<μ作为第二轮整体插空操作的接收向量;按照预定的方向对向量[ω] i+<μ中的每一个元素标注顺序值;按照已经标注的顺序值,任取向量[ω] i+<μ中的第k个元素,仅在该元素的第一侧构造唯一的空位;造空之后,任取一个之前任何步骤都没有使用过的第II类向量ξ,作为第二轮整体插空操作的插入向量;以整体插空的方式将向量ξ插入前述第k个元素对应的空位,进而生成一个新的向量,将新生成的向量记为([ω] i+<μ) k+<ξ;按照该方法进行整体插空操作,如果在执行完S8.4之后出现雷同的结果,那么将雷同的结果合并为一个结果,即将雷同的拼合向量合并为一个拼合向量; Take the type III vector [ω] i +<μ as the receiving vector for the second round of the overall blanking operation; label each element in the vector [ω] i +<μ according to the predetermined direction; Sequence value, any take the kth element in the vector [ω] i +<μ, and only construct a unique vacancy on the first side of the element; after creating a vacancy, take any second II that has not been used in any previous steps The class vector ξ is used as the insertion vector for the second round of the overall blanking operation; the vector ξ is inserted into the space corresponding to the k-th element in the overall blanking method, and then a new vector is generated, and the newly generated vector is recorded as ( [ω] i +<μ) k +<ξ; According to this method, the overall interpolation operation is performed. If the same result appears after the execution of S8.4, then the same result will be merged into one result, that is, the same merged vector Merge into a flat vector;
S8.4、在前述的A-B-C联合系统中,按照下述的方式反复执行S8.3给出的整体插空操作:取前一轮整体插空操作获得的新生成的向量,作为新一轮整体插空操作的接收向量,且任取一个之前任何步骤都没有使用过的第II类向量,作为新一轮整体插空操作的插入向量;反复执行整体插空操作,直至将所有的第II类向量全部插入空位完毕,记为穷尽全部插入向量,且在穷尽全部插入向量的同时获得一个第III类向量;将穷尽全部插入向量的同时获得的第III类向量,记为拼合向量;S8.3共包含2种整体插空操作方法,对于S8.3中的整体插空操作方法的选择,前后步骤要保持一致;将每一轮整体插空操作所采用的第II类向量按顺序依次排列,直至穷尽全部插入向量,就构成了A-B-C联合系统对应的一个插空方案;反复执行从S8.2到S8.4的操作,穷尽插空方案所涉及到的每一轮插空操作中的每一个接收向量内部的每一个元素对应的空位,即穷尽插空方案所涉及到的每一个拼合向量;S8.4. In the aforementioned ABC joint system, the overall insertion operation given in S8.3 is repeatedly executed in the following way: take the newly generated vector obtained from the previous round of overall insertion operation as a new round of overall Insert the received vector of the null operation, and any type II vector that has not been used in any previous steps is used as the insertion vector of the new round of the overall null operation; repeat the overall insert operation until all the II types After all the vectors are inserted into the space, it is recorded as the exhaustion of all the insertion vectors, and a type III vector is obtained while all the vectors are inserted. The type III vector obtained while inserting the exhaustion into the vector is recorded as the combined vector; S8.3 Contains 2 types of overall blanking operation methods. For the selection of the overall blanking operation method in S8.3, the previous and subsequent steps should be consistent; arrange the type II vectors used in each round of the overall blanking operation in order, Until all the insertion vectors are exhausted, a blanking scheme corresponding to the ABC joint system is formed; the operations from S8.2 to S8.4 are repeated to exhaust every round of blanking operations involved in the blanking scheme Receiving the space corresponding to each element in the vector, that is, each combined vector involved in the exhaustive insertion scheme;
S8.5、检查S8.4生成的结果:替换成编号;如果在一个拼合向量内部出现两个顺序逆反的编号,那么该拼合向量不合理,清除该拼合向量;如果在一个拼合向量内部没有出现顺序逆反的编号,那么该拼合向量是合理的,保留该拼合向量;S8.5. Check the result generated by S8.4: replace with a number; if two numbers in reverse order appear in a combined vector, then the combined vector is unreasonable, clear the combined vector; if it does not appear in a combined vector If the number is reversed, the combined vector is reasonable, and the combined vector is retained;
S8.6、在将前述的A-B-C联合系统中的第I类向量全都转化为第II类向量之后,首先将该A-B-C联合系统中的每一个第II类向量全都替换成对应的编号,然后执行前述的整体插空操作;按照任意给定的一个该A-B-C联合系统对应的插空方案,在每一轮整体插空操作中,在接收向量内部的每一个元素的第一侧全都构造一个空位,然后开始筛选合理空位;比较插入向量内部的左侧或右侧第一个编号与待筛选的空位对应的左侧或右侧相邻编号之间的大于或小于关系,且仅选取具有避免出现编号顺序逆反的大于或小于关系的空位作为合理空位,进行插空操作,其余空位都作为不合理空位,无插空操作;如果接收向量内部不存在合理空位,那么说明前述给定的插空方案不合理,结束该插空方案,并更换其他的插空方案;采用该方法进行优化,可以将获得的拼合向量直接记为合理的拼合向量,无需进行编号顺序逆反检查;S8.6. After converting all the type I vectors in the aforementioned ABC joint system into type II vectors, first replace each type II vector in the ABC joint system with corresponding numbers, and then execute the aforementioned The overall blanking operation; according to any given blanking scheme corresponding to the ABC joint system, in each round of the overall blanking operation, a blank is constructed on the first side of each element in the receiving vector, and then Start to filter reasonable gaps; compare the greater or less than relationship between the first number on the left or right side inserted into the vector and the adjacent number on the left or right corresponding to the gap to be filtered, and only select the number sequence to avoid occurrence Inversely, the space that is greater than or less than the relationship is regarded as a reasonable space, and the empty space is inserted, and the remaining space is regarded as an unreasonable space, and no space is inserted; if there is no reasonable space in the receiving vector, then the above-mentioned empty insertion scheme is unreasonable , End the blanking scheme, and replace other blanking schemes; using this method for optimization, the obtained combined vector can be directly recorded as a reasonable combined vector, without the need to reverse the numbering order;
S8.7、运用组合数学中的乘法原理,穷尽每一张词语列表(ii)对应的全部A-B-C联合系统; 进一步地,通过对每一个A-B-C联合系统中的全体第II类向量进行排列组合,穷尽每一个A-B-C联合系统对应的全部插空方案;再进一步地,对每一个插空方案反复执行从S8.2至S8.6的操作,直至穷尽每一个插空方案对应的全部拼合向量;S8.7. Use the principle of multiplication in combinatorics to exhaust all ABC joint systems corresponding to each word list (ii); further, by permuting and combining all type II vectors in each ABC joint system, exhaustive All the blanking schemes corresponding to each ABC joint system; further, the operations from S8.2 to S8.6 are repeated for each blanking scheme until all the stitching vectors corresponding to each blanking scheme are exhausted;
S8.8、句法规则检查:使用自然语言的句法规则,采用概率结合句法规则的方法或依存分析方法,对保留下来的每一个合理的拼合向量及其对应的A-B-C联合系统进行检查;前述的使用句法规则进行检查,应当包括运用事件宾语动词和非事件宾语动词的规则进行检查;所述事件宾语动词,是指自然语言中的只能以事件作为宾语而不能以人或事物作为宾语的动词;所述非事件宾语动词,是指自然语言中的只能以人或事物作为宾语而不能以事件作为宾语的动词;事件宾语动词和非事件宾语动词,可以通过查询词典或统计的方式预先归纳并给出;S8.8. Syntactic rule check: Use the syntactic rules of natural language, and use the method of probability combined with syntactic rules or dependency analysis method to check each reasonable combination vector and its corresponding ABC joint system; the aforementioned use Syntactic rules inspection should include the use of event object verbs and non-event object verbs; the event object verbs refer to verbs in natural language that can only use events as objects but not people or things as objects; The non-event object verbs refer to verbs in natural language that can only take people or things as objects, but not events; event object verbs and non-event object verbs can be summarized in advance by querying a dictionary or statistics Give
S8.9、在执行S8.8的同时,进行句法结构修补;所述的句法结构修补,采用概率结合句法规则的方法或依存分析方法,将遗漏的句法信息重新挖掘出来,且据此修补之前得出的句法结构中存在的缺陷;还可以通过句法结构修补这一环节,对前述保留下来的A-B-C联合系统中的每一个向量在句法结构方面的主要地位和次要地位进行区分和调整;S8.9. While executing S8.8, repair the syntactic structure; the said syntactic structure repair uses the method of probability combined with syntactic rules or the method of dependency analysis to re-excavate the missing syntactic information, and repair the previous Defects in the obtained syntactic structure; this link can also be repaired through the syntactic structure, distinguishing and adjusting the primary and secondary status of each vector in the syntactic structure of the reserved ABC joint system;
S8.10、剩余名词检查:采用概率结合句法规则的方法或依存分析方法,找出合理的剩余名词和不合理的剩余名词,且将包含不合理的剩余名词的A-B-C联合系统舍弃;S8.10. Residual noun check: use probability combined with syntactic rules or dependency analysis method to find reasonable residual nouns and unreasonable residual nouns, and discard the A-B-C joint system containing unreasonable residual nouns;
S9、以经过S8保留下来的若干个A-B-C联合系统所刻画的待解析语句的句法结构的基本框架作为标准,在采用概率结合句法规则的方法或依存分析方法对待解析语句进行分析而获得的数量充足的完整句法结构中,找出符合前述标准的且最合适的完整句法结构;S9. Take the basic framework of the syntactic structure of the sentence to be parsed described by the several ABC joint systems retained by S8 as the standard, and use the method of probability combined with syntactic rules or the dependency analysis method to analyze the sentence to be parsed to obtain sufficient numbers Among the complete syntactic structures of, find the most suitable complete syntactic structure that meets the aforementioned criteria;
S10、以S9生成的若干个完整句法结构为基础,采用语义处理的方法,找出经过前述的句法结构约束的最合适的语义关系,进而将该语义关系对应的前述的完整句法结构作为最终的句法分析结果。S10. Based on several complete syntactic structures generated by S9, using semantic processing methods to find the most suitable semantic relationship subject to the aforementioned syntactic structure constraints, and then take the aforementioned complete syntactic structure corresponding to the semantic relationship as the final Syntactic analysis results.
优选地,步骤S1包括:Preferably, step S1 includes:
S1.1、对于待解析的语句中的每个词的词性,进行计算机自动分析和标注,生成词法分析的结果;S1.1. For the part of speech of each word in the sentence to be parsed, automatic computer analysis and labeling are performed to generate the result of lexical analysis;
S1.2、对于待解析的语句中的谓语动词、基本名词短语、基本形容词短语、基本副词短语等自然语言的要素,进行计算机自动分析和标注;对于相邻并列的名词短语、相邻并列的形容词短语、相邻并列的副词短语等自然语言要素,进行计算机自动分析和标注;S1.2. For natural language elements such as predicate verbs, basic noun phrases, basic adjective phrases, and basic adverb phrases in the sentence to be parsed, automatic computer analysis and labeling; for adjacent noun phrases and adjacent parallel noun phrases Natural language elements such as adjective phrases and adjacent adverb phrases are automatically analyzed and labeled by computer;
S1.3、将各种相邻并列的词性单元合并,且将合并之后的相邻并列的词性单元记为一个对应的词性单元;S1.3. Combine various adjacent part-of-speech units, and record the merged adjacent part-of-speech units as a corresponding part-of-speech unit;
S1.4、针对S1.2和S1.3所述的待解析的语句中的语言信息,开列出一张词语列表,记为词语列表(i);词语列表(i)包括词语、词语对应的属性、词语在句子中的位置信息、标点符号及其在句子中的位置信息;S1.4. For the language information in the sentences to be parsed as described in S1.2 and S1.3, open a list of words and write them as word list (i); word list (i) includes words and word correspondences The attributes of the words, the position information of the words in the sentence, punctuation marks and their position in the sentence;
S1.5、针对词法分析可能产生的多种不同的结果,运用组合数学的相关方法,生成多张不同的词语列表(i),以便容纳多种结构歧义;针对前述生成的多张不同的词语列表(i),分别采用不同的编号加以区分;在所述的预处理操作中,放宽对词法分析结果的限制,将由结构歧义导致的多种不同的词法分析结果通过多张不同的词语列表(i)保留下来,留给后续的句法分析环节和语义处理环节加以辨别和筛选,即通过后续的句法分析环节和语义处理环节对多种不同的词法分析结果加以约束,从而增大最终选取正确的词法分析结果的可能性;S1.5. For the various possible results of lexical analysis, use combinatorial mathematics related methods to generate multiple different word lists (i) to accommodate multiple structural ambiguities; for the multiple different words generated above List (i) is distinguished by different numbers; in the preprocessing operation, the restrictions on the lexical analysis results are relaxed, and multiple different lexical analysis results caused by structural ambiguities are passed through multiple different word lists ( i) Keep it and leave it to the subsequent syntactic analysis link and semantic processing link for identification and screening, that is, through the subsequent syntactic analysis link and semantic processing link, the various lexical analysis results are restricted, thereby increasing the final selection of the correct The possibility of lexical analysis results;
S1.6、针对每一个词语列表(i),采用概率结合句法规则的方法或依存分析方法,将疑问句、 省略句、倒装句等特殊句式检查出来,并对其谓语做相应的形态处理,以便后续步骤的处理;S1.6. For each word list (i), use the method of probability combined with syntactic rules or dependency analysis to check out special sentence patterns such as interrogative sentences, omission sentences, and inverted sentences, and perform corresponding morphological processing of their predicates , In order to deal with the subsequent steps;
S1.7、针对每一个词语列表(i),剔除副词单元、形容词单元、相邻并列的副词单元、相邻并列的形容词单元、感叹词单元、非句子形态的简单插入语成分、小品词单元、相邻并列的小品词单元、无结构歧义的相邻并列的限定词单元、混合修饰单元等待解析的语句中的杂质成分;剔除非句子形态的简单插入语单元两侧的逗号等待解析的语句包含的次要的标点符号。S1.7. For each word list (i), remove adverb units, adjective units, adjacent adverb units, adjacent adjective units, interjection units, simple parentheses in non-sentence forms, and particle units , Adjacent juxtaposed particle units, adjacent juxtaposed qualifier units without structural ambiguity, mixed modifier units, impurity components in sentences waiting to be resolved; commas on both sides of non-sentence simple parentheses units waiting to be resolved are removed Contains minor punctuation marks.
优选地,所述步骤S2包括:Preferably, the step S2 includes:
S2.1、针对每一个词语列表(i),读取待解析的经过前述的预处理的语句数据结构,所述经过前述的预处理的语句数据结构包括如下信息:S2.1. For each word list (i), read the sentence data structure that has been preprocessed to be parsed, and the sentence data structure that has been preprocessed includes the following information:
(1),用于连接句子的并列关联词单元;(1) Coordinate related word units used to connect sentences;
(2),不用于连接句子的并列关联词单元;不用于连接句子的并列关联词单元的作用是连接句子内部的各种并列成分;(2) The coordinate related word unit not used to connect sentences; the role of the coordinate related word unit not used to connect sentences is to connect various coordinate components within the sentence;
(3),谓语动词单元、从属关联词单元、基本名词单元、不定式动词单元、动名词-现在分词单元、过去分词单元、介词单元、相邻并列的谓语动词组合单元、相邻并列的从属关联词组合单元、相邻并列的基本名词组合单元、相邻并列的不定式动词组合单元、相邻并列的动名词-现在分词组合单元、相邻并列的过去分词组合单元、相邻并列的介词组合单元;(3) Predicate verb unit, subordinate related word unit, basic noun unit, infinitive verb unit, gerund-present participle unit, past participle unit, preposition unit, adjacent predicate verb combination unit, adjacent parallel subordinate related words Combination unit, adjacent parallel basic noun combination unit, adjacent parallel infinitive verb combination unit, adjacent parallel gerund-present participle combination unit, adjacent parallel past participle combination unit, adjacent parallel preposition combination unit ;
(4),疑问词单元、相邻并列的疑问词组合单元、有结构歧义的限定词单元;(4) Interrogative unit, adjacent interrogative combination unit, and structurally ambiguous qualifier unit;
(5),包含谓语动词单元的插入语成分;(5), including the parenthesis component of the predicate verb unit;
(6),主要的标点符号;(6), the main punctuation marks;
S2.2、针对前述的S2.1中的语句数据结构,生成词语列表(ii);词语列表(ii)包括前述的词语、前述的词语对应的属性、依据自然语言的行文顺序对前述的词语按照从小到大的数字顺序标注的编号、主要的标点符号。S2.2. Generate a word list (ii) for the sentence data structure in the aforementioned S2.1; the word list (ii) includes the aforementioned words, the attributes corresponding to the aforementioned words, and the comparison of the aforementioned words according to the natural language sequence The numbers and main punctuation marks are marked in descending order of numbers.
优选地,所述步骤S3包括:Preferably, the step S3 includes:
S3.1、根据所述谓语元素、并列引导语元素、从属引导语元素、主语元素、第一位置宾语元素、第二位置宾语元素的可能取值,获取每一个谓语元素对应的谓语向量的所有可能取值;所述谓语向量包括并列引导语元素、从属引导语元素、主语元素、谓语元素、第一位置宾语元素、第二位置宾语元素;S3.1. Obtain all the predicate vectors corresponding to each predicate element according to the possible values of the predicate element, the parallel guide element, the subordinate guide element, the subject element, the first position object element, and the second position object element Possible values; the predicate vector includes a parallel guide element, a subordinate guide element, a subject element, a predicate element, a first-position object element, and a second-position object element;
S3.2、针对待解析语句中的每一个谓语向量,分别任取一个该谓语向量对应的可能取值,从而获得一组全体谓语向量对应的可能取值;将前述的一组全体谓语向量对应的可能取值按照固定顺序排列,构成一个n行6列矩阵;将前述的一个n行6列矩阵,称为一个主干系统;S3.2. For each predicate vector in the sentence to be parsed, choose any possible value corresponding to the predicate vector to obtain a set of possible values corresponding to the entire predicate vector; correspond to the aforementioned set of all predicate vectors The possible values of is arranged in a fixed order to form a matrix of n rows and 6 columns; the aforementioned matrix of n rows and 6 columns is called a backbone system;
S3.3、将任意给定的一个主干系统中的每一个谓语向量内部的每一个排除向量之外的元素全都替换为对应的编号;替换编号之后,检查该主干系统;如果在该主干系统中出现下述不合理的情况,那么清除该主干系统;如果在该主干系统中没有出现下述不合理的情况,那么保留该主干系统;将保留下来的主干系统称为规范主干系统:S3.3. Replace every element outside of each predicate vector in any given backbone system with a corresponding number; after replacing the number, check the backbone system; if in the backbone system If the following unreasonable conditions occur, then the backbone system should be cleared; if the following unreasonable conditions do not occur in the backbone system, then the backbone system should be retained; the remaining backbone system is called the standardized backbone system:
S3.3.1、检查前述的主干系统:对比词语列表(ii),如果存在没有进入该主干系统的用于连接句子的并列关联词单元或从属关联词单元或相邻并列的从属关联词组合单元,那么该主干系统不合理,清除该主干系统;S3.3.1. Check the aforementioned backbone system: compare the word list (ii), if there is a parallel related word unit or subordinate related word unit or adjacent parallel subordinate related word combination unit for connecting sentences that does not enter the main system, then the main The system is unreasonable, clear the backbone system;
S3.3.2、检查前述的主干系统:如果在两个不同的谓语向量中出现相同的编号或相同的谓语向量或相同的不定式向量或相同的动名词-现在分词向量,那么该主干系统不合理,清除该主干系统;S3.3.2. Check the aforementioned backbone system: If the same number or the same predicate vector or the same infinitive vector or the same gerund-present participle vector appears in two different predicate vectors, then the backbone system is unreasonable To clear the backbone system;
S3.3.3、检查前述的主干系统:如果在一个谓语向量内部出现两个顺序逆反的编号,那么该主干系统不合理,清除该主干系统;S3.3.3. Check the aforementioned backbone system: if there are two numbers in reverse order in a predicate vector, then the backbone system is unreasonable, and the backbone system is cleared;
S3.3.4、检查前述的主干系统:将两两之间存在元素代入关系的任意两个谓语向量,全都进行等量代换;如果出现向量之间的代入交叉矛盾,那么该主干系统不合理,清除该主干系统;如果在等量代换之后出现两个顺序逆反的编号,那么该主干系统不合理,清除该主干系统;S3.3.4. Check the aforementioned backbone system: replace any two predicate vectors with elements in the relationship between them, all of which are replaced by equal amounts; if there is a cross contradiction between the substitutions between the vectors, then the backbone system is unreasonable. Clear the backbone system; if two numbers in reverse order appear after equal substitutions, then the backbone system is unreasonable, and the backbone system is cleared;
S3.3.5、检查过后,恢复到检查之前的原状,以备后续的各项操作使用。S3.3.5. After the inspection, return to the original state before the inspection for use in subsequent operations.
优选地,在执行S3.2的过程中,同步执行S3.3的检查程序,阻止不合理的主干系统的生成。Preferably, in the process of executing S3.2, the inspection program of S3.3 is executed synchronously to prevent the generation of unreasonable backbone systems.
附图说明Description of the drawings
通过以下参照附图对本发明实施例的描述,本发明的上述以及其他目的、特征和优点将更为清楚,在附图中:Through the following description of the embodiments of the present invention with reference to the accompanying drawings, the above and other objectives, features, and advantages of the present invention will be more apparent, in the accompanying drawings:
图1是Berkeley Parser做出的针对例句“That men who were appointed didn't bother the liberals wasn't remarked upon by the press.”的错误解析结果截图;Figure 1 is a screenshot of the wrong analysis result of the example sentence "That men who wereappointeddidn'tbother the liberals wasn'tremarkedupon by the press" made by Berkeley Parser;
图2是Berkeley Parser做出的针对例句“That something you learned is wrong is known to the public.”的错误解析结果截图;Figure 2 is a screenshot of the wrong analysis result of the example sentence "Thatsomething you learned is wrong is known to the public." made by Berkeley Parser;
图3是本发明提供的针对例句“That men who were appointed didn't bother the liberals wasn't remarked upon by the press.”的第1种正确解析结果示意图;Fig. 3 is a schematic diagram of the first correct analysis result for the example sentences "That men who were appointed, didn't other, the liberals wasn't remarked, up by the press." provided by the present invention;
图4是本发明提供的针对例句“That men who were appointed didn't bother the liberals wasn't remarked upon by the press.”的第2种正确解析结果示意图;Fig. 4 is a schematic diagram of the second correct analysis result for the example sentences "That men who were appointed did not have the liberals wasn't remarked up by the press." provided by the present invention;
图5是本发明提供的针对例句“That something you learned is wrong is known to the public.”的正确解析结果示意图;Fig. 5 is a schematic diagram of the correct analysis result of the example sentence "That something you learned is wrong is known to the public." provided by the present invention;
图6是Berkeley Parser做出的针对例句“That that men were appointed didn't bother the liberals wasn't remarked upon by the press.”的错误解析结果截图;Figure 6 is a screenshot of the wrong parsing result of the example sentence "That that men were appointed didn't other the liberals wasn't remarked up by the press." made by Berkeley Parser;
图7是Berkeley Parser做出的针对例句“That that that men were appointed didn't bother the liberals wasn't remarked upon by the press upset many women.”的错误解析结果截图;Figure 7 is a screenshot of the wrong parsing result of the example sentence "That that that men were appointed didn't other the liberals wasn't remarked up by the press upset many women." by Berkeley Parser;
图8是本发明提供的针对例句“That that men were appointed didn't bother the liberals wasn't remarked upon by the press.”的正确解析结果示意图;FIG. 8 is a schematic diagram of the correct analysis result of the example sentence "That that men were appointed did not't other the liberals wasn't remarked up by the press." provided by the present invention;
图9是本发明提供的针对例句“That that that men were appointed didn't bother the liberals wasn't remarked upon by the press upset many women.”的正确解析结果示意图;Fig. 9 is a schematic diagram of the correct analysis result of the example sentences "That that that men were appointed the liberals wasn't remarked upon by the press upset many women." provided by the present invention;
图10是本发明提供的针对例句“Behaviorists suggest the child who is raised in an environment where there are many stimuli which develop his or her capacity for appropriate responses will experience greater intellectual development.”的正确解析结果示意图;Figure 10 is the correct analysis result of the example sentence "Behaviorists suggest the child who is raised in an environment where there are many stimuli which develop his or her capacity for appropriate response response greater" provided by the present invention.
图11是本发明提供的针对例句“Believing that what he wants will occur,Tom works hard in the company.”的正确解析结果示意图;FIG. 11 is a schematic diagram of the correct analysis result of the example sentence "Believing that what he wants, Tom works hard in the company." provided by the present invention;
图12是Berkeley Parser做出的针对例句“A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10destinations in the world.”的错误解析结果截图;Figure 12 is a screenshot of the wrong analysis result of the example sentence "A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10destinations in the world." made by Berkeley Parser;
图13是本发明提供的针对例句“A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10destinations in the world.”的正确解析结果示意图;Figure 13 is a schematic diagram of the correct analysis result of the example sentence "A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10 destinations in the world." provided by the present invention;
图14是Berkeley Parser做出的针对例句“That nearly all behavior is learned behavior is a basic  assumption that has been put forward by the social scientists.”的错误解析结果截图;Figure 14 is a screenshot of the wrong analysis result of the example sentence "That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists." made by Berkeley Parser;
图15是本发明提供的针对例句“That nearly all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.”的正确解析结果示意图;Fig. 15 is a schematic diagram of the correct analysis result of the example sentence "That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists." provided by the present invention;
图16是Berkeley Parser做出的针对例句“Jack met the patient the nurse the clinic had hired sent to the doctor.”的错误解析结果截图;Figure 16 is a screenshot of the error analysis result of the example sentence "Jack met the patient the nurse" by Berkeley Parser; the clinic had hired sent to the doctor;
图17是本发明提供的针对例句“Jack met the patient the nurse the clinic had hired sent to the doctor.”的正确解析结果示意图;Figure 17 is a schematic diagram of the correct analysis result of the example sentence "Jack met the patient the nurse the clinic had hired to the doctor" provided by the present invention;
图18是Berkeley Parser做出的针对例句“Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.”的错误解析结果截图;Figure 18 is a screenshot of the wrong analysis result of the example sentence "Jack met the boy the nurse had hired sent to the ward introduced to the patient." made by Berkeley Parser;
图19是本发明提供的针对例句“Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.”的正确解析结果示意图;Figure 19 is a schematic diagram of the correct analysis result of the example sentences "Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient" provided by the present invention;
图20是Berkeley Parser做出的针对例句“This is the malt the rat the cat the dog worried killed ate.”的错误解析结果截图;Figure 20 is a screenshot of the wrong analysis result of the example sentence "This is the malt the rat the cat the dog worried killed by Berkeley Parser";
图21是本发明提供的针对例句“This is the malt the rat the cat the dog worried killed ate.”的正确解析结果示意图;FIG. 21 is a schematic diagram of the correct analysis result of the example sentence "This is the malt the rat the cat the dog worried killed ate." provided by the present invention;
图22是Berkeley Parser做出的针对例句“Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.”的错误解析结果截图;Figure 22 is a screenshot of the wrong analysis result of the example sentence "Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life." made by Berkeley Parser;
图23是本发明提供的针对例句“Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.”的正确解析结果示意图;Figure 23 is a schematic diagram of the correct analysis result of the example sentence "Part of the reason Charles Dickens loved his own novel was closely modeled on his own life." provided by the present invention;
图24是针对例1的第1种整体插空方法的步骤图(一);Figure 24 is a step diagram (1) of the first overall inserting method for Example 1;
图25是针对例1的第1种整体插空方法的步骤图(二);Figure 25 is a step diagram (2) of the first overall inserting method for Example 1;
图26是针对例1的第1种整体插空方法的步骤图(三);Figure 26 is a step diagram (3) of the first overall inserting method for Example 1;
图27是针对例1的第1种整体插空方法的步骤图(四);Figure 27 is a step diagram (4) of the first overall inserting method for Example 1;
图28是例1的A 1-B 1-C 1联合系统刻画的句法结构的基本框架图; Figure 28 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 1;
图29是针对例1的第2种整体插空方法的步骤图(一);Figure 29 is a step diagram (1) of the second method of overall insertion for example 1;
图30是针对例1的第2种整体插空方法的步骤图(二);Figure 30 is a step diagram (2) of the second method of overall insertion for example 1;
图31是针对例1的第1种和第2种整体插空方法的优化方法的步骤图;FIG. 31 is a step diagram of the optimization method for the first and second overall interpolation methods of Example 1;
图32是例2的A 1-B 1-C 1联合系统刻画的句法结构的基本框架图; Figure 32 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 2;
图33是例3的A 1-B 1-C 1联合系统刻画的句法结构的基本框架图; Figure 33 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 3;
图34是例4的A 1-B 1-C 1联合系统刻画的句法结构的基本框架图; Figure 34 is the basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 4;
图35是例5的五轮整体插空操作图;Figure 35 is a five-wheel overall inserting operation diagram of Example 5;
图36是例6的A 1-B 1-C 1联合系统刻画的句法结构的基本框架图; Figure 36 is a basic frame diagram of the syntactic structure described by the A 1 -B 1 -C 1 joint system of Example 6;
图37是例8的A a-B a-C a联合系统对应的完整句法结构的直观形态图; Figure 37 is an intuitive morphological diagram of the complete syntactic structure corresponding to the A a -B a -C a joint system of Example 8;
图38是例8的A b-B b-C b联合系统对应的完整句法结构的直观形态图; Figure 38 is an intuitive morphological diagram of the complete syntactic structure corresponding to the A b -B b -C b joint system of Example 8;
图39是例8的A a-B a-C a联合系统对应的经过句法结构约束的语义关系图; Figure 39 is a semantic relationship diagram of the syntactic structure constraint corresponding to the A a -B a -C a joint system of Example 8;
图40是例8的A b-B b-C b联合系统对应的经过句法结构约束的语义关系图; Fig. 40 is a semantic relation diagram of syntactic structure constraints corresponding to the A b -B b -C b joint system of Example 8;
图41是例9的A a-B a-C a联合系统对应的完整句法结构的整体插空过程图; Figure 41 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A a -B a -C a joint system of Example 9;
图42是例10的A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程图; Figure 42 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 10;
图43是例11的A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程图; Figure 43 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 11;
图44是例17的A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程图; Figure 44 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 17;
图45是本发明提供的针对例句“That men the nurse the doctor the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't remarked upon by the press.”的正确解析结果示意图;Figure 45 is a schematic diagram of the correct analysis result of the example sentence "That men the next the doctor the clinic had hired sent to the ward introduced to the cleaners didn't other the patients wasn't marked up by the press." provided by the present invention;
图46是Berkeley Parser做出的针对例句“That men the nurse the doctor the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't remarked upon by the press.”的错误解析结果截图;Figure 46 is a screenshot of the example sentence "That men the nurse the clinic had hired sent to the cleaners didn't bother the patients wasn't marked up by the press." by Berkeley Parser. ;
图47是例18的A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程图; Figure 47 is a diagram of the overall insertion process of the complete syntax structure corresponding to the A 1 -B 1 -C 1 joint system of Example 18;
图48是本发明提供的针对例句“That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't remarked upon by the press.”的正确解析结果示意图;Figure 48 is a schematic diagram of the correct analysis result of the example sentence "That men the cleaner introduced to the nurses the doctor the clinic had hired to the ward didn't other the patients wasn't marked up by the press." provided by the present invention;
图49是Berkeley Parser做出的针对例句“That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't remarked upon by the press.”的错误解析结果截图;Figure 49 is a screenshot of the example sentence "That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't marked up by the press." taken by Berkeley Parser. ;
图50是第2个计算区域(β区域)包含的所有环节和算法示意图。Figure 50 is a schematic diagram of all the links and algorithms included in the second calculation area (β area).
具体实施方式:引入一些重要的定义,在下文的讲解中,会用到这些定义:Specific implementation mode: Introduce some important definitions, which will be used in the following explanation:
下文的讲解所针对的自然语言,包括但不限于英语语言。将语句的内部成分划分为4个类别:杂质成分、主干成分、辅助成分、剩余名词成分。The natural language for the following explanations, including but not limited to English language. The internal components of the sentence are divided into 4 categories: impurity components, main components, auxiliary components, and remaining noun components.
在实施计算机句法分析的过程中,首先,将副词单元、形容词单元、感叹词单元、小品词单元、混合修饰单元、相邻并列的副词单元、相邻并列的形容词单元等待解析语句中的杂质成分去掉。其次,以谓语为单位,将待解析语句中的每一个主谓搭配(简单句)连同其中的主干成分,全都处理成一个谓语向量,进而全体谓语向量形成一个n行6列的矩阵结构,作为主干系统。再次,将每一个不定式结构、过去分词结构、介词结构等辅助成分全都处理成一个辅助向量,进而全体辅助向量形成一个集合,作为辅助系统。最后,在可能产生的主干系统和辅助系统中挑选出合理的搭配,作为规范主干系统和规范辅助系统,进而将每一个不能进入规范主干系统和规范辅助系统的剩余名词成分处理成一个剩余名词向量,进而全体剩余名词向量形成一个集合,作为剩余名词系统。In the process of computer syntactic analysis, first, the adverb unit, adjective unit, interjection unit, particle unit, mixed modifier unit, adjacent adverb unit, and adjacent adjective unit are waiting to be analyzed for the impurities in the sentence Remove. Secondly, taking the predicate as the unit, each subject-predicate collocation (simple sentence) in the sentence to be parsed and its main components are processed into a predicate vector, and then all the predicate vectors form a matrix structure of n rows and 6 columns, as The backbone system. Thirdly, each auxiliary component such as infinitive structure, past participle structure, and preposition structure is processed into an auxiliary vector, and then all auxiliary vectors form a set as an auxiliary system. Finally, select a reasonable collocation from the possible main system and auxiliary system as the normative main system and normative auxiliary system, and then process each remaining noun component that cannot enter the normative main system and normative auxiliary system into a residual noun vector , And then all remaining noun vectors form a set as the remaining noun system.
定义1:定义+<为数学当中的一种有序的加法运算:设S是一个待解析的英文语句,设a和b是待解析的语句S中的两个不同的词语,如果(a,b)满足+<,那么词语a在语句S中的编号小于词语b在语句S中的编号,即,a+<b表示词语a在语句S中的编号小于词语b在语句S中的编号。Definition 1: Define +< as an ordered addition operation in mathematics: Let S be an English sentence to be parsed, and let a and b be two different words in the sentence S to be parsed. If (a, b) If +< is satisfied, then the number of word a in sentence S is less than the number of word b in sentence S, that is, a+<b means that the number of word a in sentence S is less than the number of word b in sentence S.
定义2:设S是一个待解析的英文语句,设f是英文语句S中的任意一个谓语向量。定义6个与谓语向量f相关的变量c,l,x,r,y,z:将c记为谓语向量f中的并列引导语元素;将l记为谓语向量f中的从属引导语元素,将x记为谓语向量f中的主语元素,将r记为谓语向量f中的谓语元素,将y记为谓语向量f中的第一位置宾语元素,将z记为谓语向量f中的第二位置宾语元素。如果将c,l,x,r,y,z看作6个自变量,那么谓语向量f可以看作是由前述的6个自变量构成的一个6元函数。由此,在剔除谓语向量f中的副词单元、相邻并列的副词单元、混合修饰单元、感叹词单元、小品词单元等杂质成分之后,可以获得一个刻画该谓语向量f主干成分的6元函数表达式:f(c,l,x,r,y,z)=c+<l+<x+<r+<y+<z。还可以采用数学集合 论中的表示方法,将前述的谓语向量f记为6元有序组(c,l,x,r,y,z)的形式。Definition 2: Let S be an English sentence to be parsed, and let f be any predicate vector in the English sentence S. Define 6 variables c, l, x, r, y, z related to the predicate vector f: record c as the coordinating guide element in the predicate vector f; record l as the subordinate guide element in the predicate vector f, Denote x as the subject element in the predicate vector f, r as the predicate element in the predicate vector f, y as the object element in the first position in the predicate vector f, and z as the second element in the predicate vector f Location object element. If c, l, x, r, y, z are regarded as 6 independent variables, then the predicate vector f can be regarded as a 6-ary function composed of the aforementioned 6 independent variables. Therefore, after removing the adverb unit, adjacent adverb unit, mixed modifier unit, interjection unit, particle unit and other impurity components in the predicate vector f, a 6-element function can be obtained that describes the main component of the predicate vector f Expression: f(c,l,x,r,y,z)=c+<l+<x+<r+<y+<z. It is also possible to use the representation method in mathematical set theory to record the aforementioned predicate vector f as a 6-element ordered group (c, l, x, r, y, z).
定义3:设前述的待解析的语句S共有n个谓语。依据前述的定义,将n个谓语对应的每一个谓语向量按照6元函数的形态表达出来,可以将待解析的语句S表达为一个n行6列的矩阵结构。如果将该矩阵中的每一个自变量赋予一个具体取值,即将该矩阵中的每一个谓语向量赋予一个具体取值,那么该矩阵也相应地获得了一组具体取值。将前述的该矩阵结构对应的一组具体取值,称为语句S的一个主干系统,也称为一个A系统。如下所示:Definition 3: Suppose the aforementioned sentence S to be parsed has n predicates. According to the aforementioned definition, each predicate vector corresponding to n predicates is expressed in the form of a 6-element function, and the sentence S to be analyzed can be expressed as a matrix structure of n rows and 6 columns. If each independent variable in the matrix is assigned a specific value, that is, each predicate vector in the matrix is assigned a specific value, then the matrix also obtains a set of specific values accordingly. The set of specific values corresponding to the aforementioned matrix structure is called a backbone system of sentence S, which is also called an A system. As follows:
Figure PCTCN2019100638-appb-000001
Figure PCTCN2019100638-appb-000001
定义4:定义语句中的6种辅助向量。设在前述的待解析的语句S中:将不定式向量记为g[To VB](u,v);将动名词-现在分词向量记为g[VBG](u,v);将过去分词向量记为g[VBN](u,v);将介词向量记为g[PREP](u)。对于在同一个语句中出现的多个相同种类的辅助向量,采用数字标记加以区分,如:g[To VB,1](u,v),g[To VB,2](u,v),……,或g[VBG,1](u,v),g[VBG,2](u,v),……,或g[VBN,1](u),g[VBN,2](u),……,或g[PREP,1](u),g[PREP,2](u),……。其中,每一个辅助向量中的自变量u和v,分别代表以该辅助向量的名称命名的第一位置宾语元素或第二位置宾语元素或宾语元素。Definition 4: Define the 6 auxiliary vectors in the sentence. Suppose in the aforementioned sentence S to be analyzed: record the infinitive vector as g[To VB](u,v); record the gerund-present participle vector as g[VBG](u,v); record the past participle The vector is denoted as g[VBN](u,v); the preposition vector is denoted as g[PREP](u). For multiple auxiliary vectors of the same type that appear in the same sentence, they are distinguished by number marks, such as: g[To VB,1](u,v), g[To VB,2](u,v), ……, or g[VBG,1](u,v), g[VBG,2](u,v),……, or g[VBN,1](u), g[VBN,2](u ),……, or g[PREP,1](u), g[PREP,2](u),……. Among them, the independent variables u and v in each auxiliary vector respectively represent the first-position object element or the second-position object element or the object element named after the name of the auxiliary vector.
特别说明:属于动词不定式范畴的各种形态,都通过g[To VB](u,v)来表达,例如:使用计算语言学符号表达的形态To VB,To VB VBN,To VB VBN VBN,To VB VBG等等;属于动名词-现在分词范畴的各种形态,都通过g[VBG](u,v)来表达,例如:使用计算语言学符号表达的形态VBG,VBG VBN,VBG VBN VBN等等。Special note: various forms that belong to the category of verb infinitives are expressed by g[To VB](u,v), for example: the forms expressed using computational linguistic symbols To VB, To VB VBN, To VB VBN VBN, To VB VBG, etc.; various forms that belong to the category of gerunds-present participles are expressed by g[VBG](u,v), for example: forms expressed using computational linguistic symbols VBG, VBG VBN, VBG VBN VBN and many more.
定义5:将全体辅助向量记为一个集合,将该集合称为待解析的语句S的辅助系统,也称为B系统。如下所示:Definition 5: Record all auxiliary vectors as a set, which is called the auxiliary system of the sentence S to be parsed, also called the B system. As follows:
Figure PCTCN2019100638-appb-000002
Figure PCTCN2019100638-appb-000002
注:定义3、定义4和定义5中的“数字标记”,仅用于在多个同类的向量之间进行区分和标记,与本申请方案中的“编号”不是同一个概念,不要混淆。Note: The "number mark" in Definition 3, Definition 4 and Definition 5 is only used to distinguish and mark between multiple similar vectors. It is not the same concept as the "number" in the proposal of this application, so do not confuse it.
定义6:将前述的谓语向量、辅助向量以及本申请方案中提到的剩余名词向量,统称为语言向量。任给两个语言向量α和β,且α和β都不是剩余名词向量,如果语言向量β在语言向量α中充当α的主语元素或第一位置宾语元素或第二位置宾语元素或不定式第一位置宾语元素或不定式第二位置宾语元素或动名词-现在分词第一位置宾语元素或动名词-现在分词第二位置宾语元素或过去分词宾语元素或介词宾语元素,那么就称语言向量α和β具有复合关系,记为向量α复合了向量β,或者向量β被向量α复合。语言向量之间的复合关系,在本申请方案中也称为“元素代入关系”。Definition 6: The aforementioned predicate vector, auxiliary vector, and the remaining noun vectors mentioned in the solution of this application are collectively referred to as language vectors. Any given two language vectors α and β, and α and β are not residual noun vectors, if the language vector β acts as the subject element of α or the object element in the first position or the object element in the second position or the infinitive in the language vector α One position object element or infinitive second position object element or gerund-present participle first position object element or gerund-present participle second position object element or past participle object element or preposition object element, then it is called language vector α It has a compound relationship with β, which is recorded as vector α compounding vector β, or vector β compounding vector α. The compound relationship between language vectors is also referred to as "element substitution relationship" in the solution of this application.
两点特别说明:(i)辅助向量带有一定的特殊性。通常是谓语向量复合了辅助向量;但是有的时候会反过来,辅助向量复合了谓语向量。对此,本申请方案做了相应的技术处理。(ii)下文提到的语言向量之间的整体插空的概念,以本申请方案的S8的讲解为准。Two special notes: (i) Auxiliary vector has certain particularity. Usually the predicate vector is compounded with the auxiliary vector; but sometimes the other way around, the auxiliary vector is compounded with the predicate vector. In this regard, the solution of this application has done corresponding technical processing. (ii) The concept of overall interpolation between language vectors mentioned below is subject to the explanation of S8 of the solution of this application.
下面以英语为例,阐述一条规律。语句的构成,遵循这样一条规律:任何一个复杂语句的句法结构的主要部分,都是以多个语言向量之间的复合与整体插空两种结合方式为基础,经过这两种结合方式的某种搭配而构成的。从概率和统计的数学角度衡量,上述规律是一种确定性事件,可以通过在语料库中进行统计而获得验证,即在任意一个以规范语句为样本的英语句子样本空间中,符合上述规律的复杂语句的概率全都是1。上述规律,是计算机自然语言处理中常见的远距离相关问题和深层递归嵌套问题产生的根源所在,也是本发明解决技术问题的一个重要出发点。Take English as an example to illustrate a rule. The composition of sentences follows this rule: the main part of the syntactic structure of any complex sentence is based on the combination of multiple language vectors and the overall interpolation. It is composed of a combination. From the mathematical point of view of probability and statistics, the above rule is a deterministic event, which can be verified by performing statistics in a corpus, that is, in any English sentence sample space with a standard sentence as a sample, the above rule is complicated The probabilities of the sentence are all 1. The above-mentioned law is the source of the common long-distance related problems and deep recursive nesting problems in computer natural language processing, and is also an important starting point for the present invention to solve the technical problems.
本专利申请,依据数学和计算机科学的相关自然规律,综合运用穷举、排列组合、比较自然数大小、排除自然数逆序、概率计算等数学和计算机科学的方法,建立解决问题所需的数学模型。In this patent application, based on the relevant natural laws of mathematics and computer science, comprehensive use of mathematics and computer science methods such as exhaustion, permutation and combination, comparison of natural numbers, excluding the reverse order of natural numbers, and probability calculations, establishes the mathematical model needed to solve the problem.
实例操作:Example operation:
例1:That men who were appointed didn't bother the liberals wasn't remarked upon by the press.Example 1: That men who were appointed didn't bother the liberals wasn't remarked up by the press.
本例句经过预处理,生成词语列表(i-a)和词语列表(i-b)。由于例句中的单词that具有结构歧义(structural ambiguity),that既有可能是从属关联词单元又有可能是限定词单元,所以生成两张词语列表(i),并且对这两张词语列表(i)加以不同的标识。This example sentence is preprocessed to generate a word list (i-a) and a word list (i-b). Since the word that in the example sentence has structural ambiguity, that may be both a subordinate related word unit and a qualifier unit, so two word lists (i) are generated, and the two word lists (i) Give different marks.
当句子中存在结构歧义的时候,就需要对句子开列多张词语列表(i);开列词语列表(i)的个数,可以按照结构歧义的个数,运用组合数学中的乘法原理而获得。该例句还包含一个结构歧义:upon既有可能是小品词又有可能是介词,但是限于篇幅就不再专门分析了。When there is structural ambiguity in a sentence, multiple word lists (i) need to be drawn up for the sentence; the number of word lists (i) can be obtained according to the number of structural ambiguities, using the principle of multiplication in combinatorics. This example sentence also contains a structural ambiguity: "upon" may be both a particle and a preposition, but due to space limitations, it will not be analyzed specifically.
词语列表(i-a):Word list (i-a):
Figure PCTCN2019100638-appb-000003
Figure PCTCN2019100638-appb-000003
词语列表(i-b):Word list (i-b):
Figure PCTCN2019100638-appb-000004
Figure PCTCN2019100638-appb-000004
对上述的词语列表(i-a)和词语列表(i-b),剔除其中的形容词单元、副词单元、相邻并列的形容词单元、相邻并列的副词单元、非句子形态的简单插入语单元、小品词单元、相邻并列的小品词单元、感叹词单元等作为杂质的自然语言要素,进而读取待解析的经过预处理的语句数据结构,并生成对应的词语列表(ii-a)和词语列表(ii-b),如下所示。For the above word list (ia) and word list (ib), remove the adjective unit, adverb unit, adjacent adjective unit, adjacent adjacent adverb unit, non-sentence simple parenthesis unit, and particle unit , Adjacent parallel particle units, interjection units and other natural language elements as impurities, and then read the pre-processed sentence data structure to be parsed, and generate the corresponding word list (ii-a) and word list (ii -b), as shown below.
词语列表(ii-a):Word list (ii-a):
Figure PCTCN2019100638-appb-000005
Figure PCTCN2019100638-appb-000005
词语列表(ii-b):Word list (ii-b):
Figure PCTCN2019100638-appb-000006
Figure PCTCN2019100638-appb-000006
接下来,本专利申请以词语列表(i-a)和与之相对应的词语列表(ii-a)为例,展开分析和讲解:Next, this patent application takes the word list (i-a) and the corresponding word list (ii-a) as examples to analyze and explain:
本例句共有3个谓语动词单元were appointed,didn’t bother,wasn’t remarked;由此可知,本例句包含3个谓语元素,依次记为r 1,r 2,r 3;进而,针对这3个谓语元素,生成对应的谓语向量f 1,f 2,f 3;谓语向量f 1,f 2,f 3中的每一个元素的取值如下: This example sentence has 3 predicate verb units were appointed, didn't bother, wasn't remarked; it can be seen that this example sentence contains 3 predicate elements, which are recorded as r 1 , r 2 , and r 3 in turn; furthermore, for these 3 Predicate elements to generate corresponding predicate vectors f 1 , f 2 , f 3 ; the value of each element in the predicate vectors f 1 , f 2 , and f 3 is as follows:
①对于f 1有: ①For f 1 :
将r 1的所有可能取值的全体记为{r 1};依据申请方案S3中的信息,显然:{r 1}={were appointed}; R 1 all the possible values of all referred to as {r 1}; S3 based on the information in the application program, it is clear: {r 1} = {were appointed};
将c 1的所有可能取值的全体记为{c 1};依据申请方案S3中的信息,可得:{c 1}={e}。 All possible values of c 1 is the entire note {c 1}; S3 based on the information in the application program, can be obtained: {c 1} = {e }.
将l 1的所有可能取值的全体记为{l 1};依据申请方案S3中的信息,可得:{l 1}={That,who,e}。 All possible values for all note l 1 {l 1}; S3 based on the information in the application program, can be obtained: {l 1} = {That , who, e}.
将x 1的所有可能取值的全体记为{x 1};依据申请方案S3中的信息,可得:{x 1}={men,e}。 All possible values of x 1 is referred to all {x 1}; S3 based on the information in the application program, can be obtained: {x 1} = {men , e}.
将y 1所有可能取值的全体记为{y 1};依据申请方案S3中的信息可得:{y 1}={f 2,f 3,e}。 Y 1 all the possible values of all referred to as {y 1}; S3 based on the information in the application program can be obtained: {y 1} = {f 2, f 3, e}.
将z 1所有可能取值的全体记为{z 1};虽然当前对应的谓语元素were appointed是由可接宾语结合宾语补足语的动词构成的单元,但是该谓语元素对应的第一位置宾语元素,既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么依据申请方案S3中的信息,可得:{z 1}={e}。可接双宾语的动词,例如:give,buy,sell,offer等;可接宾语结合宾语补足语的动词,例如:make,name,call,find等;前述的动词可以通过查询词典或统计的方式预先归纳并给出。 Z 1 all the possible values of all referred to as {z 1}; unit while the current corresponding to the predicate element were appointed by the object can be accessed in conjunction with the verb object complement, but the predicate element position corresponding to a first object element , Is neither a basic noun unit nor an adjacent basic noun combination unit, then according to the information in the application plan S3, we can get: {z 1 }={e}. Verbs that can accept double objects, such as: give, buy, sell, offer, etc.; can accept verbs that combine objects with object complements, such as: make, name, call, find, etc.; the aforementioned verbs can be searched in dictionaries or statistics Summarize and give in advance.
前述的过程,已经生成了谓语向量f 1中的每一个元素的所有可能取值。谓语向量f 1的所有可能取值,可以通过对f 1中的每一个元素的所有可能取值进行组合数学的相关计算而获得。 In the foregoing process, all possible values of each element in the predicate vector f 1 have been generated. All possible values of the predicate vector f 1 can be obtained by performing related calculations of combinatorial mathematics on all possible values of each element in f 1 .
与前述的生成谓语向量f 1中的每一个元素的取值过程相似,有如下生成谓语向量f 2和f 3中的每一个元素的取值过程: Similar to the aforementioned process of generating each element in the predicate vector f 1 , there is the following process of generating each element in the predicate vector f 2 and f 3 :
②对于f 2有:{r 2}={didn’t bother};{c 2}={e},{l 2}={That,who,e},{x 2}={men,f 1,e},{y 2}={the liberals,f 3,e},{z 2}={e}。 ②For f 2 : {r 2 }={didn't bother}; {c 2 }={e}, {l 2 }={That,who,e}, {x 2 }={men,f 1 ,e}, {y 2 }={the liberals,f 3 ,e}, {z 2 }={e}.
③对于f 3有:{r 3}={wasn’t remarked};{c 3}={e},{l 3}={That,who,e},{x 3}={men,the liberals,f 1,f 2,e},{y 3}={the press,e},{z 3}={e}。 ③For f 3 : {r 3 }={wasn't remarked}; {c 3 }={e}, {l 3 }={That,who,e}, {x 3 }={men,the liberals ,f 1 ,f 2 ,e}, {y 3 }={the press,e}, {z 3 }={e}.
在生成了谓语向量f 2和f 3中的每一个元素的所有可能取值的基础之上,谓语向量f 2和f 3的所有可能取值,可以通过分别对f 2和f 3中的每一个元素的所有可能取值进行组合数学的相关计算而获得。 On the basis of generating all possible values of each element in the predicate vectors f 2 and f 3 , all possible values of the predicate vectors f 2 and f 3 can be obtained by comparing each of f 2 and f 3 respectively. All possible values of an element are obtained by related calculations of combinatorial mathematics.
综上可知:本例句共有3个谓语动词单元,包含3个谓语元素,进而针对这3个谓语元素,生成对应的谓语向量f 1,f 2,f 3;谓语向量f 1,f 2,f 3中的每一个元素的取值如下: In summary, this example sentence has 3 predicate verb units, including 3 predicate elements, and for these 3 predicate elements, corresponding predicate vectors f 1 , f 2 , f 3 are generated; predicate vectors f 1 , f 2 , f The value of each element in 3 is as follows:
①对于f 1有:{r 1}={were appointed};{c 1}={e},{l 1}={That,who,e},{x 1}={men,e},{y 1}={f 2,f 3,e},{z 1}={e}。 ①For f 1 there are: {r 1 }={were appointed}; {c 1 }={e}, {l 1 }={That,who,e}, {x 1 }={men,e}, { y 1 }={f 2 ,f 3 ,e}, {z 1 }={e}.
②对于f 2有:{r 2}={didn’t bother};{c 2}={e},{l 2}={That,who,e},{x 2}={men,f 1,e},{y 2}={the liberals,f 3,e},{z 2}={e}。 ②For f 2 : {r 2 }={didn't bother}; {c 2 }={e}, {l 2 }={That,who,e}, {x 2 }={men,f 1 ,e}, {y 2 }={the liberals,f 3 ,e}, {z 2 }={e}.
③对于f 3有:{r 3}={wasn’t remarked};{c 3}={e},{l 3}={That,who,e},{x 3}={men,the liberals,f 1,f 2,e},{y 3}={the press,e},{z 3}={e}。 ③For f 3 : {r 3 }={wasn't remarked}; {c 3 }={e}, {l 3 }={That,who,e}, {x 3 }={men,the liberals ,f 1 ,f 2 ,e}, {y 3 }={the press,e}, {z 3 }={e}.
在生成了谓语向量f 1,f 2,f 3中的每一个元素的所有可能取值之后,这3个谓语向量各自的所有可能取值,可以通过分别对f 1,f 2,f 3中的每一个元素的所有可能取值进行组合数学的相关计算而获得。 After generating all possible values of each element in the predicate vectors f 1 , f 2 , and f 3 , all possible values of each of the three predicate vectors can be obtained by comparing f 1 , f 2 , and f 3 respectively All possible values of each element of is obtained by related calculations of combinatorial mathematics.
依据申请方案S3.2中的信息,本例句有三个谓语向量,则本例句的主干系统应该由一个3行6列的矩阵构成,其抽象的形式如下:According to the information in the application plan S3.2, this example sentence has three predicate vectors, the main system of this example sentence should be composed of a matrix with 3 rows and 6 columns, and its abstract form is as follows:
Figure PCTCN2019100638-appb-000007
Figure PCTCN2019100638-appb-000007
一个主干系统也就是一个A系统。将本例句对应的主干系统的全体记为{A};将集合{A}的基数记为∣A∣。将谓语向量f 1的所有可能取值的全体记为集合{f 1};将集合{f 1}的基数记为∣f 1∣。对其他各谓语向量和各元素,采取相同的处理。则运用组合数学中的乘法原理: A backbone system is also an A system. Denote the entire backbone system corresponding to this example as {A}; denote the cardinality of the set {A} as ∣A∣. The total of all possible values of the predicate vector f 1 is recorded as the set {f 1 }; the cardinality of the set {f 1 } is recorded as ∣f 1 ∣. The same treatment is adopted for other predicate vectors and elements. Then use the multiplication principle in combinatorics:
∣f 1∣=∣c 1∣×∣l 1∣×∣x 1∣×∣r 1∣×∣y 1∣×∣z 1∣=1×3×2×1×3×1=18 ∣f 1 ∣=∣c 1 ∣×∣l 1 ∣×∣x 1 ∣×∣r 1 ∣×∣y 1 ∣×∣z 1 ∣=1×3×2×1×3×1=18
∣f 2∣=∣c 2∣×∣l 2∣×∣x 2∣×∣r 2∣×∣y 2∣×∣z 2∣=1×3×3×1×3×1=27 ∣f 2 ∣=∣c 2 ∣×∣l 2 ∣×∣x 2 ∣×∣r 2 ∣×∣y 2 ∣×∣z 2 ∣=1×3×3×1×3×1=27
∣f 3∣=∣c 3∣×∣l 3∣×∣x 3∣×∣r 3∣×∣y 3∣×∣z 3∣=1×3×5×1×2×1=30 ∣f 3 ∣=∣c 3 ∣×∣l 3 ∣×∣x 3 ∣×∣r 3 ∣×∣y 3 ∣×∣z 3 ∣=1×3×5×1×2×1=30
从而:∣A∣=∣f 1∣×∣f 2∣×∣f 3∣=18×27×30=14580,总计生成14580个主干系统。 Thus: ∣A∣=∣f 1 ∣×∣f 2 ∣×∣f 3 ∣=18×27×30=14580, a total of 14580 backbone systems are generated.
上述过程,可以依据申请方案中的权利要求5加以简化,将主干系统的生成和检查同步执行,从而降低计算的复杂度。The above process can be simplified according to claim 5 in the application solution, and the generation and checking of the backbone system can be executed simultaneously, thereby reducing the complexity of calculation.
在前述生成的主干系统中,也就是在前述生成的14580个3行6列矩阵之中,任取5个矩阵,按照申请方案中的从S3.3.1至S3.3.4的要求对这5个矩阵进行检查。为便于表述,本专利申请的发明人直接将前述任取的5个矩阵替换为编号,编号都与词语列表(ii-a)相对应,在替换编号的时候,空单元e保持不变,如下所示。In the backbone system generated above, that is, among the 14580 matrices with 3 rows and 6 columns generated above, 5 matrices are randomly selected, and the 5 matrices are determined according to the requirements from S3.3.1 to S3.3.4 in the application plan. Check. For ease of presentation, the inventor of this patent application directly replaced any of the five matrices previously selected with numbers, and the numbers correspond to the word list (ii-a). When replacing numbers, the empty cell e remains unchanged, as follows Shown.
第1个矩阵:The first matrix:
Figure PCTCN2019100638-appb-000008
Figure PCTCN2019100638-appb-000008
第2个矩阵:The second matrix:
Figure PCTCN2019100638-appb-000009
Figure PCTCN2019100638-appb-000009
第3个矩阵:The third matrix:
Figure PCTCN2019100638-appb-000010
Figure PCTCN2019100638-appb-000010
第4个矩阵:The fourth matrix:
Figure PCTCN2019100638-appb-000011
Figure PCTCN2019100638-appb-000011
第5个矩阵:The fifth matrix:
Figure PCTCN2019100638-appb-000012
Figure PCTCN2019100638-appb-000012
依据申请方案S3.3.1中的要求,检查前述的第1个矩阵:该矩阵漏掉了从属关联词单元who,who的编号是3。该矩阵不合理,即该主干系统不合理,清除该主干系统;According to the requirements in the application scheme S3.3.1, check the first matrix mentioned above: the matrix omits the subordinate related word unit who, whose number is 3. The matrix is unreasonable, that is, the backbone system is unreasonable, clear the backbone system;
依据申请方案S3.3.2中的要求,检查前述的第2个矩阵:在该矩阵中,在f 1和f 2两个不同的谓语向量中分别出现了相同的编号2,该矩阵不合理,即该主干系统不合理,清除该主干系统; According to the requirements in the application plan S3.3.2, check the aforementioned second matrix: In this matrix, the same number 2 appears in two different predicate vectors f 1 and f 2 respectively. The matrix is unreasonable, namely The backbone system is unreasonable, clear the backbone system;
依据申请方案S3.3.2中的要求,检查前述的第3个矩阵:在该矩阵中,在f 2和f 3两个不同的谓语向量中分别出现了相同的谓语向量f 1,该矩阵不合理,即该主干系统不合理,清除该主干系统; According to the requirements in the application plan S3.3.2, check the aforementioned third matrix: In this matrix, the same predicate vector f 1 appears in two different predicate vectors f 2 and f 3 , and the matrix is unreasonable , That is, the backbone system is unreasonable, clear the backbone system;
依据申请方案S3.3.3中的要求,再次检查前述的第2个矩阵:在该矩阵中,谓语向量f 1内部出现两个顺序逆反的编号3和2,该矩阵不合理,即该主干系统不合理,清除该主干系统;显然,第2个矩阵两次违反了申请方案中的要求。 According to the requirements in the application scheme S3.3.3, check the aforementioned second matrix again: in this matrix, two numbers 3 and 2 appear in the predicate vector f 1 in reverse order. The matrix is unreasonable, that is, the backbone system is not Reasonable, clear the backbone system; obviously, the second matrix violated the requirements in the application plan twice.
依据申请方案S3.3.4中的要求,检查前述的第4个矩阵:在该矩阵中,谓语向量f 2内部出现f 3,而谓语向量f 3内部也出现f 2,这样就无法将f 2=e+<1+<2+<5+<f 3+<e代入f 3中,也无法将f 3=e+<e+<f 2+<7+<e+<e代入f 2中,这是代入交叉矛盾。该矩阵不合理,即该主干系统不合理,清除该主干系统。 Application programs in accordance with the requirements of S3.3.4, the fourth check matrix: the matrix, occurs inside predicate vector f 2 f 3, f 3 and predicate vector f 2 also appears inside, which makes it impossible to f 2 = e+<1+<2+<5+<f 3 +<e is substituted into f 3 , and f 3 =e+<e+<f 2 +<7+<e+<e is substituted into f 2 , which is a substitution cross contradiction. The matrix is unreasonable, that is, the backbone system is unreasonable, clear the backbone system.
依据申请方案中的上述要求,检查前述的第5个矩阵:第5个矩阵没有违反申请方案S3.3中的任何一条要求。因此,前述的第5个矩阵,是一个规范主干系统,或者说是一个规范A系统。在前述生成的14580个3行6列矩阵之中,还有其他的规范主干系统,不一一列举。将前述的第5个矩阵,记为规范A 1系统。将规范A 1系统还原为如下形态: According to the above requirements in the application plan, check the aforementioned fifth matrix: the fifth matrix does not violate any requirement in the application plan S3.3. Therefore, the aforementioned fifth matrix is a canonical backbone system, or a canonical A system. Among the 14,580 3-row and 6-column matrices generated above, there are other standardized backbone systems, which are not listed one by one. Record the fifth matrix mentioned above as the standard A 1 system. Restore the standard A 1 system to the following form:
Figure PCTCN2019100638-appb-000013
Figure PCTCN2019100638-appb-000013
依据申请方案的S3.3.5,检查过后,恢复到检查之前的原状,以备后用。According to S3.3.5 of the application plan, after the inspection, it is restored to the original state before the inspection for future use.
依据申请方案S4中的信息,本例句只有一个介词单元by,针对介词单元by,生成一个对应的辅助向量g[PREP](u)。According to the information in application S4, this example sentence has only one preposition unit by, and for the preposition unit by, a corresponding auxiliary vector g[PREP](u) is generated.
依据申请方案S4中的信息,显然:According to the information in Application S4, it is obvious that:
g[PREP](u)=by+<(u):PREP=by,u={the press,e}。g[PREP](u)=by+<(u): PREP=by, u={the press,e}.
g[PREP](u)=by+<(u)的所有可能取值是:集合{by+<the press,by+<e}。All possible values of g[PREP](u)=by+<(u) are: set {by+<the press, by+<e}.
依据申请方案S5中的信息,可知:由前述的集合{by+<the press,by+<e}可以获得两个辅助系统,也就是获得两个B系统,分别记为记为B 1系统和B 2系统;不妨设B 1={g[PREP](u)=by+<the press},B 2={g[PREP](u)=by+<e}。 According to the information in the application plan S5, it can be seen that from the aforementioned set {by+<the press,by+<e}, two auxiliary systems can be obtained, that is, two B systems can be obtained, denoted as B 1 system and B 2 System; might as well set B 1 ={g[PREP](u)=by+<the press}, B 2 ={g[PREP](u)=by+<e}.
现在,给定前述的规范A 1系统,后续的操作与规范A 1系统保持一致。 Now, given the aforementioned standard A 1 system, the subsequent operations are consistent with the standard A 1 system.
将前述的B 1和B 2系统都替换成编号: Replace the aforementioned B 1 and B 2 systems with numbers:
B 1={g[PREP](u)=8+<9},B 2={g[PREP](u)=8+<e}。 B 1 ={g[PREP](u)=8+<9}, B 2 ={g[PREP](u)=8+<e}.
经检查,B 1系统和B 2系统符合申请方案中的从S6.1至S6.5的各项要求。由于B 1系统和B 2系统的结构都比较简单,因此容易验证,不做过多说明。 After inspection, the B 1 system and B 2 system meet the requirements from S6.1 to S6.5 in the application plan. Since the structures of the B 1 system and the B 2 system are relatively simple, they are easy to verify and do not elaborate.
由此可知,在给定前述的规范A 1系统的前提之下,B 1系统和B 2系统都是规范辅助系统。可将B 1系统和B 2系统进一步记为规范B 1系统和规范B 2系统。 It can be seen that, given the aforementioned standard A 1 system, the B 1 system and the B 2 system are both standard auxiliary systems. The B 1 system and the B 2 system can be further denoted as the standard B 1 system and the standard B 2 system.
依据申请方案的S6.6,检查过后,恢复到检查之前的原状,以备后用。According to S6.6 of the application plan, after the inspection, it is restored to the original state before the inspection for future use.
生成C系统和A-B-C联合系统:Generate C system and A-B-C joint system:
将前述的规范A 1系统和规范B 1系统搭配在一起,没有产生对应的剩余名词,则对应的剩余名词系统记为
Figure PCTCN2019100638-appb-000014
将前述的规范A 1系统和规范B 2系统搭配在一起,产生一个对应的剩余名词系统,记为C 2系统,C 2={the press}。
Combine the aforementioned canonical A 1 system and canonical B 1 system together, and no corresponding residual nouns are generated, then the corresponding residual noun system is recorded as
Figure PCTCN2019100638-appb-000014
Combine the aforementioned canonical A 1 system and canonical B 2 system to produce a corresponding residual noun system, denoted as C 2 system, C 2 ={the press}.
至此,获得两个A-B-C联合系统:A 1-B 1-C 1联合系统和A 1-B 2-C 2联合系统。 So far, two ABC joint systems have been obtained: A 1 -B 1 -C 1 joint system and A 1 -B 2 -C 2 joint system.
接下来,取A 1-B 1-C 1联合系统进行整体插空操作。A 1-B 1-C 1联合系统如下: Next, take the A 1 -B 1 -C 1 joint system for the overall plug-in operation. The A 1 -B 1 -C 1 combined system is as follows:
Figure PCTCN2019100638-appb-000015
Figure PCTCN2019100638-appb-000015
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000016
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000016
上述A 1-B 1-C 1联合系统中的向量,都是等量代换之前的第I类向量。通过等量代换,将A 1-B 1-C 1联合系统中的第I类向量全都转换成第II类向量,如下所示: The vectors in the above A 1 -B 1 -C 1 joint system are all the type I vectors before the equivalent substitution. Through equivalent substitution, all the type I vectors in the A 1 -B 1 -C 1 joint system are converted into type II vectors, as shown below:
Figure PCTCN2019100638-appb-000017
Figure PCTCN2019100638-appb-000017
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000018
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000018
清除空单元e之后,A 1-B 1-C 1联合系统中的全部第II类向量,如下所示: After clearing the empty cell e, all the type II vectors in the A 1 -B 1 -C 1 joint system are as follows:
Figure PCTCN2019100638-appb-000019
Figure PCTCN2019100638-appb-000019
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000020
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000020
第1种整体插空方法:The first method of overall insertion:
下面,开始进行整体插空操作,如图24所示。取图中所示的两个向量,分别作为第一轮整体插空操作的接收向量和插入向量,将接收向量记为ω,将插入向量记为μ。取右侧作为第一侧,从右至左逐一标注向量ω中的每一个元素的顺序值。标注顺序值之后,取向量ω中的第2个元素,仅在该元素的右侧构造唯一的空位。以整体插空的方式,将向量μ插入第2个元素对应的空位,进而生成一个新的向量。Next, start the overall plug-in operation, as shown in Figure 24. Take the two vectors shown in the figure as the receiving vector and the insertion vector of the first round of the overall nulling operation, and mark the receiving vector as ω and the insertion vector as μ. Take the right side as the first side, and label the order value of each element in the vector ω one by one from right to left. After labeling the order value, take the second element in the vector ω, and construct a unique space only on the right side of the element. Insert the vector μ into the space corresponding to the second element in the way of overall insertion, and then generate a new vector.
前述的新生成的向量,如下所示。该向量是A 1-B 1-C 1联合系统中经过整体插空操作而获得的一个第III类向量,将这个新生成的向量记为[ω] 2+<μ,第一轮整体插空操作完毕。 The aforementioned newly generated vector is shown below. This vector is a type III vector obtained through the overall blanking operation in the A 1 -B 1 -C 1 joint system, and this newly generated vector is marked as [ω] 2 +<μ, the first round of overall blanking The operation is complete.
That men didn’t bother the liberals who were appointed wasn’t remarkedThat men didn’t other the liberals who were appointed wasn’t remarked
如图25所示,取新生成的向量[ω] 2+<μ作为第二轮整体插空操作的接收向量。对从向量[ω] 2+<μ中的右侧第一个元素开始直到向量[ω] 2+<μ包含的向量μ内部的左侧第一个元素who为止的每一个元素,标注顺序值;向量[ω] 2+<μ中的其余元素,全都不标注顺序值。取向量[ω] 2+<μ中已经标注顺序值的第3个元素,仅在该元素的右侧构造唯一的空位。造空之后,取介词向量g[PREP](u)=by+<the press作为第二轮整体插空操作的插入向量,将该插入向量记为ξ。以整体插空的方式将向量ξ插入前述的第3个元素对应的空位,进而生成一个新的向量。 As shown in Figure 25, the newly generated vector [ω] 2 +<μ is taken as the reception vector for the second round of the overall blanking operation. From the vector of [ω] 2 + <[mu] the first element on the right side until the start vector [ω] 2 + <first element of each element of the vector μ μ left inside contains up who, denoted by order value ; The rest of the elements in the vector [ω] 2 +<μ are not marked with order values. Take the third element in the vector [ω] 2 +<μ that has been marked with the order value, and only construct a unique space on the right side of the element. After making the empty space, take the preposition vector g[PREP](u)=by+<the press as the insertion vector for the second round of the overall empty insertion operation, and record the insertion vector as ξ. Insert the vector ξ into the space corresponding to the aforementioned third element in the way of overall insertion, and then generate a new vector.
前述的新生成的向量,如下所示。该向量是A 1-B 1-C 1联合系统中经过整体插空操作而获得的一个第III类向量,同时也是一个拼合向量。将该向量记为[[ω] 2\μ] 3+<ξ。That men didn’t bother the liberals who by the press were appointed wasn’t remarked The aforementioned newly generated vector is shown below. This vector is a type III vector obtained through the overall void operation in the A 1 -B 1 -C 1 joint system, and it is also a combined vector. Denote this vector as [[ω] 2 \μ] 3 +<ξ. That men didn't bother the liberals who by the press were appointed wasn't remarked
将前述的拼合向量替换成编号如下。经检查,该向量内部出现了顺序逆反的编号。显然该拼合向量不合理,清除该拼合向量。Replace the aforementioned merged vector with the number as follows. After inspection, the number in the reverse order appeared in the vector. Obviously the merged vector is unreasonable, so clear the merged vector.
1 2 5 6 3 8 9 4 71 2 5 6 3 8 9 4 7
重新进行第一轮整体插空操作,如图26所示。取前文所述的两个向量ω和μ,分别作为第一轮整体插空操作的接收向量和插入向量。取右侧作为第一侧,从右至左逐一标注向量ω中的每一个元素的顺序值。标注顺序值之后,取前述的向量ω中的第4个元素,仅在该元素的右侧构造唯一的空位。以整体插空的方式,将前述的向量μ插入前述第4个元素对应的空位,进而生成一个新的向量,该向量是A 1-B 1-C 1联合系统中经过整体插空操作而获得的一个第III类向量,将这个新生成的向量记为[ω] 4+<μ,第一轮整体插空操作完毕。 Perform the first round of the overall insertion operation again, as shown in Figure 26. Take the two vectors ω and μ mentioned above, respectively, as the reception vector and the insertion vector of the first round of the overall blanking operation. Take the right side as the first side, and label the order value of each element in the vector ω one by one from right to left. After the order value is marked, the fourth element in the aforementioned vector ω is taken, and a unique space is constructed only on the right side of the element. Insert the aforementioned vector μ into the space corresponding to the aforementioned fourth element in the way of overall insertion, and then generate a new vector, which is obtained by the overall insertion operation in the A 1 -B 1 -C 1 joint system A type III vector of, mark this newly generated vector as [ω] 4 +<μ, and the first round of overall interpolation is completed.
重新进行第二轮整体插空操作,如图27所示。取新生成的向量[ω] 4+<μ作为第二轮整体插空操作的接收向量。对从向量[ω] 4+<μ中的右侧第一个元素开始直到向量[ω] 4+<μ包含的向量μ内部的左侧第一个元素who为止的每一个元素,标注顺序值;向量[ω] 4+<μ中的其 余元素,全都不标注顺序值。取向量[ω] 4+<μ中已经标注顺序值的第1个元素,仅在该元素的右侧构造唯一的空位。造空之后,取介词向量g[PREP](u)=by+<the press作为第二轮整体插空操作的插入向量,将该插入向量记为ξ。以整体插空的方式将向量ξ插入前述的第1个元素对应的空位,进而生成一个新的向量。 Perform the second round of the overall insertion operation again, as shown in Figure 27. Take the newly generated vector [ω] 4 +<μ as the receiving vector for the second round of the overall nulling operation. From the vector of [ω] 4 + <a first element on the right side until the start of the vector [mu] [ω] 4 + <a first element of each element of the vector μ μ left inside contains up who, denoted by order value ; The rest of the elements in the vector [ω] 4 +<μ are not marked with order values. Take the first element in the vector [ω] 4 +<μ that has an order value, and only construct a unique space on the right side of the element. After making the empty space, take the preposition vector g[PREP](u)=by+<the press as the insertion vector for the second round of the overall empty insertion operation, and record the insertion vector as ξ. Insert the vector ξ into the space corresponding to the first element mentioned above in the way of overall insertion, and then generate a new vector.
前述的新生成的向量,如下所示。该向量是A 1-B 1-C 1联合系统中经过整体插空操作而获得的一个第III类向量,同时也是一个拼合向量。将该向量记为[[ω] 4\μ] 1+<ξ。That men who were appointed didn’t bother the liberals wasn’t remarked by the press The aforementioned newly generated vector is shown below. This vector is a type III vector obtained through the overall void operation in the A 1 -B 1 -C 1 joint system, and it is also a combined vector. Denote this vector as [[ω] 4 \μ] 1 +<ξ. That men who were appointed didn't bother the liberals wasn't remarked by the press
将前述的拼合向量替换成编号如下。经检查,该向量内部没有出现顺序逆反的编号。该拼合向量是合理的,保留该拼合向量,并保留A 1-B 1-C 1联合系统,等待后续的操作。 Replace the aforementioned merged vector with the number as follows. After inspection, there is no serial number in the vector. The merged vector is reasonable, keep the merged vector, and keep the A 1 -B 1 -C 1 joint system, and wait for subsequent operations.
1 2 3 4 5 6 7 8 91 2 3 4 5 6 7 8 9
上述的整体插空操作对应的插空方案是:ω→μ→ξ。The above-mentioned overall blanking operation corresponds to the blanking scheme: ω→μ→ξ.
至于后续的穷尽上述的插空方案所涉及到的每一轮插空操作中的每一个接收向量内部的每一个元素对应的空位,即穷尽上述的插空方案所涉及到的每一个拼合向量,可以模仿前述的操作,不一一列举。As for the subsequent exhaustion of the space corresponding to each element in each receiving vector in each round of the emptying operation involved in the above-mentioned emptying scheme, that is, exhausting every merged vector involved in the above-mentioned emptying scheme, You can imitate the aforementioned operations, not to list them all.
综上所述,通过A 1-B 1-C 1联合系统,获得了例句1的大致句法结构,即,A 1-B 1-C 1联合系统刻画了例句1的句法结构的基本框架。如图28所示。 In summary, through the A 1 -B 1 -C 1 joint system, the general syntactic structure of Example 1 is obtained, that is, the A 1 -B 1 -C 1 joint system describes the basic framework of the syntactic structure of Example 1. As shown in Figure 28.
穷尽任意一个A-B-C联合系统对应的全部插空方案:Exhaust all the plug-in solutions corresponding to any A-B-C joint system:
例如:前述的A 1-B 1-C 1联合系统,包含3个第II类向量ω,μ,ξ;对前述的3个第II类向量,按照组合数学中的排列公式
Figure PCTCN2019100638-appb-000021
进行计算,获得A 1-B 1-C 1联合系统对应的全部插空方案如下:ω→μ→ξ(方案1),μ→ω→ξ(方案3),ξ→μ→ω(方案5),
For example: the aforementioned A 1 -B 1 -C 1 joint system contains 3 type II vectors ω, μ, ξ; for the aforementioned 3 type II vectors, follow the permutation formula in combinatorics
Figure PCTCN2019100638-appb-000021
Perform calculations to obtain all the blanking schemes corresponding to the A 1 -B 1 -C 1 joint system as follows: ω→μ→ξ (plan 1), μ→ω→ξ (plan 3), ξ→μ→ω (plan 5) ),
ω→ξ→μ(方案2),μ→ξ→ω(方案4),ξ→ω→μ(方案6)。ω→ξ→μ (plan 2), μ→ξ→ω (plan 4), ξ→ω→μ (plan 6).
后续的各种穷尽:The subsequent exhaustion:
至于后续的穷尽每一张词语列表(ii)对应的全部A-B-C联合系统、穷尽每一个A-B-C联合系统对应的全部插空方案和全部拼合向量,可以运用组合数学中的乘法原理和排列组合等相关计算方法,按照前述的操作,逐步穷尽,不一一列举。As for the subsequent exhaustion of all ABC joint systems corresponding to each word list (ii), exhaustion of all insertion schemes and all merging vectors corresponding to each ABC joint system, relevant calculations such as multiplication and permutation and combination in combinatorics can be used. The methods are gradually exhausted in accordance with the aforementioned operations, and will not be listed one by one.
第2种整体插空方法:The second method of overall insertion:
第2种整体插空方法,是在每一轮整体插空操作中,对接收向量内部的每一个元素全都标注顺序值,进而可以任取已经标注顺序值的元素,构造空位且执行插空操作。The second method of overall blanking is to mark every element in the receiving vector with a sequence value in each round of the overall blanking operation, and then you can take any element that has been marked with a sequence value, construct a gap and perform the blanking operation .
在第2种整体插空方法中,每一轮插空标注顺序值和选取空位是不受限制的;在第1种整体插空方法中,从第二轮整体插空开始往后的每一轮整体插空,都限制在不超过接收向量所包含的上一轮插空的插入向量的第二侧第一个元素的位置,标注顺序值和选取空位。当穷尽某一个联合系统对应的全体拼合向量之后,第1种整体插空方法不会产生雷同的拼合向量;第2种整体插空方法可能产生雷同结果,即产生雷同的拼合向量,将雷同结果合并为一个结果。第2种整体插空方法的操作过程,如图29和图30所示。In the second overall blanking method, there are no restrictions on the order value of each round of blank insertion and the selection of spaces; in the first overall blanking method, every subsequent round from the second round of overall blanking The overall round insertion is limited to the position of the first element on the second side of the previous round insertion vector contained in the received vector, the order value is marked and the space selected. When all the stitching vectors corresponding to a certain joint system are exhausted, the first overall interpolation method will not produce the same stitching vector; the second overall inserting method may produce the same result, that is, the same stitching vector will be generated, and the result will be the same Combine into one result. The operation process of the second overall plug-in method is shown in Figure 29 and Figure 30.
对第1种整体插空方法和第2种整体插空方法的优化:Optimization of the first overall interpolation method and the second overall interpolation method:
上述过程,还可以依据申请方案的S8.6进一步优化。申请方案的S8.6,是对申请方案的从S8.2到S8.5步骤的优化,即对前述的第1种整体插空方法和第2种整体插空方法的优化。The above process can be further optimized according to S8.6 of the application plan. S8.6 of the application plan is the optimization of the steps from S8.2 to S8.5 of the application plan, that is, the optimization of the first and second overall insertion methods mentioned above.
依据申请方案的S8.6,在执行完S8.1的等量代换操作之后,将A 1-B 1-C 1联合系统中的 每一个第II类向量全都替换成对应的编号,如下所示: According to S8.6 of the application plan, after the equivalent substitution operation of S8.1 is performed, each type II vector in the A 1 -B 1 -C 1 joint system is replaced with a corresponding number, as shown below Show:
That men didn’t bother the liberals wasn’t remarked:1 2 5 6 7That men didn’t other the liberals wasn’t remarked: 1 2 5 6 7
who were appointed:3 4 by the press:8 9who were appointed: 3 4 by the press: 8 9
下面,仅以第一轮整体插空操作为例,对申请方案S8.6的优化方法加以说明。该优化方法,对前述的第1种和第2种整体插空方法全都适用。In the following, only the first round of the overall plug-in operation is taken as an example to illustrate the optimization method of the application scheme S8.6. This optimization method is applicable to all the aforementioned first and second overall inserting methods.
现在,给定一个A 1-B 1-C 1联合系统对应的插空方案,假设在该插空方案对应的第一轮整体插空操作中,向量(1 2 5 6 7)是接收向量,向量(3 4)是插入向量。取右侧作为第一侧,在接收向量内部的每一个元素的右侧全都构造一个空位,如下所示: Now, given a blanking scheme corresponding to the A 1 -B 1 -C 1 joint system, suppose that in the first round of the overall blanking operation corresponding to the blanking scheme, the vector (1 2 5 6 7) is the receiving vector, The vector (3 4) is the insertion vector. Take the right side as the first side, and construct a space on the right side of each element inside the receiving vector, as shown below:
1_______ 2_______ 5_______ 6_______ 7_______1_______ 2_______ 5_______ 6_______ 7_______
开始筛选合理空位:因为7>3,编号有序组(3 4)不能插空在编号7对应的空位,该空位无插空操作;因为6>3,编号有序组(3 4)不能插空在6对应的空位,该空位无插空操作;因为5>3,编号有序组(3 4)不能插空在编号5对应的空位,该空位无插空操作;因为2<3且4<5,编号有序组(3 4)可以插空在编号2对应的空位,对该空位进行插空操作,如图31所示。Start to filter reasonable slots: because 7>3, the numbered group (3 4) cannot be inserted into the slot corresponding to number 7, and there is no empty operation for this slot; because 6>3, the numbered group (3 4) cannot be inserted If the vacancy is in the vacancy corresponding to 6, the vacancy does not have an insert operation; because 5>3, the numbered group (3 4) cannot be vacant in the vacancy corresponding to the number 5, and the vacancy does not have an insert operation; because 2<3 and 4 <5, the numbered ordered group (3 4) can be inserted into the slot corresponding to number 2, and the slot can be inserted into the slot, as shown in Figure 31.
整体插空操作和检查拼合向量是否合理之后的步骤:Steps after the overall blanking operation and checking whether the combined vector is reasonable:
接下来,对于能够生成合理的拼合向量的规范A系统,也就是能够生成合理的拼合向量的矩阵,采用概率结合句法规则的方法或依存分析方法,进行句法规则检查。不妨构造一个句法规则集合,该集合中包含有限多条句法规则。这个句法规则集合,还可以用于后面提到的句法结构修补程序。所述的句法规则集合,包括但不限于如下的句法规则:Next, for the norm A system that can generate a reasonable merging vector, that is, a matrix that can generate a reasonable merging vector, use the method of probability combined with syntax rules or dependency analysis to check the syntax rules. Consider constructing a set of syntactic rules, which contains a limited number of syntactic rules. This set of syntactic rules can also be used in the syntactic structure fixes mentioned later. The set of syntactic rules includes but not limited to the following syntactic rules:
①在英语中,除非主语从句由左右引号围住,否则引导主语从句的从属连词that不可以省略;进一步地,除非主语从句由左右引号围住,否则任何引出主语从句的引导词,都不可以省略,体现在矩阵结构上:如果矩阵中的某一个x i元素是由某一个谓语向量f j充当的,那么前述的f j中的l j元素不能是空单元,即l j≠e。 ① In English, unless the subject clause is surrounded by left and right quotation marks, the subordinating conjunction that of the leading subject clause cannot be omitted; further, unless the subject clause is surrounded by left and right quotation marks, any leading word that leads to the subject clause cannot be omitted. The omission is reflected in the matrix structure: if a certain x i element in the matrix is served by a certain predicate vector f j , then the l j element in the aforementioned f j cannot be an empty unit, that is, l j ≠e.
②在英语中,在不包含特殊句法现象的前提之下,如果谓语是被动语态,那么该谓语不能有对应的第二位置宾语。体现在矩阵结构上:如果矩阵中的某一个r i元素是被动语态,那么与前述的r i对应的z i必须是空单元,即z i=e。 ② In English, if the predicate is in the passive voice, on the premise that it does not include special syntactic phenomena, then the predicate cannot have a corresponding second-position object. Reflected in the matrix structure: If one of the elements r i is the passive matrix, then the aforementioned r i z i corresponding to the unit must be empty, i.e., z i = e.
③在英语中,在不包含特殊句法现象的前提之下,如果谓语是被动语态,且谓语是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么该谓语既不能有对应的第一位置宾语,又不能有对应的第二位置宾语。体现在矩阵结构上:如果矩阵中的某一个r i元素是被动语态,且r i元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么与前述的r i对应的y i和z i都必须是空单元,即y i=e且z i=e。 ③In English, on the premise that no special syntactic phenomenon is included, if the predicate is in the passive voice, and the predicate is a unit composed of verbs that can neither accept a double object nor an object combined with an object complement, then the predicate is both There can be no corresponding first-position object and no corresponding second-position object. Reflected in the matrix structure: if a certain r i element in the matrix is in the passive voice, and the r i element is a unit composed of a verb that can neither accept a double object nor an object combined with an object complement, then it is the same as the aforementioned r Both y i and z i corresponding to i must be empty units, that is, y i =e and z i =e.
④在英语中,在不包含特殊句法现象的前提之下,主语和谓语在单数和复数概念上要保持一致;虽然英语中有一些单数和复数形态相同的名词,会干扰对前述问题的判断,但是这些名词可以通过查询词典或统计的方式预先归纳并给出。主语和谓语在单复数上要保持一致的规则,在矩阵结构上容易处理。④In English, subject and predicate must be consistent in the singular and plural concepts without including special syntactic phenomena; although there are some nouns with the same singular and plural forms in English, they will interfere with the judgment of the aforementioned problems. But these nouns can be summarized and given in advance by querying the dictionary or statistics. Subject and predicate must maintain the same rules in singular and plural, which is easy to handle in matrix structure.
⑤在英语中,大多数的介词,例如:in,on,at,to,with,for,about等,其后不可以接that引导的或省略that的宾语从句;少数介词,例如:except,besides,but等,其后可以接that引导的或省略that的宾语从句。⑤In English, most of the prepositions, such as in, on, at, to, with, for, about, etc., cannot be followed by the object clauses that lead or omit that; a few prepositions, such as except, besides , But, etc., can be followed by that guided or omitted object clauses of that.
⑥运用事件宾语动词和非事件宾语动词的规则进行检查;本专利申请中的事件宾语动词,是指自然语言中的只能以事件作为宾语而不能以人或事物作为宾语的动词;本专利申请中的非 事件宾语动词,是指自然语言中的只能以人或事物作为宾语而不能以事件作为宾语的动词,例如:英语中的bother,是一个典型的非事件宾语动词,可接人或事物作为宾语,但不可以接that引导的或省略that的宾语从句;事件宾语动词和非事件宾语动词,可以通过查询词典或统计的方式预先归纳并给出;事件宾语动词和非事件宾语动词的概念,对于计算机的自然语言句法分析有重要作用;本专利申请将事件宾语动词和非事件宾语动词也列为一条句法规则,按照该规则进行检查。⑥ Use the rules of event object verbs and non-event object verbs to check; the event object verbs in this patent application refer to verbs in natural language that can only use events as objects but not people or things as objects; this patent application The non-event object verb in, refers to a verb in natural language that can only use people or things as objects but not events as objects. For example, bother in English is a typical non-event object verb, which can accept people or Things are used as objects, but object clauses that lead or omit that are not allowed; event object verbs and non-event object verbs can be summarized and given in advance by querying the dictionary or statistics; event object verbs and non-event object verbs The concept plays an important role in the computer's natural language syntactic analysis; this patent application also lists event object verbs and non-event object verbs as a syntactic rule, and checks are performed according to this rule.
⑦英语中的某些特殊句法现象,倒装句或省略句,等等,不一一列举。⑦Some special syntactic phenomena in English, inverted sentences or omitted sentences, etc., are not listed one by one.
接下来,取另一个生成了合理的拼合向量的A 2-B 1-C 21联合系统,如下所示。在对该联合系统进行句法规则检查时,发现其x=the liberals和r=wasn’t remarked违反了前述的句法规则④的要求。舍去A 2-B 1-C 21联合系统。 Next, take another A 2 -B 1 -C 21 joint system that generates a reasonable combined vector, as shown below. When checking the syntactic rules of the joint system, it was found that x=the liberals and r=wasn't remarked violated the requirements of the aforementioned syntactic rule ④. A 2 -B 1 -C 21 combined system is discarded.
Figure PCTCN2019100638-appb-000022
Figure PCTCN2019100638-appb-000022
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000023
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000023
接下来,再取另一个生成了合理的拼合向量的A 3-B 1-C 31联合系统,如下所示。在对该联合系统进行句法规则检查时,发现其x=the liberals和r=wasn’t remarked违反了前述的句法规则④的要求,同时r=didn’t bother和y=f 3违反了前述的句法规则⑥的要求。A 3-B 1-C 31联合系统有两处违反了前述的句法规则,舍去A 3-B 1-C 31联合系统。 Next, take another A 3 -B 1 -C 31 joint system that generates a reasonable combined vector, as shown below. When checking the syntactic rules of the joint system, it was found that x=the liberals and r=wasn't remarked violated the requirements of the aforementioned syntactic rule ④, while r=didn't bother and y=f 3 violated the aforementioned The requirements of syntactic rules ⑥. The A 3 -B 1 -C 31 joint system violates the aforementioned syntax rules in two places, and the A 3 -B 1 -C 31 joint system is discarded.
Figure PCTCN2019100638-appb-000024
Figure PCTCN2019100638-appb-000024
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000025
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000025
特别说明:在前述的词语列表(ii-b)对应的任意一个A-B-C联合系统中,计算机最初会将有结构歧义的限定词单元That和基本名词单元men划在同一个语言片段中,即处理为That修饰men;但是That修饰men是一个明显的句法错误,在后续的句法规则检查环节中很容易被计算机识别并剔除,因为按照英文句法规则,That作为限定词不能修饰可数名词复数形式men。由此,词语列表(ii-b)生成的全部A-B-C联合系统都将被视为不合理的A-B-C联合系统而清除。Special note: In any ABC joint system corresponding to the aforementioned word list (ii-b), the computer initially divides the structurally ambiguous qualifier unit That and the basic noun unit men into the same language segment, which is processed as That modifies men; but That modifies men is an obvious syntactic error, which can be easily recognized and eliminated by the computer in the subsequent syntactic rule checking, because according to English syntax rules, That as a qualifier cannot modify the plural form of a countable noun, men. As a result, all A-B-C joint systems generated by the word list (ii-b) will be treated as unreasonable A-B-C joint systems and removed.
前述的A 1-B 2-C 2联合系统,如下所示。前述的A 1-B 2-C 2联合系统也生成了合理的拼合向量。对前述的A 1-B 2-C 2联合系统运行剩余名词检查程序,检查C 2系统的剩余名词the press是不是合理剩余名词。如果剩余名词the press是合理剩余名词,那么保留A 1-B 2-C 2联合系统;如果剩余名词the press是不合理的剩余名词,那么舍弃A 1-B 2-C 2联合系统。 The aforementioned A 1 -B 2 -C 2 combined system is as follows. The aforementioned A 1 -B 2 -C 2 joint system also generates a reasonable combined vector. Run the remaining noun checking program on the aforementioned A 1 -B 2 -C 2 joint system to check whether the remaining noun the press of the C 2 system is a reasonable remaining noun. If the remaining noun the press is a reasonable remaining noun, then the A 1 -B 2 -C 2 joint system is retained; if the remaining noun the press is an unreasonable remaining noun, then the A 1 -B 2 -C 2 joint system is discarded.
Figure PCTCN2019100638-appb-000026
Figure PCTCN2019100638-appb-000026
B 2={g[PREP](u)=by+<e};C 2={the press} B 2 ={g[PREP](u)=by+<e}; C 2 ={the press}
运用概率结合句法规则的方法或依存分析方法,进行剩余名词检查。例如:在英语中,同位语可以采用独立的名词,非谓语动词的独立主格结构可以采用独立的名词,搭配冒号的文章标题经常采用独立的名词,等等。如果运用概率结合句法规则的方法,那么前述的语言现象就是合理剩余名词对应的句法规则。在这些句法规则的基础之上,还可以在语料库内针对前述的句法规则进行专项统计,并计算出相应的概率。Use probability combined with syntactic rules or dependency analysis methods to check remaining nouns. For example, in English, appositions can use independent nouns, the independent nominative structure of non-predicate verbs can use independent nouns, and the title of articles with colons often use independent nouns, and so on. If the method of combining probability with syntactic rules is used, then the aforementioned linguistic phenomenon is the syntactic rule corresponding to reasonable residual nouns. On the basis of these syntactic rules, special statistics can also be made for the aforementioned syntactic rules in the corpus, and the corresponding probability can be calculated.
如果采用上文提到的概率结合句法规则的方法,容易检查出:C 2系统的剩余名词the press是不合理的剩余名词,因此,舍弃A 1-B 2-C 2联合系统。 If we use the method of probability and syntactic rules mentioned above, it is easy to check: the remaining noun the press of the C 2 system is an unreasonable remaining noun. Therefore, the A 1 -B 2 -C 2 joint system is discarded.
经过之前的各种处理,最后只有前述的A 1-B 1-C 1联合系统保留下来,其他的联合系统全都由于其自身存在的不合理因素而被舍弃。 After various previous treatments, only the aforementioned A 1 -B 1 -C 1 combined system remained, and all other combined systems were discarded due to their own unreasonable factors.
A 1-B 1-C 1联合系统,刻画了例句1的句法结构的基本框架,如图28所示。对照词语列表1,目前还缺少一个杂质成分upon。为了获得例句1的完整的句法分析结果,可以将前述获得的句法结构的基本框架与概率结合句法规则的方法或依存分析方法进行融合。具体地说,如果采用概率结合句法规则的方法,那么依据词语列表(i)给出的例句1的词法标记,按照概率的从大到小排序,获取与前述的句法结构的基本框架无冲突且概率最大的计算机分析结果。所述的概率结合句法规则的方法,包括但不限于:概率上下文无关文法,以及词汇化的概率上下文无关文法。 The A 1 -B 1 -C 1 joint system depicts the basic framework of the syntactic structure of Example 1, as shown in Figure 28. Compared with the word list 1, there is still an impurity ingredient missing at present. In order to obtain the complete syntactic analysis result of Example 1, the basic framework of the syntactic structure obtained above can be combined with the method of combining the probability with the syntactic rules or the method of dependency analysis. Specifically, if the method of combining probability with syntactic rules is adopted, then according to the lexical mark of example sentence 1 given in the word list (i), sorted in descending order of probability, the acquisition is not in conflict with the basic framework of the aforementioned syntactic structure. The most probable computer analysis result. The method of combining probability with syntactic rules includes, but is not limited to: probabilistic context-free grammar and lexicalized probabilistic context-free grammar.
例如:假设,按照词语列表(i)标定的例句1的词法标记,采用概率结合句法规则的方法,获取了计算机生成的10000个句法分析结果,并且将前述的结果按照概率的从大到小排序。其中,排在第1位至第19位的结果都与前述的A 1-B 1-C 1联合系统刻画的句法结构的基本框架有冲突,排在第20位的结果与前述的句法结构的基本框架无冲突,那么排在第20位的结果就是与前述的句法结构的基本框架无冲突且概率最大的计算机分析结果,将该结果作为最终正确结果。以计算机科学领域通用的字符串形式,将上述的若干个句法分析结果表达如下: For example: Suppose, according to the lexical mark of example sentence 1 calibrated in the word list (i), using the method of probability combined with syntactic rules, 10000 syntactic analysis results generated by the computer are obtained, and the aforementioned results are sorted in descending order of probability . Among them, the results ranked 1st to 19th all conflict with the basic framework of the syntactic structure described by the aforementioned A 1 -B 1 -C 1 joint system, and the result ranked 20th is incompatible with the aforementioned syntactic structure. The basic framework does not conflict, then the result ranked 20th is the computer analysis result that does not conflict with the basic framework of the aforementioned syntactic structure and has the highest probability, and this result is regarded as the final correct result. In the form of strings commonly used in the field of computer science, the results of several syntactic analysis above are expressed as follows:
1),(ROOT(S(NP(IN That)(NP(NNS men)(SBAR(WP who)(S(VBD were)(VBN appointed)))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(NP(DT the)(NNS liberals))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press)))))))))(..)))1), (ROOT(S(NP(IN That)(NP(NNSmen)(SBAR(WPwho)(S(VBDwere)(VBNappointed))))))(VP(VBDdid)(RBn't )(VP(VBbother)(NP(NP(DTthe)(NNSliberals))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon)(PP(INby) (NP(DT the)(NN press)))))))))(..)))
排名第1位的结果的概率为:0.00010738The probability of the result ranked 1st is: 0.00010738
2),(ROOT(S(IN That)(NP(NNS men)(SBAR(WP who)(S(VBD were)(VBN appointed))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(NP(DT the)(NNS liberals))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press)))))))))(..)))2),(ROOT(S(IN That)(NP(NNSmen)(SBAR(WPwho)(S(VBDwere)(VBNappointed)))))(VP(VBDdid)(RBn't)(VP (VBbother)(NP(NP(DTthe)(NNS liberals))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon)(PP(INby)(NP( DT the)(NN press))))))))))(..)))
排名第2位的结果的概率为:0.00010621The probability of the second place result is: 0.00010621
3),(ROOT(S(NP(IN That)(NP(NNS men)(SBAR(WP who)(S(VBD were)(VBN appointed)))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(NP(DT the)(NNS liberals))(VP(VBD was)(RB n't) (VP(VBN remarked)(PP(RP upon)(PP(IN by)(NP(DT the)(NN press)))))))))(..)))3), (ROOT(S(NP(IN That)(NP(NNSmen)(SBAR(WPwho)(S(VBDwere)(VBNappointed))))))(VP(VBDdid)(RBn't )(VP(VBbother)(NP(NP(DTthe)(NNSliberals))(VP(VBDwas)(RBn't) (VP(VBNremarked)(PP(RPupon)(PP(INby) (NP(DT the)(NN press)))))))))(..)))
排名第3位的结果的概率为:0.00010403The probability of the result ranked 3rd is: 0.00010403
Figure PCTCN2019100638-appb-000027
Figure PCTCN2019100638-appb-000027
20),(ROOT(S(NP(IN That)(NP(NP(NNS men)(SBAR(WP who)(S(VBD were)(VBN appointed))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(DT the)(NNS liberals))))))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press))))))(..)))20), (ROOT(S(NP(IN That)(NP(NP(NNSmen)(SBAR(WPwho)(S(VBDwere)(VBNappointed)))))(VP(VBDdid)(RBn' t)(VP(VBbother)(NP(DTthe)(NNS liberals))))))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon)(PP( IN by)(NP(DTthe)(NN press))))))(..)))
排名第20位的结果的概率为:0.00010196The probability of the result ranked 20th is: 0.00010196
综上,经过前述的一系列处理,得到例句1的句法分析结果。该结果是一个在英语语言学上可以认为正确的结果。以计算机科学领域通用的字符串形式,将该结果表达如下:[参见图3]In summary, after the aforementioned series of processing, the syntactic analysis result of Example 1 is obtained. The result is a result that can be considered correct in English linguistics. In the form of a string commonly used in the field of computer science, the result is expressed as follows: [See Figure 3]
注:图3是该字符串形式所对应的示意图,下文同理。Note: Figure 3 is a schematic diagram corresponding to the string form, and the same applies to the following.
(ROOT(S(NP(IN That)(NP(NP(NNS men)(SBAR(WP who)(S(VBD were)(VBN appointed))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(DT the)(NNS liberals))))))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press))))))(..)))(ROOT(S(NP(IN That)(NP(NP(NNSmen)(SBAR(WPwho)(S(VBDwere)(VBNappointed)))))(VP(VBDdid)(RBn't)( VP(VBbother)(NP(DTthe)(NNS liberals))))))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon)(PP(INby) (NP(DT the)(NN press))))))(..)))
在说明书的前半部分,提到了如下数学模型,将该数学模型记为Q模型。In the first half of the specification, the following mathematical model is mentioned, and this mathematical model is referred to as the Q model.
设S是一个英语句子,且S中至少存在如下3个主谓搭配(分别用6元函数表示):Suppose S is an English sentence, and there are at least the following three subject-predicate collocations in S (represented by 6-element functions):
f(c 1,l 1,x 1,r 1,y 1,z 1); f(c 1 ,l 1 ,x 1 ,r 1 ,y 1 ,z 1 );
g(c 2,l 2,x 2,r 2,y 2,z 2); g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 );
h(c 3,l 3,x 3,r 3,y 3,z 3)。 h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 ).
注:作为自变量下标的1、2、3只是为了互相区分,不代表实际的顺序含义。Note: 1, 2, and 3 as the subscripts of the independent variables are just for distinguishing each other, and do not represent the actual sequence meaning.
f,g,h满足如下三个条件:f, g, h meet the following three conditions:
①l 2=that; ①l 2 =that;
②f(c 1,l 1,g(c 2,l 2,x 2,r 2,y 2,z 2),r 1,y 1,z 1); ②f(c 1 ,l 1 ,g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 ),r 1 ,y 1 ,z 1 );
③g[h(c 3,l 3,x 3,r 3,y 3,z 3)]。 ③g[h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 )].
说明:f(c 1,l 1,g(c 2,l 2,x 2,r 2,y 2,z 2),r 1,y 1,z 1)的含义是,谓语向量g是谓语向量f的主语从句。g[h(c 3,l 3,x 3,r 3,y 3,z 3)]的含义是,谓语向量h以整体插空的方式插入谓语向量g的某一个位置。l 2=that的含义是,谓语向量g的引导词是that。相应地,Q模型的含义就是:谓语向量g是谓语向量f的主语从句,且谓语向量g的引导词是that,且谓语向量h以整体插空的方式插入谓语向量g的某一个位置。 Explanation: the meaning of f(c 1 ,l 1 ,g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 ),r 1 ,y 1 ,z 1 ) is that the predicate vector g is the predicate vector The subject clause of f. The meaning of g[h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 )] is that the predicate vector h is inserted into a certain position of the predicate vector g in a way of inserting the entire space. The meaning of l 2 =that is that the leading word of the predicate vector g is that. Correspondingly, the meaning of the Q model is: the predicate vector g is the subject clause of the predicate vector f, and the leading word of the predicate vector g is that, and the predicate vector h is inserted into a certain position of the predicate vector g in a way of overall insertion.
例句1符合上述的Q模型,验证如下,辅助成分和空单元e略去:Example 1 conforms to the above-mentioned Q model. The verification is as follows. The auxiliary components and the empty unit e are omitted:
f(c 1,l 1,x 1,r 1,y 1,z 1)=g(c 2,l 2,x 2,r 2,y 2,z 2)+<wasn’t+<remarked; f(c 1 ,l 1 ,x 1 ,r 1 ,y 1 ,z 1 )=g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 )+<wasn't+<remarked;
g(c 2,l 2,x 2,r 2,y 2,z 2)=That+<men+<didn’t+<bother+<the+<liberals; g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 )=That+<men+<didn't+<bother+<the+<liberals;
f(c 1,l 1,g(c 2,l 2,x 2,r 2,y 2,z 2),r 1,y 1,z 1)=(That+<men+<didn’t+<bother+<the+<liberals)+<wasn’t+<remarked; f(c 1 ,l 1 ,g(c 2 ,l 2 ,x 2 ,r 2 ,y 2 ,z 2 ),r 1 ,y 1 ,z 1 )=(That+<men+<didn't+<bother+<the+<liberals)+<wasn't+<remarked;
h(c 3,l 3,x 3,r 3,y 3,z 3)=who+<were+<appointed; h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 )=who+<were+<appointed;
g[h(c 3,l 3,x 3,r 3,y 3,z 3)]=That+<men+[who+<were+<appointed]+<didn’t+<bother+<the+<liberals。 g[h(c 3 ,l 3 ,x 3 ,r 3 ,y 3 ,z 3 )]=That+<men+[who+<were+<appointed]+<didn't+<bother+<the+<liberals.
需要特别指出的是:从数学角度看,凡是符合上述Q模型的英文语句,至本专利申请提交日——2019年3月22日,经常会被伯克利解析器(Berkeley Parser)和斯坦福解析器(Stanford Parser)解析出严重错误的结果!What needs to be pointed out is that from a mathematical point of view, all English sentences that conform to the above Q model will often be used by Berkeley Parser and Stanford Parser (Berkeley Parser) until the filing date of this patent application-March 22, 2019. Stanford Parser) parsed the seriously wrong result!
例2:That something you learned is wrong is known to the public.Example 2: That something you learned is wrong is known to the public.
本例句中的That产生结构歧义。但是由于篇幅有限,仅给出将That作为从属关联词单元进行预处理的词语列表(ii),如下所示。本例句中的形容词wrong充当从句的表语,是句子中的主干成分,但是为了便于计算机处理,按照申请方案的操作,形容词wrong在预处理环节暂时被去掉。可以在后续的句法结构修补环节中将从句的表语wrong修复。That in this example sentence creates structural ambiguity. However, due to limited space, only the word list (ii) that uses That as a subordinate related word unit for preprocessing is given, as shown below. The adjective wrong in this example sentence serves as the predicative of the clause and is the main component of the sentence. However, in order to facilitate computer processing, the adjective wrong is temporarily removed in the preprocessing step according to the operation of the application plan. The predicative wrong of the subordinate sentence can be repaired in the subsequent syntactic structure repair link.
Figure PCTCN2019100638-appb-000028
Figure PCTCN2019100638-appb-000028
依据本申请的方案,对于例句2,可以生成一个如下的A-B-C联合系统:According to the scheme of this application, for example sentence 2, the following A-B-C joint system can be generated:
Figure PCTCN2019100638-appb-000029
Figure PCTCN2019100638-appb-000029
B 1={to+<the public};
Figure PCTCN2019100638-appb-000030
B 1 ={to+<the public};
Figure PCTCN2019100638-appb-000030
通过上述A 1-B 1-C 1联合系统,获得了例句2的句法结构的基本框架,如图32所示。 Through the above A 1 -B 1 -C 1 joint system, the basic framework of the syntactic structure of example sentence 2 is obtained, as shown in Figure 32.
例句2的完整的句法分析结果,以字符串的形式表达如下:[参见图5]The complete syntactic analysis result of Example 2 is expressed as a string as follows: [See Figure 5]
(ROOT(S(SBAR(IN That)(S(NP(NN something)(SBAR(PRP you)(VBD learned)))(VP(VBZ is)(JJ wrong))))(VP(VBZ is)(VP(VBN known)(PP(TO to)(NP(DT the)(NN public)))))(..)))(ROOT(S(SBAR(IN That)(S(NP(NNsomething)(SBAR(PRPyou)(VBDlearned)))(VP(VBZis)(JJwrong))))(VP(VBZis)( VP(VBN known)(PP(TO to)(NP(DT the)(NN public)))))(..)))
特别说明:例句2也符合前文提到过的Q模型。至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!Special note: Example 2 also conforms to the Q model mentioned above. As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
例3:That that men were appointed didn't bother the liberals wasn't remarked upon by the press.Example 3: That that men were appointed didn't other the liberals wasn't remarked up by the press.
本例句中的2个that都产生结构歧义。但是由于篇幅有限,仅给出将2个that作为从属关联词单元进行预处理的词语列表(ii),如下所示:The two thats in this example have structural ambiguities. However, due to limited space, only the word list (ii) that uses 2 that as subordinate related word units for preprocessing is given, as shown below:
Figure PCTCN2019100638-appb-000031
Figure PCTCN2019100638-appb-000031
依据本申请的方案,对于例句3,可以生成一个如下的A-B-C联合系统:According to the scheme of this application, for example sentence 3, an A-B-C joint system can be generated as follows:
Figure PCTCN2019100638-appb-000032
Figure PCTCN2019100638-appb-000032
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000033
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000033
通过上述A 1-B 1-C 1联合系统,获得了例句3的句法结构的基本框架,如图33所示。 Through the above A 1 -B 1 -C 1 joint system, the basic framework of the syntactic structure of example sentence 3 is obtained, as shown in Figure 33.
例句3的完整的句法分析结果,以字符串的形式表达如下:[参见图8]The complete syntactic analysis result of Example 3 is expressed as a string as follows: [See Figure 8]
(ROOT(S(SBAR(IN That)(S(SBAR(IN that)(S(NP(NNS men))(VP(VBD were)(VBN appointed))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(DT the)(NNS liberals))))))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press))))))(..)))(ROOT(S(SBAR(IN That)(S(SBAR(IN That)(S(NP(NNSmen))(VP(VBDwere)(VBNappointed)))))(VP(VBDdid)(RBn' t)(VP(VBbother)(NP(DTthe)(NNS liberals))))))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon)(PP( IN by)(NP(DTthe)(NN press))))))(..)))
特别说明:至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!Special note: As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
例4:That that that men were appointed didn't bother the liberals wasn't remarked upon by the press upset many women.Example 4: That that that men were appointed didn't other the liberals wasn't remarked up by the press upset many women.
本例句中的3个that都产生结构歧义。但是由于篇幅有限,仅给出将3个that作为从属关联词单元进行预处理的词语列表(ii),如下所示:The three thats in this example all produce structural ambiguity. However, due to limited space, we only give a word list (ii) with 3 that as subordinate related word units for preprocessing, as shown below:
Figure PCTCN2019100638-appb-000034
Figure PCTCN2019100638-appb-000034
依据本申请的方案,对于例句4,可以生成一个如下的A-B-C联合系统:According to the scheme of this application, for example sentence 4, an A-B-C joint system can be generated as follows:
Figure PCTCN2019100638-appb-000035
Figure PCTCN2019100638-appb-000035
B 1={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000036
B 1 ={g[PREP](u)=by+<the press};
Figure PCTCN2019100638-appb-000036
通过上述A 1-B 1-C 1联合系统,获得了例句4的句法结构的基本框架,如图34所示。 Through the above A 1 -B 1 -C 1 joint system, the basic framework of the syntactic structure of example sentence 4 is obtained, as shown in Figure 34.
例句4的完整的句法分析结果,以字符串的形式表达如下:[参见图9]The complete syntactic analysis result of Example 4 is expressed as a string as follows: [See Figure 9]
(ROOT(S(SBAR(IN That)(S(SBAR(IN That)(S(SBAR(IN that)(S(NP(NNS men))(VP(VBD were)(VBN appointed))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(DT the)(NNS liberals))))))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press))))))))(VP(VBD upset)(NP(JJ many)(NNS women)))(..)))(ROOT(S(SBAR(IN That)(S(SBAR(IN That)(S(SBAR(IN That)(S(NP(NNSmen))(VP(VBDwere)(VBNappointed)))))(VP (VBDdid)(RBn't)(VP(VBbother)(NP(DTthe)(NNSliberals))))))(VP(VBDwas)(RBn't)(VP(VBNremarked)( ADVP(RPupon)(PP(INby)(NP(DTthe)(NNpress)))))))(VP(VBDupset)(NP(JJmany)(NNSwomen)))(..) ))
特别说明:至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!Special note: As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
对例3和例4的补充说明:在说明书的前半部分,曾经提到过例3和例4。这两个句子都没有语法和逻辑上的错误,并且都包含了that引导的主语从句的嵌套结构。其中的that都是从属连词(词法标签为IN);而Berkeley Parser和Stanford Parser对于例3和例4给出的都是错误的句法分析结果!尤其对于这两个句子中的总计5个从属连词that,Berkeley Parser和Stanford Parser都没有给出完全正确的结果!另外,对于将that记为有结构歧义的限定词单元的词语列表(ii)及其对应的A-B-C联合系统,计算机最初会将有结构歧义的限定词单元that和基本名词单元men划在同一个语言片段中,但是that修饰men是一个明显的句法错误,这一错误在后续的句法规则检查环节中很容易被计算机识别并剔除。因此,例3和例4所生成的将that记为有结构歧义的限定词单元的词语列表(ii),都会被计算机清除。Supplementary explanation for cases 3 and 4: In the first half of the description, cases 3 and 4 have been mentioned. These two sentences have no grammatical and logical errors, and both contain the nested structure of the subject clause guided by that. Among them, that is all subordinate conjunctions (the lexical label is IN); and Berkeley Parser and Stanford Parser gave incorrect syntactic analysis results for Examples 3 and 4! Especially for a total of 5 subordinating conjunctions that in these two sentences, neither Berkeley Parser nor Stanford Parser gave completely correct results! In addition, for the word list (ii) that records that as a structurally ambiguous qualifier unit (ii) and its corresponding ABC joint system, the computer initially classifies the structurally ambiguous qualifier unit that and the basic noun unit men in the same language In the fragment, but that modifies men is an obvious syntax error, and this error can be easily identified and eliminated by the computer in the subsequent syntactic rule checking. Therefore, the word lists (ii) generated in Examples 3 and 4 that mark that as a structurally ambiguous qualifier unit (ii) will be cleared by the computer.
例5:Behaviorists suggest the child who is raised in an environment where there are many stimuli which develop his or her capacity for appropriate responses will experience greater intellectual development.Example 5: Behaviorists suggest the child who is raised in an environment where they are many stimuli which develop his or her capacity for appropriate responses will experience greater intellectual development.
由于篇幅有限,仅给出经过预处理的词语列表(ii),如下所示:Due to limited space, only the preprocessed word list (ii) is given, as shown below:
Figure PCTCN2019100638-appb-000037
Figure PCTCN2019100638-appb-000037
例句共有5个谓语动词单元suggest,is raised,there are,develop,will experience;因此,本例句包含5个谓语元素,依次记为r 1,r 2,r 3,r 4,r 5;进而针对这5个谓语元素,生成对应 的谓语向量f 1,f 2,f 3,f 4,f 5;依据申请方案S3的信息,谓语向量f 1,f 2,f 3,f 4,f 5中的每一个元素的取值如下: The example sentence has 5 predicate verb units suggest, is raised, there are, develop, and will experience; therefore, this example sentence contains 5 predicate elements, which are recorded as r 1 , r 2 , r 3 , r 4 , and r 5 in turn; These 5 predicate elements generate corresponding predicate vectors f 1 , f 2 , f 3 , f 4 , and f 5 ; according to the information of the application plan S3, the predicate vectors f 1 , f 2 , f 3 , f 4 , f 5 The value of each element of is as follows:
①对于f 1有:{r 1}={suggest};{c 1}={e},{l 1}={e},{x 1}={Behaviorists,e},{y 1}={the child,f 2,f 3,f 4,f 5,e},{z 1}={e}。 ① For f 1 : {r 1 }={suggest}; {c 1 }={e}, {l 1 }={e}, {x 1 }={Behaviorists,e}, {y 1 }={ the child, f 2 , f 3 , f 4 , f 5 , e}, {z 1 } = {e}.
②对于f 2有:{r 2}={is raised};{c 2}={e},{l 2}={who,e},{x 2}={Behaviorists,the child,f 1,e},{y 2}={an environment,f 3,f 4,f 5,e},{z 2}={e}。 ②For f 2 there are: {r 2 }={is raised}; {c 2 }={e}, {l 2 }={who,e}, {x 2 }={Behaviorists,the child,f 1 , e}, {y 2 } = {an environment, f 3 , f 4 , f 5 , e}, {z 2 } = {e}.
③对于f 3有:{r 3}={there are};{c 3}={e},{l 3}={who,where,e},{x 3}={Behaviorists,the child,an environment,f 1,f 2,e},{y 3}={many stimuli,f 4,f 5,e},{z 3}={e}。 ③For f 3 : {r 3 }={there are}; {c 3 }={e}, {l 3 }={who,where,e}, {x 3 }={Behaviorists, the child, an environment, f 1 ,f 2 ,e}, {y 3 }={many stimuli,f 4 ,f 5 ,e}, {z 3 }={e}.
④对于f 4有:{r 4}={develop};{c 4}={e},{l 4}={who,where,which,e},{x 4}={Behaviorists,the child,an environment,many stimuli,f 1,f 2,f 3,e},{y 4}={capacity,responses,f 5,e},{z 4}={e}。 ④For f 4 : {r 4 }={develop}; {c 4 }={e}, {l 4 }={who,where,which,e}, {x 4 }={Behaviorists, the child, an environment,many stimuli,f 1 ,f 2 ,f 3 ,e}, {y 4 }={capacity,responses,f 5 ,e}, {z 4 }={e}.
⑤对于f 5有:{r 5}={will experience};{c 5}={e},{l 5}={who,where,which,e},{x 5}={Behaviorists,the child,an environment,many stimuli,capacity,responses,f 1,f 2,f 3,f 4,e},{y 5}={development,e},{z 5}={e}。 ⑤For f 5 : {r 5 }={will experience}; {c 5 }={e}, {l 5 }={who,where,which,e}, {x 5 }={Behaviorists, the child ,an environment,many stimuli,capacity,responses,f 1 ,f 2 ,f 3 ,f 4 ,e},{y 5 }={development,e},{z 5 }={e}.
特别说明:在英语中,there be句型本质上是一种倒装句型,there be句型的主语是位于be动词之后的语言单元。在本专利申请中,为了计算机处理上的便利,先将位于be动词之后的语言单元全都当作宾语位置上的语言单元加以处理。到了后续的句法结构修补环节,包括there be句型和倒装句型在内的特殊句法现象,都可以得到恰当的处理。Special note: In English, the there be sentence pattern is essentially an inverted sentence pattern, and the subject of the there be sentence pattern is the language unit after the be verb. In this patent application, for the convenience of computer processing, all the language units located after the be verb are treated as the language units at the object position. When it comes to the subsequent syntactic structure repair link, special syntactic phenomena including there be sentence patterns and inverted sentence patterns can be handled appropriately.
将本例句对应的主干系统的全体记为{A};将集合{A}的基数记为∣A∣。将谓语向量f 1的所有可能取值的全体记为{f 1};将集合{f 1}的基数记为∣f 1∣。对其他各谓语向量和各元素,采取相同的处理。则运用组合数学中的乘法原理: Denote the entire backbone system corresponding to this example as {A}; denote the cardinality of the set {A} as ∣A∣. Let the total of all possible values of the predicate vector f 1 be {f 1 }; let the cardinality of the set {f 1 } be ∣f 1 ∣. The same treatment is adopted for other predicate vectors and elements. Then use the multiplication principle in combinatorics:
∣f 1∣=∣c 1∣×∣l 1∣×∣x 1∣×∣r 1∣×∣y 1∣×∣z 1∣=1×1×2×1×6×1=12 ∣f 1 ∣=∣c 1 ∣×∣l 1 ∣×∣x 1 ∣×∣r 1 ∣×∣y 1 ∣×∣z 1 ∣=1×1×2×1×6×1=12
∣f 2∣=∣c 2∣×∣l 2∣×∣x 2∣×∣r 2∣×∣y 2∣×∣z 2∣=1×2×4×1×5×1=40 ∣f 2 ∣=∣c 2 ∣×∣l 2 ∣×∣x 2 ∣×∣r 2 ∣×∣y 2 ∣×∣z 2 ∣=1×2×4×1×5×1=40
∣f 3∣=∣c 3∣×∣l 3∣×∣x 3∣×∣r 3∣×∣y 3∣×∣z 3∣=1×3×6×1×4×1=72 ∣f 3 ∣=∣c 3 ∣×∣l 3 ∣×∣x 3 ∣×∣r 3 ∣×∣y 3 ∣×∣z 3 ∣=1×3×6×1×4×1=72
∣f 4∣=∣c 4∣×∣l 4∣×∣x 4∣×∣r 4∣×∣y 4∣×∣z 4∣=1×4×8×1×4×1=128 ∣f 4 ∣=∣c 4 ∣×∣l 4 ∣×∣x 4 ∣×∣r 4 ∣×∣y 4 ∣×∣z 4 ∣=1×4×8×1×4×1=128
∣f 5∣=∣c 5∣×∣l 5∣×∣x 5∣×∣r 5∣×∣y 5∣×∣z 5∣=1×4×11×1×2×1=88 ∣f 5 ∣=∣c 5 ∣×∣l 5 ∣×∣x 5 ∣×∣r 5 ∣×∣y 5 ∣×∣z 5 ∣=1×4×11×1×2×1=88
∣A∣=∣f 1∣×∣f 2∣×∣f 3∣×∣f 4∣×∣f 5∣=389283840,总计生成 389283840个主干系统。 ∣A∣=∣f 1 ∣×∣f 2 ∣×∣f 3 ∣×∣f 4 ∣×∣f 5 ∣=389283840, generating a total of 389283840 backbone systems.
上述过程,可以依据申请方案中的权利要求5加以简化,将主干系统的生成和检查同步执行,从而降低计算的复杂度。依据申请方案S4中的信息,本例句生成2个辅助向量,如下所示:The above process can be simplified according to claim 5 in the application solution, and the generation and checking of the backbone system can be executed simultaneously, thereby reducing the complexity of calculation. According to the information in S4, this example sentence generates 2 auxiliary vectors, as shown below:
g[PREP,1](u)=in+<(u):PREP=in,u={an environment,f 3,f 4,f 5,e}。 g[PREP,1](u)=in+<(u): PREP=in, u={an environment, f 3 , f 4 , f 5 , e}.
g[PREP,2](u)=for+<(u):PREP=for,u={responses,f 5,e}。 g[PREP, 2](u)=for+<(u): PREP=for, u={responses, f 5 , e}.
将本例句对应的辅助系统的全体记为{B};将集合{B}的基数记为∣B∣。将辅助向量g[PREP,1](u)的所有可能取值的全体记为{g[PREP,1](u)};将集合{g[PREP,1](u)}的基数记为∣g[PREP,1](u)∣。对辅助向量g[PREP,2](u)采取相同的处理。运用组合数学中的乘法原理:∣B∣=∣g[PREP,1](u)∣×∣g[PREP,2](u)∣=3×5=15,总计生成15个辅助系统。Denote the entire auxiliary system corresponding to this example as {B}; denote the cardinality of the set {B} as ∣B∣. Record the total of all possible values of the auxiliary vector g[PREP,1](u) as {g[PREP,1](u)}; record the base of the set {g[PREP,1](u)} as ∣g[PREP,1](u)∣. The same treatment is applied to the auxiliary vector g[PREP,2](u). Using the multiplication principle in combinatorics: ∣B∣=∣g[PREP,1](u)∣×∣g[PREP,2](u)∣=3×5=15, a total of 15 auxiliary systems are generated.
取一个经过检查的规范主干系统,记为规范A 1系统;取一个与规范A 1系统搭配的经过检 查的规范辅助系统,记为规范B 1系统;将与规范A 1系统和规范B 1系统搭配的剩余名词系统,记为C 1系统。由此获得一个A-B-C联合系统,记为A 1-B 1-C 1联合系统。如下所示: Take a through standardized basic system checks, referred to as specification A 1 system; take a Regulatory A 1 system with the checked specification assistance system, referred to regulate B 1 system; and the specification A 1 system and regulate B 1 system The remaining noun system that matches is recorded as the C 1 system. An ABC joint system is thus obtained, denoted as A 1 -B 1 -C 1 joint system. As follows:
Figure PCTCN2019100638-appb-000038
Figure PCTCN2019100638-appb-000038
B 1={g[PREP,1](u)=in+<an environment,g[PREP,2](u)=for+<responses} B 1 ={g[PREP,1](u)=in+<an environment,g[PREP,2](u)=for+<responses}
Figure PCTCN2019100638-appb-000039
Figure PCTCN2019100638-appb-000039
取右侧为第一侧,需要进行五轮整体插空操作。由于篇幅所限,本专利申请的发明人将所述的五轮插空以简易方式集中地展现出来。如图35所示。Taking the right side as the first side, five rounds of overall insertion operation are required. Due to space limitations, the inventor of this patent application presented the five-round plug-in in a simple manner. As shown in Figure 35.
经过上述的五轮整体插空操作,获得拼合向量,如下所示:After the above five rounds of overall inserting operations, the combined vector is obtained, as shown below:
Behaviorists suggest the child who is raised in an environment where there are many stimuli which develop capacity for responses will experience developmentBehaviorists suggest the child who is raised in an environment where there are many stimuli which develop capacity for responses will experience development
将上述拼合向量替换成编号,如下所示。经检查该拼合向量内部没有出现顺序逆反的编号。显然该拼合向量是合理的,保留该拼合向量。Replace the above flattened vector with a number, as shown below. After checking, there is no serial number in reverse order in the combined vector. Obviously, the combined vector is reasonable, and the combined vector is retained.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
前述的五轮整体插空操作的展现方式,就是通过A 1-B 1-C 1联合系统获得的例句5的大致句法结构,即例句5的句法结构的基本框架。 The above-mentioned five rounds of overall insertion operation is the general syntactic structure of the sentence 5 obtained through the A 1 -B 1 -C 1 joint system, that is, the basic framework of the syntactic structure of the sentence 5.
上述过程,还可以依据申请方案的S8.6进一步优化。将A 1-B 1-C 1联合系统替换为编号,如下所示: The above process can be further optimized according to S8.6 of the application plan. Replace the A 1 -B 1 -C 1 joint system with a number, as shown below:
Figure PCTCN2019100638-appb-000040
Figure PCTCN2019100638-appb-000040
B 1={g[PREP,1](u)=6+<7,g[PREP,2](u)=14+<15};
Figure PCTCN2019100638-appb-000041
B 1 ={g[PREP,1](u)=6+<7,g[PREP,2](u)=14+<15};
Figure PCTCN2019100638-appb-000041
依据申请方案的S8.6,对上述的A 1-B 1-C 1联合系统的编号进行优化操作。经过优化,获得了和前述的整体插空操作相同的结果。 According to S8.6 of the application scheme, the numbering of the above-mentioned A 1 -B 1 -C 1 combined system is optimized. After optimization, the same result as the aforementioned overall insertion operation was obtained.
依据A 1-B 1-C 1联合系统提供的句法结构的基本框架,获得例句5的完整的句法分析结果。该结果是一个在英语语言学上可以认为正确的结果,以字符串的形式表达如下:[参见图10] (ROOT(S(NP(NNS Behaviorists))(VP(VBP suggest)(SBAR(S(NP(NP(DT the)(NN child))(SBAR(WHNP(WP who))(S(VP(VBZ is)(VP(VBN raised)(PP(IN in)(NP(NP(DT an)(NN environment))(SBAR(WHADVP(WRB where))(S(NP(EX there))(VP(VBP are)(NP(NP(JJ many)(NNS stimuli))(SBAR(WHNP(WP which))(S(VP(VBP develop)(NP(NP(PRP$his)(CC or)(PRP$her)(NN capacity))(PP(IN for)(NP(JJ appropriate)(NNS responses))))))))))))))))))(VP(MD will)(VP(VB experience)(NP(JJR greater)(JJ intellectual)(NN development)))))))(..))) According to the basic framework of the syntactic structure provided by the A 1 -B 1 -C 1 joint system, the complete syntactic analysis result of Example 5 is obtained. The result is a result that can be considered correct in English linguistics, expressed as a string as follows: [See Figure 10] (ROOT(S(NP(NNS Behaviorists))(VP(VBP suggest)(SBAR(S( NP(NP(DT the)(NN child))(SBAR(WHNP(WP who))(S(VP(VBZ is)(VP(VBN raised)(PP(IN in)(NP(NP(DT an)( NN environment))(SBAR(WHADVP(WRB where))(S(NP(EX there))(VP(VBP are)(NP(NP(JJ many)(NNS stimuli))(SBAR(WHNP(WP which)) (S(VP(VBP develop)(NP(NP(PRP$his)(CC or)(PRP$her)(NN capacity))(PP(IN for)(NP(JJ appropriate)(NNS responses)))) ))))))))))))))(VP(MD will)(VP(VB experience)(NP(JJR greater)(JJ intellectual)(NN development)))))))(..) ))
例6:Believing that what he wants will occur,Tom works hard in the company.Example 6: Believing that what he wants will occur, Tom works hard in the company.
由于篇幅有限,仅给出经过预处理的词语列表(ii),如下所示:Due to limited space, only the preprocessed word list (ii) is given, as shown below:
Figure PCTCN2019100638-appb-000042
Figure PCTCN2019100638-appb-000042
例句共有3个谓语动词单元wants,will occur,works;因此,本例句包含3个谓语元素,依次记为r 1,r 2,r 3;进而针对这3个谓语元素,生成对应的谓语向量f 1,f 2,f 3;本例句包含1个动名词-现在分词元素,设该动名词-现在分词元素对应的动名词-现在分词向量为g[VBG](u,v);依据申请方案S3的信息,谓语向量f 1,f 2,f 3中的每一个元素的取值如下: The example sentence has 3 predicate verb units wants, will occur, and works; therefore, this example sentence contains 3 predicate elements, which are sequentially denoted as r 1 , r 2 , and r 3 ; and then for these 3 predicate elements, the corresponding predicate vector f is generated 1 ,f 2 ,f 3 ; This example sentence contains 1 gerund-present participle element. Let the gerund-present participle element correspond to the gerund-present participle vector as g[VBG](u,v); according to the application plan For the information of S3, the value of each element in the predicate vectors f 1 , f 2 , and f 3 is as follows:
①对于f 1有:{r 1}={wants};{c 1}={e},{l 1}={that,what,e},{x 1}={he,g[VBG](u,v),e},{y 1}={f 2,f 3,e},{z 1}={e}。 ① For f 1 there are: {r 1 }={wants}; {c 1 }={e}, {l 1 }={that,what,e}, {x 1 }={he,g[VBG]( u,v),e}, {y 1 }={f 2 ,f 3 ,e}, {z 1 }={e}.
②对于f 2有:{r 2}={will occur};{c 2}={e},{l 2}={that,what,e},{x 2}={he,g[VBG](u,v),f 1,e},{y 2}={Tom,f 3,e},{z 2}={e}。 ②For f 2 : {r 2 }={will occur}; {c 2 }={e}, {l 2 }={that,what,e}, {x 2 }={he,g[VBG] (u,v),f 1 ,e}, {y 2 }={Tom,f 3 ,e}, {z 2 }={e}.
③对于f 3有:{r 3}={works};{c 3}={e},{l 3}={that,what,e},{x 3}={he,Tom,g[VBG](u,v),f 1,f 2,e},{y 3}={the company,e},{z 3}={e}。 ③For f 3 : {r 3 }={works}; {c 3 }={e}, {l 3 }={that,what,e}, {x 3 }={he,Tom,g[VBG ](u,v),f 1 ,f 2 ,e}, {y 3 }={the company,e}, {z 3 }={e}.
将本例句对应的主干系统的全体记为{A};将集合{A}的基数记为∣A∣。将谓语向量f 1的所有可能取值的全体记为{f 1};将集合{f 1}的基数记为∣f 1∣。对其他各谓语向量和各元素,采取相同的处理。则运用组合数学中的乘法原理: Denote the entire backbone system corresponding to this example as {A}; denote the cardinality of the set {A} as ∣A∣. Let the total of all possible values of the predicate vector f 1 be {f 1 }; let the cardinality of the set {f 1 } be ∣f 1 ∣. The same treatment is adopted for other predicate vectors and elements. Then use the multiplication principle in combinatorics:
∣f 1∣=∣c 1∣×∣l 1∣×∣x 1∣×∣r 1∣×∣y 1∣×∣z 1∣=1×3×3×1×3×1=27 ∣f 1 ∣=∣c 1 ∣×∣l 1 ∣×∣x 1 ∣×∣r 1 ∣×∣y 1 ∣×∣z 1 ∣=1×3×3×1×3×1=27
∣f 2∣=∣c 2∣×∣l 2∣×∣x 2∣×∣r 2∣×∣y 2∣×∣z 2∣=1×3×4×1×3×1=36 ∣f 2 ∣=∣c 2 ∣×∣l 2 ∣×∣x 2 ∣×∣r 2 ∣×∣y 2 ∣×∣z 2 ∣=1×3×4×1×3×1=36
∣f 3∣=∣c 3∣×∣l 3∣×∣x 3∣×∣r 3∣×∣y 3∣×∣z 3∣=1×3×6×1×2×1=36 ∣f 3 ∣=∣c 3 ∣×∣l 3 ∣×∣x 3 ∣×∣r 3 ∣×∣y 3 ∣×∣z 3 ∣=1×3×6×1×2×1=36
从而:∣A∣=∣f 1∣×∣f 2∣×∣f 3∣=27×36×36=34992,总计生成 34992个主干系统。 Thus: ∣A∣=∣f 1 ∣×∣f 2 ∣×∣f 3 ∣=27×36×36=34992, generating 34992 backbone systems in total.
依据申请方案S4中的信息,本例句生成2个辅助向量g[VBG](u,v)和g[PREP](u):According to the information in the application plan S4, this example sentence generates two auxiliary vectors g[VBG](u,v) and g[PREP](u):
g[VBG](u,v)=Believing+<(u)+<e:VBG=Believing,u={he,f 1,f 2,f 3,e}。 g[VBG](u,v)=Believing+<(u)+<e: VBG=Believing, u={he, f 1 , f 2 , f 3 , e}.
g[PREP](u)=in+<(u):PREP=in,u={the company,e}。g[PREP](u)=in+<(u): PREP=in, u={the company,e}.
将本例句对应的辅助系统的全体记为{B};将集合{B}的基数记为∣B∣。将辅助向量g[VBG](u,v)的所有可能取值的全体记为{g[VBG](u,v)};将集合{g[VBG](u,v)}的基数记为∣g[VBG](u,v)∣。对辅助向量g[PREP](u)采取相同的处理。运用组合数学中的乘法原理:∣B∣=∣g[VBG](u,v)∣×∣g[PREP](u)∣=5×2=10,总计生成10个辅助系统。Denote the entire auxiliary system corresponding to this example as {B}; denote the cardinality of the set {B} as ∣B∣. Record the total of all possible values of the auxiliary vector g[VBG](u,v) as {g[VBG](u,v)}; record the base of the set {g[VBG](u,v)} as ∣g[VBG](u,v)∣. The same process is applied to the auxiliary vector g[PREP](u). Using the principle of multiplication in combinatorics: ∣B∣=∣g[VBG](u,v)∣×∣g[PREP](u)∣=5×2=10, generating 10 auxiliary systems in total.
取一个经过检查的规范主干系统,记为规范A 1系统;取一个与规范A 1系统搭配的经过检查的规范辅助系统,记为规范B 1系统;将与规范B 1系统搭配的剩余名词系统,记为C 1系统。由此获得一个A-B-C联合系统,记为A 1-B 1-C 1联合系统。如下所示: Take an inspected standard backbone system and record it as the standard A 1 system; take an inspected standard auxiliary system that matches the standard A 1 system and record it as the standard B 1 system; the remaining noun system that will match the standard B 1 system , Marked as C 1 system. An ABC joint system is thus obtained, denoted as A 1 -B 1 -C 1 joint system. As follows:
Figure PCTCN2019100638-appb-000043
Figure PCTCN2019100638-appb-000043
B 1={g[VBG](u,v)=Believing+<f 2+<e,g[PREP](u)=in+<the company} B 1 ={g[VBG](u,v)=Believing+<f 2 +<e,g[PREP](u)=in+<the company}
Figure PCTCN2019100638-appb-000044
Figure PCTCN2019100638-appb-000044
取另一个A-B-C联合系统,如下所示,记为A 2-B 2-C 22联合系统。 Take another ABC joint system, as shown below, and mark it as A 2 -B 2 -C 22 joint system.
Figure PCTCN2019100638-appb-000045
Figure PCTCN2019100638-appb-000045
B 2={g[VBG](u,v)=Believing+<he+<e,g[PREP](u)=in+<the company} B 2 ={g[VBG](u,v)=Believing+<he+<e,g[PREP](u)=in+<the company}
Figure PCTCN2019100638-appb-000046
Figure PCTCN2019100638-appb-000046
取另一个A-B-C联合系统,如下所示,记为A 2-B 1-C 21联合系统。 Take another ABC joint system, as shown below, and mark it as A 2 -B 1 -C 21 joint system.
Figure PCTCN2019100638-appb-000047
Figure PCTCN2019100638-appb-000047
B 1={g[VBG](u,v)=Believing+<f 2+<e,g[PREP](u)=in+<the company}; B 1 ={g[VBG](u,v)=Believing+<f 2 +<e,g[PREP](u)=in+<the company};
C 21={he} C 21 ={he}
取左侧为第一侧,构造空位,然后进行整体插空操作。经过整体插空操作,A 2-B 2-C 22联合系统没有生成合理的拼合向量,A 2-B 2-C 22联合系统在整体插空环节被清除。接下来,对经过整体插空操作之后保留下来的A 1-B 1-C 1联合系统和A 2-B 1-C 21联合系统,可以采用概率结合句法规则的方法,进行剩余名词检查。经检查发现C 21系统的剩余名词he,不是同位语可以采用的独立名词、不是非谓语动词的独立主格结构可以采用的独立名词、不是搭配冒号的文章标题经常采用的独立名词等等。因此,C 21系统的剩余名词he是不合理的剩余名词。 A 2-B 1-C 21联合系统有错误,舍弃。 Take the left side as the first side, construct an empty position, and then perform the overall emptying operation. After the overall blanking operation, the A 2 -B 2 -C 22 joint system did not generate a reasonable split vector, and the A 2 -B 2 -C 22 joint system was cleared during the overall blanking link. Next, for the A 1 -B 1 -C 1 joint system and the A 2 -B 1 -C 21 joint system that are retained after the overall insertion operation, the method of probability combined with syntactic rules can be used to check the remaining nouns. After inspection, it is found that the remaining nouns he in the C 21 system are not independent nouns that can be used by appositions, independent nouns that can be used by independent nominative structures that are not non-predicate verbs, independent nouns often used in article titles that are not collocation colons, and so on. Therefore, the remaining noun he in the C 21 system is an unreasonable remaining noun. A 2 -B 1 -C 21 has an error in the combined system, so discard it.
经过各种必要的处理,最后只有A 1-B 1-C 1联合系统保留下来,其他的联合系统都由于本身存在的不合理因素而被舍弃。A 1-B 1-C 1联合系统对应的例句6的句法结构的基本框架,如图36所示。 After various necessary treatments, in the end only the A 1 -B 1 -C 1 joint system remained, and the other joint systems were discarded due to their own unreasonable factors. The basic framework of the syntactic structure of example sentence 6 corresponding to the A 1 -B 1 -C 1 joint system is shown in Figure 36.
进一步地,依据A 1-B 1-C 1联合系统刻画的句法结构的基本框架,采用概率结合句法规则的方法,按照概率的从大到小排序,获取与前述的句法结构的基本框架无冲突且概率最大的计算机分析结果。经过前述一系列处理,得到例句6的完整的句法分析结果。该结果是一个在英语语言学上可以认为正确的结果,以字符串形式表达如下:[参见图11] Further, according to the basic framework of the syntactic structure described by the A 1 -B 1 -C 1 joint system, the method of combining probability with syntactic rules is adopted, and the probability is sorted from largest to smallest, so as to obtain the basic framework without conflict with the aforementioned syntactic structure. And the most probable computer analysis result. After the aforementioned series of processing, the complete syntactic analysis result of Example 6 is obtained. The result is a result that can be considered correct in English linguistics, expressed as a string as follows: [See Figure 11]
(ROOT(S(S(VP(VBG Believing)(SBAR(IN that)(S(SBAR(WHNP(WP what))(S(NP(PRP he))(VP(VBZ wants))))(VP(MD will)(VP(VB occur)))))))(,,)(NP(NNP Tom))(VP(VBZ works)(ADVP(RB hard))(PP(IN in)(NP(DT the)(NN company))))(..)))(ROOT(S(S(VP(VBGBelieving)(SBAR(IN that)(S(SBAR(WHNP(WPwhat))(S(NP(PRPhe))(VP(VBZwants))))(VP( MD will)(VP(VBoccur)))))))(,,)(NP(NNPTom))(VP(VBZworks)(ADVP(RBhard))(PP(IN)(NP(DTthe )(NN company))))(..)))
例7:A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10destinations in the world.Example 7: A study of travelers conducted by the website TripAdvisor names Yangshuo as one of the top 10 destinations in the world.
本例句中的conducted有结构歧义。由于篇幅有限,仅给出词语列表(ii-a)和(ii-b)。Conducted in this example sentence has structural ambiguity. Due to limited space, only word lists (ii-a) and (ii-b) are given.
词语列表(ii-a):Word list (ii-a):
Figure PCTCN2019100638-appb-000048
Figure PCTCN2019100638-appb-000048
词语列表(ii-b):Word list (ii-b):
Figure PCTCN2019100638-appb-000049
Figure PCTCN2019100638-appb-000049
Figure PCTCN2019100638-appb-000050
Figure PCTCN2019100638-appb-000050
依据词语列表(ii-a)生成的A a-B a-C a联合系统,如下所示: Based on the word list (ii-a), the A a -B a -C a joint system is as follows:
A a=e e A study names Yangshuo e A a =e e A study names Yangshuo e
注:词语列表(ii-a)只包含一个谓语,因此规范A a系统的矩阵结构退化为一个谓语向量。 Note: The word list (ii-a) contains only one predicate, so the matrix structure of the standard A a system degenerates into a predicate vector.
B a={g[PREP,1](u)=of+<travelers,g[PREP,2](u)=by+<the website, B a ={g[PREP,1](u)=of+<travelers,g[PREP,2](u)=by+<the website,
g[PREP,3](u)=as+<one,g[PREP,4](u)=of+<destinations,g[PREP,3](u)=as+<one,g[PREP,4](u)=of+<destinations,
g[PREP,5](u)=in+<the world,g[VBN](u)=conducted+<e}g[PREP,5](u)=in+<the world,g[VBN](u)=conducted+<e}
C a={TripAdvisor} C a ={TripAdvisor}
依据词语列表(ii-b)生成的A b-B b-C b联合系统,如下所示: The A b -B b -C b joint system generated according to the word list (ii-b) is as follows:
Figure PCTCN2019100638-appb-000051
Figure PCTCN2019100638-appb-000051
B b={g[PREP,1](u)=of+<travelers,g[PREP,2](u)=by+<the website, B b ={g[PREP,1](u)=of+<travelers,g[PREP,2](u)=by+<the website,
g[PREP,3](u)=as+<one,g[PREP,4](u)=of+<destinations,g[PREP,3](u)=as+<one,g[PREP,4](u)=of+<destinations,
g[PREP,5](u)=in+<the world}g[PREP,5](u)=in+<the world}
C b={TripAdvisor} C b ={TripAdvisor}
经过整体插空操作之后,到了句法规则检查环节,发现:A b-B b-C b联合系统中的向量f 2中的x 2=f 1,即向量f 1是f 2的主语从句,而向量f 1中的l 1=e,且向量f 1没有被左右引号围住,这违反了前文提到过的一条英文句法规则。因此,A b-B b-C b联合系统有错误,舍弃。 After the overall blanking operation, we arrived at the syntactic rule inspection link and found that: x 2 = f 1 in the vector f 2 in the A b -B b -C b joint system, that is, the vector f 1 is the subject clause of f 2 , and the vector f 1 l 1 = e, and f 1 vector is not enclosed in quotation marks around, which violates the previously mentioned an English syntactic rules. Therefore, the A b -B b -C b joint system has errors and is discarded.
经过各个步骤,A a-B a-C a联合系统没有错误,保留。最终获得A a-B a-C a联合系统所对应的例句7的完整句法结构,以字符串的形式表达如下:[参见图13] After each step, the A a -B a -C a joint system has no errors and is reserved. Finally, the complete syntactic structure of the example sentence 7 corresponding to the A a -B a -C a joint system is obtained, which is expressed as a string as follows: [see Figure 13]
(ROOT(S(NP(NP(DT A)(NN study))(ADJP(PP(IN of)(NP(NNS travelers)))(VP(VBN conducted)(PP(IN by)(NP(NP(DT the)(NN website))(NP(NNP TripAdvisor)))))))(VP(VBZ names)(NP(NNP Yangshuo))(PP(IN as)(NP(NP(CD one))(PP(IN of)(NP(NP(DT the)(JJ top)(CD 10)(NNS destinations))(PP(IN in)(NP(DT the)(NN world))))))))(..)))(ROOT(S(NP(NP(DT A)(NN study))(ADJP(PP(IN of)(NP(NNStravelers)))(VP(VBNconducted)(PP(INby)(NP(NP( DTthe)(NNwebsite))(NP(NNPTripAdvisor)))))))(VP(VBZnames)(NP(NNPYangshuo))(PP(IN as)(NP(NP(CDone))(PP (IN of)(NP(NP(DTthe)(JJtop)(CD10)(NNS destinations))(PP(IN in)(NP(DTthe)(NNworld)))))))(. .)))
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!本例句所包含的过去分词与谓语动词的一般过去式之间的结构歧义(本例句中的conducted),是一种常见的结构歧义。As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence! The structural ambiguity between the past participle of this example sentence and the general past tense of the predicate verb (conducted in this example sentence) is a common structural ambiguity.
例8:That nearly all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.Example 8: That near all behavior is learned behavior is a basic assumption that has been put forward by the social scientists.
本例句中的That和is learned有结构歧义。由于篇幅有限,仅给出词语列表(ii-a)和(ii-b)。词语列表(ii-a):There is structural ambiguity between That and is learned in this example. Due to limited space, only word lists (ii-a) and (ii-b) are given. Word list (ii-a):
Figure PCTCN2019100638-appb-000052
Figure PCTCN2019100638-appb-000052
Figure PCTCN2019100638-appb-000053
Figure PCTCN2019100638-appb-000053
词语列表(ii-b):Word list (ii-b):
Figure PCTCN2019100638-appb-000054
Figure PCTCN2019100638-appb-000054
依据词语列表(ii-a)生成的A a-B a-C a联合系统,如下所示: Based on the word list (ii-a), the A a -B a -C a joint system is as follows:
Figure PCTCN2019100638-appb-000055
Figure PCTCN2019100638-appb-000055
B a={g[VBN](u)=learned+<e,g[PREP](u)=by+<scientists};
Figure PCTCN2019100638-appb-000056
B a ={g[VBN](u)=learned+<e,g[PREP](u)=by+<scientists};
Figure PCTCN2019100638-appb-000056
依据词语列表(ii-b)生成的A b-B b-C b联合系统,如下所示: The A b -B b -C b joint system generated according to the word list (ii-b) is as follows:
Figure PCTCN2019100638-appb-000057
Figure PCTCN2019100638-appb-000057
B b={g[PREP](u)=by+<scientists};
Figure PCTCN2019100638-appb-000058
B b ={g[PREP](u)=by+<scientists};
Figure PCTCN2019100638-appb-000058
由A a-B a-C a联合系统及其对应的合理的拼合向量获得的例句8的完整句法结构: The complete syntactic structure of example sentence 8 obtained from the A a -B a -C a joint system and its corresponding reasonable stitching vector:
(ROOT(S(SBAR(IN That)(S(NP(ADJP(RB nearly)(DT all))(NN behavior))(VP(VBZ is)(NP(VBN learned)(NP(NN behavior))))))(VP(VBZ is)(NP(NP(DT a)(JJ basic)(NN assumption))(SBAR(WHNP(WP that))(S(VP(VBZ has)(VP(VBN been)(VP(VBN put)(ADVP(RB forward))(PP(IN by)(NP(DT the)(JJ social)(NNS scientists))))))))))(..)))(ROOT(S(SBAR(IN That)(S(NP(ADJP(RBnearly)(DTall))(NNbehavior))(VP(VBZis)(NP(VBNlearned)(NP(NNbehavior))) )))(VP(VBZis)(NP(NP(DTa)(JJbasic)(NNassumption))(SBAR(WHNP(WPthat))(S(VP(VBZhas)(VP(VBNbeen)( VP(VBN put)(ADVP(RB forward))(PP(INby)(NP(DTthe)(JJ social)(NNS scientists)))))))))(..)))
可以通过句法结构修补这一环节,对A b-B b-C b联合系统中的每一个向量在句法结构方面的主要地位和次要地位进行区分和调整,从而获得A b-B b-C b联合系统对应的完整句法结构。所述的对A b-B b-C b联合系统中的每一个向量在句法结构方面的主要地位和次要地位进行区分和调整,具体是指:哪一个谓语向量做主句,哪一个谓语向量做从句,以及对于充当主句的谓语向量和充当从句的谓语向量的相应调整,等等。 The syntactic structure can be used to repair this link, distinguish and adjust the primary and secondary status of each vector in the syntactic structure of the A b -B b -C b joint system, so as to obtain A b -B b -C b The complete syntactic structure corresponding to the joint system. The said distinction and adjustment of the primary and secondary status of each vector in the syntactic structure of the A b -B b -C b joint system specifically refers to: which predicate vector serves as the main sentence and which predicate vector Make clauses, and adjust the predicate vector serving as the main clause and the predicate vector serving as the clause, etc.
由A b-B b-C b联合系统及其对应的合理的拼合向量获得的例句8的完整句法结构: The complete syntactic structure of example sentence 8 obtained from the A b -B b -C b joint system and its corresponding reasonable combined vector:
(ROOT(S(SBAR(IN That)(S(NP(ADJP(RB nearly)(DT all))(NN behavior))(VP(VBZ is)(VP(VBN learned)))))(NP(NN behavior))(VP(VBZ is)(NP(NP(DT a)(JJ basic)(NN assumption))(SBAR(WHNP(WP that))(S(VP(VBZ has)(VP(VBN been)(VP(VBN put)(ADVP(RB forward))(PP(IN by)(NP(DT the)(JJ social)(NNS scientists))))))))))(..)))(ROOT(S(SBAR(IN That)(S(NP(ADJP(RBnearly)(DTall))(NNbehavior))(VP(VBZis)(VP(VBNlearned)))))(NP(NN behavior))(VP(VBZis)(NP(NP(DTa)(JJbasic)(NNassumption))(SBAR(WHNP(WPthat))(S(VP(VBZhas)(VP(VBNbeen)( VP(VBN put)(ADVP(RB forward))(PP(INby)(NP(DTthe)(JJ social)(NNS scientists)))))))))(..)))
A a-B a-C a联合系统对应的完整句法结构的直观形态,如图37所示。 The intuitive form of the complete syntactic structure corresponding to the A a -B a -C a joint system is shown in Figure 37.
A b-B b-C b联合系统对应的完整句法结构的直观形态,如图38所示。 The intuitive form of the complete syntactic structure corresponding to the A b -B b -C b joint system is shown in Figure 38.
然后,采用语义处理的方法,筛选出最佳的句法分析结果。所述语义处理的方法,包括但不限于基于λ-演算的语义分析方法、基于语义场和语义网络的语义分析方法、基于知识图谱的语义分析方法、基于语义图模型的语义分析方法、对语义关系计算概率并选取其中概率最大结果的语义分析方法,等等方法。所述语义处理的方法,通常需要以句法结构对语义关系的充分约束作为前提。所述以句法结构对语义关系的充分约束作为前提,是指由句法结构来初步决定语句中的每一个词语的含义以及各个词语含义之间的相互搭配关系。比如:依据A a-B a-C a联合系统对应的完整句法结构,本例句中的第1个That是引导主语从句的从属连词,则第1个That对应的语义是“无含义”;依据A b-B b-C b联合系统对应的完整句法结构,本例句中的第1个That是引导位于句首的状语从句的从属连词,则第1个That对应的语义是“因为”;依据A a-B a-C a联合系统对应的完整句法结构,本例句中的learned是过去分词充当定语,则learned对应的语义是“被学会的”;依据A b-B b-C b联合系统对应的完整句法结构,本例句中的is与learned联合充当谓语,则is learned对应的语义是“被学会”;等等。特别指出,为了达到前述效果,可以有针对性地构建一个符合前述要求的句法-语义约束关系数据库。 Then, the method of semantic processing is used to filter out the best syntactic analysis results. The semantic processing methods include, but are not limited to, semantic analysis methods based on λ-calculus, semantic analysis methods based on semantic fields and semantic networks, semantic analysis methods based on knowledge graphs, semantic analysis methods based on semantic graph models, and semantic analysis methods based on semantic graph models. The relationship calculates the probability and selects the semantic analysis method with the largest probability among them, and so on. The semantic processing method generally needs to be based on the sufficient restriction of the syntactic structure on the semantic relationship. The premise that the syntactic structure fully restricts the semantic relationship means that the syntactic structure preliminarily determines the meaning of each word in the sentence and the mutual collocation relationship between the meanings of the words. For example: According to the complete syntactic structure corresponding to the A a -B a -C a joint system, the first That in this example sentence is a subordinate conjunction that leads the subject clause, and the corresponding meaning of the first That is "no meaning"; The complete syntactic structure corresponding to the A b -B b -C b joint system. In this example, the first That is a subordinate conjunction that guides the adverbial clause at the beginning of the sentence, and the corresponding meaning of the first That is "because"; The complete syntactic structure corresponding to the A a -B a -C a joint system. In this example, learned is the past participle acting as an attributive, and the corresponding semantics of learned is "learned"; according to the A b -B b -C b joint system Corresponding to the complete syntactic structure, in this example sentence is and learned jointly act as a predicate, then the corresponding semantics of is learned is "to be learned"; etc. In particular, in order to achieve the aforementioned effects, a syntactic-semantic constraint relational database that meets the aforementioned requirements can be constructed in a targeted manner.
假设:以句法结构对语义关系的充分约束作为前提,对前述的两个完整句法结构所对应的语义关系计算概率并选取其中概率最大的结果,过程如下:Hypothesis: On the premise that the syntactic structure has sufficient constraints on the semantic relationship, calculate the probability of the semantic relationship corresponding to the aforementioned two complete syntactic structures and select the result with the highest probability. The process is as follows:
A a-B a-C a联合系统对应的经过前述的句法结构约束的语义关系,如图39所示。 The semantic relationship of the A a -B a -C a joint system that is subject to the aforementioned syntactic structure constraints is shown in Figure 39.
A b-B b-C b联合系统对应的经过前述的句法结构约束的语义关系,如图40所示。 The semantic relationship of the A b -B b -C b joint system that is subject to the aforementioned syntactic structure constraints is shown in Figure 40.
取语义关系概率最大的A a-B a-C a联合系统对应的前述的完整句法结构,作为本例句的句法分析最终结果。将该结果以字符串形式再次呈现如下:[参见图15] Take the aforementioned complete syntactic structure corresponding to the A a -B a -C a joint system with the largest semantic relationship probability as the final result of the syntactic analysis of this example sentence. The result is presented again in string form as follows: [See Figure 15]
(ROOT(S(SBAR(IN That)(S(NP(ADJP(RB nearly)(DT all))(NN behavior))(VP(VBZ is)(NP(VBN learned)(NP(NN behavior))))))(VP(VBZ is)(NP(NP(DT a)(JJ basic)(NN assumption))(SBAR(WHNP(WP that))(S(VP(VBZ has)(VP(VBN been)(VP(VBN put)(ADVP(RB forward))(PP(IN by)(NP(DT the)(JJ social)(NNS scientists))))))))))(..)))(ROOT(S(SBAR(IN That)(S(NP(ADJP(RBnearly)(DTall))(NNbehavior))(VP(VBZis)(NP(VBNlearned)(NP(NNbehavior))) )))(VP(VBZis)(NP(NP(DTa)(JJbasic)(NNassumption))(SBAR(WHNP(WPthat))(S(VP(VBZhas)(VP(VBNbeen)( VP(VBN put)(ADVP(RB forward))(PP(INby)(NP(DTthe)(JJ social)(NNS scientists)))))))))(..)))
特别说明:对于将That标注为有结构歧义的限定词单元的词法分析结果,计算机最初会将有结构歧义的限定词单元That和基本名词单元all behavior划在同一个语言片段中,处理为That修饰all behavior;而That修饰all behavior是一个明显的句法错误,这一错误在后续的句法规则检查环节中很容易被计算机识别并剔除。由此,将That标注为有结构歧义的限定词单元的词语列表(ii),都会被计算机清除。Special note: For the result of lexical analysis that marked That as a structurally ambiguous qualifier unit, the computer will initially classify the structurally ambiguous qualifier unit That and the basic noun unit allbehavior in the same language segment, and treat them as That modifier all behavior; and That modifies all behavior is an obvious syntax error, which can be easily identified and eliminated by the computer in the subsequent syntactic rule checking process. Therefore, the word list (ii) that marked That as a structurally ambiguous qualifier unit will be cleared by the computer.
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
例9:Jack met the patient the nurse the clinic had hired sent to the doctor.Example 9: Jack met the patient the nurse the clinic had hired sent to the doctor.
本例句中的sent有结构歧义。由于篇幅有限,仅给出词语列表(ii-a)和(ii-b)。The sentence in this sentence has structural ambiguity. Due to limited space, only word lists (ii-a) and (ii-b) are given.
词语列表(ii-a):Word list (ii-a):
JackJack metmet the patientthe patient the nursethe nurse the clinictheclinic
基本名词单元Basic noun unit 谓语动词单元Predicate verb unit 基本名词单元Basic noun unit 基本名词单元Basic noun unit 基本名词单元 Basic noun unit
11 22 33 44 55
had hiredhad hired sentsent toto the doctorthe doctor ..
谓语动词单元Predicate verb unit 谓语动词单元Predicate verb unit 介词单元Preposition unit 基本名词单元Basic noun unit 句号period
66 77 88 99 无编号No number
词语列表(ii-b):Word list (ii-b):
JackJack metmet the patientthe patient the nursethe nurse the clinictheclinic
基本名词单元Basic noun unit 谓语动词单元Predicate verb unit 基本名词单元Basic noun unit 基本名词单元Basic noun unit 基本名词单元 Basic noun unit
11 22 33 44 55
had hiredhad hired sentsent toto the doctorthe doctor ..
谓语动词单元Predicate verb unit 过去分词单元Past participle unit 介词单元Preposition unit 基本名词单元Basic noun unit 句号period
66 77 88 99 无编号No number
依据词语列表(ii-a)生成的A a-B a-C a联合系统,如下所示: Based on the word list (ii-a), the A a -B a -C a joint system is as follows:
Figure PCTCN2019100638-appb-000059
Figure PCTCN2019100638-appb-000059
B a={g[PREP](u)=to+<the doctor};
Figure PCTCN2019100638-appb-000060
B a ={g[PREP](u)=to+<the doctor};
Figure PCTCN2019100638-appb-000060
依据词语列表(ii-b)生成的A b-B b-C b联合系统,如下所示: The A b -B b -C b joint system generated according to the word list (ii-b) is as follows:
Figure PCTCN2019100638-appb-000061
Figure PCTCN2019100638-appb-000061
B b={g[VBN](u)=sent+<e,g[PREP](u)=to+<the doctor};C b={the nurse} B b ={g[VBN](u)=sent+<e,g[PREP](u)=to+<the doctor}; C b ={the nurse}
对经过整体插空操作之后保留下来的A b-B b-C b联合系统进行剩余名词检查的时候,可以采用概率结合句法规则的方法。经检查发现,C b系统的剩余名词the nurse,不是同位语可以采用的独立名词、不是非谓语动词的独立主格结构可以采用的独立名词、不是搭配冒号的文章标题经常采用的独立名词,等等。因此C b系统的剩余名词the nurse是不合理的剩余名词。A b-B b-C b联合系统有错误,舍弃。 When checking the remaining nouns of the A b -B b -C b joint system that is retained after the overall insertion operation, the method of probability combined with syntactic rules can be used. After inspection, it is found that the remaining noun the nurse of the C b system is not an independent noun that can be used by appositions, an independent noun that can be used by independent nominative structures that are not non-predicate verbs, and an independent noun that is often used in article titles that are not collocated with a colon, etc. . Therefore, the remaining noun the nurse in the C b system is an unreasonable remaining noun. The A b -B b -C b joint system has an error and is discarded.
A a-B a-C a联合系统对应的完整句法结构的整体插空过程,如图41所示。 The overall insertion process of the complete syntactic structure corresponding to the A a -B a -C a joint system is shown in Figure 41.
经过各个步骤,A a-B a-C a联合系统没有错误,保留。最终获得A a-B a-C a联合系统所对应的例句9的完整句法结构。以字符串形式,将该结果表达如下:[参见图17] After each step, the A a -B a -C a joint system has no errors and is reserved. Finally, the complete syntactic structure of example sentence 9 corresponding to the A a -B a -C a joint system is obtained. In the form of a string, the result is expressed as follows: [See Figure 17]
(ROOT(S(NP(NNP Jack))(VP(VBD met)(NP(NP(DT the)(NN patient))(SBAR(S(NP(NP(DT the)(NN nurse))(SBAR(S(NP(DT the)(NN clinic))(VP(VBD had)(VP(VBN hired))))))(VP(VBD sent)(PP(TO to)(NP(DT the)(NN doctor))))))))(..)))(ROOT(S(NP(NNP Jack))(VP(VBDmet)(NP(NP(DTthe)(NNpatient))(SBAR(S(NP(NP(DTthe)(NNnurse))(SBAR( S(NP(DT the)(NNclinic))(VP(VBDhad)(VP(VBNhired))))))(VP(VBDsent)(PP(TOto)(NP(DTthe)(NNdoctor )))))))))(..)))
在说明书的前半部分曾经提到例9。至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!本例句所包含的过去分词与谓语动词的一般过去式之间的结构歧义(本例句中的sent),是一种常见的结构歧义。Example 9 was mentioned in the first half of the description. As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence! The structural ambiguity between the past participle of this example sentence and the general past tense of the predicate verb (sent in this example sentence) is a common structural ambiguity.
由于篇幅有限,对下列例句仅作简要说明:Due to limited space, the following example sentences are only briefly explained:
例10:Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.Example 10: Jack met the boy the nurse the doctor the clinic had hired sent to the ward introduced to the patient.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
在说明书的前半部分曾经提到例10。例10与例9的计算机解析过程相似。Example 10 was mentioned in the first half of the description. The computer analysis process of Example 10 is similar to that of Example 9.
Figure PCTCN2019100638-appb-000062
Figure PCTCN2019100638-appb-000062
B 1={g[PREP,1](u)=to+<the ward,g[PREP,2](u)=to+<the patient};
Figure PCTCN2019100638-appb-000063
B 1 ={g[PREP,1](u)=to+<the ward,g[PREP,2](u)=to+<the patient};
Figure PCTCN2019100638-appb-000063
A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程,如图42所示。 The overall insertion process of the complete syntactic structure corresponding to the A 1 -B 1 -C 1 joint system is shown in Figure 42.
经过各个步骤,A 1-B 1-C 1联合系统没有错误,保留。最终获得A 1-B 1-C 1联合系统所对应的例句10的完整句法结构。以字符串形式,将该结果表达如下:[参见图19] After each step, the A 1 -B 1 -C 1 joint system has no errors and is reserved. Finally, the complete syntactic structure of example sentence 10 corresponding to the A 1 -B 1 -C 1 joint system is obtained. In the form of a string, the result is expressed as follows: [See Figure 19]
(ROOT(S(NP(NNP Jack))(VP(VBD met)(NP(NP(DT the)(NN boy))(SBAR(S(NP(NP(DT the)(NN nurse))(SBAR(S(NP(NP(DT the)(NN doctor))(SBAR(S(NP(DT the)(NN clinic))(VP(VBD had)(VP(VBN hired))))))(VP(VBD sent)(PP(TO to)(NP(DT the)(NN ward)))))))(VP(VBD introduced)(PP(TO to)(NP(DT the)(NN patient))))))))(..)))(ROOT(S(NP(NNP Jack))(VP(VBDmet)(NP(NP(DTthe)(NNboy))(SBAR(S(NP(NP(DTthe)(NNnurse))(SBAR( S(NP(NP(DT the)(NN doctor))(SBAR(S(NP(DTthe)(NNclinic))(VP(VBDhad)(VP(VBNhired))))))(VP(VBD sent)(PP(TO to)(NP(DTthe)(NNward)))))))(VP(VBD introduced)(PP(TO to)(NP(DTthe)(NNpatient))))) )))(..)))
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
例11:This is the malt the rat the cat the dog worried killed ate.Example 11: This is the malt the rat the cat the dog worried killed ate.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
在说明书的前半部分曾经提到例11。例11与例10的计算机解析过程相似。Example 11 was mentioned in the first half of the description. Example 11 is similar to the computer analysis process of Example 10.
Figure PCTCN2019100638-appb-000064
Figure PCTCN2019100638-appb-000064
Figure PCTCN2019100638-appb-000065
Figure PCTCN2019100638-appb-000065
A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程,如图43所示。 The overall insertion process of the complete syntactic structure corresponding to the A 1 -B 1 -C 1 joint system is shown in Figure 43.
经过各个步骤,A 1-B 1-C 1联合系统没有错误,保留。最终获得A 1-B 1-C 1联合系统所对应的例句11的完整句法结构。以字符串形式,将该结果表达如下:[参见图21] After each step, the A 1 -B 1 -C 1 joint system has no errors and is reserved. Finally, the complete syntactic structure of example sentence 11 corresponding to the A 1 -B 1 -C 1 joint system is obtained. In the form of a string, the result is expressed as follows: [See Figure 21]
(ROOT(S(NP(PRP This))(VP(VBZ is)(NP(NP(DT the)(NN malt))(SBAR(S(NP(NP(DT the)(NN rat))(SBAR(S(NP(NP(DT the)(NN cat))(SBAR(S(NP(DT the)(NN dog))(VP(VBD worried)))))(VP(VBD killed)))))(VP(VBD ate))))))(..)))(ROOT(S(NP(PRP This))(VP(VBZis)(NP(NP(DTthe)(NNmalt))(SBAR(S(NP(NP(DTthe)(NNrat))(SBAR( S(NP(NP(DTthe)(NN cat))(SBAR(S(NP(DTthe)(NNdog))(VP(VBD Worried)))))(VP(VBDkilled)))))( VP(VBD ate)))))))(..)))
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
例12:Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.Example 12: Part of the reason Charles Dickens loved his own novel was that it was rather closely modeled on his own life.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
在说明书的前半部分曾经提到例12。另一个例句“Part of the reason why Charles Dickens loved his own novel was that it was rather closely modeled on his own life.”与例12的计算机解析过程和结果相似。Example 12 was mentioned in the first half of the description. Another example of "Part of the reason why Charles Dickens loved his own novel was closely modeled on his own life." The computer analysis process and results in Example 12 are similar.
Figure PCTCN2019100638-appb-000066
Figure PCTCN2019100638-appb-000066
B 1={g[PREP](u)=on+<life};
Figure PCTCN2019100638-appb-000067
B 1 ={g[PREP](u)=on+<life};
Figure PCTCN2019100638-appb-000067
最终获得A 1-B 1-C 1联合系统所对应的例句11的完整句法结构。以字符串形式,将该结果表达如下:[参见图23] Finally, the complete syntactic structure of example sentence 11 corresponding to the A 1 -B 1 -C 1 joint system is obtained. In the form of a string, the result is expressed as follows: [See Figure 23]
(ROOT(S(NP(NP(NN Part))(PP(IN of)(NP(NP(DT the)(NN reason))(SBAR(S(NP(NNP Charles)(NNP Dickens))(VP(VBD loved)(NP(PRP$his)(JJ own)(NN novel))))))))(VP(VBD was)(SBAR(IN that)(S(NP(PRP it))(VP(VBD was)(VP(ADVP(RB rather)(RB closely))(VBN modeled)(PP(IN on)(NP(PRP$his)(JJ own)(NN life))))))))(..)))(ROOT(S(NP(NP(NN Part))(PP(IN of)(NP(NP(DTthe)(NN reason))(SBAR(S(NP(NNPCharles)(NNPDickens))(VP( VBDloved)(NP(PRP$his)(JJown)(NNnovel))))))))(VP(VBDwas)(SBAR(IN That)(S(NP(PRPit))(VP(VBD was)(VP(ADVP(RBrather)(RBclosely))(VBNmodeled)(PP(INon)(NP(PRP$his)(JJown)(NNlife))))))))(.. )))
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果!As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence!
例13:He said he wanted to improve the vineyard to allow visitors to enjoy local food and that in this way,he could make more money.Example 13: He said he wanted to improve the vineyard to allow visitors to enjoy local food and that in this way, he could make more money.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。本例句包含两个并列的宾语从句。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system. This example sentence contains two juxtaposed object clauses.
Figure PCTCN2019100638-appb-000068
Figure PCTCN2019100638-appb-000068
B 1={g[To VB,1](u,v)=to improve+<the vineyard+<e,g[To VB,2](u,v)=to allow+<visitors+<e,g[To VB,3](u,v)=to enjoy+<local food+<e,g[PREP](u)=in+<this way};
Figure PCTCN2019100638-appb-000069
B 1 ={g[To VB,1](u,v)=to improve+<the vineyard+<e,g[To VB,2](u,v)=to allow+<visitors+<e,g[To VB, 3](u,v)=to enjoy+<local food+<e,g[PREP](u)=in+<this way};
Figure PCTCN2019100638-appb-000069
例14:I will buy the car which my father needs and the bike which my brother wants.Example 14: I will buy the car which my father needs and the bike which my brother wants.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
句法结构修补,是与本申请方案中的句法规则检查同时进行的另一个环节。句法结构修补采用概率结合句法规则的方法或依存分析方法,将遗漏的复杂倒装句型、遗漏的远距离动宾关系、遗漏的远距离并列成分、遗漏的形容词做表语成分、遗漏的介词短语做表语成分、遗漏的不定式结构做宾语的补足语成分、遗漏的动名词-现在分词结构做宾语的补足语成分、遗漏的过去分词结构做宾语的补足语成分、遗漏的介词短语做宾语的补足语成分等句法信息重新挖掘出来,并且据此修补之前得出的句法结构中存在的缺陷。比如:在本例句中,the car和the bike并列作为will buy的宾语,the car和the bike被定语从句which my father needs隔离。通过句法结构修补这一环节,可以将the car和the bike合并成一个宾语元素。对于分别插空在the car和the bike之后的两个定语从句which my father needs和which my brother wants,将它们处理为在同一个宾语元素内部对两个基本名词单元的分别插空。另外,本例句中的and,属于“不用于连接句子的并列关联词单元”。Syntactic structure repair is another link that is carried out at the same time as the syntactic rule check in the proposal of this application. Syntactic structure repair adopts the method of probability combined with syntactic rules or the method of dependency analysis, and the missing complex inverted sentence patterns, missing long-distance verb-object relations, missing long-distance parallel components, missing adjectives as predicative components, and missing prepositions Phrases are used as predicative components, missing infinitive structures are used as complementary components of the object, missing gerund-present participle structures are used as complementary components of the object, missing past participle structures are used as complementary components of the object, and missing prepositional phrases are used Syntactic information such as the complement of the object is re-excavated, and the defects in the syntactic structure obtained before are repaired accordingly. For example: in this example, the car and the bike are juxtaposed as the object of will buy, and the car and the bike are separated by the attributive clause which my father needs. By patching this link through the syntactic structure, the car and the bike can be combined into one object element. For the two attributive clauses which are inserted respectively after the car and the bike, which my father needs and which my brother wants, they are treated as separate insertions of two basic noun units within the same object element. In addition, the and in this example sentence belongs to "coordinate related word units not used to connect sentences".
Figure PCTCN2019100638-appb-000070
Figure PCTCN2019100638-appb-000070
Figure PCTCN2019100638-appb-000071
C 1={the bike}经过句法结构修补得到:will buy the car and the bike。
Figure PCTCN2019100638-appb-000071
C 1 = {the bike} is repaired by the syntactic structure: will buy the car and the bike.
例15:Determining where we are in relation to our surroundings remains an essential skill for our survival.Example 15: Determining where we are in relation to our surroundings remains an essential skill for our survival.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
本例句中的in relation to带有结构歧义,一方面可以理解为in relation to是一个完整的是复 合介词,另一方面可以理解为in relation to是由介词短语in relation和介词to两个部分联合构成的一个整体。在如下的A 1-B 1-C 1联合系统中,是将in relation to作为复合介词来处理的。复合介词短语in relation to our surroundings充当从句的表语,是句子中的主干成分,但是为了便于计算机处理,按照申请方案的操作,复合介词短语in relation to our surroundings不计入矩阵中,可以在后面的句法结构修补环节中将in relation to our surroundings修复为从句的表语。 The in relation to in this example sentence has structural ambiguity. On the one hand, it can be understood that in relation to is a complete compound preposition. On the other hand, it can be understood that in relation to is a combination of the prepositional phrase in relation and the preposition to. Constitute a whole. In the following A 1 -B 1 -C 1 joint system, in relation to is treated as a compound preposition. The compound prepositional phrase in relation to our surroundings serves as the predicative of the clause and is the main component of the sentence. However, in order to facilitate computer processing, according to the operation of the application plan, the compound prepositional phrase in relation to our surroundings is not counted in the matrix, and can be followed The in relation to our surroundings is repaired as a clause predicative in the syntactic structure repair link.
Figure PCTCN2019100638-appb-000072
Figure PCTCN2019100638-appb-000072
B 1={g[VBG](u,v)=Determining+<f 1+<e,g[PREP,1](u)=in relation to+<our surroundings,g[PREP,2](u)=for+<our survival};
Figure PCTCN2019100638-appb-000073
B 1 ={g[VBG](u,v)=Determining+<f 1 +<e,g[PREP,1](u)=in relation to+<our surroundings,g[PREP,2](u)=for+ <our survival};
Figure PCTCN2019100638-appb-000073
例16:Tom washed and polished his car,after he gave his brother a present.Example 16: Tom washed and polished his car, after he gave his brother a present.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
本例句中的washed and polished是一个相邻并列的谓语动词组合单元,washed and polished构成一个谓语元素;gave是可接双宾语的动词,可以通过查询词典或统计的方式预先归纳并给出。Washed and polished in this example sentence is a combined unit of adjacent predicate verbs, washed and polished constitutes a predicate element; given is a verb that can accept double objects, which can be summarized and given in advance by querying a dictionary or statistics.
Figure PCTCN2019100638-appb-000074
Figure PCTCN2019100638-appb-000074
Figure PCTCN2019100638-appb-000075
Figure PCTCN2019100638-appb-000075
例17:That men the nurse the doctor the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't remarked upon by the press.Example 17: That men the nurse the clinic had hired sent to the ward introduced to the cleaners didn't bother the patients wasn't marked up by the press.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
在说明书的前半部分曾经提到例17。例17符合前文提到的Q模型,验证略。Example 17 was mentioned in the first half of the description. Example 17 conforms to the Q model mentioned above, the verification is omitted.
Figure PCTCN2019100638-appb-000076
Figure PCTCN2019100638-appb-000076
B 1={g[PREP,1](u)=to+<the ward,g[PREP,2](u)=to+<the cleaners, B 1 ={g[PREP,1](u)=to+<the ward,g[PREP,2](u)=to+<the cleaners,
g[PREP,3](u)=by+<the press};
Figure PCTCN2019100638-appb-000077
g[PREP,3](u)=by+<the press};
Figure PCTCN2019100638-appb-000077
A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程,如图44所示。 The overall insertion process of the complete syntactic structure corresponding to the A 1 -B 1 -C 1 joint system is shown in Figure 44.
经过各个步骤,A 1-B 1-C 1联合系统没有错误,保留。最终获得A 1-B 1-C 1联合系统所对应的例句17的完整句法结构。以字符串形式,将该结果表达如下:[参见图45] After each step, the A 1 -B 1 -C 1 joint system has no errors and is reserved. Finally, the complete syntactic structure of the sentence 17 corresponding to the A 1 -B 1 -C 1 joint system is obtained. In the form of a string, the result is expressed as follows: [See Figure 45]
(ROOT(S(SBAR(IN That)(S(NP(NP(NNS men))(SBAR(S(NP(NP(DT the)(NN nurse))(SBAR(S(NP(NP(DT the)(NN doctor))(SBAR(S(NP(DT the)(NN clinic))(VP(VBD had)(VP(VBN hired))))))(VP(VBD sent)(PP(TO to)(NP(DT the)(NN ward)))))))(VP(VBD introduced)(PP(TO to)(NP(DT the)(NNS cleaners)))))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(DT the)(NNS patients))))))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press))))))(..)))(ROOT(S(SBAR(IN That)(S(NP(NP(NNSmen))(SBAR(S(NP(NP(DTthe)(NNnurse))(SBAR(S(NP(NP(DTthe) (NN doctor))(SBAR(S(NP(DTthe)(NNclinic))(VP(VBDhad)(VP(VBNhired))))))(VP(VBDsent)(PP(TOto)( NP(DT the)(NNward))))))(VP(VBD introduced)(PP(TO)(NP(DTthe)(NNScleaners)))))))(VP(VBDdid)( RB n't)(VP(VBbother)(NP(DTthe)(NNSpatients))))))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon) (PP(IN by)(NP(DT the)(NN press))))))(..)))
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果![参见图46]As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence! [See Figure 46]
例18:That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't remarked upon by the press.Example 18: That men the cleaner introduced to the nurses the doctor the clinic had hired sent to the ward didn't bother the patients wasn't marked up by the press.
本例句可以通过如下的A 1-B 1-C 1联合系统获得正确的最终句法分析结果。 This example sentence can obtain the correct final syntactic analysis result through the following A 1 -B 1 -C 1 joint system.
在说明书的前半部分曾经提到例18。例18符合前文提到的Q模型,验证略。Example 18 was mentioned in the first half of the description. Example 18 conforms to the Q model mentioned above, the verification is omitted.
Figure PCTCN2019100638-appb-000078
Figure PCTCN2019100638-appb-000078
B 1={g[PREP,1](u)=to+<the nurses,g[PREP,2](u)=to+<the ward, B 1 ={g[PREP,1](u)=to+<the nurses,g[PREP,2](u)=to+<the ward,
g[PREP,3](u)=by+<the press};
Figure PCTCN2019100638-appb-000079
g[PREP,3](u)=by+<the press};
Figure PCTCN2019100638-appb-000079
A 1-B 1-C 1联合系统对应的完整句法结构的整体插空过程,如图47所示。 The overall insertion process of the complete syntactic structure corresponding to the A 1 -B 1 -C 1 joint system is shown in Figure 47.
经过各个步骤,A 1-B 1-C 1联合系统没有错误,保留。最终获得A 1-B 1-C 1联合系统所对应的例句18的完整句法结构。以字符串形式,将该结果表达如下:[参见图48] After each step, the A 1 -B 1 -C 1 joint system has no errors and is reserved. Finally, the complete syntactic structure of example sentence 18 corresponding to the A 1 -B 1 -C 1 joint system is obtained. In the form of a string, the result is expressed as follows: [See Figure 48]
(ROOT(S(SBAR(IN That)(S(NP(NP(NNS men))(SBAR(S(NP(DT the)(NN cleaner))(VP(VBD introduced)(PP(TO to)(NP(NP(DT the)(NNS nurses))(SBAR(S(NP(NP(DT the)(NN doctor))(SBAR(S(NP(DT the)(NN clinic))(VP(VBD had)(VP(VBN hired))))))(VP(VBD sent)(PP(TO to)(NP(DT the)(NN ward))))))))))))(VP(VBD did)(RB n't)(VP(VB bother)(NP(DT the)(NNS patients))))))(VP(VBD was)(RB n't)(VP(VBN remarked)(ADVP(RP upon)(PP(IN by)(NP(DT the)(NN press))))))(..)))(ROOT(S(SBAR(IN That)(S(NP(NP(NNSmen))(SBAR(S(NP(DTthe)(NNcleaner))(VP(VBDintroduced)(PP(TOto)(NP (NP(DT the)(NNS nurses))(SBAR(S(NP(NP(DTthe)(NN doctor))(SBAR(S(NP(DTthe)(NNclinic))(VP(VBDhad)( VP(VBN hired)))))))(VP(VBD sent)(PP(TO to)(NP(DTthe)(NNward)))))))))))))(VP(VBDdid)( RB n't)(VP(VBbother)(NP(DTthe)(NNSpatients))))))(VP(VBDwas)(RBn't)(VP(VBNremarked)(ADVP(RPupon) (PP(IN by)(NP(DT the)(NN press))))))(..)))
至本专利申请提交日——2019年3月22日,Berkeley Parser和Stanford Parser对于本例句给出的都是错误结果![参见图49]As of the filing date of this patent application-March 22, 2019, Berkeley Parser and Stanford Parser gave wrong results for this example sentence! [See Figure 49]
本发明的总结:Summary of the invention:
本专利申请的方案,以解决计算机自然语言处理中的具体技术难题为目标,将计算机执行的词法分析、句法分析、语义分析三个方面有机地统一起来,使这三个方面之间互相参照、互相约束、互相纠正。在本专利申请的方案中,发明人建立了一套全新的适用于计算机处理的刻画语句的数学模型。所述的刻画语句的数学模型,结构清晰准确,表达能力和实用性很强,模型所包含的每一个公式的长度都是有限的,符合数学和计算机科学的自然规律,有助于提高计算机处理自然语言的准确率。在此基础之上,发明人给出了一套使用计算机分析语句的句法结构的方法。所述的使用计算机分析语句的句法结构的方法,符合自然规律,适用范围广,准确率高,计算量非常大,建议采用分布式计算。特别指出,凡是在本专利申请的说明书中出现的句子,全都可以使用本专利申请的方案获得正确的句法分析结果。本专利申请的方案可以划分为如下4个计算区域:The solution of this patent application aims to solve specific technical problems in computer natural language processing, and organically unifies the three aspects of computer-executed lexical analysis, syntactic analysis, and semantic analysis, so that these three aspects are cross-referenced. Restrict and correct each other. In the scheme of this patent application, the inventor established a new set of mathematical models suitable for computer processing to describe sentences. The mathematical model describing the sentence has a clear and accurate structure, strong expressive ability and practicability. The length of each formula contained in the model is limited, conforms to the natural laws of mathematics and computer science, and helps improve computer processing The accuracy of natural language. On this basis, the inventor gave a set of methods for using computers to analyze the syntactic structure of sentences. The method of using a computer to analyze the syntactic structure of a sentence conforms to the laws of nature, has a wide range of application, high accuracy, and a very large amount of calculation. Distributed computing is recommended. In particular, it is pointed out that all sentences that appear in the specification of this patent application can use the scheme of this patent application to obtain correct syntactic analysis results. The scheme of this patent application can be divided into the following 4 calculation areas:
第1个计算区域:α区域The first calculation area: α area
在α区域中,读取待解析的语句数据结构,并针对待解析的语句数据结构进行预处理操作;读取待解析的经过前述的预处理的语句数据结构;对于不存在谓语动词单元的待解析语句,改为采用概率结合句法规则的方法或依存分析方法对该语句进行分析,且取前述的分析结果作为计算机的最终分析结果;对于存在谓语动词单元的待解析语句,生成相关的词语列表,并生成与前述词语列表对应的谓语向量、辅助向量、剩余名词向量,进而生成与前述词语列表对应的A-B-C联合系统。In the α area, read the sentence data structure to be parsed, and perform preprocessing operations on the sentence data structure to be parsed; read the sentence data structure to be parsed after the aforementioned preprocessing; for the sentence data structure to be parsed without the predicate verb unit Analyze the sentence, use probability combined with syntactic rules or dependency analysis method to analyze the sentence, and take the aforementioned analysis result as the final analysis result of the computer; for the sentence to be parsed with predicate verb unit, generate a list of related words , And generate the predicate vector, auxiliary vector, and remaining noun vector corresponding to the aforementioned word list, and then generate the ABC joint system corresponding to the aforementioned word list.
需要注意的是:针对每一张词语列表(i),采用概率结合句法规则的方法或依存分析方法,将特殊疑问句、省略句、局部倒装句等检查出来并对其谓语做形态处理,以便后续的操作。It should be noted that: for each word list (i), use probability combined with syntactic rules or dependency analysis methods to check out special interrogative sentences, ellipsis sentences, partial inverted sentences, etc., and perform morphological processing on their predicates in order to Follow-up operations.
例如:When did you leave the house?For example: When did you leave the house?
处理为陈述句形态是:When+<you+<(did)leave+<the house+<(.)The form of processing as declarative sentence is: When+<you+<(did)leave+<the house+<(.)
第2个计算区域:β区域The second calculation area: β area
在β区域中,对α区域生成的任意一个A-B-C联合系统,进行整体插空操作、句法规则检查和句法结构修补、剩余名词检查。这个计算区域,充分运用自然规律,通过筛选和检查,生成待解析语句的大致句法结构,即生成待解析语句的句法结构的基本框架。In the β area, for any A-B-C joint system generated in the α area, the overall insertion operation, the syntax rule check, the syntax structure repair, and the remaining noun check are performed. This calculation area makes full use of natural laws, through screening and inspection, to generate the general syntactic structure of the sentence to be parsed, that is, the basic framework for generating the syntactic structure of the sentence to be parsed.
进而运用组合数学中的乘法原理,穷尽前述α区域生成的每一张词语列表对应的全部A-B-C联合系统;进一步地,通过对每一个A-B-C联合系统中的全部的相关向量进行排列组合,穷尽每一个A-B-C联合系统对应的全部插空方案;再进一步地,对每一个插空方案反复执行β区域的计算,直至穷尽每一个插空方案所涉及到的全部空位和全部拼合向量。Furthermore, using the principle of multiplication in combinatorial mathematics, all ABC joint systems corresponding to each word list generated in the α region are exhausted; further, by permuting and combining all the correlation vectors in each ABC joint system, each one is exhausted. All the blanking schemes corresponding to the ABC joint system; further, the calculation of the β area is repeated for each blanking scheme until all the vacancies and all the combined vectors involved in each blanking scheme are exhausted.
β区域的所有环节和算法,参见说明书的附图50。其中,A、B、C三个环节构成A-B-C联合系统;D=ψ(A,B,C)是整体插空和排除自然数逆序的算法;E={σ(1),σ(2),……,σ(m)}是句法规则检查和句法结构修补所需的各个分项的算法;F=Φ(NP)是剩余名词检查的算法;G=ε(↑↓)是前述各种穷尽和前述反复执行β区域的算法。For all the links and algorithms of the β area, please refer to Figure 50 of the specification. Among them, the three links of A, B, and C constitute the ABC joint system; D=ψ(A,B,C) is an algorithm for overall insertion and elimination of the inverse order of natural numbers; E={σ(1),σ(2),... …,Σ(m)} is the algorithm for each sub-item required for syntactic rule checking and syntactic structure repair; F=Φ(NP) is the algorithm for checking remaining nouns; G=ε(↑↓) is the aforementioned exhaustive sum The aforementioned algorithm for the β region is repeatedly executed.
判断剩余名词是否合理,是本申请方案中控制计算机句法分析过程的技术平衡点。β区域保留下来的A-B-C联合系统,刻画了待解析语句的大致句法结构,即刻画了待解析语句的句法结构的基本框架。Judging whether the remaining nouns are reasonable or not is the technical balance point for controlling the computer syntax analysis process in the proposal of this application. The A-B-C joint system preserved in the β area depicts the general syntactic structure of the sentence to be parsed, and immediately depicts the basic framework of the syntactic structure of the sentence to be parsed.
第3个计算区域:γ区域The third calculation area: γ area
在γ区域中,以β区域保留下来的若干个A-B-C联合系统所刻画的待解析语句的句法结构 的基本框架作为标准,在采用概率结合句法规则的方法或依存分析方法对待解析语句进行分析而获得的数量充足的完整句法结构中,找出符合前述标准的且最合适的完整句法结构。In the γ region, the basic framework of the syntactic structure of the sentence to be parsed described by the several ABC joint systems retained in the β region is used as the standard, and obtained by analyzing the sentence to be parsed using the method of probability combined with syntactic rules or the dependency analysis method Among the sufficient number of complete syntactic structures, find the most suitable complete syntactic structure that meets the aforementioned criteria.
第4个计算区域:δ区域The fourth calculation area: δ area
在δ区域中,以γ区域生成的待解析语句的若干个完整句法结构为基础,采用语义处理的方法,找出经过前述的句法结构约束的最合适的语义关系,进而将该语义关系对应的前述的完整句法结构作为最终的句法分析结果,并输出该结果。所述语义处理的方法,通常需要以句法结构对语义关系的充分约束作为前提。所述以句法结构对语义关系的充分约束作为前提,是指由句法结构来初步决定语句中的每一个词语的含义以及各个词语含义之间的相互搭配关系。In the δ region, based on several complete syntactic structures of the sentence to be parsed generated in the γ region, the method of semantic processing is adopted to find the most suitable semantic relationship subject to the aforementioned syntactic structure constraints, and then the semantic relationship corresponds to The foregoing complete syntactic structure is used as the final syntactic analysis result, and the result is output. The semantic processing method generally needs to be based on the sufficient restriction of the syntactic structure on the semantic relationship. The premise that the syntactic structure fully restricts the semantic relationship means that the syntactic structure preliminarily determines the meaning of each word in the sentence and the mutual collocation relationship between the meanings of the words.
说明:上述4个计算区域所涉及到的从α至β的希腊文小写字母和从A至G的英文大写字母,是顺序标记,代表了各个计算区域、各个环节、各个算法的操作顺序。Explanation: The Greek lowercase letters from α to β and the English capital letters from A to G involved in the above four calculation areas are sequence marks, which represent the operation sequence of each calculation area, each link, and each algorithm.
以上所述仅为本发明的优选实施例,并不用于限制本发明,对于本领域技术人员而言,本发明可以有各种改动和变化。凡是在本发明的精神和原理之内所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not used to limit the present invention. For those skilled in the art, the present invention can have various modifications and changes. Any modification, equivalent replacement, improvement, etc., made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

  1. 一种自然语言句法分析的方法,包括:A method of natural language syntactic analysis, including:
    S1、读取待解析的语句数据结构,并针对待解析的语句数据结构进行预处理操作;S1. Read the sentence data structure to be parsed, and perform preprocessing operations on the sentence data structure to be parsed;
    S2、针对每一个词语列表(i),读取待解析的经过前述的预处理的语句数据结构:如果在待解析的语句中存在谓语动词单元,那么生成词语列表(ii);如果在待解析的语句中不存在谓语动词单元,那么改为采用概率结合句法规则的方法或依存分析方法对该语句进行分析,取前述分析的结果作为计算机的最终分析结果,进而清除对应的词语列表(i)且不生成词语列表(ii);S2. For each word list (i), read the sentence data structure to be parsed after the aforementioned preprocessing: if there is a predicate verb unit in the sentence to be parsed, then generate a word list (ii); There is no predicate verb unit in the sentence, then the sentence is analyzed by the method of probability combined with syntactic rules or the dependency analysis method, and the result of the aforementioned analysis is used as the final analysis result of the computer, and then the corresponding word list is cleared (i) And does not generate a word list (ii);
    S3、针对每一个谓语元素,生成对应的谓语向量;所述谓语向量包括并列引导语元素、从属引导语元素、主语元素、谓语元素、第一位置宾语元素、第二位置宾语元素;S3. For each predicate element, generate a corresponding predicate vector; the predicate vector includes a parallel guide element, a subordinate guide element, a subject element, a predicate element, a first position object element, and a second position object element;
    其中,所述谓语元素是对应的谓语动词单元,或对应的相邻并列的谓语动词组合单元;所述谓语元素编号是对应的谓语动词单元编号,或对应的相邻并列的谓语动词组合单元编号;Wherein, the predicate element is the corresponding predicate verb unit, or the corresponding adjacent predicate verb combination unit; the predicate element number is the corresponding predicate verb unit number, or the corresponding adjacent predicate verb combination unit number ;
    其中,所述并列引导语元素的可能取值是编号小于对应的谓语元素编号的用于连接句子的并列关联词单元之一,或空单元;不用于连接句子的并列关联词单元,不能作为并列引导语元素的可能取值;Wherein, the possible value of the coordinate introductory element is one of the coordinate related word units used to connect sentences with a number less than the corresponding predicate element number, or an empty unit; the coordinate related word unit that is not used to connect sentences cannot be used as a coordinate introductory The possible values of the element;
    其中,所述从属引导语元素的可能取值是编号小于对应的谓语元素编号的从属关联词单元之一,或编号小于对应的谓语元素编号的相邻并列的从属关联词组合单元之一,或编号小于对应的谓语元素编号的疑问词单元之一,或编号小于对应的谓语元素编号的相邻并列的疑问词组合单元之一,或空单元;Wherein, the possible value of the subordinate introductory element is one of the subordinate related word units whose number is smaller than the corresponding predicate element number, or one of the adjacent and juxtaposed subordinate related word combination units whose number is smaller than the corresponding predicate element number, or the number is smaller than One of the interrogative unit of the corresponding predicate element number, or one of the adjacent interrogative combination units with a number smaller than the corresponding predicate element number, or an empty unit;
    其中,所述主语元素的可能取值是编号小于对应的谓语元素编号的基本名词单元之一,或编号小于对应的谓语元素编号的相邻并列的基本名词组合单元之一,或编号小于对应的谓语元素编号的不定式元素对应的不定式向量之一,或编号小于对应的谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的谓语元素编号小的谓语元素对应的谓语向量之一,或空单元;Wherein, the possible value of the subject element is one of the basic noun units whose number is less than the corresponding predicate element number, or one of the adjacent and parallel basic noun combination units whose number is less than the corresponding predicate element number, or the number is less than the corresponding One of the infinitive vectors corresponding to the infinitive element of the predicate element number, or a gerund whose number is less than the corresponding predicate element number-the gerund corresponding to the present participle element-one of the present participle vectors, or one of the corresponding predicate element numbers One of the predicate vectors corresponding to the predicate element, or an empty unit;
    其中,所述第一位置宾语元素的可能取值是编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的不定式元素对应的不定式向量之一,或编号大于对应的谓语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的谓语元素编号大的谓语元素对应的谓语向量之一,或空单元;谓语元素对应的符合前述要求的表语成分,也当作第一位置宾语元素处理;Wherein, the possible value of the object element in the first position is one of the basic noun units whose number is greater than the number of the corresponding predicate element and less than the number of the first predicate element that appears after the predicate element, or the number is greater than the corresponding predicate element The element number is less than one of the adjacent basic noun combination units of the first predicate element number that appears after the predicate element, or the number is greater than the corresponding predicate element number and less than the first predicate element that appears after the predicate element. One of the infinitive vectors corresponding to the infinitive element of a predicate element number, or a gerund whose number is greater than the number of the corresponding predicate element and less than the number of the first predicate element that appears after the predicate element-the verb corresponding to the present participle element Noun-one of the present participle vectors, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding predicate element, or an empty unit; the predicative component corresponding to the predicate element that meets the aforementioned requirements is also regarded as the first position object element deal with;
    其中,如果对应的谓语元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的第一位置宾语元素是一个基本名词单元或一个相邻并列的基本名词组合单元,那么所述第二位置宾语元素的可能取值是编号大于对应的第一位置宾语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的第一位置宾语元素编号且小于在所述谓语元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的谓语元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的谓语元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的第一位置宾语元素既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么所述 第二位置宾语元素的取值是空单元;如果对应的谓语元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述第二位置宾语元素的可能取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;Among them, if the corresponding predicate element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the corresponding object element in the first position is a basic noun unit or an adjacent basic noun Combination unit, then the possible value of the object element at the second position is one of the basic noun units with a number greater than the number of the corresponding object element at the first position and less than the number of the first predicate element that appears after the predicate element, or One of the adjacent basic noun combination units whose number is greater than the number of the corresponding object element in the first position and less than the number of the first predicate element that appears after the predicate element, or corresponds to the predicate element whose number is greater than the corresponding predicate element One of the predicate vectors of, or an empty unit; if the corresponding predicate element is a unit composed of a verb that can accept a double object or a verb that can be combined with an object complement, and the corresponding object element in the first position is neither a basic noun If the unit is not an adjacent basic noun combination unit, then the value of the object element in the second position is an empty unit; if the corresponding predicate element is a verb that is complemented by neither a double object nor an unacceptable object combined with an object The possible value of the object element in the second position is an empty unit; among them, the verb of the double-object can be accessed or the verb of the complementary object combined with the object complement and the unacceptable double-object Verbs that cannot accept an object combined with an object complement can be summarized and given in advance by querying a dictionary or statistically; define the verbs that can accept double objects or the verbs that can accept an object combined with the object complement and the said both. Verbs that cannot accept double objects and cannot accept an object combined with an object complement will help reduce the complexity of calculations;
    S4、针对每一个不定式元素,生成对应的不定式向量;针对每一个动名词-现在分词元素,生成对应的动名词-现在分词向量;针对每一个过去分词元素,生成对应的过去分词向量;针对每一个介词元素,生成对应的介词向量;根据所述不定式元素、不定式第一位置宾语元素、不定式第二位置宾语元素的可能取值,获取每一个不定式元素对应的不定式向量的所有可能取值;根据所述动名词-现在分词元素、动名词-现在分词第一位置宾语元素、动名词-现在分词第二位置宾语元素的可能取值,获取每一个动名词-现在分词元素对应的动名词-现在分词向量的所有可能取值;根据所述过去分词元素、过去分词宾语元素的可能取值,获取每一个过去分词元素对应的过去分词向量的所有可能取值;根据所述介词元素、介词宾语元素的可能取值,获取每一个介词元素对应的介词向量的所有可能取值;S4. For each infinitive element, generate a corresponding infinitive vector; for each gerund-present participle element, generate a corresponding gerund-present participle vector; for each past participle element, generate a corresponding past participle vector; For each preposition element, a corresponding preposition vector is generated; according to the possible values of the infinitive element, the infinitive first-position object element, and the infinitive second-position object element, obtain the infinitive vector corresponding to each infinitive element All possible values of; According to the possible values of the gerund-present participle element, gerund-present participle object element in the first position, gerund-present participle object element in the second position, obtain each gerund-present participle The gerund corresponding to the element-all possible values of the present participle vector; according to the possible values of the past participle element and the past participle object element, all possible values of the past participle vector corresponding to each past participle element are obtained; State the possible values of preposition elements and preposition object elements, and obtain all possible values of the preposition vector corresponding to each preposition element;
    其中,所述不定式向量包括不定式元素、不定式第一位置宾语元素、不定式第二位置宾语元素;Wherein, the infinitive vector includes infinitive elements, infinitive first-position object elements, and infinitive second-position object elements;
    所述不定式元素是对应的不定式动词单元,或对应的相邻并列的不定式动词组合单元;所述不定式元素编号是对应的不定式动词单元编号,或对应的相邻并列的不定式动词组合单元编号;The infinitive element is the corresponding infinitive verb unit, or the corresponding adjacent infinitive verb combination unit; the infinitive element number is the corresponding infinitive verb unit number, or the corresponding adjacent infinitive infinitive Verb combination unit number;
    所述不定式第一位置宾语元素的可能取值是编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的不定式元素对应的不定式向量之一,或编号大于对应的不定式元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的不定式元素编号大的谓语元素对应的谓语向量之一,或空单元;不定式元素对应的符合前述要求的表语成分,也当作不定式第一位置宾语元素处理;The possible value of the object element in the first position of the infinitive is one of the basic noun units whose number is greater than the number of the corresponding infinitive element and less than the number of the first predicate element that appears after the infinitive element, or the number is greater than the corresponding The number of the infinitive element of and is less than one of the adjacent basic noun combination units of the first predicate element number that appears after the infinitive element, or the number is greater than the number of the corresponding infinitive element and less than the number of the infinitive element One of the infinitive vectors corresponding to the infinitive element of the first predicate element number that appears after the element, or one of the infinitive vectors whose number is greater than the corresponding infinitive element number and less than the number of the first predicate element that appears after the infinitive element Noun-the gerund corresponding to the present participle element-one of the present participle vectors, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding infinitive element, or an empty unit; the infinitive element corresponds to the predicative that meets the aforementioned requirements Component, also treated as an object element in the first position of the infinitive;
    如果对应的不定式元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的不定式第一位置宾语元素是一个基本名词单元或一个相邻并列的基本名词组合单元,那么所述不定式第二位置宾语元素的可能取值是编号大于对应的不定式第一位置宾语元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的不定式第一位置宾语元素编号且小于在所述不定式元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的不定式元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的不定式元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的不定式第一位置宾语元素既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么所述不定式第二位置宾语元素的取值是空单元;如果对应的不定式元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述不定式第二位置宾语元素的取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾 语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;If the corresponding infinitive element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the object element in the first position of the corresponding infinitive is a basic noun unit or an adjacent basic Noun combination unit, then the possible value of the object element in the second position of the infinitive is a basic number greater than the number of the object element in the first position of the corresponding infinitive and less than the number of the first predicate element that appears after the infinitive element One of the noun units, or one of the adjacent basic noun combination units whose number is greater than the number of the object element in the first position of the corresponding infinitive and less than the number of the first predicate element that appears after the infinitive element, or one of the corresponding basic noun combination units One of the predicate vectors corresponding to the predicate element with the larger number of the infinitive element, or an empty unit; if the corresponding infinitive element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and corresponds to The object element in the first position of the infinitive is neither a basic noun unit nor an adjacent basic noun combination unit, then the value of the object element in the second position of the infinitive is an empty unit; if the corresponding infinitive element is A unit composed of verbs that can neither accept a double object nor a non-acceptable object combined with an object complement, then the value of the object element in the second position of the infinitive is an empty unit; among them, the verb that can accept a double object or can The verbs that receive the object combined with the object complement and the verbs that can neither receive the double object nor the object combined with the object complement can be summarized and given in advance by querying the dictionary or statistics; define the said acceptable double object The verbs or the verbs that can accept the object and the object complement and the verbs that can not accept the double object or the unacceptable object and the object complement can help reduce the complexity of calculation;
    其中,所述动名词-现在分词向量包括动名词-现在分词元素、动名词-现在分词第一位置宾语元素、动名词-现在分词第二位置宾语元素;Wherein, the gerund-present participle vector includes gerund-present participle element, gerund-present participle first position object element, gerund-present participle second position object element;
    所述动名词-现在分词元素是对应的动名词-现在分词单元,或对应的相邻并列的动名词-现在分词组合单元;所述动名词-现在分词元素编号是对应的动名词-现在分词单元编号,或对应的相邻并列的动名词-现在分词组合单元编号;The gerund-present participle element is the corresponding gerund-present participle unit, or the corresponding adjacent gerund-present participle combination unit; the gerund-present participle element number is the corresponding gerund-present participle Unit number, or corresponding adjacent parallel gerund-present participle combination unit number;
    所述动名词-现在分词第一位置宾语元素的可能取值是编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的不定式元素对应的不定式向量之一,或编号大于对应的动名词-现在分词元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的动名词-现在分词元素对应的动名词-现在分词向量之一,或比对应的动名词-现在分词元素编号大的谓语元素对应的谓语向量之一,或空单元;动名词-现在分词元素对应的符合前述要求的表语成分,也当作动名词-现在分词第一位置宾语元素处理;The possible value of the object element in the first position of the gerund-present participle is a basic number greater than the number of the corresponding gerund-present participle element and less than the number of the first predicate element that appears after the gerund-present participle element One of the noun units, or one of the adjacent basic noun combination units whose numbers are greater than the corresponding gerund-present participle element number and less than the number of the first predicate element that appears after the gerund-present participle element, or One of the infinitive vectors corresponding to the infinitive element whose number is greater than the corresponding gerund-present participle element number and less than the first predicate element number that appears after the gerund-present participle element, or the number is greater than the corresponding gerund -The present participle element number is less than the gerund with the number of the first predicate element that appears after the present participle element-the gerund corresponding to the present participle element-one of the present participle vectors, or more than the corresponding gerund- One of the predicate vectors corresponding to the predicate element with the higher number of the present participle element, or an empty unit; the predicative component corresponding to the gerund-present participle element that meets the aforementioned requirements is also treated as the object element in the first position of the gerund-present participle;
    如果对应的动名词-现在分词元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的动名词-现在分词第一位置宾语元素是一个基本名词单元或一个相邻并列的基本名词组合单元,那么所述动名词-现在分词第二位置宾语元素的可能取值是编号大于对应的动名词-现在分词第一位置宾语元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的动名词-现在分词第一位置宾语元素编号且小于在所述动名词-现在分词元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的动名词-现在分词元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的动名词-现在分词元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,且对应的动名词-现在分词第一位置宾语元素既不是一个基本名词单元又不是一个相邻并列的基本名词组合单元,那么所述动名词-现在分词第二位置宾语元素的取值是空单元;如果对应的动名词-现在分词元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述动名词-现在分词第二位置宾语元素的取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;If the corresponding gerund-present participle element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the corresponding gerund-present participle first position object element is a basic noun unit or An adjacent basic noun combination unit, then the possible value of the object element in the second position of the gerund-present participle is that the number is greater than the number of the object element in the first position of the corresponding gerund-present participle and is smaller than the object element number in the first position of the gerund -One of the basic noun units of the first predicate element number that appears after the present participle element, or the number is greater than the corresponding gerund-number of the object element in the first position of the present participle and less than the number that appears after the gerund-present participle element One of the adjacent and juxtaposed basic noun combination units of the first predicate element number, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding gerund-present participle element, or an empty unit; if the corresponding gerund- The present participle element is a unit composed of a verb that can accept a double object or a verb that can accept an object combined with an object complement, and the corresponding gerund-the object element in the first position of the present participle is neither a basic noun unit nor an adjacent juxtaposition The basic noun combination unit of, then the value of the object element in the second position of the gerund-present participle is the empty unit; if the corresponding gerund-present participle element is composed of both unacceptable double objects and unacceptable objects combined with object complements The unit of the verb constituted by the verb, then the value of the object element in the second position of the gerund-present participle is the empty unit; wherein the verb that can accept the double object or the verb of the object complement and the Verbs that can neither accept double objects nor accept objects combined with object complements can be summarized and given in advance by querying a dictionary or statistically; define the verbs that can accept double objects or accept objects combined with object complements Verbs and the mentioned verbs that can neither accept double objects nor combine object complements can help reduce the complexity of calculations;
    其中,所述过去分词向量包括过去分词元素、过去分词宾语元素;Wherein, the past participle vector includes past participle elements and past participle object elements;
    所述过去分词元素是对应的过去分词单元,或对应的相邻并列的过去分词组合单元;所述过去分词元素编号是对应的过去分词单元编号,或对应的相邻并列的过去分词组合单元编号;The past participle element is the corresponding past participle unit, or the corresponding adjacent past participle combination unit; the past participle element number is the corresponding past participle unit number, or the corresponding adjacent past participle combination unit number ;
    如果对应的过去分词元素是由可接双宾语的动词或可接宾语结合宾语补足语的动词构成的单元,那么所述过去分词宾语元素的可能取值是编号大于对应的过去分词元素编号且小于在所 述过去分词元素之后出现的第一个谓语元素编号的基本名词单元之一,或编号大于对应的过去分词元素编号且小于在所述过去分词元素之后出现的第一个谓语元素编号的相邻并列的基本名词组合单元之一,或比对应的过去分词元素编号大的谓语元素对应的谓语向量之一,或空单元;如果对应的过去分词元素是由既不可接双宾语又不可接宾语结合宾语补足语的动词构成的单元,那么所述过去分词宾语元素的取值是空单元;其中,所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,可以通过查询词典或统计的方式预先归纳并给出;界定所述的可接双宾语的动词或可接宾语结合宾语补足语的动词以及所述的既不可接双宾语又不可接宾语结合宾语补足语的动词,有助于降低计算的复杂度;If the corresponding past participle element is a unit consisting of a verb that can accept a double object or a verb that can be combined with an object complement, then the possible value of the past participle object element is that the number is greater than the number of the corresponding past participle element and less than One of the basic noun units of the first predicate element number that appears after the past participle element, or the number greater than the corresponding past participle element number and less than the first predicate element number that appears after the past participle element One of the basic noun combination units that are adjacent to each other, or one of the predicate vectors corresponding to the predicate element with a larger number than the corresponding past participle element, or an empty unit; if the corresponding past participle element is composed of neither a double object nor an object Combining the unit composed of the verb of the object complement, then the value of the object element of the past participle is the empty unit; wherein, the verb that can be accessed by the double object or the verb that can be combined with the object complement and the verb of the object complement. Verbs that cannot accept a double object or an object combined with an object complement can be summarized and given in advance by querying a dictionary or statistics; define the verbs that can accept a double object or a verb that can accept an object combined with an object complement and The described verbs that can neither accept double objects nor accept objects combined with object complements help to reduce the complexity of calculation;
    其中,所述介词向量包括过介词元素、介词宾语元素;Wherein, the preposition vector includes a preposition element and a preposition object element;
    所述介词元素是对应的介词单元,或对应的相邻并列的介词组合单元;所述介词元素编号是对应的介词单元编号,或对应的相邻并列的介词组合单元编号;The preposition element is a corresponding preposition unit, or a corresponding adjacent preposition combination unit; the preposition element number is a corresponding preposition unit number, or a corresponding adjacent preposition combination unit number;
    所述介词宾语元素的可能取值是编号大于对应的介词元素编号且在所述介词元素之后出现的第一个基本名词单元,或编号大于对应的介词元素编号且在所述介词元素之后出现的第一个相邻并列的基本名词组合单元,或编号大于对应的介词元素编号且在所述介词元素之后出现的第一个动名词-现在分词向量,或编号大于对应的介词元素编号且在所述介词元素之后出现的第一个不定式向量,或编号大于对应的介词元素编号且与所述介词元素编号的数字顺序相邻的介词元素对应的介词向量,或比对应的介词元素编号大的谓语元素对应的谓语向量之一,或空单元;The possible value of the preposition object element is the first basic noun unit whose number is greater than the number of the corresponding preposition element and appears after the preposition element, or the number is greater than the number of the corresponding preposition element and appears after the preposition element The first adjacent basic noun combination unit, or the first gerund-present participle vector whose number is greater than the corresponding preposition element number and appears after the preposition element, or the number is greater than the corresponding preposition element number and is The first infinitive vector that appears after the preposition element, or the preposition vector corresponding to the preposition element whose number is greater than the corresponding preposition element number and is adjacent to the number sequence of the preposition element number, or the preposition vector that is greater than the corresponding preposition element number One of the predicate vectors corresponding to the predicate element, or an empty unit;
    S5、将不定式向量、动名词-现在分词向量、过去分词向量和介词向量,统称为辅助向量;针对待解析语句中的每一个辅助向量,分别任取一个该辅助向量对应的可能取值,从而获得一组全体辅助向量对应的可能取值;将前述的一组全体辅助向量对应的可能取值看作一个集合,称为一个辅助系统;S5. The infinitive vector, the gerund-present participle vector, the past participle vector and the preposition vector are collectively referred to as auxiliary vectors; for each auxiliary vector in the sentence to be parsed, any possible value corresponding to the auxiliary vector is selected. In this way, a set of possible values corresponding to all auxiliary vectors is obtained; the possible values corresponding to the aforementioned set of all auxiliary vectors are regarded as a set, which is called an auxiliary system;
    S6、任意给定一个规范主干系统,搭配一个对应的辅助系统;将前述的辅助系统中的每一个辅助向量内部的每一个排除向量之外的元素全都替换为对应的编号;替换编号之后,检查该辅助系统;如果在该辅助系统中出现下述不合理的情况,那么清除该辅助系统;如果在该辅助系统中没有出现下述不合理的情况,那么保留该辅助系统;将保留下来的辅助系统称为规范辅助系统;接下来提到的谓语向量,都是指前述给定的规范主干系统中的谓语向量;S6. Given a standard backbone system arbitrarily, collocation with a corresponding auxiliary system; replace every element outside the excluded vector in each auxiliary vector in the aforementioned auxiliary system with the corresponding number; after replacing the number, check The auxiliary system; if the following unreasonable situation occurs in the auxiliary system, then the auxiliary system is removed; if the following unreasonable situation does not occur in the auxiliary system, then the auxiliary system is retained; the remaining auxiliary system The system is called the specification auxiliary system; the predicate vectors mentioned in the following all refer to the predicate vectors in the aforementioned canonical backbone system;
    S6.1、如果在两个不同的辅助向量中出现相同的编号或相同的谓语向量或相同的不定式向量或相同的动名词-现在分词向量或相同的介词向量,那么该辅助系统不合理,清除该辅助系统;S6.1. If the same number or the same predicate vector or the same infinitive vector or the same gerund-present participle vector or the same preposition vector appears in two different auxiliary vectors, then the auxiliary system is unreasonable, Clear the auxiliary system;
    S6.2、如果一个辅助向量内部和一个谓语向量内部同时出现相同的编号或相同的谓语向量或相同的不定式向量或相同的动名词-现在分词向量,那么该辅助系统不合理,清除该辅助系统;S6.2. If the same number or the same predicate vector or the same infinitive vector or the same gerund-present participle vector appears in an auxiliary vector and in a predicate vector at the same time, then the auxiliary system is unreasonable and the auxiliary system is removed. system;
    S6.3、如果在一个辅助向量内部出现两个顺序逆反的编号,那么该辅助系统不合理,清除该辅助系统;S6.3. If two numbers in reverse order appear in an auxiliary vector, then the auxiliary system is unreasonable, and the auxiliary system is cleared;
    S6.4、将两两之间存在元素代入关系的任意两个辅助向量,全都进行等量代换;如果出现向量之间的代入交叉矛盾,那么该辅助系统不合理,清除该辅助系统;如果在等量代换之后出现两个顺序逆反的编号,那么该辅助系统不合理,清除该辅助系统;S6.4. Substituting any two auxiliary vectors that have elements between the two into the relationship, all of which are equivalently substituted; if there is a cross-substitution contradiction between the vectors, then the auxiliary system is unreasonable, and the auxiliary system is cleared; if If two numbers in reverse order appear after equal substitution, then the auxiliary system is unreasonable. Clear the auxiliary system;
    S6.5、将两两之间存在元素代入关系的任意一个辅助向量和任意一个谓语向量,全都进行等量代换;如果出现向量之间的代入交叉矛盾,那么该辅助系统不合理,清除该辅助系统;如 果在等量代换之后出现两个顺序逆反的编号,那么该辅助系统不合理,清除该辅助系统;S6.5. Substituting any auxiliary vector and any predicate vector that have elements between the two elements into the relationship, all of which are equivalently substituted; if there is a contradiction in the substitution between the vectors, then the auxiliary system is unreasonable, and the Auxiliary system; if two numbers in reverse order appear after equal substitution, then the auxiliary system is unreasonable, and the auxiliary system is cleared;
    S6.6、检查过后,恢复到检查之前的原状,以备后续的各项操作使用;S6.6. After the inspection, restore to the original state before the inspection for use in subsequent operations;
    S7、生成剩余名词系统和A-B-C联合系统;S7. Generate residual noun system and A-B-C joint system;
    S7.1、任意给定一个规范主干系统和一个与该规范主干系统对应的规范辅助系统,将没有进入前述的规范主干系统和规范辅助系统的剩余的基本名词单元和相邻并列的基本名词组合单元的全体看作一个集合,将这个集合称为一个剩余名词系统;将剩余名词系统中的每一个元素,称为一个剩余名词元素;一个剩余名词元素的编号,是该剩余名词元素对应的基本名词单元或基本名词组合单元的编号;针对每一个剩余名词元素,生成一个对应的剩余名词向量;所述剩余名词向量,仅包括剩余名词元素,即剩余名词向量与剩余名词元素是一一对应的;S7.1. Given a canonical backbone system and a canonical auxiliary system corresponding to the canonical backbone system, the remaining basic noun units and adjacent parallel basic noun combinations that do not enter the aforementioned canonical backbone system and standard auxiliary system The whole unit is regarded as a set, which is called a residual noun system; each element in the residual noun system is called a residual noun element; the number of a residual noun element is the basic corresponding to the residual noun element The number of the noun unit or the basic noun combination unit; for each remaining noun element, a corresponding remaining noun vector is generated; the remaining noun vector includes only the remaining noun elements, that is, the remaining noun vector and the remaining noun elements are in one-to-one correspondence ;
    S7.2、按照S7.1所述的方式互相对应的一个规范主干系统、一个规范辅助系统和一个剩余名词系统,就构成一个A-B-C联合系统;S7.2. A normative backbone system, a normative auxiliary system and a residual noun system corresponding to each other in the manner described in S7.1 constitute an A-B-C joint system;
    S8、任意给定一个A-B-C联合系统,针对该A-B-C联合系统执行整体插空操作;每一个空位,在一次整体插空操作中至多可以接收一个向量,也可以不接收任何向量,即无插空操作;在整体插空操作之前,清除空单元;在整体插空操作中,将构造空位且接收其他向量进入该空位的向量,记为接收向量;将插入其他向量的空位的向量,记为插入向量;S8. For any given ABC joint system, perform the overall blanking operation for the ABC joint system; each slot can receive at most one vector in an overall blanking operation, or no vector, that is, no blanking operation ; Before the overall blanking operation, clear the empty unit; in the overall blanking operation, the vector that constructs a space and receives other vectors into the space is recorded as the received vector; the vector that inserts the space of other vectors is recorded as the inserted vector ;
    S8.1、在前述的A-B-C联合系统中,对每一个向量内部的每一个可以用其他向量进行代换的元素,全都使用对应的向量进行等量代换,无论对应的向量是谓语向量还是辅助向量;执行前述的等量代换,直至将每一个向量内部的其他向量全都替换完毕;经过前述的等量代换,如果某一个向量被代入另一个向量内部,那么取消代入另一个向量内部的向量在A-B-C联合系统中的原有位置,从而令经过前述的等量代换操作的两个向量完全融合;通过等量代换,将A-B-C联合系统中原有的向量,全都转化为相互之间不存在元素代入关系的新的向量;以等量代换为界限,将等量代换之前的A-B-C联合系统中的向量称为第I类向量,将等量代换之后的A-B-C联合系统中的向量称为第II类向量;显然,某一个第I类向量和某一个第II类向量,可以是同一个向量,即一个向量在等量代换的之前和之后可以不发生变化;S8.1. In the aforementioned ABC joint system, for each element in each vector that can be replaced by other vectors, all the corresponding vectors are used for equivalent substitution, regardless of whether the corresponding vector is a predicate vector or an auxiliary vector Vector; perform the aforementioned equal substitution until all the other vectors in each vector are replaced; after the aforementioned equal substitution, if a vector is substituted into another vector, then cancel the substitution into the other vector The original position of the vector in the ABC joint system, so that the two vectors after the aforementioned equal substitution operation are completely integrated; through equal substitution, all the original vectors in the ABC joint system are transformed into mutual differences. There is a new vector in which the elements are substituted; taking equal substitution as the limit, the vector in the ABC joint system before the equal substitution is called the I type vector, and the vector in the ABC joint system after the equal substitution It is called a type II vector; obviously, a certain type I vector and a certain type II vector can be the same vector, that is, a vector may not change before and after the equivalent substitution;
    S8.2、在A-B-C联合系统中进行第一轮整体插空操作:任取一个第II类向量ω,作为第一轮整体插空操作的接收向量;按照预定的方向逐一标注向量ω中的每一个元素的顺序值;按照已经标注的顺序值,任取向量ω中的第i个元素,仅在该元素的第一侧构造唯一的空位;造空之后,任取一个排除前述的向量ω之外的第II类向量μ,作为第一轮整体插空操作的插入向量;以整体插空的方式,将向量μ插入前述第i个元素对应的空位,进而生成一个新的向量,将这个新生成的向量记为[ω] i+<μ;将A-B-C联合系统中经过整体插空操作而获得的向量,统称为第III类向量;每一轮整体插空标注的顺序值,仅限于在这一轮整体插空过程中使用; S8.2. Perform the first round of the overall blanking operation in the ABC joint system: take any type II vector ω as the receiving vector of the first round of the overall blanking operation; label each of the vectors ω one by one according to a predetermined direction The order value of an element; according to the order value that has been marked, the i-th element in the vector ω can be selected, and a unique space is constructed only on the first side of the element; after the space is created, any one that excludes the aforementioned vector ω The second type of vector μ outside is used as the insertion vector for the first round of the overall blanking operation; in the way of overall blanking, the vector μ is inserted into the space corresponding to the aforementioned i-th element, and then a new vector is generated. The generated vector is denoted as [ω] i +<μ; the vectors obtained through the overall blanking operation in the ABC joint system are collectively referred to as type III vectors; the order value of the overall blanking labeling in each round is limited to this Used in a round of overall plug-in process;
    S8.3、在A-B-C联合系统中进行第二轮整体插空操作:取第III类向量[ω] i+<μ作为第二轮整体插空操作的接收向量;按照预定的方向,对从向量[ω] i+<μ中的第一侧第一个元素开始直到向量[ω] i+<μ包含的向量μ内部的第二侧第一个元素为止的每一个元素,标注顺序值;向量[ω] i+<μ中的其余元素,全都不标注顺序值;按照已经标注的顺序值,取第j个元素,仅在该元素的第一侧构造唯一的空位;造空之后,任取一个之前任何步骤都没有使用过的第II类向量ξ,作为第二轮整体插空操作的插入向量;以整体插空的方式将向量ξ插入前述第j个元素对应的空位,进而生成一个新的向量,将新生成的向量记为[[ω] i\μ] j+<ξ;或者 S8.3. Perform the second round of the overall blanking operation in the ABC joint system: take the type III vector [ω] i +<μ as the receiving vector of the second round of the overall blanking operation; according to the predetermined direction, the slave vector Each element from the first element on the first side in [ω] i +<μ to the first element on the second side inside the vector μ contained in the vector [ω] i +<μ is marked with an order value; vector The rest of the elements in [ω] i +<μ are not marked with the order value; according to the marked order value, the j-th element is taken, and only a unique space is constructed on the first side of the element; after the space is created, you can take any A type II vector ξ that has not been used in any previous steps is used as the insertion vector for the second round of the overall blanking operation; the vector ξ is inserted into the space corresponding to the j-th element in the overall blanking manner, and then a new , The newly generated vector is marked as [[ω] i \μ] j +<ξ; or
    取第III类向量[ω] i+<μ作为第二轮整体插空操作的接收向量;按照预定的方向对向量 [ω] i+<μ中的每一个元素标注顺序值;按照已经标注的顺序值,任取向量[ω] i+<μ中的第k个元素,仅在该元素的第一侧构造唯一的空位;造空之后,任取一个之前任何步骤都没有使用过的第II类向量ξ,作为第二轮整体插空操作的插入向量;以整体插空的方式将向量ξ插入前述第k个元素对应的空位,进而生成一个新的向量,将新生成的向量记为([ω] i+<μ) k+<ξ;按照该方法进行整体插空操作,如果在执行完S8.4之后出现雷同的结果,那么将雷同的结果合并为一个结果,即将雷同的拼合向量合并为一个拼合向量; Take the type III vector [ω] i +<μ as the receiving vector for the second round of the overall blanking operation; label each element in the vector [ω] i +<μ according to the predetermined direction; Sequence value, any take the kth element in the vector [ω] i +<μ, and only construct a unique vacancy on the first side of the element; after creating a vacancy, take any second II that has not been used in any previous steps The class vector ξ is used as the insertion vector for the second round of the overall blanking operation; the vector ξ is inserted into the space corresponding to the k-th element in the overall blanking method, and then a new vector is generated, and the newly generated vector is recorded as ( [ω] i +<μ) k +<ξ; According to this method, the overall interpolation operation is performed. If the same result appears after the execution of S8.4, then the same result will be merged into one result, that is, the same merged vector Merge into a flat vector;
    S8.4、在前述的A-B-C联合系统中,按照下述的方式反复执行S8.3给出的整体插空操作:取前一轮整体插空操作获得的新生成的向量,作为新一轮整体插空操作的接收向量,且任取一个之前任何步骤都没有使用过的第II类向量,作为新一轮整体插空操作的插入向量;反复执行整体插空操作,直至将所有的第II类向量全部插入空位完毕,记为穷尽全部插入向量,且在穷尽全部插入向量的同时获得一个第III类向量;将穷尽全部插入向量的同时获得的第III类向量,记为拼合向量;S8.3共包含2种整体插空操作方法,对于S8.3中的整体插空操作方法的选择,前后步骤要保持一致;将每一轮整体插空操作所采用的第II类向量按顺序依次排列,直至穷尽全部插入向量,就构成了A-B-C联合系统对应的一个插空方案;反复执行从S8.2到S8.4的操作,穷尽插空方案所涉及到的每一轮插空操作中的每一个接收向量内部的每一个元素对应的空位,即穷尽插空方案所涉及到的每一个拼合向量;S8.4. In the aforementioned ABC joint system, the overall insertion operation given in S8.3 is repeatedly executed in the following way: take the newly generated vector obtained from the previous round of overall insertion operation as a new round of overall Insert the received vector of the null operation, and any type II vector that has not been used in any previous steps is used as the insertion vector of the new round of the overall null operation; repeat the overall insert operation until all the II types After all the vectors are inserted into the space, it is recorded as the exhaustion of all the insertion vectors, and a type III vector is obtained while all the vectors are inserted. The type III vector obtained while inserting the exhaustion into the vector is recorded as the combined vector; S8.3 Contains 2 types of overall blanking operation methods. For the selection of the overall blanking operation method in S8.3, the previous and subsequent steps should be consistent; arrange the type II vectors used in each round of the overall blanking operation in order, Until all the insertion vectors are exhausted, a blanking scheme corresponding to the ABC joint system is formed; the operations from S8.2 to S8.4 are repeated to exhaust every round of blanking operations involved in the blanking scheme Receiving the space corresponding to each element in the vector, that is, each combined vector involved in the exhaustive insertion scheme;
    S8.5、检查S8.4生成的结果:替换成编号;如果在一个拼合向量内部出现两个顺序逆反的编号,那么该拼合向量不合理,清除该拼合向量;如果在一个拼合向量内部没有出现顺序逆反的编号,那么该拼合向量是合理的,保留该拼合向量;S8.5. Check the result generated by S8.4: replace with a number; if two numbers in reverse order appear in a combined vector, then the combined vector is unreasonable, clear the combined vector; if it does not appear in a combined vector If the number is reversed, the combined vector is reasonable, and the combined vector is retained;
    S8.6、在将前述的A-B-C联合系统中的第I类向量全都转化为第II类向量之后,首先将该A-B-C联合系统中的每一个第II类向量全都替换成对应的编号,然后执行前述的整体插空操作;按照任意给定的一个该A-B-C联合系统对应的插空方案,在每一轮整体插空操作中,在接收向量内部的每一个元素的第一侧全都构造一个空位,然后开始筛选合理空位;比较插入向量内部的左侧或右侧第一个编号与待筛选的空位对应的左侧或右侧相邻编号之间的大于或小于关系,且仅选取具有避免出现编号顺序逆反的大于或小于关系的空位作为合理空位,进行插空操作,其余空位都作为不合理空位,无插空操作;如果接收向量内部不存在合理空位,那么说明前述给定的插空方案不合理,结束该插空方案,并更换其他的插空方案;采用该方法进行优化,可以将获得的拼合向量直接记为合理的拼合向量,无需进行编号顺序逆反检查;S8.6. After converting all the type I vectors in the aforementioned ABC joint system into type II vectors, first replace each type II vector in the ABC joint system with corresponding numbers, and then execute the aforementioned The overall blanking operation; according to any given blanking scheme corresponding to the ABC joint system, in each round of the overall blanking operation, a blank is constructed on the first side of each element in the receiving vector, and then Start to filter reasonable gaps; compare the greater or less than relationship between the first number on the left or right side inserted into the vector and the adjacent number on the left or right corresponding to the gap to be filtered, and only select the number sequence to avoid occurrence Inversely, the space that is greater than or less than the relationship is regarded as a reasonable space, and the empty space is inserted, and the remaining space is regarded as an unreasonable space, and no space is inserted; if there is no reasonable space in the receiving vector, then the above-mentioned empty insertion scheme is unreasonable , End the blanking scheme, and replace other blanking schemes; using this method for optimization, the obtained combined vector can be directly recorded as a reasonable combined vector, without the need to reverse the numbering order;
    S8.7、运用组合数学中的乘法原理,穷尽每一张词语列表(ii)对应的全部A-B-C联合系统;进一步地,通过对每一个A-B-C联合系统中的全体第II类向量进行排列组合,穷尽每一个A-B-C联合系统对应的全部插空方案;再进一步地,对每一个插空方案反复执行从S8.2至S8.6的操作,直至穷尽每一个插空方案对应的全部拼合向量;S8.7. Use the principle of multiplication in combinatorics to exhaust all ABC joint systems corresponding to each word list (ii); further, by permuting and combining all type II vectors in each ABC joint system, exhaustive All the blanking schemes corresponding to each ABC joint system; further, the operations from S8.2 to S8.6 are repeated for each blanking scheme until all the stitching vectors corresponding to each blanking scheme are exhausted;
    S8.8、句法规则检查:使用自然语言的句法规则,采用概率结合句法规则的方法或依存分析方法,对保留下来的每一个合理的拼合向量及其对应的A-B-C联合系统进行检查;前述的使用句法规则进行检查,应当包括运用事件宾语动词和非事件宾语动词的规则进行检查;所述事件宾语动词,是指自然语言中的只能以事件作为宾语而不能以人或事物作为宾语的动词;所述非事件宾语动词,是指自然语言中的只能以人或事物作为宾语而不能以事件作为宾语的动词;事件宾语动词和非事件宾语动词,可以通过查询词典或统计的方式预先归纳并给出;S8.8. Syntactic rule check: Use the syntactic rules of natural language, and use the method of probability combined with syntactic rules or dependency analysis method to check each reasonable combination vector and its corresponding ABC joint system; the aforementioned use Syntactic rules inspection should include the use of event object verbs and non-event object verbs; the event object verbs refer to verbs in natural language that can only use events as objects but not people or things as objects; The non-event object verbs refer to verbs in natural language that can only take people or things as objects, but not events; event object verbs and non-event object verbs can be summarized in advance by querying a dictionary or statistics Give
    S8.9、在执行S8.8的同时,进行句法结构修补;所述的句法结构修补,采用概率结合句法 规则的方法或依存分析方法,将遗漏的句法信息重新挖掘出来,且据此修补之前得出的句法结构中存在的缺陷;还可以通过句法结构修补这一环节,对前述保留下来的A-B-C联合系统中的每一个向量在句法结构方面的主要地位和次要地位进行区分和调整;S8.9. While executing S8.8, repair the syntactic structure; the said syntactic structure repair uses the method of probability combined with syntactic rules or the method of dependency analysis to re-excavate the missing syntactic information, and repair the previous Defects in the obtained syntactic structure; this link can also be repaired through the syntactic structure, distinguishing and adjusting the primary and secondary status of each vector in the syntactic structure of the reserved ABC joint system;
    S8.10、剩余名词检查:采用概率结合句法规则的方法或依存分析方法,找出合理的剩余名词和不合理的剩余名词,且将包含不合理的剩余名词的A-B-C联合系统舍弃;S8.10. Residual noun check: use probability combined with syntactic rules or dependency analysis method to find reasonable residual nouns and unreasonable residual nouns, and discard the A-B-C joint system containing unreasonable residual nouns;
    S9、以经过S8保留下来的若干个A-B-C联合系统所刻画的待解析语句的句法结构的基本框架作为标准,在采用概率结合句法规则的方法或依存分析方法对待解析语句进行分析而获得的数量充足的完整句法结构中,找出符合前述标准的且最合适的完整句法结构;S9. Take the basic framework of the syntactic structure of the sentence to be parsed described by the several ABC joint systems retained by S8 as the standard, and use the method of probability combined with syntactic rules or the dependency analysis method to analyze the sentence to be parsed to obtain sufficient numbers Among the complete syntactic structures of, find the most suitable complete syntactic structure that meets the aforementioned criteria;
    S10、以S9生成的若干个完整句法结构为基础,采用语义处理的方法,找出经过前述的句法结构约束的最合适的语义关系,进而将该语义关系对应的前述的完整句法结构作为最终的句法分析结果。S10. Based on several complete syntactic structures generated by S9, using semantic processing methods to find the most suitable semantic relationship subject to the aforementioned syntactic structure constraints, and then take the aforementioned complete syntactic structure corresponding to the semantic relationship as the final Syntactic analysis results.
  2. 根据权利要求1所述的方法,其特征在于,所述的预处理操作,包括:The method according to claim 1, wherein the preprocessing operation comprises:
    S1.1、对于待解析的语句中的每个词的词性,进行计算机自动分析和标注,生成词法分析的结果;S1.1. For the part of speech of each word in the sentence to be parsed, automatic computer analysis and labeling are performed to generate the result of lexical analysis;
    S1.2、对于待解析的语句中的谓语动词、基本名词短语、基本形容词短语、基本副词短语等自然语言的要素,进行计算机自动分析和标注;对于相邻并列的名词短语、相邻并列的形容词短语、相邻并列的副词短语等自然语言要素,进行计算机自动分析和标注;S1.2. For natural language elements such as predicate verbs, basic noun phrases, basic adjective phrases, and basic adverb phrases in the sentence to be parsed, automatic computer analysis and labeling; for adjacent noun phrases and adjacent parallel noun phrases Natural language elements such as adjective phrases and adjacent adverb phrases are automatically analyzed and labeled by computer;
    S1.3、将各种相邻并列的词性单元合并,且将合并之后的相邻并列的词性单元记为一个对应的词性单元;S1.3. Combine various adjacent part-of-speech units, and record the merged adjacent part-of-speech units as a corresponding part-of-speech unit;
    S1.4、针对S1.2和S1.3所述的待解析的语句中的语言信息,开列出一张词语列表,记为词语列表(i);词语列表(i)包括词语、词语对应的属性、词语在句子中的位置信息、标点符号及其在句子中的位置信息;S1.4. For the language information in the sentences to be parsed as described in S1.2 and S1.3, open a list of words and write them as word list (i); word list (i) includes words and word correspondences The attributes of the words, the position information of the words in the sentence, punctuation marks and their position in the sentence;
    S1.5、针对词法分析可能产生的多种不同的结果,运用组合数学的相关方法,生成多张不同的词语列表(i),以便容纳多种结构歧义;针对前述生成的多张不同的词语列表(i),分别采用不同的编号加以区分;在所述的预处理操作中,放宽对词法分析结果的限制,将由结构歧义导致的多种不同的词法分析结果通过多张不同的词语列表(i)保留下来,留给后续的句法分析环节和语义处理环节加以辨别和筛选,即通过后续的句法分析环节和语义处理环节对多种不同的词法分析结果加以约束,从而增大最终选取正确的词法分析结果的可能性;S1.5. For the various possible results of lexical analysis, use combinatorial mathematics related methods to generate multiple different word lists (i) to accommodate multiple structural ambiguities; for the multiple different words generated above List (i) is distinguished by different numbers; in the preprocessing operation, the restrictions on the lexical analysis results are relaxed, and multiple different lexical analysis results caused by structural ambiguities are passed through multiple different word lists ( i) Keep it and leave it to the subsequent syntactic analysis link and semantic processing link for identification and screening, that is, through the subsequent syntactic analysis link and semantic processing link, the various lexical analysis results are restricted, thereby increasing the final selection of the correct The possibility of lexical analysis results;
    S1.6、针对每一个词语列表(i),采用概率结合句法规则的方法或依存分析方法,将疑问句、省略句、倒装句等特殊句式检查出来,并对其谓语做相应的形态处理,以便后续步骤的处理;S1.6. For each word list (i), use probability combined with syntactic rules or dependency analysis methods to check out special sentence patterns such as interrogative sentences, omission sentences, and inverted sentences, and perform corresponding morphological processing of their predicates , In order to deal with the subsequent steps;
    S1.7、针对每一个词语列表(i),剔除副词单元、形容词单元、相邻并列的副词单元、相邻并列的形容词单元、感叹词单元、非句子形态的简单插入语成分、小品词单元、相邻并列的小品词单元、无结构歧义的相邻并列的限定词单元、混合修饰单元等待解析的语句中的杂质成分;剔除非句子形态的简单插入语单元两侧的逗号等待解析的语句包含的次要的标点符号。S1.7. For each word list (i), remove adverb units, adjective units, adjacent adverb units, adjacent adjective units, interjection units, simple parentheses in non-sentence forms, and particle units , Adjacent juxtaposed particle units, adjacent juxtaposed qualifier units without structural ambiguity, mixed modifier units, impurity components in sentences waiting to be resolved; commas on both sides of non-sentence simple parentheses units waiting to be resolved are removed Contains minor punctuation marks.
  3. 根据权利要求1所述的方法,其特征在于,所述步骤S2包括:The method according to claim 1, wherein the step S2 comprises:
    S2.1、针对每一个词语列表(i),读取待解析的经过前述的预处理的语句数据结构,所述经过前述的预处理的语句数据结构包括如下信息:S2.1. For each word list (i), read the sentence data structure that has been preprocessed to be parsed, and the sentence data structure that has been preprocessed includes the following information:
    (1),用于连接句子的并列关联词单元;(1) Coordinate related word units used to connect sentences;
    (2),不用于连接句子的并列关联词单元;不用于连接句子的并列关联词单元的作用是连 接句子内部的各种并列成分;(2) The coordinate related word unit not used to connect sentences; the function of the coordinate related word unit not used to connect sentences is to connect various coordinate components within the sentence;
    (3),谓语动词单元、从属关联词单元、基本名词单元、不定式动词单元、动名词-现在分词单元、过去分词单元、介词单元、相邻并列的谓语动词组合单元、相邻并列的从属关联词组合单元、相邻并列的基本名词组合单元、相邻并列的不定式动词组合单元、相邻并列的动名词-现在分词组合单元、相邻并列的过去分词组合单元、相邻并列的介词组合单元;(3) Predicate verb unit, subordinate related word unit, basic noun unit, infinitive verb unit, gerund-present participle unit, past participle unit, preposition unit, adjacent predicate verb combination unit, adjacent parallel subordinate related words Combination unit, adjacent parallel basic noun combination unit, adjacent parallel infinitive verb combination unit, adjacent parallel gerund-present participle combination unit, adjacent parallel past participle combination unit, adjacent parallel preposition combination unit ;
    (4),疑问词单元、相邻并列的疑问词组合单元、有结构歧义的限定词单元;(4) Interrogative unit, adjacent interrogative combination unit, and structurally ambiguous qualifier unit;
    (5),包含谓语动词单元的插入语成分;(5), including the parenthesis component of the predicate verb unit;
    (6),主要的标点符号;(6), the main punctuation marks;
    S2.2、针对前述的S2.1中的语句数据结构,生成词语列表(ii);词语列表(ii)包括前述的词语、前述的词语对应的属性、依据自然语言的行文顺序对前述的词语按照从小到大的数字顺序标注的编号、主要的标点符号。S2.2. Generate a word list (ii) for the sentence data structure in the aforementioned S2.1; the word list (ii) includes the aforementioned words, the attributes corresponding to the aforementioned words, and the comparison of the aforementioned words according to the natural language sequence The numbers and main punctuation marks are marked in descending order of numbers.
  4. 根据权利要求1所述的方法,其特征在于,所述步骤S3包括:The method according to claim 1, wherein the step S3 comprises:
    S3.1、根据所述谓语元素、并列引导语元素、从属引导语元素、主语元素、第一位置宾语元素、第二位置宾语元素的可能取值,获取每一个谓语元素对应的谓语向量的所有可能取值;所述谓语向量包括并列引导语元素、从属引导语元素、主语元素、谓语元素、第一位置宾语元素、第二位置宾语元素;S3.1. Obtain all the predicate vectors corresponding to each predicate element according to the possible values of the predicate element, the parallel guide element, the subordinate guide element, the subject element, the first position object element, and the second position object element Possible values; the predicate vector includes a parallel guide element, a subordinate guide element, a subject element, a predicate element, a first-position object element, and a second-position object element;
    S3.2、针对待解析语句中的每一个谓语向量,分别任取一个该谓语向量对应的可能取值,从而获得一组全体谓语向量对应的可能取值;将前述的一组全体谓语向量对应的可能取值按照固定顺序排列,构成一个n行6列矩阵;将前述的一个n行6列矩阵,称为一个主干系统;S3.2. For each predicate vector in the sentence to be parsed, choose any possible value corresponding to the predicate vector to obtain a set of possible values corresponding to the entire predicate vector; correspond to the aforementioned set of all predicate vectors The possible values of is arranged in a fixed order to form a matrix of n rows and 6 columns; the aforementioned matrix of n rows and 6 columns is called a backbone system;
    S3.3、将任意给定的一个主干系统中的每一个谓语向量内部的每一个排除向量之外的元素全都替换为对应的编号;替换编号之后,检查该主干系统;如果在该主干系统中出现下述不合理的情况,那么清除该主干系统;如果在该主干系统中没有出现下述不合理的情况,那么保留该主干系统;将保留下来的主干系统称为规范主干系统:S3.3. Replace every element outside of each predicate vector in any given backbone system with a corresponding number; after replacing the number, check the backbone system; if in the backbone system If the following unreasonable conditions occur, then the backbone system should be cleared; if the following unreasonable conditions do not occur in the backbone system, then the backbone system should be retained; the remaining backbone system is called the standardized backbone system:
    S3.3.1、检查前述的主干系统:对比词语列表(ii),如果存在没有进入该主干系统的用于连接句子的并列关联词单元或从属关联词单元或相邻并列的从属关联词组合单元,那么该主干系统不合理,清除该主干系统;S3.3.1. Check the aforementioned backbone system: compare the word list (ii), if there is a parallel related word unit or subordinate related word unit or adjacent parallel subordinate related word combination unit for connecting sentences that does not enter the main system, then the main The system is unreasonable, clear the backbone system;
    S3.3.2、检查前述的主干系统:如果在两个不同的谓语向量中出现相同的编号或相同的谓语向量或相同的不定式向量或相同的动名词-现在分词向量,那么该主干系统不合理,清除该主干系统;S3.3.2. Check the aforementioned backbone system: If the same number or the same predicate vector or the same infinitive vector or the same gerund-present participle vector appears in two different predicate vectors, then the backbone system is unreasonable To clear the backbone system;
    S3.3.3、检查前述的主干系统:如果在一个谓语向量内部出现两个顺序逆反的编号,那么该主干系统不合理,清除该主干系统;S3.3.3. Check the aforementioned backbone system: if there are two numbers in reverse order in a predicate vector, then the backbone system is unreasonable, and the backbone system is cleared;
    S3.3.4、检查前述的主干系统:将两两之间存在元素代入关系的任意两个谓语向量,全都进行等量代换;如果出现向量之间的代入交叉矛盾,那么该主干系统不合理,清除该主干系统;如果在等量代换之后出现两个顺序逆反的编号,那么该主干系统不合理,清除该主干系统;S3.3.4. Check the aforementioned backbone system: replace any two predicate vectors with elements in the relationship between them, all of which are replaced by equal amounts; if there is a cross contradiction between the substitutions between the vectors, then the backbone system is unreasonable. Clear the backbone system; if two numbers in reverse order appear after equal substitutions, then the backbone system is unreasonable, and the backbone system is cleared;
    S3.3.5、检查过后,恢复到检查之前的原状,以备后续的各项操作使用。S3.3.5. After the inspection, return to the original state before the inspection for use in subsequent operations.
  5. 根据权利要求4所述的方法,其特征在于,在执行S3.2的过程中,同步执行S3.3的检查程序,阻止不合理的主干系统的生成。The method according to claim 4, wherein in the process of executing S3.2, the inspection program of S3.3 is executed synchronously to prevent the generation of an unreasonable backbone system.
PCT/CN2019/100638 2019-03-22 2019-08-14 Method for syntactic parsing of natural language WO2020191993A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910224013 2019-03-22
CN201910224013.X 2019-03-22

Publications (1)

Publication Number Publication Date
WO2020191993A1 true WO2020191993A1 (en) 2020-10-01

Family

ID=67190451

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/100638 WO2020191993A1 (en) 2019-03-22 2019-08-14 Method for syntactic parsing of natural language

Country Status (2)

Country Link
CN (1) CN110020434B (en)
WO (1) WO2020191993A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328848A (en) * 2022-03-16 2022-04-12 北京金山数字娱乐科技有限公司 Text processing method and device
CN117609518A (en) * 2024-01-17 2024-02-27 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020434B (en) * 2019-03-22 2021-02-12 北京语自成科技有限公司 Natural language syntactic analysis method
CN110399936A (en) * 2019-08-06 2019-11-01 北京先声智能科技有限公司 It is a kind of for training English Grammar to correct mistakes the text data generation method of model
CN112686024B (en) * 2020-12-31 2023-12-22 竹间智能科技(上海)有限公司 Syntax analysis method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015342A1 (en) * 2002-02-15 2004-01-22 Garst Peter F. Linguistic support for a recognizer of mathematical expressions
CN102945230A (en) * 2012-10-17 2013-02-27 刘运通 Natural language knowledge acquisition method based on semantic matching driving
CN103927298A (en) * 2014-04-25 2014-07-16 秦一男 Natural language syntactic structure analyzing method and device based on computer
CN104156353A (en) * 2014-08-22 2014-11-19 秦一男 Computer-based method and device for analyzing natural language syntactic structures
CN107301172A (en) * 2017-06-22 2017-10-27 秦男 Data processing method and storage medium
CN108197107A (en) * 2017-12-29 2018-06-22 秦男 Data processing method
CN110020434A (en) * 2019-03-22 2019-07-16 北京语自成科技有限公司 A kind of method of natural language syntactic analysis

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6952666B1 (en) * 2000-07-20 2005-10-04 Microsoft Corporation Ranking parser for a natural language processing system
US9104780B2 (en) * 2013-03-15 2015-08-11 Kamazooie Development Corporation System and method for natural language processing
CN106030568B (en) * 2014-04-29 2018-11-06 乐天株式会社 Natural language processing system, natural language processing method and natural language processing program
CN104360994A (en) * 2014-12-04 2015-02-18 科大讯飞股份有限公司 Natural language understanding method and natural language understanding system
WO2017015231A1 (en) * 2015-07-17 2017-01-26 Fido Labs, Inc. Natural language processing system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015342A1 (en) * 2002-02-15 2004-01-22 Garst Peter F. Linguistic support for a recognizer of mathematical expressions
CN102945230A (en) * 2012-10-17 2013-02-27 刘运通 Natural language knowledge acquisition method based on semantic matching driving
CN103927298A (en) * 2014-04-25 2014-07-16 秦一男 Natural language syntactic structure analyzing method and device based on computer
CN104156353A (en) * 2014-08-22 2014-11-19 秦一男 Computer-based method and device for analyzing natural language syntactic structures
CN107301172A (en) * 2017-06-22 2017-10-27 秦男 Data processing method and storage medium
CN108197107A (en) * 2017-12-29 2018-06-22 秦男 Data processing method
CN110020434A (en) * 2019-03-22 2019-07-16 北京语自成科技有限公司 A kind of method of natural language syntactic analysis

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328848A (en) * 2022-03-16 2022-04-12 北京金山数字娱乐科技有限公司 Text processing method and device
CN117609518A (en) * 2024-01-17 2024-02-27 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure
CN117609518B (en) * 2024-01-17 2024-04-26 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure

Also Published As

Publication number Publication date
CN110020434B (en) 2021-02-12
CN110020434A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
WO2020191993A1 (en) Method for syntactic parsing of natural language
US9710458B2 (en) System for natural language understanding
US9323747B2 (en) Deep model statistics method for machine translation
US9824083B2 (en) System for natural language understanding
Zettlemoyer et al. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars
Turmo et al. Adaptive information extraction
Berwick Principle-based parsing
Nevins et al. Variable rules meet Impoverishment theory: Patterns of agreement leveling in English varieties
Wang et al. Morpho-syntactic lexical generalization for CCG semantic parsing
Mihalcea Word sense disambiguation with pattern learning and automatic feature selection
US10503769B2 (en) System for natural language understanding
Terdalkar et al. Framework for question-answering in Sanskrit through automated construction of knowledge graphs
Chen et al. Automated extraction of tree-adjoining grammars from treebanks
Farghaly et al. Inductive coding of the Arabic lexicon
KR100474823B1 (en) Part of speech tagging apparatus and method of natural language
Ehsan et al. Statistical Parser for Urdu
Perfiliev et al. Methods of syntactic analysis and comparison of constructions of a natural language oriented to use in search systems
Okumura et al. Lexicon-to-ontology concept association using a bilingual dictionary
Yoon et al. A New Parsing Method Using a Global Association Table
Zettlemoyer Learning to map sentences to logical form
Ong An architecture and prototype system for automatically processing natural-language statements of policy
Antoniadis et al. A french text recognition model for information retrieval system
Fournier et al. Processing of unknown words in a natural language question‐answering system
Buys Probabilistic tree transducers for grammatical error correction
JPH04283865A (en) Method and device for meaning processing of natural language

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19922080

Country of ref document: EP

Kind code of ref document: A1