CN110674308A - Scientific and technological word list expansion method, device, terminal and medium based on grammar mode - Google Patents

Scientific and technological word list expansion method, device, terminal and medium based on grammar mode Download PDF

Info

Publication number
CN110674308A
CN110674308A CN201910785089.XA CN201910785089A CN110674308A CN 110674308 A CN110674308 A CN 110674308A CN 201910785089 A CN201910785089 A CN 201910785089A CN 110674308 A CN110674308 A CN 110674308A
Authority
CN
China
Prior art keywords
entity
scientific
vocabulary
technological
word list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910785089.XA
Other languages
Chinese (zh)
Inventor
田欣
朱悦
翁泉飞
胡寅骏
杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kehui Talent Service Co ltd
Shanghai Science And Technology Development Co ltd
Original Assignee
Shanghai Science And Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Science And Technology Development Co Ltd filed Critical Shanghai Science And Technology Development Co Ltd
Priority to CN201910785089.XA priority Critical patent/CN110674308A/en
Publication of CN110674308A publication Critical patent/CN110674308A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a scientific and technological word list expansion method, a device, a terminal and a medium based on a grammar mode, wherein the scientific and technological word list expansion method comprises the following steps: extracting a plurality of entity relationships from one or more texts based on a grammar pattern; determining one or more entity relationships associated with each of the search contents among the extracted plurality of entity relationships using one or more words in the original scientific and technological vocabulary before expansion as the search contents; and expanding the original scientific and technological word list based on the entity relation related to each search content to form a new scientific and technological word list with a larger vocabulary hierarchy compared with the original scientific and technological word list. The technical scheme of this application aims at providing the automatic expansion scheme of vocabulary based on grammar mode, can carry out high-efficient and intelligent vocabulary expansion to the STKOS vocabulary, follows closely the pace of science and technology development to effectively solve the difficult problem among the prior art.

Description

Scientific and technological word list expansion method, device, terminal and medium based on grammar mode
Technical Field
The present application relates to the technical field of scientific and technological vocabulary, and in particular, to a method, an apparatus, a terminal, and a medium for expanding a scientific and technological vocabulary based on a grammar pattern.
Background
The STKOS vocabulary is a super-technology vocabulary which is based on the international advanced knowledge organization technology and method and is built by using the construction experience of the existing knowledge organization system at home and abroad and is oriented to computer application. The STKOS vocabulary is beneficial to better developing and utilizing resources such as scientific and technical literature, patents and the like, and has important information in various aspects such as national information industry promotion, literature sharing and the like.
However, the expansion of the STKOS vocabulary does not follow the pace of technology development, and there are situations that new words are not updated timely, the updating method is time-consuming, and the manpower input is high. Therefore, how to expand the STKOS vocabulary efficiently and intelligently becomes a technical problem which needs to be solved urgently in the field.
Content of application
In view of the above-mentioned shortcomings of the prior art, the present application aims to provide a scientific vocabulary expansion method, apparatus, terminal and medium based on grammar patterns, which are used to solve the technical problem that the STKOS vocabulary in the prior art cannot be efficiently and intelligently expanded.
To achieve the above and other related objects, a first aspect of the present application provides a scientific vocabulary expansion method based on grammar patterns, which includes: extracting a plurality of entity relationships from one or more texts based on a grammar pattern; determining one or more entity relationships associated with each of the search contents among the extracted plurality of entity relationships using one or more words in the original scientific and technological vocabulary before expansion as the search contents; and expanding the original scientific and technological word list based on the entity relation related to each search content to form a new scientific and technological word list with a larger vocabulary hierarchy compared with the original scientific and technological word list.
In some embodiments of the first aspect of the present application, the method further comprises: and determining the entity relationship extracted by mistake in the new science and technology vocabulary for correction according to the frequency of the entity relationship in the large-scale corpus.
In some embodiments of the first aspect of the present application, the method further comprises: if the frequency of the occurrence of the first entity relationship and the second entity relationship in the large-scale corpus, which are respectively formed by any two lower entities in the plurality of lower entities corresponding to the upper entity, is higher than the frequency of the occurrence of the third entity relationship in the large-scale corpus, which is formed by any two lower entities, the third entity relationship is determined to be the entity relationship with the extraction error.
In some embodiments of the first aspect of the present application, the method comprises: if the entity in an entity relationship can be divided into two or more independent words, judging whether the frequency of the two or more independent words appearing together is greater than a preset threshold value; and if the number of the independent words is larger than a preset threshold value, determining that the two or more independent words belong to the same entity.
In some embodiments of the first aspect of the present application, the method comprises: if the entity in the entity relationship can be divided into two or more independent words, judging whether the frequency of the independent words is greater than the frequency of the independent words; if so, determining that the independent term does not belong to the entity.
In some embodiments of the first aspect of the present application, the scientific vocabulary comprises a STKOS vocabulary.
In some embodiments of the first aspect of the present application, the entities in the entity relationship have membership.
To achieve the above and other related objects, a second aspect of the present application provides a scientific vocabulary expansion apparatus based on grammar patterns, comprising: the entity relation extracting module is used for extracting a plurality of entity relations from one or more texts based on the grammar mode; the word list expansion module is used for determining one or more entity relations related to each search content in the extracted entity relations by taking one or more vocabularies in an original scientific and technological word list before expansion as the search content, and expanding the original scientific and technological word list based on the entity relations related to each search content to form a new scientific and technological word list with a larger vocabulary hierarchy compared with the original scientific and technological word list.
To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the scientific vocabulary expansion method based on grammar patterns.
To achieve the above and other related objects, a fourth aspect of the present application provides an electronic terminal comprising: a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored in the memory so as to enable the terminal to execute the scientific and technological word list expansion method based on the grammar mode.
As described above, the scientific and technological vocabulary expansion method, apparatus, terminal, and medium based on the grammar schema of the present application have the following beneficial effects: the technical scheme of this application aims at providing the automatic expansion scheme of vocabulary based on grammar mode, can carry out high-efficient and intelligent vocabulary expansion to the STKOS vocabulary, follows closely the pace of science and technology development to effectively solve the difficult problem among the prior art.
Drawings
Fig. 1 is a flowchart illustrating a scientific vocabulary expansion method based on a grammar pattern according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating a scientific vocabulary expansion method based on a grammar pattern according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating a scientific vocabulary expansion method based on a grammar pattern according to an embodiment of the present application.
Fig. 4 is a flowchart illustrating a scientific vocabulary expansion method based on a grammar pattern according to an embodiment of the present application.
Fig. 5 is a flowchart illustrating a scientific vocabulary expansion method based on a grammar pattern according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of a scientific vocabulary expansion apparatus based on a grammar pattern according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of an electronic terminal according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.
In this application, unless expressly stated or limited otherwise, the terms "mounted," "connected," "secured," "retained," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
Aiming at the technical problems that the expansion of the STKOS vocabulary in the prior art does not follow the pace of scientific and technological development, the update of new words is not timely, the update method is time-consuming, the manpower input is high and the like, the invention provides a scientific and technological vocabulary expansion method, a device, a terminal and a medium based on a grammar mode, and the problems in the prior art are effectively solved. The technical scheme of the invention aims to provide a vocabulary automatic expansion scheme based on a grammar mode, so that efficient and intelligent vocabulary expansion is carried out on the STKOS vocabulary, and the pace of scientific and technological development is followed.
Fig. 1 is a flow chart of a scientific and technological vocabulary expansion method based on a grammar pattern according to an embodiment of the present application.
It should be noted that the scientific and technological vocabulary expansion method based on the grammar pattern in the present application can be applied to various types of hardware devices. Specifically, the hardware device may be a controller, such as an arm (advanced RISC machines) controller, an fpga (field Programmable Gate array) controller, a soc (system on chip) controller, a dsp (digital Signal processing) controller, or an mcu (micro controller unit) controller, etc.; the hardware device may also be a computer device including components such as memory, memory controllers, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer device includes, but is not limited to, a Personal computer such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a smart television, a Personal Digital Assistant (PDA for short), and the like; the hardware device may also be a server, and the server may be arranged on one or more entity servers according to various factors such as functions, loads, and the like, or may be formed by a distributed or centralized server cluster, which is not limited in this embodiment.
In this embodiment, the scientific vocabulary expansion method based on the grammar pattern includes step S101, step S102, and step S103.
In step S101, a plurality of entity relationships are extracted from one or more texts based on a grammar schema.
In some alternative implementations, the syntactic patterns are based on a method of syntactic analysis, which discovers word-to-word relationships by analyzing the syntactic structure of a sentence. The grammar schema includes, but is not limited to, e.g., a heartst schema, which is classified as a grammar schema because grammar-related tags are specified in the heartst schema, which refers to mining context by recognizing grammar schemas in sentences.
In some alternative implementations, the entity relationships include isA entity relationships, i.e., there is a membership between entities in an entity relationship. Preferably, the entity relationships are extracted based on large-scale text derived from an English literature database and/or scientific literature content crawled by a web crawler. Specifically, the isA entity relationships can be automatically extracted from the textual content of titles, abstracts, full-text, etc. of large-scale document collections based on the hearts schema.
To facilitate understanding by those skilled in the art, the heartst mode mentioned in the present embodiment will now be explained in conjunction with table 1 below. Wherein, the left column in Table 1 represents the schema type; the right column indicates an example corresponding to the pattern type; NP denotes the Noun Phrase (Noun Phrase).
TABLE 1
Figure BDA0002177792150000041
Figure BDA0002177792150000051
The method for extracting the isA entity relationship executed in this embodiment is to match the heartt pattern with the text, taking the first pattern type in table 1 as an example: extracting a characteristic word 'sucas', and using noun phrases positioned before and after the characteristic word as upper and lower words; for example, "NP suc as { NP, } or | and } NP" can be expressed as "NP 0 suc as { NP1, NP2, …, (or | and) } NPn", which means that for either NPi, "Npi isA NP 0" holds.
Therefore, for words and sentences matching with the pattern, entity relationships can be extracted easily, for example, the entity relationship of "Linear Regression A Learning Algorithms" is extracted from "Learning Algorithms like Linear Regression, processing Tree"; the entity relationship of "processing Tree isA Learning Algorithms" is extracted from "Learning Algorithms Such as processing Tree", and the like, and therefore, similar embodiments are many and are not described herein.
In step S102, one or more vocabularies in the original scientific and technological vocabulary before expansion are used as search contents, and one or more entity relationships associated with each of the search contents are determined among the extracted entity relationships.
Preferably, one or more isA entity relationships associated with the search term are found from large-scale literature using the vocabulary of the STKOS scientific vocabulary prior to expansion, including the vocabulary of all levels as the search term.
In step S103, the original scientific and technological vocabulary is expanded based on the entity relationship associated with each of the search contents to form a new scientific and technological vocabulary having a larger vocabulary hierarchy than the original scientific and technological vocabulary.
Specifically, the original scientific and technological vocabulary is expanded according to one or more isA entity relations which are found from large-scale literature and are associated with search words, so that a new scientific and technological vocabulary with a larger vocabulary hierarchy compared with the original scientific and technological vocabulary is formed, the automatic expansion function of the vocabulary is realized, and the updating method is timely, time-saving and low in labor investment, so that the method has very obvious advantages compared with the prior art.
Fig. 2 is a flow chart of a scientific vocabulary expansion method based on a grammar pattern according to an embodiment of the present application. In this embodiment, the scientific and technological vocabulary expansion method includes steps S201 to S204.
In step S201, a plurality of entity relationships are extracted from one or more texts based on a grammar schema.
In step S202, one or more vocabularies in the original scientific and technological vocabulary before expansion are used as search contents, and one or more entity relationships associated with each of the search contents are determined among the extracted entity relationships.
In step S203, the original scientific and technological vocabulary is expanded based on the entity relationship associated with each of the search contents to form a new scientific and technological vocabulary having a larger vocabulary hierarchy than the original scientific and technological vocabulary.
It should be noted that, the method flow steps S201 to S203 in this embodiment are similar to the method flow steps S101 to S103 in the above embodiment, and therefore, the detailed description thereof is omitted.
In step S204, the entity relationship extracted with errors in the new science and technology vocabulary is determined according to the frequency of occurrence of the entity relationship in the large-scale corpus, so as to correct the entity relationship.
It should be noted that the modification includes, but is not limited to, the following operations: deleting the entity relationship of the extraction error, modifying the entity relationship of the extraction error, or covering the entity relationship of the extraction error, and the like, which is not limited in this embodiment.
In some aspects, the extraction of erroneous entity relationships is caused by, for example, noisy vocabulary. For example, the erroneous entity relationship of "cat isA dogs" is extracted from the sentence of "animal peak dogs" due to the interference of "other peak" which is a noisy word.
For the extraction error caused by the interference of the noise vocabulary, the method steps shown in fig. 3 can be executed to determine whether the entity relationship is the extraction error, as shown in steps S301 to S303 below:
in step S301, any two lower entities of the plurality of lower entities corresponding to a higher entity and the higher entity form a first entity relationship and a second entity relationship, respectively.
In step S302, it is determined whether the frequency of the first entity relationship and the second entity relationship appearing in the large-scale corpus is higher than the frequency of the third entity relationship formed by any two lower entities appearing in the large-scale corpus.
In step S303, if yes, it is determined that the third entity relationship is an entity relationship with an extraction error.
For example, let the first entity relationship t1 be "Cats isA animal", the second entity relationship t2 be "dog isAanimal", and the third entity relationship t3 be "Cats isA dogs"; if the frequency of the first entity relationship and the second entity relationship extracted from the large-scale corpus is higher than that of the third entity relationship, and the rule that the upper and lower entities cannot meet the upper and lower relationships is combined with two lower entities of the same upper entity, the third entity relationship can be determined to be the entity relationship with the extraction error.
It should be noted that, since the frequency occurring in the large-scale corpus directly relates to the reliability of the entity relationship, that is, the higher the frequency is, the more reliable the entity relationship is, and otherwise, the less reliable the entity relationship is, under the condition that whether the entity relationship is extracted incorrectly is determined, it is preferable to determine whether the frequency of the first entity relationship and the second entity relationship occurring in the large-scale corpus is significantly higher than the frequency of the third entity relationship, and if so, it is determined that the third entity relationship is incorrect, and such a determination manner can improve the accuracy of the determination.
In addition, the term "significantly higher" in this embodiment can be used to distinguish whether the frequency of the first entity relationship (or the second entity relationship) appearing in the large-scale corpus is significantly higher than the frequency of the third entity relationship by calculating whether the ratio of the frequency of the first entity relationship to the frequency of the third entity relationship appearing in the large-scale corpus is greater than the preset threshold.
In other aspects, the entity relationship that is extracted incorrectly is caused by a word segmentation error, for example. For example, in "algorithms including SVM, LR and RF", the participle model has difficulty determining whether "LR and RF" is exactly one entity or two entities, resulting in an extraction error.
For extraction errors caused by interference of word segmentation errors, whether the entity relationship is an extraction error can be determined by executing the method steps shown in fig. 4, specifically as shown in steps S401 to S402 below:
in step S401, if an entity in an entity relationship can be divided into two or more independent terms, it is determined whether the frequency of the two or more independent terms is greater than a preset threshold.
In step S402, if the number of the independent words is greater than the preset threshold, it is determined that the two or more independent words belong to the same entity.
Taking the expression "LR and RF" as an example, it is determined whether "LR and RF" is an entity or two entities, wherein "LR and RF" can be divided into two independent words (i.e., "LR" and "RF"). In large scale, if the frequency of occurrence of "LR and RF" is greater than a predetermined threshold, it is determined that "LR and RF" are integral, i.e., "LR" and "RF" belong to the same entity.
For extraction errors caused by interference of word segmentation errors, the method steps shown in fig. 5 may also be performed to determine whether the entity relationship is an extraction error, specifically as follows:
in step S501, if an entity in an entity relationship can be divided into two or more independent terms, it is determined whether the frequency of the independent terms is greater than the frequency of the independent terms.
In step S502, if yes, it is determined that the independent word does not belong to the entity.
Still taking the expression "successful including SVM, LR and RF" as an example, it is necessary to determine whether "LR and RF" is an entity or two entities, wherein "LR and RF" can be divided into two independent words (i.e., "LR" and "RF"). In large scale anticipation, it may be determined that "LR and RF" is not a whole, i.e., "LR" and "RF" are separate entities, if the independent word "LR" occurs more frequently than "LR and RF" occurs together, or the independent word "RF" occurs more frequently than "LR and RF" occurs together.
It should be noted that, since the frequency occurring in the large-scale corpus directly relates to the reliability of the entity relationship, that is, the higher the frequency is, the more reliable the entity relationship is, and otherwise, the less reliable the entity relationship is, under the condition that whether the entity relationship is extracted incorrectly is determined, it is preferable to determine whether the frequency of the occurrence of the independent term is significantly higher than the frequency of the occurrence of the independent term together, and if so, it is determined that the independent term is not a component of the entity, so that the determination method can improve the accuracy of the determination.
In addition, the term "significantly higher" in this embodiment can be used to distinguish whether the frequency of the independent words appearing in the large-scale corpus is significantly higher than the frequency of the independent words appearing in the large-scale corpus by calculating whether the ratio of the frequency of the independent words appearing to the frequency of the independent words appearing together is greater than a preset threshold.
In an embodiment, the present application further provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the above-mentioned method steps.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Fig. 6 is a schematic structural diagram of a scientific and technological vocabulary expansion apparatus based on a grammar pattern according to an embodiment of the present application. The scientific and technological vocabulary expansion device comprises an entity relation extraction module 61 and a vocabulary expansion module 62.
The entity relationship extracting module 61 is configured to extract a plurality of entity relationships from one or more texts based on a grammar pattern; the vocabulary expansion module 62 is configured to determine one or more entity relationships associated with each of the search contents from among the extracted plurality of entity relationships, using one or more vocabularies in an original scientific vocabulary before expansion as search contents, and expand the original scientific vocabulary based on the entity relationships associated with each of the search contents to form a new scientific vocabulary having a larger vocabulary hierarchy than the original scientific vocabulary.
The implementation of the scientific and technological vocabulary expansion apparatus based on the grammar pattern provided in this embodiment is similar to the implementation of the scientific and technological vocabulary expansion method based on the grammar pattern, and is not repeated here.
It should be further understood that the division of the modules of the above apparatus is only a division of logical functions, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated.
Fig. 7 is a schematic structural diagram of another electronic terminal according to an embodiment of the present application. This example provides an electronic terminal, includes: a processor 71 and a memory 72; the memory 72 is connected to the processor 71 through a system bus and is used for completing communication between the processor and the memory 72, the processor 71 is used for running the computer program, and the electronic terminal is enabled to execute the steps of the scientific vocabulary expansion method based on the grammar mode.
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In summary, the present application provides a scientific and technological vocabulary expansion method, apparatus, terminal, and medium based on a grammar mode, and the technical scheme of the present application aims to provide an automatic vocabulary expansion scheme based on a grammar mode, which can perform efficient and intelligent vocabulary expansion on an STKOS vocabulary, following the pace of scientific and technological development, thereby effectively solving the problems in the prior art. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (10)

1. A scientific and technological word list expansion method based on grammar mode is characterized by comprising the following steps:
extracting a plurality of entity relationships from one or more texts based on a grammar pattern;
determining one or more entity relationships associated with each of the search contents among the extracted plurality of entity relationships using one or more words in the original scientific and technological vocabulary before expansion as the search contents;
and expanding the original scientific and technological word list based on the entity relation related to each search content to form a new scientific and technological word list with a larger vocabulary hierarchy compared with the original scientific and technological word list.
2. The method of claim 1, further comprising:
and determining the entity relationship extracted by mistake in the new science and technology vocabulary for correction according to the frequency of the entity relationship in the large-scale corpus.
3. The method of claim 2, further comprising:
if the frequency of the occurrence of the first entity relationship and the second entity relationship in the large-scale corpus, which are respectively formed by any two lower entities in the plurality of lower entities corresponding to the upper entity, is higher than the frequency of the occurrence of the third entity relationship in the large-scale corpus, which is formed by any two lower entities, the third entity relationship is determined to be the entity relationship with the extraction error.
4. The method of claim 2, wherein the method comprises:
if the entity in an entity relationship can be divided into two or more independent words, judging whether the frequency of the two or more independent words appearing together is greater than a preset threshold value;
and if the number of the independent words is larger than a preset threshold value, determining that the two or more independent words belong to the same entity.
5. The method of claim 2, wherein the method comprises:
if the entity in the entity relationship can be divided into two or more independent words, judging whether the frequency of the independent words is greater than the frequency of the independent words;
if so, determining that the independent term does not belong to the entity.
6. The method of claim 1, wherein the scientific vocabulary comprises a STKOS vocabulary.
7. The method of claim 1, wherein entities in the entity relationship have membership.
8. A scientific and technological vocabulary expansion device based on grammar mode is characterized by comprising:
the entity relation extracting module is used for extracting a plurality of entity relations from one or more texts based on the grammar mode;
the word list expansion module is used for determining one or more entity relations related to each search content in the extracted entity relations by taking one or more vocabularies in an original scientific and technological word list before expansion as the search content, and expanding the original scientific and technological word list based on the entity relations related to each search content to form a new scientific and technological word list with a larger vocabulary hierarchy compared with the original scientific and technological word list.
9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the method for augmenting a scientific vocabulary based on a grammar pattern according to any one of claims 1 to 7.
10. An electronic terminal, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory to enable the terminal to execute the scientific vocabulary expansion method based on the grammar pattern according to any one of claims 1 to 7.
CN201910785089.XA 2019-08-23 2019-08-23 Scientific and technological word list expansion method, device, terminal and medium based on grammar mode Pending CN110674308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910785089.XA CN110674308A (en) 2019-08-23 2019-08-23 Scientific and technological word list expansion method, device, terminal and medium based on grammar mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910785089.XA CN110674308A (en) 2019-08-23 2019-08-23 Scientific and technological word list expansion method, device, terminal and medium based on grammar mode

Publications (1)

Publication Number Publication Date
CN110674308A true CN110674308A (en) 2020-01-10

Family

ID=69075643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910785089.XA Pending CN110674308A (en) 2019-08-23 2019-08-23 Scientific and technological word list expansion method, device, terminal and medium based on grammar mode

Country Status (1)

Country Link
CN (1) CN110674308A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653706A (en) * 2015-12-31 2016-06-08 北京理工大学 Multilayer quotation recommendation method based on literature content mapping knowledge domain
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
CN106933787A (en) * 2017-03-20 2017-07-07 上海智臻智能网络科技股份有限公司 Adjudicate the computational methods of document similarity, search device and computer equipment
CN109215798A (en) * 2018-10-09 2019-01-15 北京科技大学 A kind of construction of knowledge base method towards Chinese medicine ancient Chinese prose
CN109308323A (en) * 2018-12-07 2019-02-05 中国科学院长春光学精密机械与物理研究所 A kind of construction method, device and the equipment of causality knowledge base
CN109992689A (en) * 2019-03-26 2019-07-09 华为技术有限公司 Searching method, terminal and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653706A (en) * 2015-12-31 2016-06-08 北京理工大学 Multilayer quotation recommendation method based on literature content mapping knowledge domain
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
CN106933787A (en) * 2017-03-20 2017-07-07 上海智臻智能网络科技股份有限公司 Adjudicate the computational methods of document similarity, search device and computer equipment
CN109215798A (en) * 2018-10-09 2019-01-15 北京科技大学 A kind of construction of knowledge base method towards Chinese medicine ancient Chinese prose
CN109308323A (en) * 2018-12-07 2019-02-05 中国科学院长春光学精密机械与物理研究所 A kind of construction method, device and the equipment of causality knowledge base
CN109992689A (en) * 2019-03-26 2019-07-09 华为技术有限公司 Searching method, terminal and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
梁冰等: "国家科技图书文献中心科技词表的构建与应用", 《中国科技资源导刊》 *
马雨萌等: "STKOS中领域本体模型框架研究", 《图书情报工作》 *

Similar Documents

Publication Publication Date Title
US20200081899A1 (en) Automated database schema matching
CN106844341B (en) Artificial intelligence-based news abstract extraction method and device
US10242670B2 (en) Syntactic re-ranking of potential transcriptions during automatic speech recognition
WO2021012519A1 (en) Artificial intelligence-based question and answer method and apparatus, computer device, and storage medium
CN110334209B (en) Text classification method, device, medium and electronic equipment
EP3115907A1 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
CN106610931B (en) Topic name extraction method and device
CN105446986A (en) Web page processing method and device
CN110929509B (en) Domain event trigger word clustering method based on louvain community discovery algorithm
JP5317061B2 (en) A simultaneous classifier in multiple languages for the presence or absence of a semantic relationship between words and a computer program therefor.
CN112287077A (en) Statement extraction method and device for combining RPA and AI for document, storage medium and electronic equipment
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN109300550B (en) Medical data relation mining method and device
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN116797195A (en) Work order processing method, apparatus, computer device, and computer readable storage medium
CN101533391A (en) System for searching similar matched sentences and method thereof
CN110795562A (en) Map optimization method, device, terminal and storage medium
CN110674308A (en) Scientific and technological word list expansion method, device, terminal and medium based on grammar mode
CN108733702B (en) Method, device, electronic equipment and medium for extracting upper and lower relation of user query
CN112380348B (en) Metadata processing method, apparatus, electronic device and computer readable storage medium
CN115098061A (en) Software development document optimization method and device, computer equipment and storage medium
WO2021056740A1 (en) Language model construction method and system, computer device and readable storage medium
CN111625579B (en) Information processing method, device and system
CN113535938A (en) Standard data construction method, system, device and medium based on content identification
CN114528824A (en) Text error correction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211201

Address after: 200052 rooms 514, 516, 518, 519, 520, 522, building 3, 1634 Huaihai Middle Road, Xuhui District, Shanghai

Applicant after: Shanghai Science and Technology Development Co.,Ltd.

Applicant after: Shanghai Kehui Talent Service Co.,Ltd.

Address before: 200052 rooms 514, 516, 518, 519, 520, 522, building 3, 1634 Huaihai Middle Road, Xuhui District, Shanghai

Applicant before: Shanghai Science and Technology Development Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110

RJ01 Rejection of invention patent application after publication