CN114254177A - Language processing method and system based on word sense distribution hypothesis construction - Google Patents
Language processing method and system based on word sense distribution hypothesis construction Download PDFInfo
- Publication number
- CN114254177A CN114254177A CN202111461699.8A CN202111461699A CN114254177A CN 114254177 A CN114254177 A CN 114254177A CN 202111461699 A CN202111461699 A CN 202111461699A CN 114254177 A CN114254177 A CN 114254177A
- Authority
- CN
- China
- Prior art keywords
- word
- sentence
- component
- word component
- meaning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims description 9
- 238000003672 processing method Methods 0.000 title abstract description 6
- 238000004458 analytical method Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000003062 neural network model Methods 0.000 claims description 3
- 230000003287 optical effect Effects 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention provides a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence breaking to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text and is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.
Description
Technical Field
The present application relates to the field of network multimedia, and in particular, to a method and system for language processing based on word sense distribution hypothesis construction.
Background
With the rapid development of networks, an automatic machine capable of rapidly and accurately understanding language meanings is needed, but the existing language understanding machine is difficult to accurately understand, particularly, the machine is difficult to be competent under the condition that Chinese vocabulary has word ambiguity, and a machine capable of understanding the meaning of the vocabulary in combination with context needs to be developed.
Therefore, there is a need for a targeted language processing method and system based on word sense distribution hypothesis construction.
Disclosure of Invention
The invention aims to provide a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence segmentation to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text, the candidate phrase is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.
In a first aspect, the present application provides a method for language processing based on word sense distribution hypothesis construction, the method comprising:
acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;
inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;
setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;
the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;
and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the setting of the extraction windows with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the semantic analysis model performs semantic analysis according to sentence syntax requirements.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the kernels of the semantic analysis model and the syntax model both use a neural network model.
In a second aspect, the present application provides a system for language processing based on word sense distribution hypothesis construction, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any one of the four possibilities of the first aspect according to instructions in the program code.
In a third aspect, the present application provides a computer readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.
The invention provides a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence breaking to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text and is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.
Fig. 1 is a flowchart of a method for language processing based on word sense distribution hypothesis construction provided in the present application, including:
acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;
inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;
setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;
the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;
and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.
In some preferred embodiments, the setting of the extraction window with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.
In some preferred embodiments, the semantic analysis model performs semantic analysis according to sentence grammar requirements.
In some preferred embodiments, the kernels of the semantic analysis model and the syntactic model both use a neural network model.
The present application provides a system for language processing based on word sense distribution hypothesis construction, the system comprising: the system includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.
The present application provides a computer readable storage medium for storing program code for performing the method of any of the embodiments of the first aspect.
In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments of the present specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.
Claims (6)
1. A method for language processing based on word sense distribution hypothesis construction, the method comprising:
acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;
inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;
setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;
the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;
and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.
2. The method of claim 1, wherein: and setting extraction windows with different widths according to each word type, including updating the word type, and establishing a corresponding relation between the new word type and the width of the extraction window.
3. The method according to any one of claims 1-2, wherein: and the semantic analysis model performs semantic analysis according to sentence grammar requirements.
4. A method according to any one of claims 1-3, characterized in that: the kernels of the semantic analysis model and the syntactic model both use a neural network model.
5. A language processing system constructed based on word sense distribution hypotheses, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to instructions in the program code to implement any of claims 1-4.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing implementing the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111461699.8A CN114254177A (en) | 2021-12-02 | 2021-12-02 | Language processing method and system based on word sense distribution hypothesis construction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111461699.8A CN114254177A (en) | 2021-12-02 | 2021-12-02 | Language processing method and system based on word sense distribution hypothesis construction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114254177A true CN114254177A (en) | 2022-03-29 |
Family
ID=80793860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111461699.8A Pending CN114254177A (en) | 2021-12-02 | 2021-12-02 | Language processing method and system based on word sense distribution hypothesis construction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114254177A (en) |
-
2021
- 2021-12-02 CN CN202111461699.8A patent/CN114254177A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5901001B1 (en) | Method and device for acoustic language model training | |
EP3819785A1 (en) | Feature word determining method, apparatus, and server | |
CN111753531A (en) | Text error correction method and device based on artificial intelligence, computer equipment and storage medium | |
CN107341143B (en) | Sentence continuity judgment method and device and electronic equipment | |
CN111079408A (en) | Language identification method, device, equipment and storage medium | |
CN115544240B (en) | Text sensitive information identification method and device, electronic equipment and storage medium | |
US8880391B2 (en) | Natural language processing apparatus, natural language processing method, natural language processing program, and computer-readable recording medium storing natural language processing program | |
CN112528641A (en) | Method and device for establishing information extraction model, electronic equipment and readable storage medium | |
CN114639386A (en) | Text error correction and text error correction word bank construction method | |
CN112084769A (en) | Dependency syntax model optimization method, device, equipment and readable storage medium | |
KR20150092879A (en) | Language Correction Apparatus and Method based on n-gram data and linguistic analysis | |
CN115858776B (en) | Variant text classification recognition method, system, storage medium and electronic equipment | |
CN112633007A (en) | Semantic understanding model construction method and device and semantic understanding method and device | |
CN111492364B (en) | Data labeling method and device and storage medium | |
CN113807091B (en) | Word mining method and device, electronic equipment and readable storage medium | |
CN115620726A (en) | Voice text generation method, and training method and device of voice text generation model | |
CN114254177A (en) | Language processing method and system based on word sense distribution hypothesis construction | |
CN112307183B (en) | Search data identification method, apparatus, electronic device and computer storage medium | |
CN113268588A (en) | Text abstract extraction method, device, equipment, storage medium and program product | |
CN113011162A (en) | Reference resolution method, device, electronic equipment and medium | |
CN114519357B (en) | Natural language processing method and system based on machine learning | |
CN117271778B (en) | Insurance outbound session information output method and device based on generation type large model | |
CN113326691B (en) | Data processing method and device, electronic equipment and computer readable medium | |
CN115455179B (en) | Sensitive vocabulary detection method, device, equipment and storage medium | |
CN114186552B (en) | Text analysis method, device and equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 607a, 6 / F, No. 31, Fuchengmenwai street, Xicheng District, Beijing 100037 Applicant after: Beijing Guorui Digital Intelligence Technology Co.,Ltd. Address before: 607a, 6 / F, No. 31, Fuchengmenwai street, Xicheng District, Beijing 100037 Applicant before: Beijing Zhimei Internet Technology Co.,Ltd. |
|
CB02 | Change of applicant information |