CN114254177A

CN114254177A - Language processing method and system based on word sense distribution hypothesis construction

Info

Publication number: CN114254177A
Application number: CN202111461699.8A
Authority: CN
Inventors: 苏长君; 曾祥禄
Original assignee: Beijing Zhimei Internet Technology Co ltd
Current assignee: Beijing Zhimei Internet Technology Co ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-03-29

Abstract

The invention provides a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence breaking to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text and is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.

Description

Language processing method and system based on word sense distribution hypothesis construction

Technical Field

The present application relates to the field of network multimedia, and in particular, to a method and system for language processing based on word sense distribution hypothesis construction.

Background

With the rapid development of networks, an automatic machine capable of rapidly and accurately understanding language meanings is needed, but the existing language understanding machine is difficult to accurately understand, particularly, the machine is difficult to be competent under the condition that Chinese vocabulary has word ambiguity, and a machine capable of understanding the meaning of the vocabulary in combination with context needs to be developed.

Therefore, there is a need for a targeted language processing method and system based on word sense distribution hypothesis construction.

Disclosure of Invention

The invention aims to provide a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence segmentation to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text, the candidate phrase is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.

In a first aspect, the present application provides a method for language processing based on word sense distribution hypothesis construction, the method comprising:

acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;

inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;

setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;

the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;

and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the setting of the extraction windows with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the semantic analysis model performs semantic analysis according to sentence syntax requirements.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the kernels of the semantic analysis model and the syntax model both use a neural network model.

In a second aspect, the present application provides a system for language processing based on word sense distribution hypothesis construction, the system comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of any one of the four possibilities of the first aspect according to instructions in the program code.

In a third aspect, the present application provides a computer readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.

Fig. 1 is a flowchart of a method for language processing based on word sense distribution hypothesis construction provided in the present application, including:

In some preferred embodiments, the setting of the extraction window with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.

In some preferred embodiments, the semantic analysis model performs semantic analysis according to sentence grammar requirements.

In some preferred embodiments, the kernels of the semantic analysis model and the syntactic model both use a neural network model.

The present application provides a system for language processing based on word sense distribution hypothesis construction, the system comprising: the system includes a processor and a memory:

the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.

The present application provides a computer readable storage medium for storing program code for performing the method of any of the embodiments of the first aspect.

In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts in the various embodiments of the present specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims

1. A method for language processing based on word sense distribution hypothesis construction, the method comprising:

2. The method of claim 1, wherein: and setting extraction windows with different widths according to each word type, including updating the word type, and establishing a corresponding relation between the new word type and the width of the extraction window.

3. The method according to any one of claims 1-2, wherein: and the semantic analysis model performs semantic analysis according to sentence grammar requirements.

4. A method according to any one of claims 1-3, characterized in that: the kernels of the semantic analysis model and the syntactic model both use a neural network model.

5. A language processing system constructed based on word sense distribution hypotheses, the system comprising a processor and a memory:

the processor is configured to perform the method according to instructions in the program code to implement any of claims 1-4.

6. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing implementing the method of any of claims 1-4.