CN114254177A - Language processing method and system based on word sense distribution hypothesis construction - Google Patents

Language processing method and system based on word sense distribution hypothesis construction Download PDF

Info

Publication number
CN114254177A
CN114254177A CN202111461699.8A CN202111461699A CN114254177A CN 114254177 A CN114254177 A CN 114254177A CN 202111461699 A CN202111461699 A CN 202111461699A CN 114254177 A CN114254177 A CN 114254177A
Authority
CN
China
Prior art keywords
word
sentence
component
word component
meaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111461699.8A
Other languages
Chinese (zh)
Inventor
苏长君
曾祥禄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhimei Internet Technology Co ltd
Original Assignee
Beijing Zhimei Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhimei Internet Technology Co ltd filed Critical Beijing Zhimei Internet Technology Co ltd
Priority to CN202111461699.8A priority Critical patent/CN114254177A/en
Publication of CN114254177A publication Critical patent/CN114254177A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence breaking to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text and is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.

Description

Language processing method and system based on word sense distribution hypothesis construction
Technical Field
The present application relates to the field of network multimedia, and in particular, to a method and system for language processing based on word sense distribution hypothesis construction.
Background
With the rapid development of networks, an automatic machine capable of rapidly and accurately understanding language meanings is needed, but the existing language understanding machine is difficult to accurately understand, particularly, the machine is difficult to be competent under the condition that Chinese vocabulary has word ambiguity, and a machine capable of understanding the meaning of the vocabulary in combination with context needs to be developed.
Therefore, there is a need for a targeted language processing method and system based on word sense distribution hypothesis construction.
Disclosure of Invention
The invention aims to provide a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence segmentation to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text, the candidate phrase is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.
In a first aspect, the present application provides a method for language processing based on word sense distribution hypothesis construction, the method comprising:
acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;
inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;
setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;
the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;
and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the setting of the extraction windows with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the semantic analysis model performs semantic analysis according to sentence syntax requirements.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the kernels of the semantic analysis model and the syntax model both use a neural network model.
In a second aspect, the present application provides a system for language processing based on word sense distribution hypothesis construction, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any one of the four possibilities of the first aspect according to instructions in the program code.
In a third aspect, the present application provides a computer readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.
The invention provides a language processing method and system based on word meaning distribution hypothesis structure, which are characterized in that a sentence is input into a syntactic model to be subjected to preliminary sentence breaking to obtain a first word component, the first word component is input into a semantic analysis model one by one to obtain a second word component, the upper text of the current sentence is obtained, a next candidate phrase is predicted according to the meaning of the upper text and is matched with the second word component, the meaning of the second word component is given according to the matching result, and then the meaning of the sentence is obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.
Fig. 1 is a flowchart of a method for language processing based on word sense distribution hypothesis construction provided in the present application, including:
acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;
inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;
setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;
the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;
and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.
In some preferred embodiments, the setting of the extraction window with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.
In some preferred embodiments, the semantic analysis model performs semantic analysis according to sentence grammar requirements.
In some preferred embodiments, the kernels of the semantic analysis model and the syntactic model both use a neural network model.
The present application provides a system for language processing based on word sense distribution hypothesis construction, the system comprising: the system includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.
The present application provides a computer readable storage medium for storing program code for performing the method of any of the embodiments of the first aspect.
In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments of the present specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims (6)

1. A method for language processing based on word sense distribution hypothesis construction, the method comprising:
acquiring a network data stream, extracting sentences from the network data stream, inputting the sentences into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, and the extraction windows are taken as sentence segmentation basis, and words in the window width form the first word component;
inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;
setting the width of the upper text as N, wherein N is a positive integer, acquiring the upper text of the current sentence according to the width of the upper text, inputting the upper text into a semantic analysis model, analyzing the meaning of the upper text and predicting a candidate phrase next to the upper text, matching the candidate phrase with the second word component, and endowing the meaning of the second word component according to a matching result;
the matching refers to comparing words in the candidate phrase with words in the second word component one by one, calculating the number of the same words, and when the number is greater than a preset threshold value, determining that the candidate phrase is matched with the second word component;
and recombining the second word components to form a new sentence, and obtaining the meaning of the new sentence.
2. The method of claim 1, wherein: and setting extraction windows with different widths according to each word type, including updating the word type, and establishing a corresponding relation between the new word type and the width of the extraction window.
3. The method according to any one of claims 1-2, wherein: and the semantic analysis model performs semantic analysis according to sentence grammar requirements.
4. A method according to any one of claims 1-3, characterized in that: the kernels of the semantic analysis model and the syntactic model both use a neural network model.
5. A language processing system constructed based on word sense distribution hypotheses, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to instructions in the program code to implement any of claims 1-4.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing implementing the method of any of claims 1-4.
CN202111461699.8A 2021-12-02 2021-12-02 Language processing method and system based on word sense distribution hypothesis construction Pending CN114254177A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111461699.8A CN114254177A (en) 2021-12-02 2021-12-02 Language processing method and system based on word sense distribution hypothesis construction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111461699.8A CN114254177A (en) 2021-12-02 2021-12-02 Language processing method and system based on word sense distribution hypothesis construction

Publications (1)

Publication Number Publication Date
CN114254177A true CN114254177A (en) 2022-03-29

Family

ID=80793860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111461699.8A Pending CN114254177A (en) 2021-12-02 2021-12-02 Language processing method and system based on word sense distribution hypothesis construction

Country Status (1)

Country Link
CN (1) CN114254177A (en)

Similar Documents

Publication Publication Date Title
JP5901001B1 (en) Method and device for acoustic language model training
EP3819785A1 (en) Feature word determining method, apparatus, and server
CN111753531A (en) Text error correction method and device based on artificial intelligence, computer equipment and storage medium
CN107341143B (en) Sentence continuity judgment method and device and electronic equipment
CN111079408A (en) Language identification method, device, equipment and storage medium
CN115544240B (en) Text sensitive information identification method and device, electronic equipment and storage medium
US8880391B2 (en) Natural language processing apparatus, natural language processing method, natural language processing program, and computer-readable recording medium storing natural language processing program
CN112528641A (en) Method and device for establishing information extraction model, electronic equipment and readable storage medium
CN114639386A (en) Text error correction and text error correction word bank construction method
CN112084769A (en) Dependency syntax model optimization method, device, equipment and readable storage medium
KR20150092879A (en) Language Correction Apparatus and Method based on n-gram data and linguistic analysis
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN112633007A (en) Semantic understanding model construction method and device and semantic understanding method and device
CN111492364B (en) Data labeling method and device and storage medium
CN113807091B (en) Word mining method and device, electronic equipment and readable storage medium
CN115620726A (en) Voice text generation method, and training method and device of voice text generation model
CN114254177A (en) Language processing method and system based on word sense distribution hypothesis construction
CN112307183B (en) Search data identification method, apparatus, electronic device and computer storage medium
CN113268588A (en) Text abstract extraction method, device, equipment, storage medium and program product
CN113011162A (en) Reference resolution method, device, electronic equipment and medium
CN114519357B (en) Natural language processing method and system based on machine learning
CN117271778B (en) Insurance outbound session information output method and device based on generation type large model
CN113326691B (en) Data processing method and device, electronic equipment and computer readable medium
CN115455179B (en) Sensitive vocabulary detection method, device, equipment and storage medium
CN114186552B (en) Text analysis method, device and equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 607a, 6 / F, No. 31, Fuchengmenwai street, Xicheng District, Beijing 100037

Applicant after: Beijing Guorui Digital Intelligence Technology Co.,Ltd.

Address before: 607a, 6 / F, No. 31, Fuchengmenwai street, Xicheng District, Beijing 100037

Applicant before: Beijing Zhimei Internet Technology Co.,Ltd.

CB02 Change of applicant information