CN114254631A

CN114254631A - Natural language analysis method and system based on data stream

Info

Publication number: CN114254631A
Application number: CN202111461882.8A
Authority: CN
Inventors: 苏长君; 曾祥禄
Original assignee: Beijing Zhimei Internet Technology Co ltd
Current assignee: Beijing Zhimei Internet Technology Co ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-03-29

Abstract

The invention provides a natural language analysis method and a system based on data flow, which convert the data flow into a form more suitable for natural language by proper processing, endow the data flow with a structured vector sequence according to a tree structure, input the vector sequence into a syntax model to break sentences to obtain first word components, input the first word components into a semantic analysis model one by one to obtain second word components, and form a new sentence according to the preset mapping relation between the type of a phrase and a weight value, thereby identifying the meaning of the new sentence.

Description

Natural language analysis method and system based on data stream

Technical Field

The present application relates to the field of network multimedia, and in particular, to a natural language analysis method and system based on data stream.

Background

When the existing natural language analysis algorithm faces mass data streams, the problems of high energy consumption and slow operation exist, and the existing natural language analysis algorithm needs to be improved. The need to properly process the data stream is a matter of consideration for those skilled in the art.

Therefore, there is a need for a targeted data stream oriented natural language analysis based method and system.

Disclosure of Invention

The invention aims to provide a method and a system for analyzing natural language based on data stream, which convert the data stream into a form more suitable for natural language by proper processing, endow the data stream with a structured vector sequence according to a tree structure, input the vector sequence into a syntactic model for sentence breaking to obtain a first word component, input the first word component into a semantic analysis model one by one to obtain a second word component, and form a new sentence according to a preset mapping relation between a phrase type and a weighted value, thereby identifying the meaning of the new sentence.

In a first aspect, the present application provides a natural language analysis method based on data stream oriented, the method including:

acquiring a network data stream, extracting carried sentences and additional element information from the network data stream, wherein the additional element information refers to identifiers used for distinguishing different sentences and different sources, mapping the sentences and the additional element information into data with character string type attributes respectively, and vectorizing to obtain a first vector sequence;

sequentially endowing the first vector sequences to a tree structure according to the sequence of head to tail connection, wherein the vector sequences corresponding to the additional element information are positioned at the subtree leaves of the vector sequences corresponding to the sentences of the same source, and a tree-structured second vector sequence is obtained;

inputting the second vector sequence into a syntactic model, and performing preliminary sentence segmentation to obtain a first word component, wherein the syntactic model is provided with extraction windows with different widths according to each word type, the extraction windows are used as sentence segmentation basis, and words in the window width form the first word component;

inputting the first word components into a semantic analysis model one by one, if the first word components can be identified as short sentences, determining that the preliminary sentence break of the first word components is unsuccessful, inputting the first word components into the syntactic model again, and performing sentence break again to obtain second word components; if the short sentence cannot be recognized and the phrase cannot be recognized, the preliminary sentence break of the first word component is determined to be successful, and the first word component is directly marked as a second word component; the phrase consists of a plurality of words and has no syntactic structure;

repeatedly inputting the second word components into the semantic analysis model one by one until each second word component is determined to be successful in preliminary sentence breaking;

and analyzing all the second word components after the preliminary sentence break according to a preset mapping relation between the phrase types and the weight values, clustering the second word components with the weight values larger than a threshold value to form a new sentence, and identifying the meaning of the new sentence.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the setting of the extraction windows with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the semantic analysis model performs semantic analysis according to sentence syntax requirements.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the kernels of the semantic analysis model and the syntax model both use a neural network model.

In a second aspect, the present application provides a system based on natural language analysis oriented to data streams, the system comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of any one of the four possibilities of the first aspect according to instructions in the program code.

In a third aspect, the present application provides a computer readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.

The invention provides a method and a system for analyzing natural language based on data stream, which convert the data stream into a form more suitable for natural language by proper processing, endow the data stream with a structured vector sequence according to a tree structure, input the vector sequence into a syntax model for sentence breaking to obtain a first word component, input the first word component into a semantic analysis model one by one to obtain a second word component, and form a new sentence according to a preset mapping relation between a phrase type and a weighted value, thereby identifying the meaning of the new sentence.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.

Fig. 1 is a flowchart of a method for natural language analysis based on data stream, which includes:

In some preferred embodiments, the setting of the extraction window with different widths according to each word type includes updating the word type, and establishing a corresponding relationship between the new word type and the width of the extraction window.

In some preferred embodiments, the semantic analysis model performs semantic analysis according to sentence grammar requirements.

In some preferred embodiments, the kernels of the semantic analysis model and the syntactic model both use a neural network model.

The application provides a system based on natural language analysis facing data stream, the system includes: the system includes a processor and a memory:

the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.

The present application provides a computer readable storage medium for storing program code for performing the method of any of the embodiments of the first aspect.

In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts in the various embodiments of the present specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims

1. A natural language analysis method based on data stream oriented, the method comprising:

2. The method of claim 1, wherein: and setting extraction windows with different widths according to each word type, including updating the word type, and establishing a corresponding relation between the new word type and the width of the extraction window.

3. The method according to any one of claims 1-2, wherein: and the semantic analysis model performs semantic analysis according to sentence grammar requirements.

4. A method according to any one of claims 1-3, characterized in that: the kernels of the semantic analysis model and the syntactic model both use a neural network model.

5. A stream-oriented natural language parsing system, the system comprising a processor and a memory:

the processor is configured to perform the method according to instructions in the program code to implement any of claims 1-4.

6. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing implementing the method of any of claims 1-4.