CN117251551B

CN117251551B - Natural language processing system and method based on large language model

Info

Publication number: CN117251551B
Application number: CN202311467688.XA
Authority: CN
Inventors: 邹一荣; 黄自才
Original assignee: China Unicom Guangdong Industrial Internet Co Ltd
Current assignee: China Unicom Guangdong Industrial Internet Co Ltd
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2024-05-07
Anticipated expiration: 2043-11-06
Also published as: CN117251551A

Abstract

The embodiment of the application discloses a natural language processing system and a natural language processing method based on a large language model, wherein the natural language processing system comprises a data type identification module, a data processing module and a data output module; the data type identification module is used for receiving the search information input by the user and carrying out data type identification on the search information; the data processing module is used for matching the search information after the data type identification based on a preset deep learning model; and the data output module is used for carrying out privacy processing on the data output by the data processing module and outputting target retrieval data. According to the natural language processing system and the natural language processing method, the data processing module is used for matching the search information after the data type identification based on the preset deep learning model, and the data output module is used for further controlling the data output by the data processing module, so that the accuracy and the search efficiency of information search can be improved.

Description

Natural language processing system and method based on large language model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a natural language processing system and method based on a large language model.

Background

Natural language processing is a branch discipline in the fields of artificial intelligence and linguistics, and can implement various theories and methods for efficient communication between a person and a computer in natural language. With the continuous development of natural language processing, people can gradually search information on the basis of the natural language processing, for example, in the prior related technology, a natural language processing system is used, and the understanding capability of the natural language information is improved by connecting a context, expanding a parameter dictionary and a rule set, so that the system has better field expansibility; the language information with poor recognition effect can be improved by only abstracting new rules according to the grammar structure and adding the new rules into the system, so that the recognition effect of sentences can be improved, and the retrieval efficiency is further improved during information retrieval.

However, although the above-described technical solution realizes the functions of improving the retrieval efficiency and the recognition efficiency, the processing of the natural language data and the retrieval information is not perfect enough, and thus the problem of low accuracy and low efficiency of the data processing is caused.

Therefore, how to improve the accuracy and the searching efficiency of information searching is a problem to be solved.

Disclosure of Invention

In view of this, the embodiment of the application provides a natural language processing system and a natural language processing method based on a large language model, which can improve the accuracy and the retrieval efficiency of information retrieval. The natural language processing system and method based on the large language model provided by the embodiment of the application are realized as follows:

The embodiment of the application provides a natural language processing system based on a large language model, which comprises: the device comprises a data type identification module, a data processing module and a data output module; the data type identification module is used for receiving search information input by a user and carrying out data type identification on the search information, wherein the data type comprises at least one of text, pictures, audio and video; the data processing module is used for matching the search information after the data type identification based on a preset deep learning model, and the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data; the data output module is used for carrying out privacy processing on the data output by the data processing module and outputting target retrieval data, and the privacy processing is used for indicating the processing of sensitive data; the data processing module comprises a data type classification module, a data preprocessing module, a deep learning module and an information matching module; the data type classification module is used for classifying data types of at least one input natural language data; the data preprocessing module is used for preprocessing at least one piece of natural language data subjected to type classification, and the preprocessing comprises fuzzy processing and integrity processing; the deep learning module is used for training the initial neural network model based on at least one natural language data subjected to the type classification and the preprocessing to obtain the preset deep learning model; the information matching module is used for matching the search information after the data type identification based on the preset deep learning model.

In some embodiments, in a case where the at least one natural language data includes text data, the data type classification module is configured to perform word segmentation, part-of-speech tagging, named entity recognition, and semantic analysis on the text data based on a natural language processing technique; the data type classification module is used for classifying the picture data, detecting targets and identifying objects based on a convolutional neural network under the condition that the at least one natural language data comprises the picture data; in the case that the at least one natural language data includes audio data, the data type classification module is configured to extract spectral features of the audio data based on a fast fourier transform; and the data type classification module is used for performing motion detection, face recognition and behavior analysis on the video data based on motion analysis and object tracking technology when the at least one natural language data comprises video data.

In some embodiments, in a case where the at least one natural language data includes text data, the blurring process is configured to instruct to replace a part of personal identification information in the text data with a preset identifier, where the personal identification information includes at least one of a name, an identification card number, and a phone number; when the at least one natural language data includes picture data, the blurring process is configured to instruct blurring processing to sensitive personal information in the picture data, delete geographical location information included in a picture in the picture data, identify a copyrighted picture, and add a text watermark to the copyrighted picture to identify a source of the picture and the copyrighted information, where the sensitive personal information includes at least one of a face, a name, and an identification card number; in the case that the at least one natural language data includes audio data, the blurring process is configured to instruct deletion of information related to a person's identity in the audio data based on an audio editing tool, and replace sensitive vocabulary or personal information with a general placeholder or a randomly generated replacement word by changing a sound feature of audio in the audio data; in the case that the at least one natural language data includes video data, the blurring process is used for instructing blurring process of a face, a license plate number, an identification card number area in the video data, deleting information related to personal identity, and processing sound in a video in the natural language data.

In some embodiments, where the at least one natural language data comprises text data, the integrity process is to instruct converting text in the text data to a digital representation based on a preset algorithm, the preset algorithm comprising employing at least one of a bag of words model, a frequency of words-inverse document frequency TF-IDF vectorization, and word embedding; in the case that the at least one natural language data includes picture data, the integrity process is configured to instruct to adjust a size of a picture in the picture data so as to adapt the picture to an input requirement of a model, and extract characteristics of the picture based on a pre-trained convolutional neural network; in the case that the at least one natural language data comprises audio data, the integrity process is for indicating that the audio data is converted into a time-frequency representation based on a short-time fourier transform; in case the at least one natural language data comprises video data, the integrity process is for indicating to extract a sequence of frames from the video data as input or to obtain timing information by an optical flow method.

In some embodiments, the preset deep learning model includes a convolutional neural network model, a residual network model, a cyclic neural network model, and a three-dimensional convolutional neural network model; the information matching module is used for carrying out text classification and named entity recognition on the text data based on the convolutional neural network model so as to match the search information after the data type recognition; the information matching module is used for processing the picture data by adopting a residual block based on the residual network model to solve the problem that the gradient of deep network training disappears so as to match the search information after the data type identification; in the case that the at least one natural language data includes audio data, the information matching module is configured to perform audio sequence data processing, voice recognition and voice generation on the audio data based on the recurrent neural network model, so as to match the search information after the data type recognition; and the information matching module is used for carrying out video classification and action recognition on the video data based on the three-dimensional convolutional neural network model so as to match the search information after the data type recognition.

In some embodiments, the motion analysis, object tracking technique includes at least one of a Lucas-Kanade optical flow algorithm, a gaussian mixture model algorithm, a single-target tracking algorithm, and a multi-target tracking algorithm.

In some embodiments, the TF-IDF vectorization process is configured to instruct, for each document in the text data, calculating a number of occurrences of each word in the each document and an inverse document frequency of the each word in the whole corpus, to obtain TF-IDF weights of the each word; wherein the inverse document frequency of each word in the whole corpus is obtained according to the logarithmic result of the ratio of the total number of documents in the corpus to the number of documents comprising each word, and the formula is as follows:

IDF＝log(N/DF)

Wherein IDF is the inverse document frequency of each word in the whole corpus, N is the total number of documents in the corpus, DF is the number of documents comprising each word;

The TF-IDF weight of each word is obtained by multiplying the frequency of the inverse document by the number of occurrences of each word in each document.

In some embodiments, the short-time fourier transform process is configured to instruct to use a hamming window or a rectangular window as a window function in the short-time fourier transform STFT, segment an original signal into windows with a fixed length by overlapping sliding windows, multiply the signal in each window with the window function, apply a fast fourier transform to the signal in each window, convert a time domain signal into a frequency domain signal, and obtain spectrum information corresponding to each window according to a fourier transform result.

In some embodiments, the processing of the sensitive data is used to indicate blurring the sensitive data or deleting the sensitive data.

In some embodiments, the natural language processing system further comprises a storage module for storing process data of the data type identification module, the data processing module, and the data output module.

The embodiment of the application provides a natural language processing method based on a large language model, which comprises the following steps: receiving search information input by a user through a data type identification module, and carrying out data type identification on the search information, wherein the data type comprises at least one of text, picture, audio and video; matching the search information after the data type identification based on a preset deep learning model through a data processing module, wherein the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data; the data output by the data processing module is subjected to privacy processing through the data output module, target retrieval data is output, and the privacy processing is used for indicating the processing of sensitive data; the data processing module comprises a data type classification module, a data preprocessing module, a deep learning module and an information matching module; classifying the data type of the input at least one natural language data by the data type classification module; preprocessing at least one piece of natural language data subjected to type classification through the data preprocessing module, wherein the preprocessing comprises fuzzy processing and integrity processing; training the initial neural network model by the deep learning module based on at least one natural language data subjected to the type classification and the preprocessing to obtain the preset deep learning model; and matching the search information after the data type identification based on the preset deep learning model through the information matching module.

The computer device provided by the embodiment of the application comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes the method of the embodiment of the application when executing the program.

The computer readable storage medium provided by the embodiment of the present application stores a computer program thereon, which when executed by a processor implements the method provided by the embodiment of the present application.

In the natural language processing system and method based on the large language model provided by the embodiment of the application, the data processing module is used for matching the search information after the data type identification based on the preset deep learning model, and the data output module is used for further controlling the data output by the data processing module, so that the accuracy and the search efficiency of information search can be improved, and the technical problems in the background technology are solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a natural language processing system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data processing module according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a natural language processing system according to another embodiment of the present application;

FIG. 4 is a schematic diagram of an exemplary architecture of a natural language processing system according to another embodiment of the present application;

Fig. 5 is a flowchart of a natural language processing method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are illustrative of the application and are not intended to limit the scope of the application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

It should be noted that the term "first/second/third" in relation to embodiments of the present application is used to distinguish between similar or different objects, and does not represent a particular ordering of the objects, it being understood that the "first/second/third" may be interchanged with a particular order or sequencing, as permitted, to enable embodiments of the present application described herein to be implemented in an order other than that illustrated or described herein.

Large language models refer to deep learning models trained using large amounts of text data that can generate natural language text or understand the meaning of language text. Natural language processing (natural language processing, NLP) is a branch discipline in the fields of artificial intelligence and linguistics, enabling a variety of theories and methods for efficient communication between humans and computers in natural language. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relation with the research in linguistics, but has important differences. Natural language processing is not a general study of natural language, but rather is the development of computer systems, and in particular software systems therein, that are effective in achieving natural language communications, and thus are part of computer science.

With the continuous development of natural language processing, people can gradually perform information retrieval on the basis of natural language processing, for example, in the current related technology, a natural language processing system comprising an input module, a context processing module, a preprocessing module, a word segmentation processing module, a post-processing module, a parameter labeling module, a grammar database, a grammar rule matching module, a knowledge database, a logic reasoning model library and an output module is used, and the understanding capability of natural language information is improved by connecting contexts, expanding parameter dictionaries and rule sets, so that the system has better field expansibility; the language information with poor recognition effect can be improved by only abstracting new rules according to the grammar structure and adding the new rules into the system, so that the recognition effect of sentences can be improved, and the retrieval efficiency is further improved during information retrieval.

In view of the above, the embodiment of the application provides a natural language processing system and a method based on a large language model, wherein the natural language processing system comprises a data type identification module, a data processing module and a data output module; the data type identification module is used for receiving the search information input by the user and carrying out data type identification on the search information, wherein the data type comprises at least one of text, picture, audio and video; the data processing module is used for matching the search information after the data type identification based on a preset deep learning model, wherein the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data; the data output module is used for carrying out privacy processing on the data output by the data processing module and outputting target retrieval data, and the privacy processing is used for indicating the processing of sensitive data. According to the natural language processing system and method based on the large language model, the data processing module is used for matching the search information after the data type is identified based on the preset deep learning model, and the data output module is used for further controlling the data output by the data processing module, so that the accuracy and the search efficiency of information search can be improved.

In order to make the purpose and the technical scheme of the application clearer and more intuitive, the following describes in detail a natural language processing system and a method based on a large language model provided by the embodiment of the application with reference to the attached drawings and the embodiment. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Referring to fig. 1, a schematic structure diagram of a natural language processing system according to an embodiment of the application is shown. As shown in fig. 1, the natural language processing system 100 includes a data type recognition module 110, a data processing module 120, and a data output module 130.

The data type identifying module 110 is configured to receive search information input by a user, and identify a data type of the search information, where the data type includes at least one of text, picture, audio and video.

It should be understood that the search information input by the user may be text data of plain text, or may be pictures, or may be audio or video, or may be a combination of text, pictures, audio and video, etc., which is not limited in the present application.

The data processing module 120 is configured to match the search information after the data type identification based on a preset deep learning model, where the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data.

It should be understood that the preset deep learning model includes a convolutional neural network model, a residual network model, a cyclic neural network model, and a three-dimensional convolutional neural network model. The convolutional neural network model is a neural network, except that at least one layer in the neural network structure adopts a mathematical operation called convolution, and replaces matrix multiplication of the traditional artificial neural network, and the convolutional network structure enables the convolutional neural network to utilize a two-dimensional structure of input data. The residual network is also a convolutional neural network, has the characteristics of easy optimization, can improve the accuracy by increasing a considerable depth, uses jump connection for residual blocks in the residual network, and relieves the gradient vanishing problem caused by increasing the depth in the deep neural network. The recurrent neural network model is a recurrent neural network which takes sequence data as input, carries out recursion (recursion) in the evolution direction of the sequence and all nodes are connected in a chained mode; the cyclic neural network has memory, parameter sharing and complete graphics, so that the cyclic neural network has certain advantages in learning the nonlinear characteristics of the sequence. The three-dimensional convolutional neural network model can capture the time-space characteristics of brain wave (electroencephalogram, EEG) characteristics in the motion process, and keep the vital time components in the brain induction activity.

In some embodiments, the data processing module 120 is configured to match the search information after the data type identification, for example, identifying that the search information is text, picture, audio or video, based on the preset deep learning model, and output the matched related search data.

The data output module 130 is configured to perform privacy processing on the data output by the data processing module, and output target retrieval data, where the privacy processing is used to instruct processing on sensitive data.

In some embodiments, the data output module may process the sensitive data in a manner that obscures the sensitive data or deletes the sensitive data.

The sensitive data generally refers to a face, a name, an identification card number, a telephone number, a face, a name, an identification card number, a geographic location, a speaking voice, a face, a license plate number, an identification card number area, etc. of the user in the video.

The blurring process may be to replace the information such as the face, name, identification card number, phone number, etc. of the user in the text data, for example, to replace the information with a corresponding identifier; the method can also be used for carrying out fuzzy processing on faces, names, identification numbers and the like in the pictures, deleting geographic positions contained in the pictures, modifying speaking sounds of users in audio, carrying out fuzzy processing on faces, license plate numbers, identification number areas and the like of the users in video, deleting information related to personal identities and the like.

According to the natural language processing system, the data processing module is used for matching the search information after the data type identification based on the preset deep learning model, and the data output module is used for further controlling the data output by the data processing module, namely privacy processing, so that the output target search data is more accurate and has a good privacy protection function.

In a possible implementation manner, please refer to fig. 2, which is a schematic diagram of a data processing module according to an embodiment of the present application. As shown in fig. 2, the data processing module 120 includes a data type classification module 121, a data preprocessing module 122, a deep learning module 123, and an information matching module 124.

The data type classification module 121 is configured to perform data type classification on at least one input natural language data.

It should be understood that natural language data generally refers to language data that naturally evolves with culture. The data type includes at least one of text, picture, audio, and video.

In some embodiments, where the at least one natural language data includes text data, the data type classification module 121 is configured to perform word segmentation, part-of-speech tagging, named entity recognition, semantic analysis on the text data based on natural language processing techniques.

The natural language processing technology is a technology for processing and processing various types of character information special for human beings by taking an electronic computer as a tool, is an important branch in the field of artificial intelligence, and aims to enable the computer to understand, process and generate natural language and provide more intelligent language interaction experience for human beings.

In some embodiments, word segmentation of text data is the process of segmenting the text data into individual words or phrases having semantic units according to a certain rule algorithm. Because the expression modes of natural language are various and comprise Chinese, english, numerals, symbols and the like, the word segmentation needs to consider the characteristics of different languages and contexts and the relation among words, so that a reasonable and accurate word segmentation result is obtained. The word segmentation method can comprise a rule-based word segmentation method, a statistical-based word segmentation method, a deep learning-based word segmentation method and the like, wherein the rule-based word segmentation method is used for manually defining rules or templates to segment text data; the word segmentation method based on statistics learns a probability model of a text by carrying out statistical analysis on a large number of text corpora, so as to infer the most probable word segmentation result; the word segmentation method based on deep learning is to perform end-to-end learning and prediction on a text by using a neural network model, so as to obtain an optimized word segmentation result.

In some embodiments, part-of-speech tagging of text data is the process of determining the grammatical category of each word in given text data, determining its part-of-speech, and tagging. Colloquially, after the text data is segmented, the segmented words in the text data are marked with word properties, such as nouns (n), verbs (v) and the like. The part-of-speech tagging method comprises a part-of-speech tagging method based on a statistical model and a part-of-speech tagging method based on a combination of statistics and rules, wherein the part-of-speech tagging method based on the statistical model has the core idea that part-of-speech tagging is regarded as a sequence tagging problem, and the core is that given a sequence of words with respective tags, the most probable part-of-speech of the next word can be estimated and judged. Typical statistical-based models are hidden Markov models, conditional random fields, etc., which can be trained using a large corpus of labeled data; the part-of-speech tagging method based on combination of statistics and rules is mainly characterized in that a confidence level is given to a result of the statistics tagging by calculating the probability that words are tagged as all parts of speech, all corpus is subjected to statistics tagging, the result of the statistics tagging is screened, and then the result of the statistics tagging with the confidence level smaller than a threshold value is manually checked and disambiguated by adopting a rule method instead of using the statistical method and the rule method in all situations.

In some embodiments, named entities generally refer to entities in text that have a particular meaning or are referred to as strongly descriptive, typically including person names, place names, facility names, date and time, proper nouns, etc., and typically include two parts: the boundary of an entity identifies and determines the type of entity, such as a person name, place name, organization name, or others. Named entity recognition of text data is to identify named entities that may exist in a piece of text data. The named entity recognition method comprises a rule and dictionary based method, a statistical learning based method and the like, wherein the rule based method mostly adopts a linguistic expert to manually construct a rule template, adopts a method that features comprise statistical information, punctuation marks, keyword indication words, direction words, position words, center words and the like, takes pattern and character string matching as a main means, and most of the systems depend on establishment of a knowledge base and a dictionary; the method based on rules and dictionary is the earliest method used in named entity recognition, which relies on manual rule system to use named entity library and assign weight to each rule, when encountering rule conflict, the rule with highest weight is selected to judge the type of named entity. The method based on statistical learning mainly comprises the following steps: hidden markov models, maximum models, support vector machines, conditional random fields, etc.

In some embodiments, semantic analysis of text data refers to the selection of representations of text and their feature items, and text analysis is a fundamental problem of text mining and information retrieval. It quantifies feature words extracted from text to represent textual information. Text, which is the same as message meaning, refers to an information structure composed of certain symbols or codes, and can take different expression forms such as language, characters, images and the like.

In another possible implementation manner, in a case where the at least one natural language data includes picture data, the data type classification module 121 is configured to classify, target detect, and identify an object based on the convolutional neural network.

It will be appreciated that convolutional neural networks may classify images, such as classifying images into different object categories, and that features in images may be learned by training the convolutional neural network to classify images.

In some embodiments, the convolutional neural network may also detect objects in the image, such as faces, vehicles, traffic signs, etc. in the image. Wherein the target detection generally comprises two steps: the image is firstly subjected to region extraction, and then the extracted regions are classified and positioned, that is, the convolutional neural network can detect the target in the image by extracting the characteristics of the image.

It should be noted that, the picture in the embodiment of the present application may also be referred to as an image, which is not limited in this aspect of the present application.

In a possible implementation, in case that the at least one natural language data includes audio data, the data type classification module 121 is configured to extract spectral features of the audio data based on a fast fourier transform.

It should be appreciated that a fast fourier transform is a method of converting a time domain signal into a frequency domain signal that may be used to extract spectral information of the signal. Spectral extraction refers to the process of separating individual frequency components from a signal segment, and can be used in a variety of applications such as signal analysis, filtering, modulation, demodulation, and the like. The process of the fast fourier transform consists in decomposing the time domain signal into a sum of a plurality of sine and cosine functions, which are then expressed as amplitude and phase with respect to frequency, this representation being called frequency domain representation, which can graphically show the intensity and phase of the signal at different frequencies.

In another possible implementation, in case that the at least one natural language data includes video data, the data type classification module 121 is configured to perform motion detection, face recognition, and behavior analysis on the video data based on motion analysis and object tracking technology.

The motion analysis and object tracking technology comprises at least one of a Lucas-Kanade optical flow algorithm, a Gaussian mixture model algorithm, a single-target tracking algorithm and a multi-target tracking algorithm.

The Lucas-Kanade optical flow algorithm is a two-frame differential optical flow estimation algorithm, and the optical flow is a motion pattern, which refers to the apparent movement of an object, surface, edge, under a view angle, formed between an observer and the background. Optical flow techniques, such as motion detection and image segmentation, temporal collision, motion compensated coding, three-dimensional stereo parallax, are all techniques that make use of such edge or surface motion. The movement of the two-dimensional image is a projection of the three-dimensional object movement in the image plane relative to the observer, and the ordered images can estimate the instantaneous image rate or discrete image transfer of the two-dimensional image. The Lucas-Kanade algorithm is most common and most popular, calculates the shift of each pixel position in a time interval for two frames, and since it is based on the taylor series of the image signal, this method is called differencing, which is the use of partial derivatives for spatial and temporal coordinates.

In the Gaussian mixture model algorithm, the characteristics of each pixel point in an image are represented by using K (basically 3 to 5 Gaussian models), the Gaussian mixture model is updated after a new frame of image is obtained, each pixel point in the current image is matched with the Gaussian mixture model, if successful, the point is judged to be a background point, and otherwise, the point is judged to be a foreground point. Through the whole Gaussian model, the model mainly comprises two parameters of variance and mean value, and the stability, accuracy and convergence of the model are directly affected by different learning mechanisms for the mean value and the variance. Since we model the background extraction of moving objects, two parameters, variance and mean, in the gaussian model need to be updated in real time. In order to improve the learning capacity of the model, the improved method adopts different learning rates for updating the mean and variance; in order to improve the detection effect of a large and slow moving object in a busy scene, a concept of a weight average value is introduced, a background image is established and updated in real time, and then the foreground and the background are classified by combining the weight, the weight average value and the background image.

In the single-target tracking algorithm, single-target tracking is one of tasks which are based on the field of computer vision and have wide practicability, and the tracking method is to acquire the characteristic information of a target area in a first frame of a video, estimate the state of the target in subsequent trails according to the characteristic information and accurately position the target. The tracking framework of single-target tracking can be divided into 5 parts, namely a motion model, feature extraction, an observation model, model updating and integrated processing, wherein the key of whether the observation model is successful or not the target tracking can be generally divided into a generating model and a discriminant model.

The multi-target tracking algorithm may be classified into an offline multi-target tracking algorithm and an online multi-target tracking algorithm according to the order in which the tracks are generated. The multi-objective tracking algorithm in an offline mode is generally constructed as a graph model of the objective detection relationship, wherein the similarity or distance measure between the design and calculation detection is the key to determine the construction correctness of the graph model. The multi-target tracking algorithm in an online mode calculates the matching relation with the existing track according to the current detection observation, and calculates the proper matching measurement to determine the matching accuracy. Therefore, whether the multi-target tracking algorithm is in an offline mode or an online mode, learning the features of the detection result and calculating the matching similarity or distance metric are key steps of the multi-target tracking algorithm.

In one possible implementation, the data preprocessing module 122 is configured to perform preprocessing on at least one natural language data after the type classification, where the preprocessing includes blurring processing and integrity processing.

In some embodiments, where the at least one natural language data includes text data, the obfuscation is used to indicate that a portion of personal identification information in the text data, including at least one of a name, an identification card number, and a telephone number, is replaced with a preset identifier. For example, the name in the text data is replaced with "XX".

In some embodiments, in the case where the at least one natural language data includes picture data, the blurring process is used to instruct blurring of sensitive personal information in the picture data, including at least one of a face, a name, and an identification card number, delete geographic location information contained in a picture in the picture data, and identify a copyrighted picture, and add a text watermark on the copyrighted picture to identify a source of the picture and the copyrighted information. For example, the name and the identification card number in the picture data are subjected to mosaic processing.

In some embodiments, where the at least one natural language data includes audio data, the blurring process is used to instruct deletion of information related to the identity of the individual in the audio data based on an audio editing tool, and to modify the speaker's voice by changing the voice characteristics of the audio in the audio data, to replace sensitive vocabulary or personal information with generic placeholder words or randomly generated replacement words, or the like. Illustratively, the original sound in the audio data is subjected to a sound-changing process, such as changing male sounds to female sounds.

In some embodiments, where the at least one natural language data includes video data, blurring is used to indicate blurring of faces, license numbers, identification number areas in the video data, deleting information related to the identity of the person, and processing sounds in the video in the natural language data. Illustratively, the license plate number and the identification card number area in the video are subjected to mosaic processing.

In some embodiments, where the at least one natural language data includes text data, the integrity process is to instruct converting text in the text data to a digital representation based on a preset algorithm including employing at least one of a bag of words model, a frequency-reverse document frequency (TF-IDF) vectorization, and word embedding.

Wherein the bag of words model is a representation model that is simplified under natural language processing and information retrieval. It is characterized by that under the bag-of-words model, the words, such as sentences or documents, can be represented by means of a bag holding these words, and this representation mode does not take into account grammar and word sequence.

And the TF-IDF vectorization processing is used for indicating that for each document in the text data, the occurrence frequency of each word in each document and the inverse document frequency of each word in the whole corpus are calculated to obtain the TF-IDF weight of each word.

Wherein the inverse document frequency of each word in the whole corpus is obtained according to the logarithmic result of the ratio of the total number of documents in the corpus to the number of documents comprising each word, and the formula is as follows:

IDF＝log(N/DF)

Where IDF is the inverse document frequency of each word in the whole corpus, N is the total number of documents in the corpus, and DF is the number of documents comprising each word.

Further, the TF-IDF weight of each word is obtained by multiplying the frequency of the inverse document by the number of occurrences of each word in each document, as follows:

TF-IDF＝TF*IDF

Wherein, TF-IDF is the weight of TF-IDF, TF is the number of occurrences of each word in each document, and IDF is the inverse document frequency. Each document is represented as a vector, where each dimension of the vector corresponds to a word and the value is the TF-IDF weight of the word.

It should be understood that word embedding is a generic term for language models and token learning techniques in natural language processing. Conceptually, it refers to embedding a high-dimensional space, which is the number of all words in dimension, into a continuous vector space, which is much lower in dimension, each word or phrase being mapped to a vector on the real number domain. The word embedding method comprises an artificial neural network, dimension reduction of a word co-occurrence matrix, a probability model, explicit representation of the context of the word and the like. In the bottom layer input, the word embedding method is used for representing the word group, so that the effects of a grammar analyzer, text emotion analysis and the like in the NLP are greatly improved.

In some embodiments, where the at least one natural language data includes picture data, an integrity process for indicating to resize a picture in the picture data to adapt the picture to input requirements of the model, and extracting features of the picture based on the pre-trained convolutional neural network.

In some embodiments, where the at least one natural language data includes audio data, the integrity process is to instruct converting the audio data to a time-frequency representation based on a short-time fourier transform.

The short-time fourier transform processing is used for indicating that a hamming window or a rectangular window is used as a window function in the short-time fourier transform, dividing an original signal into windows with fixed lengths by adopting a mode of overlapping sliding windows, multiplying signals in each window by the window function, applying fast fourier transform to the signals in each window, converting a time domain signal into a frequency domain signal, and obtaining spectrum information corresponding to each window according to a fourier transform result, namely, the steps are sequentially executed, so that audio data can be converted into a time-frequency representation.

In some embodiments, in the case where the at least one natural language data comprises video data, an integrity process is used to instruct extraction of a sequence of frames from the video data as input, or to obtain timing information by an optical flow method.

Optical flow is a concept in object motion detection in the field of view to describe the motion of an observed object, surface or edge caused by motion relative to an observer. Optical flow methods are very useful in pattern recognition, computer vision, and other image processing fields, for motion detection, object cutting, computation of collision time and object expansion, motion compensation coding, or stereo measurement through object surfaces and edges, etc.

In a possible implementation manner, the deep learning module 123 is configured to train the initial neural network model based on the at least one natural language data after the type classification and the preprocessing, to obtain a preset deep learning model, and further, the information matching module 124 is configured to match the search information after the data type identification based on the preset deep learning model.

The preset deep learning model comprises a convolutional neural network model, a residual network model, a cyclic neural network model and a three-dimensional convolutional neural network model.

In some embodiments, where the at least one natural language data includes text data, the information matching module 124 is configured to perform text classification and named entity recognition on the text data based on the convolutional neural network model to match the retrieved information after the data type recognition.

It should be appreciated that text classification of text data can be divided into three steps: 1) Text preprocessing, namely word segmentation, word stopping removal and first class; 2) Text representation, i.e. representing text as a vector; 3) Classification model construction, i.e., classification, svm, textcnn, and so on. The implementation methods can be roughly classified into two types: text classification based on traditional machine learning and text classification based on deep learning.

In the text preprocessing, word segmentation is a process of segmenting text data into words or phrases with semantic units according to a certain rule algorithm. The term to be deactivated refers to that some words which have no effect on classification tasks are removed in advance, and some general dictionary of the deactivated words at present has about 2000 words, and mainly comprises some adverbs, adjectives and connective words. Normalization refers to normalizing a number of DIGITs to "DIGIT" and a TIME to "TIME" and URL links to "URL", although different DIGITs represent different meanings, they are the same for many classification tasks, which can reduce dictionary size.

Named entities generally refer to entities with specific meaning or strong meaning in text, generally include person names, place names, organization names, date and time, proper nouns and the like, and the named entity identification of text data is to identify named entities possibly existing in a section of text data. Named entity recognition methods include rules and dictionary based methods, statistical learning based methods, and the like.

In some embodiments, in the case that the at least one natural language data includes picture data, the information matching module 124 is configured to process the picture data with a residual block based on a residual network model, and solve the disappearance of a deep network training gradient, so as to match the search information after the data type identification.

It should be understood that, generally, the input to the activation function is the output result of the calculation of the neural network layer by layer, but since the gradient instability, such as gradient explosion or gradient disappearance, occurs easily due to the continuous deepening of the network, the error does not become smaller and smaller with the gradual deepening of the network, so that the problem of gradient instability and gradient disappearance of deep network training can be solved by using the residual block, which needs to refer to the input result by high output result in a jump connection manner.

In some embodiments, where the at least one natural language data includes audio data, the information matching module 124 is configured to perform audio sequence data processing on the audio data based on the recurrent neural network model, and speech recognition and speech generation to match the retrieved information after the data type recognition.

Among them, the recurrent neural network is a special neural network structure, and exhibits excellent effects in processing sequence data (e.g., audio data). Unlike feed forward neural networks, cyclic neural networks are capable of processing not only current inputs, but also previous information in an input sequence, when processing the sequence, thereby enabling modeling of contextual information of the input data. Specifically, the recurrent neural network may calculate the output result of the previous time and the input data of the current time through one recurrent neural unit, so as to obtain the output result of the current time.

When audio sequence data processing is performed on audio data based on a cyclic neural network model, and voice recognition and voice generation are performed, firstly, a batch of known voice sample data is needed to be used in a training stage, and through a specific training algorithm, the internal parameters of the cyclic neural network, namely, the connection weights, are optimized. These optimized parameters will be used in the subsequent speech generation process. Then a trained recurrent neural network is used, and the speech text to be generated is used as the input of the network, and after a series of calculations, the network obtains an audio output for the text.

In some embodiments, where the at least one natural language data includes video data, the information matching module 124 is configured to perform video classification and motion recognition on the video data based on the three-dimensional convolutional neural network model to match the retrieved information after the data type recognition.

In the aspect of video classification technology, we need to extract features of a video first and then classify the extracted features. The video classification of the convolutional neural network is mainly divided into two methods, one is classification by utilizing visual characteristics, the other is classification by utilizing multi-modal characteristics, and aiming at the single-modal characteristics, the common method is a method based on dynamic global pooling of a 3D convolutional layer. When the three-dimensional convolutional neural network model is used for identifying the actions of the video data, the three-dimensional convolutional neural network model can automatically extract the characteristics and time sequence information of the image or the video, and the effect is good.

In some embodiments, please refer to fig. 3, which is a schematic diagram of a natural language processing system according to another embodiment of the present application, as shown in fig. 3, the natural language processing system 100 further includes a storage module 140, where the storage module 140 is configured to store process data of the data type identification module 110, the data processing module 120, and the data output module 130.

Illustratively, the storage module 140 may include a text data storage module for storing related text data, a picture data storage module for storing related picture data, an audio data storage module for storing related audio data, and a video data storage module for storing related video data.

In this embodiment, the data processing module in the natural language processing system is configured to match the search information after the data type recognition based on a preset deep learning model, where the preset deep learning model needs to be established by sequentially processing at least one input natural language data by a data type classification module, a data preprocessing module, and a deep learning module in the data processing module, for example, the data type classification module performs data type classification on the input at least one natural language data, and the data preprocessing module performs preprocessing, that is, fuzzy processing and integrity processing, on the at least one natural language data after the type classification; the deep learning module trains the initial neural network model based on at least one natural language data subjected to type classification and preprocessing to obtain a preset deep learning model; and the information matching module is used for matching the search information after the data type identification based on a preset deep learning model. Further, the data output module performs privacy processing on the data output by the data processing module and outputs target retrieval data.

In the natural language processing system, the data type recognition module performs data type recognition on the search information input by the user, and when a preset deep learning model is established, the data type classification module performs data type classification on at least one input natural language data, namely, a classification mode is adopted, so that the classification processing of the data is realized, and further, the better processing efficiency and the better processing effect are achieved. In addition, the natural language processing system is also provided with a data output module for further controlling the output information, and the retrieval precision is improved. Therefore, the natural language processing system in the embodiment of the application can improve the accuracy and the retrieval efficiency of information retrieval.

On the basis of the above embodiment, please refer to fig. 4, which is a schematic diagram illustrating an exemplary structure of a natural language processing system according to another embodiment of the present application, in which data types include text, pictures, audio and video, as shown in fig. 4, the data type recognition module 110 includes a text data recognition module 111, a picture data recognition module 112, an audio data recognition module 113 and a video data recognition module 114; the data processing module 120 includes a data type classification module 121, a data preprocessing module 122, a deep learning module 123, and an information matching module 124; the data output module 130 includes a text data output module 131, a picture data output module 132, an audio data output module 133, and a video data output module 134; the storage module 140 may include a text data storage module 141, a picture data storage module 142, an audio data storage module 143, and a video data storage module 144.

Wherein, the data type classification module 121 includes a text data type classification module 1211, a picture data type classification module 1212, an audio data type classification module 1213, and a video data type classification module 1214, the data preprocessing module 122 includes a text data preprocessing module 1221, a picture data preprocessing module 1222, an audio data preprocessing module 1223, and a video data preprocessing module 1224, the depth learning module 123 includes a text depth learning module 1231, a picture depth learning module 1232, an audio depth learning module 1233, and a video depth learning module 1234, and the information matching module 124 includes a text information matching module 1241, a picture information matching module 1242, an audio information matching module 1243, and a video information matching module 1244.

In some embodiments, as shown in fig. 4, the data type recognition module 110 is configured to receive search information input by a user, perform data type recognition on the search information, and the data processing module 120 is configured to match the search information after performing data type recognition based on a preset deep learning model, the data output module 130 is configured to perform privacy processing on data output by the data processing module, and output target search data, the storage module 140 is configured to store process data of the data type recognition module 110, the data processing module 120, and the data output module 130, and the storage module 140 is configured to store data of the data type recognition module 110 and the data processing module 120 in fig. 4.

Wherein the text data recognition module 111 is used for recognizing text data in the search information, the picture data recognition module 112 is used for recognizing picture data in the search information, the audio data recognition module 113 is used for recognizing audio data in the search information, and the video data recognition module 114 is used for recognizing video data in the search information.

The data type classification module 121 is configured to perform data type classification on the input at least one natural language data, the data preprocessing module 122 is configured to perform preprocessing on the at least one natural language data after the type classification, the preprocessing includes fuzzy processing and integrity processing, the deep learning module 123 is configured to train the initial neural network model based on the at least one natural language data after the type classification and the preprocessing, to obtain a preset deep learning model, and the information matching module 124 is configured to match the search information after the data type recognition based on the preset deep learning model.

The text data output module 131 is used for privacy processing of the text data output by the data processing module 120 and outputting target search data, the picture data output module 132 is used for privacy processing of the picture data output by the data processing module 120 and outputting target search data, the audio data output module 133 is used for privacy processing of the audio data output by the data processing module 120 and outputting target search data, and the video data output module 134 is used for privacy processing of the video data output by the data processing module 120 and outputting target search data.

The text data storage module 141 is used for storing text data in a natural language processing system, the picture data storage module 142 is used for storing picture data in the natural language processing system, the audio data storage module 143 is used for storing audio data in the natural language processing system, and the video data storage module 144 is used for storing video data in the natural language processing system.

In some embodiments, the text data type classification module 1211 is for classifying text data in the input at least one natural language data, the picture data type classification module 1212 is for classifying picture data in the input at least one natural language data, the audio data type classification module 1213 is for classifying audio data in the input at least one natural language data, and the video data type classification module 1214 is for classifying video data in the input at least one natural language data.

The text data preprocessing module 1221 is used for preprocessing the text data output by the text data type classification module 1211, the picture data preprocessing module 1222 is used for preprocessing the picture data output by the picture data type classification module 1212, the audio data preprocessing module 1223 is used for preprocessing the audio data output by the audio data type classification module 1213, and the video data preprocessing module 1224 is used for preprocessing the video data output by the video data type classification module 1214.

The text deep learning module 1231 is configured to train the initial neural network model based on the text data output by the text data preprocessing module 1221 to obtain a convolutional neural network model, the picture deep learning module 1232 is configured to train the initial neural network model based on the picture data output by the picture data preprocessing module 1222 to obtain a residual network model, the audio deep learning module 1233 is configured to train the initial neural network model based on the audio data output by the audio data preprocessing module 1223 to obtain a cyclic neural network model, and the video deep learning module 1234 is configured to train the initial neural network model based on the video data output by the video data preprocessing module 1224 to obtain a three-dimensional convolutional neural network model.

The text information matching module 1241 is configured to match the search information after data type recognition based on the convolutional neural network model, output matched text data, the picture information matching module 1242 is configured to match the search information after data type recognition based on the residual network model, output matched picture data, the audio information matching module 1243 is configured to match the search information after data type recognition based on the convolutional neural network model, output matched audio data, and the video information matching module 1244 is configured to match the search information after data type recognition based on the three-dimensional convolutional neural network model, and output matched video data.

In this embodiment, taking the example that the data types include text, picture, audio and video, the processing procedure of the natural language processing system is described in detail, the data processing module matches the search information after the data type is identified based on the preset deep learning model, and the data output module further controls the data output by the data processing module, so that the accuracy and the search efficiency of information search can be improved.

Based on the above embodiments, please refer to fig. 5, which is a flowchart illustrating a natural language processing method according to an embodiment of the present application. As shown in fig. 5, the method may include the following steps 501 to 503:

Step 501, receiving search information input by a user through a data type identification module, and performing data type identification on the search information, wherein the data type comprises at least one of text, picture, audio and video.

Step 502, matching, by a data processing module, the search information after the data type recognition based on a preset deep learning model, where the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data.

In step 503, privacy processing is performed on the data output by the data processing module through the data output module, and target search data is output, where the privacy processing is used for indicating processing of sensitive data.

It should be noted that, the natural language processing method is executed by the natural language processing system in the above embodiment, and specific execution steps of each module have been described in the foregoing, which is not repeated herein.

In some embodiments, the data processing module includes a data type classification module, a data preprocessing module, a deep learning module, and an information matching module, the method further comprising: classifying the data type of the input at least one natural language data by a data type classifying module; preprocessing at least one piece of natural language data subjected to type classification through a data preprocessing module, wherein the preprocessing comprises fuzzy processing and integrity processing; training the initial neural network model by a deep learning module based on at least one natural language data subjected to the type classification and the preprocessing to obtain the preset deep learning model; and matching the search information after the data type identification based on the preset deep learning model through an information matching module.

In some embodiments, in the case that the at least one natural language data includes text data, performing word segmentation, part-of-speech tagging, named entity recognition, semantic analysis on the text data by a data type classification module based on natural language processing techniques; classifying, target detecting and object identifying the picture data based on a convolutional neural network through a data type classification module under the condition that the at least one natural language data comprises the picture data; extracting, by a data type classification module, spectral features of the audio data based on a fast fourier transform, in case the at least one natural language data comprises audio data; and under the condition that the at least one natural language data comprises video data, performing action detection, face recognition and behavior analysis on the video data by a data type classification module based on motion analysis and object tracking technology.

In some embodiments, where the at least one natural language data comprises text data, the blurring process comprises: replacing a personal identity information part in the text data with a preset identifier, wherein the personal identity information comprises at least one of a name, an identity card number and a telephone number; in the case where the at least one natural language data includes picture data, the blurring process includes: performing fuzzy processing on sensitive personal information in the picture data, deleting geographic position information contained in the picture data, identifying a copyright picture, and adding a text watermark on the copyright picture to identify a picture source and copyright information, wherein the sensitive personal information comprises at least one of a face, a name and an identity card number; in the case where the at least one natural language data includes audio data, the blurring process includes: deleting information related to the personal identity in the audio data based on an audio editing tool, and replacing sensitive vocabulary or personal information with universal placeholder words or randomly generated replacement words by changing the sound characteristics of the audio in the audio data; in the case where the at least one natural language data includes video data, the blurring process includes: the face, license plate number and ID card number area in the video data are subjected to fuzzy processing, information related to personal identity is deleted, and sound in the video in the natural language data is processed.

In some embodiments, where the at least one natural language data comprises text data, the integrity process comprises: converting text in the text data into a digital representation based on a preset algorithm, wherein the preset algorithm comprises at least one of word bag model, word frequency-reverse file frequency TF-IDF vectorization and word embedding; in the case where the at least one natural language data includes picture data, the integrity process includes: adjusting the size of the picture in the picture data to enable the picture to adapt to the input requirement of a model, and extracting the characteristics of the picture based on a pretrained convolutional neural network; in the case where the at least one natural language data includes audio data, the integrity process includes: converting the audio data into a time-frequency representation based on a short-time fourier transform; in the case where the at least one natural language data comprises video data, the integrity process comprises: and extracting a frame sequence from the video data as input or acquiring time sequence information through an optical flow method.

In some embodiments, the preset deep learning model includes a convolutional neural network model, a residual network model, a cyclic neural network model, and a three-dimensional convolutional neural network model, the method further comprising: in the case that the at least one natural language data comprises text data, carrying out text classification and named entity recognition on the text data based on the convolutional neural network model through an information matching module so as to match the search information after the data type recognition; under the condition that the at least one natural language data comprises picture data, processing the picture data by adopting a residual block based on the residual network model through an information matching module, and solving the problem that a deep network training gradient disappears so as to match the search information after the data type identification; in the case that the at least one natural language data includes audio data, performing audio sequence data processing, voice recognition and voice generation on the audio data based on the cyclic neural network model through an information matching module so as to match retrieval information after the data type recognition; and under the condition that the at least one natural language data comprises video data, carrying out video classification and action recognition on the video data based on the three-dimensional convolutional neural network model through an information matching module so as to match the search information subjected to the data type recognition.

In some embodiments, the TF-IDF vectorization process comprises: for each document in the text data, calculating the occurrence frequency of each word in each document and the inverse document frequency of each word in the whole corpus to obtain TF-IDF weight of each word;

IDF＝log(N/DF)

In some embodiments, the short-time fourier transform process includes: using a hamming window or a rectangular window as a window function in the short-time fourier transform STFT; dividing the original signal into windows with fixed length by adopting a mode of overlapping sliding windows; multiplying the signal within each window with the window function; applying a fast fourier transform to the signals within each window, converting the time domain signals into frequency domain signals; and obtaining the frequency spectrum information corresponding to each window according to the Fourier transform result.

In some embodiments, the processing of sensitive data includes: and blurring processing is carried out on the sensitive data, or the sensitive data is deleted.

In some embodiments, the natural language processing system further comprises a storage module, the method further comprising: and storing the process data of the data type identification module, the data processing module and the data output module through a storage module.

An embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method provided in the above-described embodiment.

Embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the method provided by the method embodiments described above.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" or "some embodiments" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "in some embodiments" in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

The term "and/or" is herein merely an association relation describing associated objects, meaning that there may be three relations, e.g. object a and/or object B, may represent: there are three cases where object a alone exists, object a and object B together, and object B alone exists.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described embodiments are merely illustrative, and the division of the modules is merely a logical function division, and other divisions may be implemented in practice, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or modules, whether electrically, mechanically, or otherwise.

The modules described above as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules; can be located in one place or distributed to a plurality of network units; some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may be separately used as one unit, or two or more modules may be integrated in one unit; the integrated modules may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Or the above-described integrated units of the application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the related art, embodied in the form of a software product stored in a storage medium, including several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The features disclosed in the several product embodiments provided by the application can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.

The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.

The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The natural language processing system based on the large language model is characterized by comprising a data type identification module, a data processing module and a data output module;

The data type identification module is used for receiving search information input by a user and carrying out data type identification on the search information, wherein the data type comprises at least one of text, pictures, audio and video;

The data processing module is used for matching the search information after the data type identification based on a preset deep learning model, and the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data;

The data output module is used for carrying out privacy processing on the data output by the data processing module and outputting target retrieval data, and the privacy processing is used for indicating the processing of sensitive data;

the data processing module comprises a data type classification module, a data preprocessing module, a deep learning module and an information matching module;

The data type classification module is used for classifying data types of at least one input natural language data;

The data preprocessing module is used for preprocessing at least one piece of natural language data subjected to type classification, the preprocessing comprises fuzzy processing and integrity processing, wherein the integrity processing is used for indicating that text in the text data is converted into digital representation based on a preset algorithm when the at least one piece of natural language data comprises text data, the preset algorithm comprises at least one of word bag model, word frequency-reverse file frequency TF-IDF vectorization and word embedding, the integrity processing is used for indicating that the size of a picture in the picture data is adjusted to enable the picture to meet the input requirement of a model when the at least one piece of natural language data comprises picture data, and the integrity processing is used for indicating that the audio data is converted into time-frequency representation based on short-time Fourier transform when the at least one piece of natural language data comprises audio data when the at least one piece of natural language data comprises video data, and is used for indicating that the integrity processing is used for acquiring the picture from the picture data or extracting the picture from the picture data as time sequence information through a time sequence information based on a convolutional neural network;

the deep learning module is used for training the initial neural network model based on at least one natural language data subjected to the type classification and the preprocessing to obtain the preset deep learning model;

the information matching module is used for matching the search information after the data type identification based on the preset deep learning model.

2. The system of claim 1, wherein, in the case where the at least one natural language data includes text data, the data type classification module is configured to perform word segmentation, part-of-speech tagging, named entity recognition, semantic analysis on the text data based on natural language processing techniques;

The data type classification module is used for classifying the picture data, detecting targets and identifying objects based on a convolutional neural network under the condition that the at least one natural language data comprises the picture data;

In the case that the at least one natural language data includes audio data, the data type classification module is configured to extract spectral features of the audio data based on a fast fourier transform;

And the data type classification module is used for performing motion detection, face recognition and behavior analysis on the video data based on motion analysis and object tracking technology when the at least one natural language data comprises video data.

3. The system according to claim 1, wherein in the case where the at least one natural language data includes text data, the blurring process is for instructing to replace a part of personal identification information in the text data, which includes at least one of a name, an identification card number, and a telephone number, with a preset identifier;

when the at least one natural language data includes picture data, the blurring process is configured to instruct blurring processing to sensitive personal information in the picture data, delete geographical location information included in a picture in the picture data, identify a copyrighted picture, and add a text watermark to the copyrighted picture to identify a source of the picture and the copyrighted information, where the sensitive personal information includes at least one of a face, a name, and an identification card number;

In the case that the at least one natural language data includes audio data, the blurring process is configured to instruct deletion of information related to a person's identity in the audio data based on an audio editing tool, and replace sensitive vocabulary or personal information with a general placeholder or a randomly generated replacement word by changing a sound feature of audio in the audio data;

In the case that the at least one natural language data includes video data, the blurring process is used for instructing blurring process of a face, a license plate number, an identification card number area in the video data, deleting information related to personal identity, and processing sound in a video in the natural language data.

4. The system of claim 1, wherein the pre-set deep learning model comprises a convolutional neural network model, a residual network model, a cyclic neural network model, and a three-dimensional convolutional neural network model;

The information matching module is used for carrying out text classification and named entity recognition on the text data based on the convolutional neural network model so as to match the search information after the data type recognition;

The information matching module is used for processing the picture data by adopting a residual block based on the residual network model to solve the problem that the gradient of deep network training disappears so as to match the search information after the data type identification;

in the case that the at least one natural language data includes audio data, the information matching module is configured to perform audio sequence data processing, voice recognition and voice generation on the audio data based on the recurrent neural network model, so as to match the search information after the data type recognition;

And the information matching module is used for carrying out video classification and action recognition on the video data based on the three-dimensional convolutional neural network model so as to match the search information after the data type recognition.

5. The system of claim 2, wherein the motion analysis, object tracking technique comprises at least one of a Lucas-Kanade optical flow algorithm, a gaussian mixture model algorithm, a single-target tracking algorithm, and a multi-target tracking algorithm.

6. The system of claim 1, wherein the TF-IDF vectorization process is configured to instruct, for each document in the text data, calculating a number of occurrences of each word in each document and an inverse document frequency of each word in the entire corpus, and obtaining TF-IDF weights of each word;

IDF＝log(N/DF)

7. The system of claim 1, wherein the short-time fourier transform process is configured to instruct segmentation of an original signal into windows of a fixed length using hamming windows or rectangular windows as window functions in the short-time fourier transform STFT by overlapping sliding windows, multiplying signals in each window by the window functions, applying a fast fourier transform to the signals in each window, converting a time-domain signal into a frequency-domain signal, and obtaining spectral information corresponding to each window according to fourier transform results.

8. The system of claim 1, wherein the processing of sensitive data is to indicate blurring of the sensitive data or deleting the sensitive data.

9. The system of any one of claims 1 to 8, wherein the natural language processing system further comprises a storage module for storing process data of the data type identification module, the data processing module, and the data output module.

10. A method of natural language processing based on a large language model, applied to the natural language processing system of any one of claims 1 to 9, the method comprising:

receiving search information input by a user through a data type identification module, and carrying out data type identification on the search information, wherein the data type comprises at least one of text, picture, audio and video;

Matching the search information after the data type identification based on a preset deep learning model through a data processing module, wherein the preset deep learning model is obtained by training an initial neural network model based on at least one natural language data;

The data output by the data processing module is subjected to privacy processing through the data output module, target retrieval data is output, and the privacy processing is used for indicating the processing of sensitive data;

classifying the data type of the input at least one natural language data by the data type classification module;

preprocessing at least one piece of natural language data subjected to type classification through the data preprocessing module, wherein the preprocessing comprises fuzzy processing and integrity processing, the integrity processing is used for indicating that text in the text data is converted into digital representation based on a preset algorithm when the at least one piece of natural language data comprises text data, the preset algorithm comprises at least one of word bag model, word frequency-inverse file frequency TF-IDF vectorization and word embedding, the integrity processing is used for indicating that the size of a picture in the picture data is adjusted to enable the picture to meet the input requirement of a model when the at least one piece of natural language data comprises picture data, and the integrity processing is used for indicating that the audio data is converted into time-frequency representation based on a Fourier transform when the at least one piece of natural language data comprises video data, and is used for indicating that the picture in the picture is subjected to the integrity processing to adjust the size of the picture to meet the input requirement of the model based on a pre-trained convolutional neural network, and extracting characteristics of the picture when the at least one piece of natural language data comprises audio data, and the integrity processing is used for indicating that the audio data is converted into time-frequency representation based on Fourier transform when the at least one piece of natural language data comprises video data, and the integrity processing is used for acquiring the complete video data as time sequence information through a time sequence method;

Training the initial neural network model by the deep learning module based on at least one natural language data subjected to the type classification and the preprocessing to obtain the preset deep learning model;

and matching the search information after the data type identification based on the preset deep learning model through the information matching module.