CN112463927A

CN112463927A - Efficient intelligent semantic matching method

Info

Publication number: CN112463927A
Application number: CN202011427260.9A
Authority: CN
Inventors: 陈丽园
Original assignee: Shanghai Hi Kuqiang Supply Chain Information Technology Co ltd
Current assignee: Shanghai Hi Kuqiang Supply Chain Information Technology Co ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-03-09

Abstract

The invention discloses an efficient intelligent semantic matching method, which comprises the following steps: the intelligent mapping method of the natural language processing vocabulary library based on the fuzzy algorithm comprises a voice data acquisition software and hardware system and a network storage database, and also comprises an intelligent mapping vocabulary library system and a natural language database special for a user; the voice data acquisition software and hardware system comprises voice acquisition software and/or a voice acquisition software and hardware system, and the voice acquisition software is installed in the host; the intelligent mapping vocabulary library system comprises a key word setting and capturing program, a word semantic comparison program, a word semantic mapping program and a user-specific vocabulary library generating system. The invention researches the intelligent mapping vocabulary library processed by natural language, establishes a multi-level vocabulary library, is suitable for various complex scene applications, can continuously and automatically upgrade deep learning, automatically matches dialect, common language and voice expression habits, is convenient for users to apply, and improves the accuracy and the processing efficiency.

Description

Efficient intelligent semantic matching method

Technical Field

The invention relates to the field of natural language processing, in particular to an efficient intelligent semantic matching method.

Background

Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will relate to natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics, but has important difference. Natural language processing is not a general study of natural language but is directed to the development of computer systems, and particularly software systems therein, that can efficiently implement natural language communications. It is thus part of computer science. Natural Language Processing (NLP) is a field of computer science, Artificial Intelligence (AI), linguistics that focuses on the interaction between computer and human (natural) language.

The modern logic is used as a tool for analyzing the natural language, and the defects of the natural language include (1) the hierarchical structure of an expression is not clear enough, (2) the individuation cognition mode is not clear enough, (3) the domination range of quantifier words is not exact, (4) the word order of sentence components is not fixed, and (5) the language form and the semanteme are not corresponding. The logic language is measured from the perspective of natural language, and the defects of the logic language are (1) the variety of initial terms is not diverse enough, (2) the variety of quantifier is poor, (3) the domain with quantifier cannot dynamically extend in a formula series, and (4) the efficiency of language information transmission is not high due to the lack of context.

The invention patent with the prior art publication number of CN110780878A discloses a method for carrying out JavaScript type inference by adopting a deep learning technology, which relates to deep learning of a source code in the field of artificial intelligence, and comprises the following contents: collecting and processing data, constructing a model, training the model, evaluating the model and deducing the type, firstly determining the type of the neural network, then determining the number of layers of the neural network, and finally determining the number of neurons in each layer, wherein the collecting and processing of the data comprises the following steps: downloading a certain amount of source codes on Github, screening the source codes with rich types as a final data set, converting the data into a format in which words (tokens) and types (types) are aligned and matched, simultaneously generating token and type vocabulary libraries, finally expressing the source codes into a data format suitable for learning by utilizing token-type mapping, tracking loss function values and classification errors, updating model parameters until a model with higher accuracy is obtained, outputting the model in a file form, and carrying out certain bottom-layer program processing.

The invention patent with the patent number of CN201711204633.4 discloses a natural language question-answering method and system based on question and knowledge graph structure analysis, belonging to the field of graph calculation and natural language question-answering, and aiming at the problems of various existing question-answering systems, the invention provides a natural language question-answering method based on question and knowledge graph structure construction, analysis and matching, so as to expand the types of questions capable of answering natural language and simultaneously improve the accuracy of answers, and the method adopts the structure of a graph to represent query question through traversal taking entity nodes as the center and designing extraction rules; the method includes the steps that subgraphs covering all answers are searched in a knowledge graph in a mapping mode based on a query graph, matching similarity is calculated while answer subgraphs are filtered and screened through building of semantic vectors of paths and phrases, candidate answers are obtained, the processing effect on natural language generated based on graphs is obvious, the field of graphs which can be generated in reality is narrow, and the places where graphs meet requirements can be obtained are fewer, so that large-scale practical application is difficult, the method is still in a continuous research stage, cannot adapt to places with complex environments, cannot be applied to a voice environment, can provide some basic technical support on bottom-layer logic, and cannot provide creative technical inspiration.

The invention patent with publication number CN111444700A discloses a text similarity measurement method based on semantic document expression, which respectively performs word segmentation preprocessing on sentences of each text by acquiring two texts to be compared; mapping the words after the two texts are preprocessed to generate word vectors; processing each text through a convolutional neural network model CNN and a bidirectional long-and-short time memory cycle network model BilSTM, and extracting the semantic features of CNN sentences and BilSTM sentences of each text; capturing attention features of each sentence semantic feature of each text through an attention mechanism model, generating weight vectors, calculating weight sums, generating CNN semantic expression vectors and BilSTM semantic expression vectors, and splicing the two semantic expression vectors of each text respectively to generate vocabulary semantic association feature vectors; the similarity calculation function is constructed according to the vocabulary semantic association feature vectors of the two texts, the similarity of the sentences of the two texts is calculated, a software program of a similar algorithm is also adopted in the invention, and the method belongs to the basic technical support level of the invention, is only a calculation tool and cannot provide creative inspiration for the invention.

The invention discloses a patent with publication number CN109684482A and discloses a national culture resource clustering analysis method based on a deep neural network model, which belongs to the technical field of national culture resource mining, and comprises the steps of firstly obtaining national culture resource text data by using a distributed web crawler technology, then carrying out text analysis and text preprocessing on the national culture resource text data by using a natural language processing technology, realizing feature word extraction and vectorization of the national culture resource text based on doc2vec, then clustering the quantified national culture resource text based on a K-means clustering algorithm, determining the optimal clustering cluster number by using an elbow rule method, and finally obtaining the national culture resource text association relationship Storage and intelligent services provide technical support.

The invention patent with the publication number of CNCN110413627A discloses an information optimization method, an information optimization device, a server and a storage medium, wherein the method comprises the following steps: acquiring request information sent by a client, and determining response information corresponding to the request information from a preset information database; sending the response information to the client, and acquiring feedback information input based on the response information from the client; and determining the satisfaction degree of the response information according to the feedback information, and optimizing the response information when the satisfaction degree is smaller than a preset satisfaction degree threshold value to obtain the optimized response information, wherein the satisfaction degree determined according to the optimized response information is larger than or equal to the satisfaction degree threshold value, and the updating efficiency of information data in the server can be improved.

The invention patent with the patent number of CN201510260549.9 discloses a Smith predictor parameter estimation method based on a fuzzy algorithm, which can be used for estimating Smith predictor parameters in real time and assisting the Smith predictor to accurately and quickly improve the PID control performance of a time-delay system. The method estimates a proportionality coefficient Kp (t) and a hysteresis coefficient tau (t) of a Smith predictor in real time by utilizing the output and the input of a time-delay system, and estimates an inertia coefficient T (t) in real time by utilizing a fuzzy algorithm. The Smith predictor parameters Kp (t) and tau (t) can be effectively estimated in real time, and the accuracy of real-time estimation is greatly improved; the inertial link coefficient T (t) fuzzy algorithm model is established, so that the accuracy and the adaptability of real-time estimation of T (t) are improved, and the interference of external interference signals on the real-time estimation of the inertial link coefficient T (t) is eliminated. The method improves the control quality and accuracy of a time-delay system based on PID control, ensures the real-time performance and stability of PID control, has a relatively narrow specific application range, and needs a plurality of specific application scenes and innovative research of internal logic and discovery of regularity to apply the method.

In view of the above, the existing technologies are relatively incomplete, so it is necessary to research an intelligent mapping method for a natural language processing vocabulary library based on a fuzzy algorithm, establish multiple hierarchies, be suitable for various complex scene applications, facilitate user applications, and improve accuracy and processing efficiency.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provides an intelligent mapping method for a natural language processing vocabulary library based on a fuzzy algorithm, the method has strong applicability and wide application range, does not need conventional special investigation means which consumes time and labor and has huge cost, can cool people after the former people plant trees, is more fit for the latter users, and does not need to collect, map, eliminate and bind from the beginning; through multiple mapping binding, unbinding and optimization, the established vocabulary library has strong relevance and high accuracy; the dialects of all regions can be compatible, dialectization mutual translation can be automatically realized through region level setting and dialect setting, and the dialectization mutual translation is more suitable for the subsequent users.

In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:

an intelligent mapping method of a natural language processing vocabulary library based on a fuzzy algorithm comprises a network storage database, an intelligent mapping vocabulary library system, an upstream and downstream system butt joint intelligent recognition automatic mapping system, a deep learning automatic optimization upgrading system and a user-specific natural language database; the intelligent mapping vocabulary library system comprises a key word setting and capturing program, a word semantic comparison program, a word semantic mapping program and a user-specific vocabulary library generating system; the deep learning automatic optimization upgrading system comprises the following specific operation steps:

a. a user installs voice acquisition software and a voice acquisition software and hardware system in a target working area, and performs software and hardware connection and matching debugging;

b. enabling a system by a user, logging in a management account, and setting user requirements and keywords special and common for the user;

c. audio and video data of a user target area are collected through voice collection software and a voice collection software and hardware system;

d. starting a key word setting and capturing program at a background, capturing and screening the audio-visual data fragments containing the key words;

e. converting the video data fragments into text sentences, analyzing the semantics of the video data fragments through a word semantic comparison program, screening the video data fragments meeting the requirements of users, and excluding the video data fragments obviously not belonging to the application environment set by the users according to the requirements;

f. starting a word semantic mapping program, and establishing a preliminary mapping corresponding relation for natural language, dictionary semantics, the screened audio-visual data fragments, the user dialect and the user spoken language associated keywords;

g. the user-specific vocabulary library generation system calls the preliminary mapping corresponding relation in a high frequency mode to generate daily expressions which accord with the set environment of a user, when interaction exists, the daily expressions are automatically added into an interaction link, the preliminary mapping corresponding relation which does not accord with the reality obviously after the interaction is automatically or manually eliminated by the user according to the feedback condition of the user and/or the customer of the user, the natural language processing mapping corresponding relation is generated, and the natural language processing mapping corresponding relation is stored into the user-specific vocabulary library generation system to complete the construction of the specific vocabulary library.

Preferably, the host is a computer host which is connected or not connected with the network, or a virtual host based on the internet, and is connected with the display screen device and the operation input device.

Preferably, the display screen device is a computer display screen, a television with a network connection function, a mobile phone or a PDA, and the operation input device is a computer keyboard, a control button, a touch screen, a mouse or a microphone.

Preferably, the interaction link comprises a display screen, a touch screen question answering and voice call, information pushing according with an environment scene, game interaction with or without rewards and customer experience consultation.

Preferably, the system is applied to the field with complex customer sources, and large dialect differences, common language differences and speech expression habit differences exist.

Preferably, the areas of complex customer origin include food, chain of meals, new retail, OCR applications.

Preferably, the upstream and downstream system docking intelligent recognition automatic mapping system is provided with a hierarchy and a region, and when system interfaces between an upper layer and a lower layer or between regions are docked, user-specific vocabulary library fields of a counterpart system are intelligently recognized and automatically mapped to perform translation according with local group expression habits.

Preferably, an OCR application module is further provided, and when OCR identifies the customer order, the corresponding order field is intelligently identified and automatically mapped.

Preferably, the system also comprises a voice data acquisition software and hardware system, wherein the voice data acquisition software and hardware system comprises voice acquisition software and/or a voice acquisition software and hardware system, and the voice acquisition software is installed in the host.

Preferably, the voice acquisition software and hardware system is a microphone, a video and audio data transmission receiver or a high-speed camera; the microphone comprises a microphone, a sound sensor and a digital electronic product containing mic.

The invention has the beneficial effects that:

1. the invention has unique thought, reasonable and simple logic structure, simple data acquisition method, natural interaction after data preliminary processing and no need of conventional special investigation means with time, labor and cost;

2. through multiple mapping binding, unbinding and optimization, the established vocabulary library has strong relevance and high accuracy;

3. the applicability is strong, the application range is wide, people can enjoy the cool after planting trees by the former people, the later users are more closely attached, and all the collection, mapping, elimination and binding from the beginning are not needed;

4. the dialects of all regions can be compatible, dialectization mutual translation can be automatically realized through region level setting and dialect setting, and the subsequent users are more closely attached without collecting, mapping, eliminating and binding all the parts from the beginning.

The foregoing is a summary of the present invention, and in order to provide a clear understanding of the technical means of the present invention and to be implemented in accordance with the present specification, the following is a detailed description of the preferred embodiments of the present invention with reference to the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of the technical components and operation of the present invention.

Detailed Description

The invention is further described with reference to the accompanying drawings in which:

referring to fig. 1, an intelligent mapping method of a natural language processing vocabulary library based on a fuzzy algorithm comprises a voice data acquisition software and hardware system, a network storage database, an intelligent mapping vocabulary library system and a user-specific natural language database; voice data are collected through voice collection software or a voice collection software and hardware system, and the voice collection software is installed in an independent computer host or an internet virtual host; setting technical modules of intelligent mapping vocabulary library systems such as a capturing program, a word semantic comparison program, a word semantic mapping program, a user-specific vocabulary library generation system and the like through key words; the operation is carried out according to the following operation procedures:

a user installs voice acquisition software and a voice acquisition software and hardware system in a target working area, for example, a microphone, an audio-visual data transmission receiver or a high-speed camera and a microphone are installed in application scenes such as food, chain catering, new retail, OCR and the like, or digital electronic products such as a mobile phone, a PDA and the like are used for scanning, registering and logging in the application scene system, and then software and hardware connection and matching debugging are carried out; after the test is successful, the user starts the system, logs in the management account, and sets user requirements and keywords special for and commonly used by the user; then audio-video data of a user target area are collected through voice collection software and a voice collection software and hardware system; and then starting a key word setting and capturing program, capturing and screening the audio-visual data fragments containing the key words, such as: setting the captured key words as 'eating', converting the captured key words into word sentences, analyzing the semantics of the word semantic comparison program to screen out the image data fragments meeting the requirements of the user, excluding the image data fragments obviously not belonging to the application environment set by the user according to the requirements, excluding the corresponding sentences obviously not related to food, new retail, chain catering and OCR unrelated phrases, excluding the image data fragments only containing the meanings of 'eating, eating and eating', and excluding homophonic similar sound phrases of 'CHI', such as: "red, red", but if there is a sound like "red melon island" can be the possibly related words such as "eat melon, eat melon to XX, eat melon crowd", etc., then the temporal ambiguity is set as related; and starting a word semantic mapping program, and establishing a preliminary mapping corresponding relation for the natural language, the dictionary semantics, the screened audio-visual data fragments and the user spoken language associated keywords.

Referring to the fuzzy mapping parameter estimation method based on the fuzzy algorithm disclosed in the invention patent with the patent number of CN201510260549.9, in order to check the accuracy of fuzzy mapping parameter estimation, a PID control natural language control system based on the fuzzy algorithm is used for actually testing the mapping relation of a controlled object. The program flow control target value of the natural language control system based on fuzzy mapping PID control is set to be more than 50% accurate, and the natural language data acquisition period is 3.2S. In order to show the role of fuzzy mapping in natural language control, the accuracy of the initial natural language of the mapping relation of the natural language control system based on the conventional PID control is about more than 10%, and the accuracy of the initial natural language of the mapping relation of the PID control natural language control system based on the fuzzy mapping is improved to more than 30%. And when the mapping relation natural language is stably applied to a plurality of scenes without obvious mistakes and omissions, starting mapping correction operation, simultaneously starting data receiving of an upper computer, and recording the mapping relation natural language real-time data output by the conventional PID control and the optimal control system based on the fuzzy mapping PID control in real time.

The method comprises the following steps of carrying out dynamic comparison of a natural language control system based on conventional PID control and fuzzy mapping control and steady-state comparison of the natural language control system based on conventional PID control and fuzzy mapping control, wherein due to the delay characteristic of a controlled object, the time lag characteristic of the natural language control system based on conventional PID control is too prominent, so that the accuracy of the peak value output by the system is more than 50% or less than 10%, the actual mapping relation natural language is repeatedly trained for 3-5 times in a targeted manner, the natural language tends to be stable but always has small oscillation, and the natural language really enters the steady state when the deviation amplitude is less than 5%; the fuzzy mapping-based natural language control system formally establishes or eliminates the mapping relation, and the actual mapping relation natural language is stable when the deviation amplitude is less than 5%, and the number of times of steady-state experiments is reduced by 5-10 times compared with the number of times of steady-state experiments of conventional PID control. Therefore, the natural language control system based on fuzzy mapping control can effectively improve and improve PID control quality. Therefore, the method can accurately estimate the proportional parameters, the lag link parameters and the inertia link parameters of the controlled object. Moreover, the fuzzy mapping can correct the internal control parameters in real time according to the estimated parameters, and calculate the PID control correction quantity. The method greatly improves the control quality and the control quality of the PID control, obviously improves the accuracy and the stability of the PID control, and has higher accuracy and stronger universality of each parameter estimated by the method.

The user-specific vocabulary library generation system calls the preliminary mapping corresponding relation in a high frequency mode to generate daily expressions which accord with the set environment of a user, when interaction exists, the daily expressions are automatically added into an interaction link, the preliminary mapping corresponding relation which does not accord with the reality obviously after the interaction is automatically or manually eliminated by the user according to the feedback condition of the user and/or the customer of the user, the natural language processing mapping corresponding relation is generated, and the natural language processing mapping corresponding relation is stored into the user-specific vocabulary library generation system to complete the construction of the specific vocabulary library.

The host can be a computer host which is connected or not connected with the network, or can be a virtual host based on the internet, and is connected with a display screen device and an operation input device, such as a computer display screen, an all-in-one machine, a television with a network connection function, a mobile phone or a PDA, and the operation input is carried out through a computer keyboard, a control button, a touch screen, a mouse or a microphone.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

Example 1: the user can select the existing natural language processing vocabulary library which is most suitable for the user due to different regions and dialects, if the existing natural language processing vocabulary library is not suitable, or the existing natural language processing vocabulary library is empty, the vocabulary library is not selected, the user starts to establish or perfect the natural language processing vocabulary library which is most suitable for the current environment and application, the position which is suitable for collecting the voice image is provided with the voice data acquisition hardware such as a camera, a headset, a display screen, a PDA and the like, the network test is carried out, the system is put into formal use after the test is qualified, the information push which accords with the environment scene is carried out through the display screen, the touch screen, the question and answer, the voice call words, the interaction with the reward or no reward game interaction, the customer experience consultation and the like, the special natural language vocabulary library which is most suitable for the current application scene is constructed and perfected, because of the difference of the customer source and the dialect using habit, the distance is closer to different users 'natural language specialized vocabulary library can be very different, such as food, chain catering, new retail, OCR application users near the transportation junction, in order to expand and enrich the natural language specialized vocabulary library, the invention divides the upstream and downstream hierarchical relation according to the territorial division and dialect affiliation, and shares the natural language specialized vocabulary library of each user in a certain range, carries out mapping, collection and integration, can carry out label matching based on recurrent neural network model and combining NLTK and WordNet NLP algorithm when the upstream and downstream system interfaces are butted, on the basis of the traditional OCR identification, combines the industry vocabulary library, carries out intelligent deviation rectification, improves the matching rate, intelligently identifies and automatically maps the system field of the other party, and downloads and upgrades the natural language processing vocabulary library of the corresponding dialect according to the occurrence frequency of the special dialect in the user's own natural language processing vocabulary library, thereby being capable of better performing targeted service for different customers.

Example 2: when the cross-regional network new retail is carried out, a client order is identified by using an OCR tool, the regional and voice dialect characteristics of the client or the dialect expression characteristics of the region are identified and judged, the field of the order is intelligently identified and automatically mapped, word and sentence groups conforming to local habits and user habits, such as Cantonese, northeast, Tibetan and the like, are used, the expression mode word sequence difference of the sentences is large, the inter-translation conforming to the user habits can enable the client to feel more attentive, and therefore the wide application is obtained.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An efficient intelligent semantic matching method comprises a network storage database, and is characterized in that: the system also comprises an intelligent mapping vocabulary library system, an upstream and downstream system butt joint intelligent recognition automatic mapping system, a deep learning automatic optimization upgrading system and a user-specific natural language database; the intelligent mapping vocabulary library system comprises a key word setting and capturing program, a word semantic comparison program, a word semantic mapping program and a user-specific vocabulary library generating system; the deep learning automatic optimization upgrading system comprises the following specific operation steps:

a user installs voice acquisition software and a voice acquisition software and hardware system in a target working area, and performs software and hardware connection and matching debugging;

enabling a system by a user, logging in a management account, and setting user requirements and keywords special and common for the user;

audio and video data of a user target area are collected through voice collection software and a voice collection software and hardware system;

starting a key word setting and capturing program at a background, capturing and screening the audio-visual data fragments containing the key words;

converting the video data fragments into text sentences, analyzing the semantics of the video data fragments through a word semantic comparison program, screening the video data fragments meeting the requirements of users, and excluding the video data fragments obviously not belonging to the application environment set by the users according to the requirements;

starting a word semantic mapping program, and establishing a preliminary mapping corresponding relation for natural language, dictionary semantics, the screened audio-visual data fragments, the user dialect and the user spoken language associated keywords;

2. The efficient intelligent semantic matching method according to claim 1, characterized in that: the host is a computer host which is connected or not connected with the network or a virtual host based on the internet, and is connected with the display screen device and the operation input device.

3. The efficient intelligent semantic matching method according to claim 2, characterized in that: the display screen device is a computer display screen, a television with a network connection function, a mobile phone or a PDA, and the operation input device is a computer keyboard, a control button, a touch screen, a mouse or a microphone.

4. An efficient intelligent semantic matching method according to step g of claim 1, characterized in that: the interaction link comprises a display screen, a touch screen question and answer, and a voice call, information push according with an environment scene, game interaction with or without rewards, and customer experience consultation.

5. The efficient intelligent semantic matching method according to claim 1, characterized in that: the system is applied to the field with complex customer sources, and has great dialect difference, common language difference and speech expression habit difference.

6. The efficient intelligent semantic matching method according to claim 5, characterized in that: areas where the customer sources are complex include food, chain meals, new retail, OCR applications.

7. The efficient intelligent semantic matching method according to claim 1, characterized in that: the intelligent identification automatic mapping system for the upstream and downstream system butt joint is provided with levels and areas, and when system interfaces are in butt joint between the upper layer and the lower layer or between the areas, user-specific vocabulary library fields of the other system are intelligently identified and automatically mapped, so that translation conforming to local group expression habits is performed.

8. The efficient intelligent semantic matching method according to claim 7, characterized in that: an OCR application module is also arranged, and when OCR identifies a customer order, the corresponding order field is intelligently identified and automatically mapped.

9. The efficient intelligent semantic matching method according to claim 1, characterized in that: the voice data acquisition system comprises a host, and is characterized by further comprising a voice data acquisition software and hardware system, wherein the voice data acquisition software and hardware system comprises voice acquisition software and/or a voice acquisition software and hardware system, and the voice acquisition software is installed in the host.

10. The efficient intelligent semantic matching method according to claim 9, characterized in that: the voice acquisition software and hardware system is a microphone, an audio-video data transmission receiver or a high-speed camera; the microphone comprises a microphone, a sound sensor and a digital electronic product containing mic.