CN117056488A - Data complement method, device, equipment and storage medium based on artificial intelligence - Google Patents

Data complement method, device, equipment and storage medium based on artificial intelligence Download PDF

Info

Publication number
CN117056488A
CN117056488A CN202311071484.4A CN202311071484A CN117056488A CN 117056488 A CN117056488 A CN 117056488A CN 202311071484 A CN202311071484 A CN 202311071484A CN 117056488 A CN117056488 A CN 117056488A
Authority
CN
China
Prior art keywords
word
data
standard
words
upstream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311071484.4A
Other languages
Chinese (zh)
Inventor
杨辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202311071484.4A priority Critical patent/CN117056488A/en
Publication of CN117056488A publication Critical patent/CN117056488A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application belongs to the field of financial science and technology, and relates to a data complement method, a device, computer equipment and a storage medium based on artificial intelligence, wherein the method comprises the following steps: segmenting the sample data to obtain standard words; carrying out association relation arrangement on standard words to obtain word upstream relations; classifying the standard words according to the word upstream relation to obtain related words and independent words; inputting related words into a preset algorithm model for training to obtain a word matching model; word segmentation is carried out on the nonstandard data to obtain key words, and whether the key words are independent words or related words is detected; if the key words are independent words, carrying out data complement on the nonstandard data; and if the keyword is a related word, inputting the keyword into the word matching model to obtain a matching word, and sending the matching word to the selection component for corresponding completion. The application can effectively complement the nonstandard data input in the system.

Description

Data complement method, device, equipment and storage medium based on artificial intelligence
Technical Field
The application relates to the technical field of finance and technology, in particular to the technical field of data processing, and particularly relates to a data complement method, device, computer equipment and storage medium based on artificial intelligence.
Background
In the big data age, how to effectively process data is a problem to be solved, and for text type data, text data needs to be input into a processing system to effectively process the data because the text type data is different from a machine language.
When the text data is recorded, if the recorded text data is incomplete or inaccurate, the text data is called nonstandard data. When non-standard data is input into the system, the system is difficult to effectively identify the non-standard data, so that the input text data is invalid, and a certain storage space is occupied.
For text data composed of words or keywords with certain association relations, a corresponding upper-lower relation exists between each word or keyword, for example, an address text, and provinces, cities, counties and the like in the address text exist in upper-lower relation. In a text data input system, if only a certain subordinate word or a short text composed of a certain subordinate word with an association relationship is input, the text data input system can not effectively complement the part of data according to the input word or the short text, or the text data input system can only input complete and correct text data or superior words, so that a certain trouble is brought to the text data input.
Disclosure of Invention
The embodiment of the application aims to provide a data complement method, a device, computer equipment and a storage medium based on artificial intelligence, so as to solve the problem that input next-stage data cannot be effectively complemented.
In order to solve the technical problems, the embodiment of the application provides a data complement method based on artificial intelligence, which adopts the following technical scheme:
acquiring sample data, and performing word segmentation on the sample data according to a preset word segmentation rule to obtain standard words;
performing association relation arrangement on the standard words to obtain word upstream relations;
acquiring an upstream word corresponding to the standard word according to the word upstream relation, and classifying the standard word according to the upstream word to obtain a related word and an independent word;
inputting the related words into a preset algorithm model for training to obtain a word matching model;
obtaining nonstandard data, segmenting the nonstandard data to obtain key words, and detecting whether the key words are independent words or related words;
if the keyword is the independent word, acquiring an independent upstream word corresponding to the keyword, performing data complementation on the nonstandard data according to the independent upstream word to obtain standard data, and sending the standard data to a preset display assembly for display; a kind of electronic device with high-pressure air-conditioning system
If the keyword is the related word, inputting the keyword into the word matching model to obtain a matching word, and sending the matching word to a preset selection component for display.
Further, the word segmentation rule includes a word recognition rule, a word ordering rule, and a word marking rule, and the step of obtaining sample data, and segmenting the sample data according to a preset word segmentation rule to obtain a standard word specifically includes:
acquiring historical data, and preprocessing the historical data to obtain the sample data;
identifying the sample data according to the word identification rule to obtain an initial word;
acquiring a first position sequence corresponding to the initial word in the sample data according to the word ordering rule; a kind of electronic device with high-pressure air-conditioning system
And marking the sample data according to the word marking rule and the first position sequence to obtain the standard word.
Further, the step of performing association relation arrangement on the standard words to obtain word upstream relations specifically includes:
acquiring a standard text corresponding to the standard word, and acquiring first position information of the standard word in the standard text;
Acquiring an upstream word corresponding to the standard word according to the first position information; a kind of electronic device with high-pressure air-conditioning system
And carrying out association marking on the upstream words and the standard words to obtain the word upstream relation.
Further, the step of classifying the standard word according to the upstream word to obtain a related word and an independent word specifically includes:
detecting whether the upstream word is a unique value;
if the upstream word is a unique value, dividing the standard word into the independent words; a kind of electronic device with high-pressure air-conditioning system
If the upstream word is not the unique value, the standard word is divided into the related words.
Further, the step of obtaining non-standard data and word segmentation is performed on the non-standard data to obtain key words specifically includes:
acquiring non-standard data, and performing word segmentation on the non-standard data according to the word segmentation rule to obtain non-standard words;
detecting whether the non-standard words are correctly input;
if the non-standard word is correctly input, the non-standard word is used as the key word; a kind of electronic device with high-pressure air-conditioning system
If the non-standard word is incorrectly input, analyzing the non-standard word to obtain word information, acquiring a standard word with the highest similarity with the non-standard word according to the word information, and taking the standard word with the highest similarity as the key word.
Further, the step of performing data complement on the nonstandard data according to the independent upstream word to obtain standard data specifically includes:
obtaining second position information of the independent upstream word and the keyword, and marking the independent upstream word and the keyword according to the second position information to obtain a second position sequence;
and sorting the independent upstream words and the keyword according to the second position order to obtain the standard data.
Further, the step of sending the matching word to a preset selection component for display specifically includes:
calculating the matching degree of the matching words and the keyword;
sorting the matched words according to the matching degree to obtain a matched text sequence list; a kind of electronic device with high-pressure air-conditioning system
And sending the matched text sequence list to the selection component for display.
In order to solve the technical problems, the embodiment of the application also provides a data complement device based on artificial intelligence, which adopts the following technical scheme:
the first data word segmentation module is used for obtaining sample data, and segmenting the sample data according to a preset word segmentation rule to obtain standard words;
The relation arrangement module is used for carrying out association relation arrangement on the standard words to obtain word upstream relations;
the word classification module is used for acquiring upstream words corresponding to the standard words according to the word upstream relation, classifying the standard words according to the upstream words, and obtaining related words and independent words;
the model training module is used for inputting the related words into a preset algorithm model for training to obtain a word matching model;
the second data word segmentation module is used for acquiring non-standard data, segmenting the non-standard data to obtain key words, and detecting whether the key words are independent words or related words;
the first data processing module is used for acquiring independent upstream words corresponding to the keyword if the keyword is the independent word, carrying out data complementation on the nonstandard data according to the independent upstream words to obtain standard data, and sending the standard data to a preset display assembly for display; a kind of electronic device with high-pressure air-conditioning system
And the second data processing module is used for inputting the keyword into the word matching model to obtain a matched word if the keyword is the related word, and sending the matched word to a preset selection component for display.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
a computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based data complement method of any one of the preceding claims.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
a computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based data complement method of any one of the above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects: according to the embodiment, the sample data are obtained, and word segmentation is carried out on the sample data according to the preset word segmentation rule, so that accurate standard words are effectively obtained; the association relation arrangement is carried out on the standard words, so that word upstream relations corresponding to the superior relations of the standard words are obtained; acquiring upstream words corresponding to the standard words according to the word upstream relation, and classifying the standard words according to the upstream words so as to obtain related words with a plurality of upstream word results and independent words with only one upstream word result; inputting the related words into a preset algorithm model for training, so as to obtain a trained word matching model; the method comprises the steps of obtaining nonstandard data, segmenting the nonstandard data to obtain keyword, detecting whether the keyword is the independent word or the related word, and further processing the keyword according to a judging result; if the keyword is the independent word, acquiring an independent upstream word corresponding to the keyword, performing data complementation on the nonstandard data according to the independent upstream word to obtain standard data, and sending the standard data to a preset display assembly for display; if the keyword is the related word, inputting the keyword into the word matching model to obtain a matching word, and sending the matching word to a preset selection component for display. By the method, nonstandard data can be effectively complemented. The embodiment can be applied to a vehicle insurance information system, and when the system inputs data containing upper and lower layers, such as addresses, the input addresses can be automatically complemented so as to improve the efficiency of the vehicle insurance information system in inputting vehicle insurance data.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of an artificial intelligence based data completion method according to the present application;
FIG. 3 is a flow chart of one embodiment of step S10 of FIG. 2;
FIG. 4 is a flow chart of one embodiment of step S20 of FIG. 2;
FIG. 5 is a flow chart of one embodiment of step S30 of FIG. 2;
FIG. 6 is a flow chart of one embodiment of step S50 of FIG. 2;
FIG. 7 is a flow chart of one embodiment of step S60 of FIG. 2;
FIG. 8 is a flow chart of one embodiment of step S70 of FIG. 2;
FIG. 9 is a schematic diagram of one embodiment of an artificial intelligence based data completion device in accordance with the present application;
FIG. 10 is a schematic structural view of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are non-related or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the data complement method based on artificial intelligence provided by the embodiment of the application is generally executed by a server, and correspondingly, the data complement device based on artificial intelligence is generally arranged in the server.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of a method of artificial intelligence based data completion in accordance with the present application is shown. The artificial intelligence-based data complement method comprises the following steps:
step S10, obtaining sample data, and performing word segmentation on the sample data according to a preset word segmentation rule to obtain standard words;
in this embodiment, the sample data is text data randomly extracted from the historical input data, the text data includes a plurality of sample texts, the data contained in the sample texts are all standard data with correct input, and the standard words refer to word data in the standard data. Words refer to data including keywords and keywords, the keywords are typically one word, and the keywords may be two words or three words. When determining the standard word, since the keyword or the keyword of the standard word may be a noun or a proprietary word representing an attribute, such as a city, a county, a district, etc., the standard noun automatically merges the partial word with a name word preceding the partial word, such as Shenzhen city, guangzhou city, futian district, etc., when segmenting the standard noun.
Step S20, carrying out association relation arrangement on the standard words to obtain word upstream relations;
in this embodiment, the word upstream relationship is a relationship link for marking upstream words, where a plurality of upstream words are connected to the relationship link, for example, if the standard word is thai four-way 29, then the word upstream relationship is guangdong-Shenzhen-Fu Tian Ou-Shahead street-Tianan community. Upstream words corresponding to the standard words can be obtained rapidly through the form of the relation link, so that the efficiency of data input of the system is improved effectively.
Step S30, obtaining upstream words corresponding to the standard words according to the word upstream relation, and classifying the standard words according to the upstream words to obtain related words and independent words;
in this embodiment, the upstream word corresponds to the upper level data of the standard word, and the related word indicates that one standard word corresponds to a plurality of upstream words, for example, the keyword is the liberation path number 22, and the corresponding upstream word includes Guangzhou, shenzhen, dagao and the like, so that the keyword is the related word. The independent word indicates that the standard word corresponds to only one upstream word, for example, the keyword is Gulangyu No. 3, and the corresponding upstream word is Xiamen city, si Ming district and Gulangyu street. By classifying the standard words, the standard words can be effectively divided into independent words corresponding to one upstream word and related words corresponding to a plurality of upstream words, so that different data processing is carried out on the independent words and the related words to improve the processing efficiency.
Step S40, inputting the related words into a preset algorithm model for training to obtain a word matching model;
in this embodiment, the preset algorithm model is a model including a data matching algorithm, such as a KMP algorithm model.
S50, obtaining nonstandard data, segmenting the nonstandard data to obtain key words, and detecting whether the key words are independent words or related words;
in this embodiment, the nonstandard data is data obtained by identifying the data input by the user, after the user inputs the data in the text box of the system, the user judges according to the input content, if the input content is identified to be the complete data conforming to the format, the content of the complete data is further identified, and the complete data is input into the system after the identification is correct. If the input content is identified to be data which does not accord with the format or data of which part of information is missing, the system corrects the data which does not accord with the format, for example, the input content is the Guangdong Shenzhen Fu Tian Ousha street natural safety community Thailand four-way 29, the system corrects the data to the Guangdong Shenzhen Fu Tian Ousha street natural safety community Thailand four-way 29, eliminates redundant parts or repeated parts in the input data, and supplements hidden hierarchical words to ensure the format of the input data to be uniform. For the data missing part of the information, the part of the data is judged as non-standard data. When non-standard data is acquired, if the key words of the non-standard data are followed by data of a lower level, only data of an upper level can be complemented.
Step S60, if the keyword is the independent word, acquiring an independent upstream word corresponding to the keyword, performing data complementation on the nonstandard data according to the independent upstream word to obtain standard data, and sending the standard data to a preset display assembly for display; a kind of electronic device with high-pressure air-conditioning system
In this embodiment, the standard data is a complete text after data is completed, for example, after the number 3 of the Gulangyu is completed, the standard data is the number 3 of the Gulangyu street in the Si Ming district of Xiamen. The display component is a functional component arranged in the system and used for rendering and displaying data on a page with a preset routing address.
And step S70, if the keyword is the related word, inputting the keyword into the word matching model to obtain a matching word, and sending the matching word to a preset selection component for display.
In this embodiment, the selection component is a functional component set in the system for rendering and displaying data on a page with a predetermined routing address, and when a matching word is displayed, the matching word can be selectively changed in the input box.
According to the embodiment, the sample data are obtained, and word segmentation is carried out on the sample data according to the preset word segmentation rule, so that accurate standard words are effectively obtained; the association relation arrangement is carried out on the standard words, so that word upstream relations corresponding to the superior relations of the standard words are obtained; acquiring upstream words corresponding to the standard words according to the word upstream relation, and classifying the standard words according to the upstream words so as to obtain related words with a plurality of upstream word results and independent words with only one upstream word result; inputting the related words into a preset algorithm model for training, so as to obtain a trained word matching model; the method comprises the steps of obtaining nonstandard data, segmenting the nonstandard data to obtain keyword, detecting whether the keyword is the independent word or the related word, and further processing the keyword according to a judging result; if the keyword is the independent word, acquiring an independent upstream word corresponding to the keyword, performing data complementation on the nonstandard data according to the independent upstream word to obtain standard data, and sending the standard data to a preset display assembly for display; if the keyword is the related word, inputting the keyword into the word matching model to obtain a matching word, and sending the matching word to a preset selection component for display. By the method, nonstandard data can be effectively complemented. The embodiment can be applied to a vehicle insurance information system, and when the system inputs data containing upper and lower layers, such as addresses, the input addresses can be automatically complemented so as to improve the efficiency of the vehicle insurance information system in inputting vehicle insurance data.
With continued reference to fig. 3, in some alternative implementations of the present embodiment, step S10 includes the steps of:
step S101, acquiring historical data, and preprocessing the historical data to obtain sample data;
in this embodiment, the historical data is data recorded by the system collecting the history, the part of the data is data which meets the system format requirement and is recorded correctly, and when the historical data is obtained, the part of the data needs to be rechecked to remove the data with repeated content, so as to obtain effective sample data.
Step S102, recognizing the sample data according to the word recognition rule to obtain an initial word;
in this embodiment, the word recognition rule can effectively recognize and extract word information in the sample data, and the word information is collected and stored in a predetermined position.
Step S103, acquiring a first position sequence corresponding to the initial word in the sample data according to the word ordering rule; a kind of electronic device with high-pressure air-conditioning system
And step S104, marking the sample data according to the word marking rule and the first position sequence to obtain the standard word.
In this embodiment, the word ordering rule may obtain a first position sequence corresponding to the word information stored to the predetermined position in the sample data, where the word marking rule may mark the first position sequence, for example, the position sequence of Guangdong province is 1, the position sequence of Shenzhen city is 2, the position sequence of Futian area is 3, the position sequence of sand street is 4, the position sequence of Tianan community is 5, the position sequence of Tay four paths 29 is 6, and the position sequence is marked by a sequence number to effectively order the initial words to obtain the standard word.
With continued reference to fig. 4, in some alternative implementations of the present embodiment, step S20 includes the steps of:
step S201, obtaining a standard text corresponding to the standard word, and obtaining first position information of the standard word in the standard text;
in this embodiment, the first location information indicates a location number corresponding to the standard word in the standard text, where the location number is consistent with the location sequence of the tag, and by obtaining the first location information, a data level corresponding to the standard word can be effectively determined, so as to accurately obtain the upstream word.
Step S202, obtaining upstream words corresponding to the standard words according to the first position information; a kind of electronic device with high-pressure air-conditioning system
And step S203, carrying out association marking on the upstream words and the standard words to obtain the word upstream relation.
In this embodiment, the association flag and the word upstream relationship are acquired in the manner described above.
With continued reference to fig. 5, in some alternative implementations of the present embodiment, step S30 includes the steps of:
step S301, detecting whether the upstream word is a unique value;
in this embodiment, the unique value indicates that the content result corresponding to the upstream word is 1, where the unique value refers to data for a single attribute noun, such as information of Guangdong province, shenzhen, futian district, and the like, and the basis for determining whether the upstream word is the unique value is whether only one of the noun data with different attributes corresponding to the standard word, for example, whether the standard word is Gulangyu No. 3, and the corresponding upstream word is only Fujian province, xiamen city, sinomeng district, and Gulangyu street, so that the upstream word is the unique value. If the standard word is the liberation path 22 number, the corresponding upstream word has Guangdong province, shenzhen city, guangzhou city, guangdong region, guangzhou city, guangzhou region, guangdong province, guangzhou city, increasing city region and the like, and comprises a plurality of upstream words, and the upstream words have more than one name result on the noun province, the Guangzhou city and the Guangzhou region with different properties, so that the upstream words are not unique values.
Step S302, if the upstream word is a unique value, dividing the standard word into independent words; a kind of electronic device with high-pressure air-conditioning system
Step S303, if the upstream word is not the unique value, dividing the standard word into the related words.
With continued reference to fig. 6, in some alternative implementations of the present embodiment, step S50 includes the steps of:
step S501, obtaining nonstandard data, and performing word segmentation on the nonstandard data according to the word segmentation rule to obtain nonstandard words;
step S502, detecting whether the non-standard words are correctly input;
step S503, if the non-standard word is correctly input, the non-standard word is used as the key word; a kind of electronic device with high-pressure air-conditioning system
Step S504, if the non-standard word is incorrectly input, analyzing the non-standard word to obtain word information, acquiring a standard word with the highest similarity with the non-standard word according to the word information, and taking the standard word with the highest similarity as the key word.
In this embodiment, since the source of the nonstandard data may be data copied and pasted directly from the data filled by the user, there may be a phenomenon of input error, for example, in Guangdong Guangzhou city, the Guangzhou city may be directly changed after input, but when the input word is difficult to judge, the correct word needs to be obtained from the database, and the judgment is performed according to the similarity of the input word and the correct word, for example, the correct word is a cat Mao Cun, the input data may be filled in Mao Maocun, so that the correct standard word needs to be obtained according to the word information of the input data to ensure the accuracy of the input data.
With continued reference to fig. 7, in some alternative implementations of the present embodiment, step S60 includes the steps of:
step S601, obtaining second position information of the independent upstream word and the keyword, and marking the independent upstream word and the keyword according to the second position information to obtain a second position sequence;
in this embodiment, the second location information is the inferred location of the independent upstream word and the keyword in the input text, for example, the keyword is Thailand four-way 29, and according to the format of the system input data, it is determined that there is information of communities, streets, urban areas, city levels, provinces, and the like before Thailand four-way 29. The second position sequence marks the independent upstream words in the form of sequence number marks, for example, the position sequence of Guangdong province is 1, the position sequence of Shenzhen city is 2, the position sequence of Futian area is 3, the position sequence of sand head street is 4, the position sequence of Tianan community is 5, and the position sequence of Thailand four way No. 29 is 6.
And step S602, sorting and sorting the independent upstream words and the keyword according to the second position order to obtain the standard data.
In this embodiment, the standard data is the complete data meeting the preset input format of the system after sorting and sorting, for example, tian Ousha street Tianan community Thailand No. 29 in Shenzhen city, guangdong.
With continued reference to fig. 8, in some alternative implementations of the present embodiment, step S70 includes the steps of:
step S701, calculating the matching degree of the matching word and the keyword;
in this embodiment, the calculation of the matching degree is performed by a matching algorithm, and the matching algorithm may be a commonly used algorithm, for example, BF algorithm, RK algorithm, KMP algorithm, BM algorithm, or the like.
Step S702, sorting the matched words according to the matching degree to obtain a matched text sequence list; a kind of electronic device with high-pressure air-conditioning system
And step 703, transmitting the matched text sequence list to the selection component for display.
In this embodiment, the selection component displays the matching text sequence table at a predetermined text box position in the page, and by clicking an arrow button on the text box, the selection box can be popped up, the content in the selection box corresponds to the content of the matching text sequence table, and the arrangement order of the word content in the selection style is consistent with the matching text sequence table.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 9, as an implementation of the method shown in fig. 1, the present application provides an embodiment of an artificial intelligence-based data complement apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus is particularly applicable to various electronic devices.
As shown in fig. 9, the artificial intelligence based data complement apparatus 800 according to the present embodiment includes: a first data word segmentation module 801, a relationship arrangement module 802, a word classification module 803, a model training module 804, a second data word segmentation module 805, a first data processing module 806, and a second data processing module 807. Wherein:
The first data word segmentation module 801 is configured to obtain sample data, segment the sample data according to a preset word segmentation rule, and obtain a standard word;
the relationship arrangement module 802 is configured to perform association relationship arrangement on the standard word to obtain a word upstream relationship;
the word classification module 803 is configured to obtain an upstream word corresponding to the standard word according to the word upstream relationship, and classify the standard word according to the upstream word, so as to obtain a related word and an independent word;
the model training module 804 is configured to input the related word into a preset algorithm model for training, so as to obtain a word matching model;
the second data word segmentation module 805 is configured to obtain non-standard data, segment the non-standard data to obtain a keyword, and detect whether the keyword is the independent word or the related word;
the first data processing module 806 is configured to obtain an independent upstream word corresponding to the keyword if the keyword is the independent word, perform data complementation on the nonstandard data according to the independent upstream word, obtain standard data, and send the standard data to a preset display component for display; a kind of electronic device with high-pressure air-conditioning system
And a second data processing module 807, configured to, if the keyword is the related word, input the keyword into the word matching model to obtain a matching word, and send the matching word to a preset selection component for display.
By adopting the device, the embodiment can effectively complement the nonstandard data input in the system.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 10, fig. 10 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 9 comprises a memory 91, a processor 92, a network interface 93 communicatively connected to each other via a system bus. It should be noted that only computer device 9 having components 91-93 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 91 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 91 may be an internal storage unit of the computer device 9, such as a hard disk or a memory of the computer device 9. In other embodiments, the memory 91 may also be an external storage device of the computer device 9, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 9. Of course, the memory 91 may also comprise both an internal memory unit of the computer device 9 and an external memory device. In this embodiment, the memory 91 is typically used to store an operating system and various application software installed on the computer device 9, such as computer readable instructions of an artificial intelligence based data complement method. Further, the memory 91 may be used to temporarily store various types of data that have been output or are to be output.
The processor 92 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device 9. In this embodiment, the processor 92 is configured to execute computer readable instructions stored in the memory 91 or process data, for example, execute computer readable instructions of the artificial intelligence based data complement method.
The network interface 93 may comprise a wireless network interface or a wired network interface, which network interface 93 is typically used to establish a communication connection between the computer device 9 and other electronic devices.
By adopting the computer equipment, the embodiment can effectively complement the nonstandard data input in the system.
The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the artificial intelligence-based data complement method as described above.
The embodiment can effectively complement the nonstandard data input in the system by adopting the computer readable storage medium.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (10)

1. The data complement method based on artificial intelligence is characterized by comprising the following steps:
acquiring sample data, and performing word segmentation on the sample data according to a preset word segmentation rule to obtain standard words;
performing association relation arrangement on the standard words to obtain word upstream relations;
acquiring an upstream word corresponding to the standard word according to the word upstream relation, and classifying the standard word according to the upstream word to obtain a related word and an independent word;
inputting the related words into a preset algorithm model for training to obtain a word matching model;
obtaining nonstandard data, segmenting the nonstandard data to obtain key words, and detecting whether the key words are independent words or related words;
if the keyword is the independent word, acquiring an independent upstream word corresponding to the keyword, performing data complementation on the nonstandard data according to the independent upstream word to obtain standard data, and sending the standard data to a preset display assembly for display; a kind of electronic device with high-pressure air-conditioning system
If the keyword is the related word, inputting the keyword into the word matching model to obtain a matching word, and sending the matching word to a preset selection component for display.
2. The artificial intelligence based data complement method according to claim 1, wherein the word segmentation rule includes a word recognition rule, a word ordering rule, and a word marking rule, the step of obtaining sample data, and segmenting the sample data according to a preset word segmentation rule to obtain standard words specifically includes:
acquiring historical data, and preprocessing the historical data to obtain the sample data;
identifying the sample data according to the word identification rule to obtain an initial word;
acquiring a first position sequence corresponding to the initial word in the sample data according to the word ordering rule; a kind of electronic device with high-pressure air-conditioning system
And marking the sample data according to the word marking rule and the first position sequence to obtain the standard word.
3. The artificial intelligence based data complement method according to claim 1, wherein the step of performing association relation arrangement on the standard words to obtain word upstream relation specifically comprises:
acquiring a standard text corresponding to the standard word, and acquiring first position information of the standard word in the standard text;
Acquiring an upstream word corresponding to the standard word according to the first position information; a kind of electronic device with high-pressure air-conditioning system
And carrying out association marking on the upstream words and the standard words to obtain the word upstream relation.
4. The artificial intelligence based data complement method as set forth in claim 1, wherein the step of classifying the standard word according to the upstream word to obtain a related word and an independent word specifically includes:
detecting whether the upstream word is a unique value;
if the upstream word is a unique value, dividing the standard word into the independent words; a kind of electronic device with high-pressure air-conditioning system
If the upstream word is not the unique value, the standard word is divided into the related words.
5. The artificial intelligence based data complement method according to claim 1, wherein the step of obtaining nonstandard data, and word segmentation is performed on the nonstandard data to obtain key words comprises the following steps:
acquiring non-standard data, and performing word segmentation on the non-standard data according to the word segmentation rule to obtain non-standard words;
detecting whether the non-standard words are correctly input;
if the non-standard word is correctly input, the non-standard word is used as the key word; a kind of electronic device with high-pressure air-conditioning system
If the non-standard word is incorrectly input, analyzing the non-standard word to obtain word information, acquiring a standard word with the highest similarity with the non-standard word according to the word information, and taking the standard word with the highest similarity as the key word.
6. The artificial intelligence based data complement method as set forth in claim 1, wherein the step of performing data complement on the nonstandard data according to the independent upstream word to obtain standard data specifically includes:
obtaining second position information of the independent upstream word and the keyword, and marking the independent upstream word and the keyword according to the second position information to obtain a second position sequence;
and sorting the independent upstream words and the keyword according to the second position order to obtain the standard data.
7. The artificial intelligence based data complement method of claim 1, wherein the step of sending the matching word to a preset selection component for display specifically comprises:
calculating the matching degree of the matching words and the keyword;
Sorting the matched words according to the matching degree to obtain a matched text sequence list; a kind of electronic device with high-pressure air-conditioning system
And sending the matched text sequence list to the selection component for display.
8. An artificial intelligence based data completion device, comprising:
the first data word segmentation module is used for obtaining sample data, and segmenting the sample data according to a preset word segmentation rule to obtain standard words;
the relation arrangement module is used for carrying out association relation arrangement on the standard words to obtain word upstream relations;
the word classification module is used for acquiring upstream words corresponding to the standard words according to the word upstream relation, classifying the standard words according to the upstream words, and obtaining related words and independent words;
the model training module is used for inputting the related words into a preset algorithm model for training to obtain a word matching model;
the second data word segmentation module is used for acquiring non-standard data, segmenting the non-standard data to obtain key words, and detecting whether the key words are independent words or related words;
the first data processing module is used for acquiring independent upstream words corresponding to the keyword if the keyword is the independent word, carrying out data complementation on the nonstandard data according to the independent upstream words to obtain standard data, and sending the standard data to a preset display assembly for display; a kind of electronic device with high-pressure air-conditioning system
And the second data processing module is used for inputting the keyword into the word matching model to obtain a matched word if the keyword is the related word, and sending the matched word to a preset selection component for display.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based data complementing method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based data complement method of any one of claims 1 to 7.
CN202311071484.4A 2023-08-24 2023-08-24 Data complement method, device, equipment and storage medium based on artificial intelligence Pending CN117056488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311071484.4A CN117056488A (en) 2023-08-24 2023-08-24 Data complement method, device, equipment and storage medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311071484.4A CN117056488A (en) 2023-08-24 2023-08-24 Data complement method, device, equipment and storage medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN117056488A true CN117056488A (en) 2023-11-14

Family

ID=88658727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311071484.4A Pending CN117056488A (en) 2023-08-24 2023-08-24 Data complement method, device, equipment and storage medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN117056488A (en)

Similar Documents

Publication Publication Date Title
US20190179956A1 (en) Content moderation system
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN111191012B (en) Knowledge graph generation device and method and computer readable storage medium thereof
CN110263009A (en) Generation method, device, equipment and the readable storage medium storing program for executing of log classifying rules
EP2291812A2 (en) Forum web page clustering based on repetitive regions
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
WO2022105119A1 (en) Training corpus generation method for intention recognition model, and related device thereof
CN114398477A (en) Policy recommendation method based on knowledge graph and related equipment thereof
US20230138491A1 (en) Continuous learning for document processing and analysis
CN110688995B (en) Map query processing method, computer-readable storage medium and mobile terminal
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN111950265A (en) Domain lexicon construction method and device
CN116704528A (en) Bill identification verification method, device, computer equipment and storage medium
CN116796730A (en) Text error correction method, device, equipment and storage medium based on artificial intelligence
CN116453125A (en) Data input method, device, equipment and storage medium based on artificial intelligence
WO2022105120A1 (en) Text detection method and apparatus from image, computer device and storage medium
CN115186240A (en) Social network user alignment method, device and medium based on relevance information
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN117056488A (en) Data complement method, device, equipment and storage medium based on artificial intelligence
CN114091451A (en) Text classification method, device, equipment and storage medium
CN114067343A (en) Data set construction method, model training method and corresponding device
CN114820211B (en) Method, device, computer equipment and storage medium for checking and verifying quality of claim data
CN118115293A (en) Identity document verification method, device, equipment and storage medium thereof
CN113849633A (en) Method, system, device, electronic equipment and medium for merging texts
CN117234505A (en) Interactive page generation method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination