CN111651452A - Data storage method and device, computer equipment and storage medium - Google Patents

Data storage method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111651452A
CN111651452A CN202010358876.9A CN202010358876A CN111651452A CN 111651452 A CN111651452 A CN 111651452A CN 202010358876 A CN202010358876 A CN 202010358876A CN 111651452 A CN111651452 A CN 111651452A
Authority
CN
China
Prior art keywords
data
list
item
stored
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010358876.9A
Other languages
Chinese (zh)
Inventor
温林祥
蔡智晓
林梓棱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010358876.9A priority Critical patent/CN111651452A/en
Publication of CN111651452A publication Critical patent/CN111651452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a data storage method, a data storage device, computer equipment and a storage medium, wherein the method comprises the following steps: when the list to be stored is a list without a header, acquiring a data dictionary in the Mongo database, wherein the value of a key in the data dictionary is the same as the key field in the relational database; responding to the successful matching of the item data in the list to be stored and the data in the data dictionary, and acquiring a target key corresponding to the item data successfully matched in the data dictionary and a value of the target key; and storing the item data in the list to be stored in a data column corresponding to the key field with the same value as the target key in the relational database. The invention also relates to the technical field of block chains, wherein the relational database is positioned in the block chain to realize the storage of the data block chain. The invention realizes the conversion of the list without the head into the list with the head through the data dictionary, thereby realizing the automatic storage of the unstructured data in the relational database.

Description

Data storage method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of data processing in artificial intelligence, is applied to an intelligent medical scene, and particularly relates to a data storage method and device based on a block chain, computer equipment and a storage medium.
Background
With the continuous presentation of the overall success of the intelligent medical informatization strategy, a large amount of medical data, such as a list, is generated. These manifests are to be recognized by OCR, and then the data in the recognized manifests are stored in relational data.
For different medical systems, the lists are different, so that the data formats recognized by the OCR are divided into two types, namely list data with a header and list data without the header. List data with a header can be very easily stored in a regular relational database, but list data without a header is unstructured data and cannot be directly stored in a relational database.
Therefore, it is necessary to provide a data storage method for automatically and rapidly storing unstructured data in a relational database.
Disclosure of Invention
In view of the above, there is a need for a data storage method, apparatus, computer device and storage medium, which implement conversion from a list without a header to a list with a header through a data dictionary, thereby implementing automatic storage of unstructured data in a relational database.
A first aspect of the present invention provides a data storage method applied in a computer device or in a block chain, where the data storage method includes:
acquiring a list to be stored;
identifying whether the list to be stored is a list with a header;
when the list to be stored is identified to be a list without a header, acquiring a data dictionary in a Mongo database, wherein the value of a key in the data dictionary is the same as the key field in the relational database;
matching the item data in the list to be stored with the data in the data dictionary;
responding to the successful matching of the item data in the list to be stored and the data in the data dictionary, and acquiring a target key corresponding to the item data successfully matched in the data dictionary and the value of the target key;
determining a first key field in the relational database with the same value as the target key;
and storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database, wherein the relational database is positioned in a block chain, and data block chain storage is realized.
According to an optional embodiment of the present invention, before the obtaining the list to be stored, the data storage method further includes:
acquiring a plurality of historical list pictures;
recognizing the plurality of historical list pictures by adopting an OCR (optical character recognition) to obtain a plurality of historical lists;
extracting item types and corresponding item data in the plurality of history lists;
and creating the data dictionary according to the project type and the corresponding project data.
According to an optional embodiment of the present invention, the identifying whether the list to be stored is a list with a header includes:
acquiring a target character string in the list to be stored;
judging whether the value of the target character string is null or not;
when the value of the target character string is null, determining that the list to be stored is a list without a header;
and when the value of the target character string is not null, determining that the list to be stored is a list with a header.
According to an alternative embodiment of the present invention, the data storage method further comprises:
responding to failure of matching of the item data in the list to be stored and the data in the data dictionary, and inputting the item data in the list to be stored into a pre-trained item type recognition model;
identifying the item type corresponding to the item data through the item type identification model;
and storing the item types and the corresponding item data in the relational database.
According to an alternative embodiment of the present invention, the training process of the item type recognition model includes:
acquiring a plurality of project types and project data;
constructing a plurality of data pairs, wherein each data pair comprises an item type and corresponding item data;
and training the convolutional neural network based on the plurality of data pairs to obtain a project type identification model.
According to an optional embodiment of the present invention, when the list to be stored is identified as a list with a header, the data storage method further includes:
acquiring the value of the target character string in the list with the header;
searching a second key field which has the same value as the target character string in the relational database;
and storing the item data in the list to be stored in a data column of the relational database corresponding to the second key field.
According to an alternative embodiment of the present invention, the data storage method further comprises:
acquiring first item data corresponding to a preset first character string and second item data corresponding to a preset second character string in the list to be stored;
deleting a preset identifier in the first item data;
performing numerical value conversion on the second item data;
determining the maximum value in the second item data subjected to numerical value conversion as a designated field, and processing the rest values according to a preset format;
and storing the processed list to be stored as a target list.
A second aspect of the present invention provides a data storage apparatus, running on a computer device or applied in a block chain, comprising:
the list acquisition module is used for acquiring a list to be stored;
the list head identification module is used for identifying whether the list to be stored is a list with a list head;
the dictionary obtaining module is used for obtaining a data dictionary in a Mongo database when the list to be stored is identified to be a list without a header, wherein the value of a key in the data dictionary is the same as the value of a key field in a relational database;
the data matching module is used for matching the item data in the list to be stored with the data in the data dictionary;
the key value acquisition module is used for responding to the successful matching of the item data in the list to be stored and the data in the data dictionary and acquiring a target key corresponding to the item data successfully matched in the data dictionary and the value of the target key;
a field determination module, configured to determine a first key field in the relational database that has a same value as the target key;
and the data storage module is used for storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database, wherein the relational database is positioned in a block chain, and the data block chain storage is realized.
A third aspect of the invention provides a computer apparatus comprising a processor for implementing the data storage method when executing a computer program stored in a memory.
A fourth aspect of the present invention provides a computer-readable storage medium including a storage data area storing data created according to use of a blockchain node and a storage program area storing a computer program which implements the data storage method when executed by a processor.
In summary, the present invention relates to the technical field of data processing in artificial intelligence, and is particularly applicable to the technical field of block chains, and practical application scenarios include but are not limited to smart medical scenarios in smart cities to promote the construction of smart cities. And when the target key corresponding to the item data cannot be matched through the data dictionary, the item type identification model identifies the item type corresponding to the item data, so that the list without the head is converted into the list with the head.
Drawings
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a data storage device according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
The following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example one
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention.
In this embodiment, the data storage method may be applied to a computer device, and for a computer device that needs to store data, the data storage function provided by the method of the present invention may be directly integrated on the computer device, or may be run in the computer device in the form of a Software Development Kit (SKD).
As shown in fig. 1, the data storage method specifically includes the following steps, and the order of the steps in the flowchart may be changed and some steps may be omitted according to different requirements.
The data storage method described in this embodiment may be applicable not only to medical checklist storage scenarios, but also to other checklist (for example, a shipping checklist) storage scenarios.
And S11, acquiring the list to be stored.
The user can send the list to be stored to the computer device and also can send the list picture to be stored to the computer device. When the list picture to be stored is received, firstly, the list picture to be stored is subjected to character recognition by adopting an Optical Character Recognition (OCR) technology to obtain a list to be stored. Wherein, the OCR technology is prior art, the invention is not elaborated herein.
In an optional embodiment, before the obtaining the list to be stored, the data storage method further includes:
acquiring a plurality of historical list pictures;
recognizing the plurality of historical list pictures by adopting an OCR (optical character recognition) to obtain a plurality of historical lists;
extracting item types and corresponding item data in the plurality of history lists;
and creating the data dictionary according to the project type and the corresponding project data.
In this alternative embodiment, a data dictionary is first created, which is stored in the montodb.
When creating the data dictionary, it is necessary to collect history list pictures in advance, and each history list picture includes, but is not limited to: the name of the medicine, the type of the medicine, the property of medical insurance, the department and other project types. One item type corresponds to one key 'sql _ key', and one key 'sql _ key' corresponds to a plurality of item data.
Preferably, the storage format of the data dictionary is as follows:
Figure BDA0002474383280000061
the 'id' is an index in a data dictionary, a plurality of item data are stored in the 'name' at intervals in a mode of a spacer ', the plurality of item data stored in the same' name 'belong to the same key' sql _ key ', and the value' in the key 'sql _ key' is the same as the key field in the relational database.
By corresponding the value of the key in the data dictionary to the key field in the relational database, the key field in the relational database can be determined according to the value of the target key when the target key is searched from the data dictionary.
And S12, identifying whether the list to be stored is a list with a header.
The list to be stored may be a list with a header or a list without a header.
Illustratively, the list of headers is as follows:
{ "item _ type": "item name",
"value": 'Annao tablet' },
{ "item _ type": "type",
"value": the 'western medicine fee' },
{ "item _ type": "Specification/description",
“value”:‘0.5g*24’},
{ "item _ type": the "unit",
"value": the 'sheet' is arranged on the bottom of the container,
……
wherein, the item name, the type, the specification/description, the unit are the item type, and the "Annao tablet", "Western medicine fee", "0.5 g × 24", and "tablet" are the item data. It can be seen that the list with the header includes a plurality of pairs of item _ type and value, the value of item _ type is corresponding to the key field in the relational database, and the value of value is stored in the data column of the relational database corresponding to the key field.
An exemplary, headerless listing is as follows:
{“item_type”:“”,
"value": 'Annao tablet' },
{“item_type”:“”,
"value": the 'western medicine fee' },
{“item_type”:“”,
“value”:‘0.5g*24’},
{“item_type”:“”,
"value": the 'sheet' is arranged on the bottom of the container,
……
it can be seen that the value of item _ type in the list without the header is null, and such unstructured data cannot be directly stored in the relational database. Since there is no key field in the relational database corresponding to the value of value, it cannot be determined in which data column the value of value is stored.
In an optional embodiment, the identifying whether the list to be stored is a list with a header includes:
acquiring a target character string in the list to be stored;
judging whether the value of the target character string is null or not;
when the value of the target character string is null, determining that the list to be stored is a list without a header;
and when the value of the target character string is not null, determining that the list to be stored is a list with a header.
In this alternative embodiment, the target character string is "item _ type", and it is possible to determine whether the list to be stored is a list with a header by identifying whether the value of the target character string is empty.
It should be noted that, when some item types in a list to be stored are empty and other item types are not empty, at this time, the item data corresponding to the item type being empty in the list to be stored may be extracted as a new list without a header, and the item data corresponding to the item type not being empty may be extracted as a new list with a header.
S13, when the list to be stored is identified to be a list without a header, acquiring a data dictionary in the Mongo database.
For item data in a list without headers, the corresponding key and the value of the key may be looked up from the data dictionary.
And S14, matching the item data in the list to be stored with the data in the data dictionary.
And matching each item data in the list without the head with all data in the data dictionary one by one. And if the data which is the same as the project data can be matched in the data dictionary, the matching is considered to be successful. And if the data which is the same as the item data cannot be matched in the data dictionary, the matching is considered to be failed.
S15, responding to the successful matching between the item data in the list to be stored and the data in the data dictionary, and acquiring the target key corresponding to the item data successfully matched in the data dictionary and the value of the target key.
For the successful matching situation, the target key "sql _ key" corresponding to the successfully matched item data is obtained from the data dictionary, and then the value of the target key "sql _ key" is obtained. The value of the target key "sql _ key" is the item type of the item data.
S16, determining the first key field in the relational database with the same value as the target key.
And finding out a key field which is the same as the value of the target key 'sql _ key' from a key database as a first key field.
And S17, storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database, wherein the relational database is located in a block chain, and realizing data block chain storage.
And storing the item data in the list without the header into a data column corresponding to the first key field in the relational database, so that the business system can conveniently acquire data from the relational database for analysis and application.
In an optional embodiment, the data storage method further comprises:
responding to failure of matching of the item data in the list to be stored and the data in the data dictionary, and inputting the item data in the list to be stored into a pre-trained item type recognition model;
identifying the item type corresponding to the item data through the item type identification model;
and storing the item types and the corresponding item data in the relational database.
Because the data which is the same as the data of the items in the list without the headers cannot be matched from the data dictionary, the list without the headers cannot be converted into the list with the headers, and at the moment, the item type corresponding to the item data can be recognized by adopting a pre-trained item type recognition model, so that the list without the headers is converted into the list with the headers.
In this embodiment, since the item types matched by the data dictionary are all accurate, and the item type identified by the item type identification model is a probability value, there is an uncertainty factor, so that if the data identical to the item data can be directly matched by the data dictionary, the key and the key value obtained can be used to find the key field identical to the key value in the relational database, that is, the data dictionary is used to convert the list without the header into the list with the header, and the accuracy is higher.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In an alternative embodiment, the training process of the item type recognition model may include:
acquiring a plurality of project types and project data;
constructing a plurality of data pairs, wherein each data pair comprises an item type and corresponding item data;
and training the convolutional neural network based on the plurality of data pairs to obtain a project type identification model.
In this optional embodiment, when the data dictionary is created, a plurality of historical item types and corresponding item data are acquired, and at this time, the plurality of historical item types and corresponding item data are used as a data set, and a item data type recognition model is trained based on the data set.
In an alternative embodiment, the plurality of data pairs may be divided into a training set based on which the item type identification model is trained and a test set based on which the test pass rate of the item type identification model is tested. And when the test passing rate is higher than a preset passing rate threshold value, determining the trained item type recognition model as the optimal one. And when the test passing rate is lower than the preset passing rate threshold value, dividing the plurality of data pairs into a training set and a testing set again, and training the item type recognition model based on the new training set until the test passing rate of the item type recognition model based on the new testing set is higher than the preset passing rate threshold value.
And after each subsequent storage of the list to be stored in the relational database, updating the data set based on the item data and the item type in the list to be stored, and retraining the item type recognition model based on the updated data set, so as to improve the recognition performance of the item type recognition model.
In an optional embodiment, when the list to be stored is identified as a list with a header, the data storage method further includes:
acquiring the value of the target character string in the list with the header;
searching a second key field which has the same value as the target character string in the relational database;
and storing the item data in the list to be stored in a data column of the relational database corresponding to the second key field.
And for the list with the header, taking the value of the target character string as an item type, taking the item type as a key field, taking the item data corresponding to the item type as a value corresponding to the key field, and storing the value in a relational database.
In an optional embodiment, the data storage method further includes:
acquiring first item data corresponding to a preset first character string and second item data corresponding to a preset second character string in the list to be stored;
deleting a preset identifier in the first item data;
performing numerical value conversion on the second item data;
determining the maximum value in the second item data subjected to numerical value conversion as a designated field, and processing the rest values according to a preset format;
and storing the processed list to be stored as a target list.
For example, the preset first character string may be "specification/description", "free payment rate", and the like, and the preset identifier may be "star", and it is checked whether there is a special identifier in the item data corresponding to the "specification/description", "free payment rate", and the like. The preset second character string can be 'quantity', 'unit price' or 'amount', item data corresponding to the 'quantity', 'unit price' or 'amount' are subjected to numerical value conversion and then compared, the largest is the amount, the other two are the quantity and the unit price, and the relational database can be stored in a form of 'quantity plus unit price'.
Due to the fact that formats of the item data of different lists are different, the item data in the list to be stored can be normalized through the embodiment, and the style of the item data stored in the relational database is kept uniform. Since ASCII conversion is required when a meaningless identifier is written into the database, removal of the meaningless identifier can increase the speed of writing the item data in the list to be stored into the relational database.
In summary, in the data storage method according to this embodiment, when the list to be stored is a list without a header, a target key corresponding to item data in the list to be stored and a value of the target key are matched through a data dictionary; determining a first key field in the relational database with the same value as the target key; and finally, storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database. And matching a target key corresponding to the item data and the value of the target key through the data dictionary, and using the value of the target key as an item type, so that a list without a header is converted into a list with a header, and the list is stored in a relational database.
And when the target key corresponding to the item data cannot be matched through the data dictionary, the item type identification model identifies the item type corresponding to the item data, so that the list without the head is converted into the list with the head.
In the embodiment, a mode of combining a 'data dictionary' + item type identification model is adopted, a list without a header is converted into a list with a header, and the purpose of converting unstructured data into structured data is achieved, so that the purpose of automatically storing the unstructured data in a relational database is achieved. The application and the expansion of the later data are better promoted.
Example two
Fig. 2 is a structural diagram of a data storage device according to a second embodiment of the present invention.
In some embodiments, the data storage device 20 may include a plurality of functional modules comprised of program code segments. The program code of the various program segments in the data storage device 20 may be stored in the memory of the computer apparatus and executed by the at least one processor to perform the functions of data storage (described in detail in fig. 1).
In this embodiment, the data storage device 20 may be divided into a plurality of functional modules according to the functions performed by the data storage device. The functional module may include: the system comprises a list acquisition module 201, a dictionary creation module 202, a header recognition module 203, a dictionary acquisition module 204, a data matching module 205, a key value acquisition module 206, a field determination module 207, a data storage module 208, a model training module 209, and a data processing module 210. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The list obtaining module 201 is configured to obtain a list to be stored.
The user can send the list to be stored to the computer device and also can send the list picture to be stored to the computer device. When the list picture to be stored is received, firstly, the list picture to be stored is subjected to character recognition by adopting an Optical Character Recognition (OCR) technology to obtain a list to be stored. Wherein, the OCR technology is prior art, the invention is not elaborated herein.
The dictionary creating module 202 is configured to create a data dictionary.
Before the list to be stored is obtained, a data dictionary is created and stored in the MongoDB.
In an alternative embodiment, the dictionary creation module 202 creating the data dictionary includes:
acquiring a plurality of historical list pictures;
recognizing the plurality of historical list pictures by adopting an OCR (optical character recognition) to obtain a plurality of historical lists;
extracting item types and corresponding item data in the plurality of history lists;
and creating the data dictionary according to the project type and the corresponding project data.
When creating the data dictionary, it is necessary to collect history list pictures in advance, and each history list picture includes, but is not limited to: the name of the medicine, the type of the medicine, the property of medical insurance, the department and other project types. One item type corresponds to one key 'sql _ key', and one key 'sql _ key' corresponds to a plurality of item data.
Preferably, the storage format of the data dictionary is as follows:
Figure BDA0002474383280000131
the 'id' is an index in a data dictionary, a plurality of item data are stored in the 'name' at intervals in a mode of a spacer ', the plurality of item data stored in the same' name 'belong to the same key' sql _ key ', and the value' in the key 'sql _ key' is the same as the key field in the relational database.
By corresponding the value of the key in the data dictionary to the key field in the relational database, the key field in the relational database can be determined according to the value of the target key when the target key is searched from the data dictionary.
The header identification module 203 is configured to identify whether the list to be stored is a list with a header.
The list to be stored may be a list with a header or a list without a header.
Illustratively, the list of headers is as follows:
{ "item _ type": "item name",
"value": 'Annao tablet' },
{ "item _ type": "type",
"value": the 'western medicine fee' },
{ "item _ type": "Specification/description",
“value”:‘0.5g*24’},
{ "item _ type": the "unit",
"value": the 'sheet' is arranged on the bottom of the container,
……
wherein, the item name, the type, the specification/description, the unit are the item type, and the "Annao tablet", "Western medicine fee", "0.5 g × 24", and "tablet" are the item data. It can be seen that the list with the header includes a plurality of pairs of item _ type and value, the value of item _ type is corresponding to the key field in the relational database, and the value of value is stored in the data column of the relational database corresponding to the key field.
An exemplary, headerless listing is as follows:
{“item_type”:“”,
"value": 'Annao tablet' },
{“item_type”:“”,
"value": the 'western medicine fee' },
{“item_type”:“”,
“value”:‘0.5g*24’},
{“item_type”:“”,
"value": the 'sheet' is arranged on the bottom of the container,
……
it can be seen that the value of item _ type in the list without the header is null, and such unstructured data cannot be directly stored in the relational database. Since there is no key field in the relational database corresponding to the value of value, it cannot be determined in which data column the value of value is stored.
In an optional embodiment, the identifying, by the header identifying module 203, whether the list to be stored is a list with a header includes:
acquiring a target character string in the list to be stored;
judging whether the value of the target character string is null or not;
when the value of the target character string is null, determining that the list to be stored is a list without a header;
and when the value of the target character string is not null, determining that the list to be stored is a list with a header.
In this alternative embodiment, the target character string is "item _ type", and it is possible to determine whether the list to be stored is a list with a header by identifying whether the value of the target character string is empty.
It should be noted that, when some item types in a list to be stored are empty and other item types are not empty, at this time, the item data corresponding to the item type being empty in the list to be stored may be extracted as a new list without a header, and the item data corresponding to the item type not being empty may be extracted as a new list with a header.
The dictionary obtaining module 204 is configured to obtain a data dictionary in the Mongo database when the list to be stored is identified as a list without a header.
For item data in a list without headers, the corresponding key and the value of the key may be looked up from the data dictionary.
The data matching module 205 is configured to match the item data in the list to be stored with the data in the data dictionary.
And matching each item data in the list without the head with all data in the data dictionary one by one. And if the data which is the same as the project data can be matched in the data dictionary, the matching is considered to be successful. And if the data which is the same as the item data cannot be matched in the data dictionary, the matching is considered to be failed.
The key value obtaining module 206 is configured to, in response to that item data in the list to be stored is successfully matched with data in the data dictionary, obtain a target key corresponding to the item data successfully matched in the data dictionary and a value of the target key.
For the successful matching situation, the target key "sql _ key" corresponding to the successfully matched item data is obtained from the data dictionary, and then the value of the target key "sql _ key" is obtained. The value of the target key "sql _ key" is the item type of the item data.
The field determination module 207 is configured to determine a first key field in the relational database, where the first key field has the same value as the target key.
And finding out a key field which is the same as the value of the target key 'sql _ key' from a key database as a first key field.
The data storage module 208 is configured to store the item data in the list to be stored in a data column corresponding to the first key field in the relational database.
And storing the item data in the list without the header into a data column corresponding to the first key field in the relational database.
The data storage module 208 is further configured to, in response to failure in matching the item data in the to-be-stored list with the data in the data dictionary, input the item data in the to-be-stored list into a pre-trained item type recognition model; identifying the item type corresponding to the item data through the item type identification model; and storing the item types and the corresponding item data in the relational database. The relational database is positioned in the block chain, and data block chain storage is realized.
Because the data which is the same as the data of the items in the list without the headers cannot be matched from the data dictionary, the list without the headers cannot be converted into the list with the headers, and at the moment, the item type corresponding to the item data can be recognized by adopting a pre-trained item type recognition model, so that the list without the headers is converted into the list with the headers.
In this embodiment, since the item types matched by the data dictionary are all accurate, and the item type identified by the item type identification model is a probability value, there is an uncertainty factor, so that if the data identical to the item data can be directly matched by the data dictionary, the key and the key value obtained can be used to find the key field identical to the key value in the relational database, that is, the data dictionary is used to convert the list without the header into the list with the header, and the accuracy is higher.
The model training module 209 is used for training the item type recognition model.
In an alternative embodiment, the model training module 209 training the item type recognition model comprises:
acquiring a plurality of project types and project data;
constructing a plurality of data pairs, wherein each data pair comprises an item type and corresponding item data;
and training the convolutional neural network based on the plurality of data pairs to obtain a project type identification model.
In this optional embodiment, when the data dictionary is created, a plurality of historical item types and corresponding item data are acquired, and at this time, the plurality of historical item types and corresponding item data are used as a data set, and a item data type recognition model is trained based on the data set.
In an alternative embodiment, the plurality of data pairs may be divided into a training set based on which the item type identification model is trained and a test set based on which the test pass rate of the item type identification model is tested. And when the test passing rate is higher than a preset passing rate threshold value, determining the trained item type recognition model as the optimal one. And when the test passing rate is lower than the preset passing rate threshold value, dividing the plurality of data pairs into a training set and a testing set again, and training the item type recognition model based on the new training set until the test passing rate of the item type recognition model based on the new testing set is higher than the preset passing rate threshold value.
And after each subsequent storage of the list to be stored in the relational database, updating the data set based on the item data and the item type in the list to be stored, and retraining the item type recognition model based on the updated data set, so as to improve the recognition performance of the item type recognition model.
In an optional embodiment, when the to-be-stored list is identified as a list with a header, the data storage module 208 is further configured to obtain a value of a target character string in the list with the header; searching a second key field which has the same value as the target character string in the relational database; and storing the item data in the list to be stored in a data column of the relational database corresponding to the second key field.
And for the list with the header, taking the value of the target character string as an item type, taking the item type as a key field, taking the item data corresponding to the item type as a value corresponding to the key field, and storing the value in a relational database.
The data processing module 210 is configured to process the list to be stored.
In an optional embodiment, the processing, by the data processing module 210, the list to be stored includes:
acquiring first item data corresponding to a preset first character string and second item data corresponding to a preset second character string in the list to be stored;
deleting a preset identifier in the first item data;
performing numerical value conversion on the second item data;
determining the maximum value in the second item data subjected to numerical value conversion as a designated field, and processing the rest values according to a preset format;
and storing the processed list to be stored as a target list.
For example, the preset first character string may be "specification/description", "free payment rate", and the like, and the preset identifier may be "star", and it is checked whether there is a special identifier in the item data corresponding to the "specification/description", "free payment rate", and the like. The preset second character string can be 'quantity', 'unit price' or 'amount', item data corresponding to the 'quantity', 'unit price' or 'amount' are subjected to numerical value conversion and then compared, the largest is the amount, the other two are the quantity and the unit price, and the relational database can be stored in a form of 'quantity plus unit price'.
Due to the fact that formats of the item data of different lists are different, the item data in the list to be stored can be normalized through the embodiment, and the style of the item data stored in the relational database is kept uniform. Since ASCII conversion is required when a meaningless identifier is written into the database, removal of the meaningless identifier can increase the speed of writing the item data in the list to be stored into the relational database.
In summary, in the data storage device according to this embodiment, when the list to be stored is a list without a header, a data dictionary is used to match a target key corresponding to item data in the list to be stored and a value of the target key; determining a first key field in the relational database with the same value as the target key; and finally, storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database. And matching a target key corresponding to the item data and the value of the target key through the data dictionary, and using the value of the target key as an item type, so that a list without a header is converted into a list with a header, and the list is stored in a relational database.
And when the target key corresponding to the item data cannot be matched through the data dictionary, the item type identification model identifies the item type corresponding to the item data, so that the list without the head is converted into the list with the head.
In the embodiment, a mode of combining a 'data dictionary' + item type identification model is adopted, a list without a header is converted into a list with a header, and the purpose of converting unstructured data into structured data is achieved, so that the purpose of automatically storing the unstructured data in a relational database is achieved. The application and the expansion of the later data are better promoted.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a computer device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.
It should be noted that the computer device 3 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 is used for storing program codes and various data, such as the data storage device 20 installed in the computer device 3, and realizes high-speed and automatic access to programs or data during the operation of the computer device 3. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only Memory (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer that can be used to carry or store data.
In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions of the computer device 3 and processes data, such as performing functions of data storage, by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and according to a preferred embodiment of the present invention, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the data storage method according to the embodiments of the present invention.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In a further embodiment, in conjunction with fig. 2, the at least one processor 32 may execute operating means of the computer device 3 as well as installed various types of application programs (such as the data storage device 20), program code, and the like, such as the various modules described above.
The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the respective modules illustrated in fig. 2 are program codes stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the respective modules for the purpose of data storage.
In one embodiment of the invention, the memory 31 stores a plurality of instructions that are executed by the at least one processor 32 to implement the functions of data storage.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A data storage method is applied to a computer device or a block chain, and comprises the following steps:
acquiring a list to be stored;
identifying whether the list to be stored is a list with a header;
when the list to be stored is identified to be a list without a header, acquiring a data dictionary in a Mongo database, wherein the value of a key in the data dictionary is the same as the key field in the relational database;
matching the item data in the list to be stored with the data in the data dictionary;
responding to the successful matching of the item data in the list to be stored and the data in the data dictionary, and acquiring a target key corresponding to the item data successfully matched in the data dictionary and the value of the target key;
determining a first key field in the relational database with the same value as the target key;
and storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database, wherein the relational database is positioned in a block chain, and data block chain storage is realized.
2. The data storage method of claim 1, wherein prior to said obtaining the manifest to be stored, the data storage method further comprises:
acquiring a plurality of historical list pictures;
recognizing the plurality of historical list pictures by adopting an OCR (optical character recognition) to obtain a plurality of historical lists;
extracting item types and corresponding item data in the plurality of history lists;
and creating the data dictionary according to the project type and the corresponding project data.
3. The data storage method of claim 1, wherein the identifying whether the list to be stored is a list with a header comprises:
acquiring a target character string in the list to be stored;
judging whether the value of the target character string is null or not;
when the value of the target character string is null, determining that the list to be stored is a list without a header;
and when the value of the target character string is not null, determining that the list to be stored is a list with a header.
4. The data storage method of claim 1, wherein the data storage method further comprises:
responding to failure of matching of the item data in the list to be stored and the data in the data dictionary, and inputting the item data in the list to be stored into a pre-trained item type recognition model;
identifying the item type corresponding to the item data through the item type identification model;
and storing the item types and the corresponding item data in the relational database.
5. The data storage method of claim 1, wherein the training process of the item type recognition model comprises:
acquiring a plurality of project types and project data;
constructing a plurality of data pairs, wherein each data pair comprises an item type and corresponding item data;
and training the convolutional neural network based on the plurality of data pairs to obtain a project type identification model.
6. The data storage method according to any one of claims 1 to 5, wherein when the list to be stored is identified as a list with a header, the data storage method further comprises:
acquiring the value of the target character string in the list with the header;
searching a second key field which has the same value as the target character string in the relational database;
and storing the item data in the list to be stored in a data column of the relational database corresponding to the second key field.
7. The data storage method of claim 6, wherein the data storage method further comprises:
acquiring first item data corresponding to a preset first character string and second item data corresponding to a preset second character string in the list to be stored;
deleting a preset identifier in the first item data;
performing numerical value conversion on the second item data;
determining the maximum value in the second item data subjected to numerical value conversion as a designated field, and processing the rest values according to a preset format;
and storing the processed list to be stored as a target list.
8. A data storage device, operable on a computer device or for use in a blockchain, the data storage device comprising:
the list acquisition module is used for acquiring a list to be stored;
the list head identification module is used for identifying whether the list to be stored is a list with a list head;
the dictionary obtaining module is used for obtaining a data dictionary in a Mongo database when the list to be stored is identified to be a list without a header, wherein the value of a key in the data dictionary is the same as the value of a key field in a relational database;
the data matching module is used for matching the item data in the list to be stored with the data in the data dictionary;
the key value acquisition module is used for responding to the successful matching of the item data in the list to be stored and the data in the data dictionary and acquiring a target key corresponding to the item data successfully matched in the data dictionary and the value of the target key;
a field determination module, configured to determine a first key field in the relational database that has a same value as the target key;
and the data storage module is used for storing the item data in the list to be stored in a data column corresponding to the first key field in the relational database, wherein the relational database is positioned in a block chain, and the data block chain storage is realized.
9. A computer device comprising a processor for implementing a data storage method as claimed in any one of claims 1 to 7 when executing a computer program stored in a memory.
10. A computer-readable storage medium, comprising a stored data area storing data created according to use of a blockchain node and a stored program area storing a computer program which, when executed by a processor, implements a data storage method according to any one of claims 1 to 7.
CN202010358876.9A 2020-04-29 2020-04-29 Data storage method and device, computer equipment and storage medium Pending CN111651452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010358876.9A CN111651452A (en) 2020-04-29 2020-04-29 Data storage method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010358876.9A CN111651452A (en) 2020-04-29 2020-04-29 Data storage method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111651452A true CN111651452A (en) 2020-09-11

Family

ID=72346587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010358876.9A Pending CN111651452A (en) 2020-04-29 2020-04-29 Data storage method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111651452A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434024A (en) * 2020-10-23 2021-03-02 杭州未名信科科技有限公司 Relational database-oriented data dictionary generation method, device, equipment and medium
CN113627892A (en) * 2021-08-16 2021-11-09 深圳市云采网络科技有限公司 BOM data identification method and electronic equipment thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434024A (en) * 2020-10-23 2021-03-02 杭州未名信科科技有限公司 Relational database-oriented data dictionary generation method, device, equipment and medium
CN112434024B (en) * 2020-10-23 2024-04-02 杭州未名信科科技有限公司 Relational database-oriented data dictionary generation method, device, equipment and medium
CN113627892A (en) * 2021-08-16 2021-11-09 深圳市云采网络科技有限公司 BOM data identification method and electronic equipment thereof
CN113627892B (en) * 2021-08-16 2023-09-01 深圳市云采网络科技有限公司 BOM data identification method and electronic equipment thereof

Similar Documents

Publication Publication Date Title
CN112445854B (en) Multi-source service data real-time processing method, device, terminal and storage medium
CN112016905B (en) Information display method and device based on approval process, electronic equipment and medium
CN112698971A (en) Rule engine based parameter conversion method, device, equipment and medium
CN113946690A (en) Potential customer mining method and device, electronic equipment and storage medium
CN111949708A (en) Multi-task prediction method, device, equipment and medium based on time sequence feature extraction
CN111950621A (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN111984898A (en) Label pushing method and device based on big data, electronic equipment and storage medium
CN111986794A (en) Anti-counterfeiting registration method and device based on face recognition, computer equipment and medium
CN111950625A (en) Risk identification method and device based on artificial intelligence, computer equipment and medium
CN111696663A (en) Disease risk analysis method and device, electronic equipment and computer storage medium
CN113901236A (en) Target identification method and device based on artificial intelligence, electronic equipment and medium
CN111651452A (en) Data storage method and device, computer equipment and storage medium
CN113806434A (en) Big data processing method, device, equipment and medium
CN114386509A (en) Data fusion method and device, electronic equipment and storage medium
CN112948275A (en) Test data generation method, device, equipment and storage medium
CN112102011A (en) User grade prediction method, device, terminal and medium based on artificial intelligence
CN111738778A (en) User portrait generation method and device, computer equipment and storage medium
CN114880449A (en) Reply generation method and device of intelligent question answering, electronic equipment and storage medium
CN112395432B (en) Course pushing method and device, computer equipment and storage medium
CN113570286B (en) Resource allocation method and device based on artificial intelligence, electronic equipment and medium
CN113420847B (en) Target object matching method based on artificial intelligence and related equipment
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN114331661A (en) Data verification method and device, electronic equipment and storage medium
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
CN114595321A (en) Question marking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination