CN111177387A - User list information processing method, electronic device and computer readable storage medium - Google Patents

User list information processing method, electronic device and computer readable storage medium Download PDF

Info

Publication number
CN111177387A
CN111177387A CN201911388144.8A CN201911388144A CN111177387A CN 111177387 A CN111177387 A CN 111177387A CN 201911388144 A CN201911388144 A CN 201911388144A CN 111177387 A CN111177387 A CN 111177387A
Authority
CN
China
Prior art keywords
classification
text string
user list
information
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911388144.8A
Other languages
Chinese (zh)
Inventor
黄忆丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201911388144.8A priority Critical patent/CN111177387A/en
Publication of CN111177387A publication Critical patent/CN111177387A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of intelligent decision, and discloses a user list information processing method, which comprises the following steps: the method comprises the steps of obtaining a user list, carrying out character recognition to obtain text string information to be classified, inputting the text string information to be classified into a classification model, outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a corresponding text string, substituting the field name and the text string in each classification group into a cosine similarity formula to calculate a cosine similarity value, judging that the field names in the classification groups are correctly matched with the text strings when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list. The invention also provides an electronic device and a computer readable storage medium. The invention adopts the classification model to automatically identify and classify a plurality of classification groups of the user list, and cosine similarity calculation is carried out on each classification group to confirm the matching degree of text strings and field names in the classification groups, thereby improving the accuracy of the classification of the user list information.

Description

User list information processing method, electronic device and computer readable storage medium
Technical Field
The present invention relates to the field of intelligent decision making technologies, and in particular, to a user list information processing method, an electronic apparatus, and a computer-readable storage medium.
Background
The name list is a name card for displaying the user information, and the acquisition and the arrangement of the user name list information are beneficial to improving the business requirements of enterprises.
One existing method for acquiring a user list is to screen and classify the content of the user list through workers, and record information in the user list one by one, wherein manual labeling and classification are easy to cause manual omission or errors, so that information matching is wrong; the other method is to scan a user list to extract character information to perform full enumeration of keywords, and then label the character information to perform classification, so that the character information cannot be classified due to the fact that the character information cannot be correctly understood and the keywords cannot be enumerated in full, and the user list information can be correctly matched only by manually labeling and classifying.
Therefore, the existing classification mode of the user list information also depends on manual operation, and the automation efficiency is low.
Disclosure of Invention
In view of the above, there is a need to provide a method for processing user list information, which aims to solve the problems of low automation efficiency and low accuracy of user list information classification.
The user list information processing method provided by the invention comprises the following steps:
an acquisition step: acquiring a user list and performing character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings;
and (3) classification step: inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name;
a confirmation step: and substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list.
Optionally, the obtaining step further includes:
and when the user list contains a plurality of user data, respectively carrying out character recognition on each user data to obtain text string information to be classified of each user.
Optionally, the training process of the classification model includes:
acquiring sample text string information;
classifying sample field names and sample text strings according to the sample text string information, and labeling the sample text strings corresponding to the sample field names with attributes; and
and training a classification model according to the sample text string corresponding to the sample field name marked by the attribute to obtain a plurality of classification groups of the sample text string information, wherein each classification group comprises a sample field name and a sample text string corresponding to the sample field name.
Optionally, the cosine similarity formula is:
Figure BDA0002336446730000021
where a denotes a vector of a text string of a classification group, B denotes a vector of a field name of the classification group, Ai denotes each component of the vector a, and Bi denotes each component of the vector B.
Optionally, the method further includes, after the confirming step:
a generation step: and according to the determined classification group information of the user list, inputting the field names and the text strings in the classification group of the user list into a list system, and generating a list information table corresponding to the user list.
In addition, to achieve the above object, the present invention also provides an electronic device including: the device comprises a memory and a processor, wherein the memory stores a user list information processing program which can run on the processor, and the user list information processing program realizes the following steps when being executed by the processor:
an acquisition step: acquiring a user list and performing character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings;
and (3) classification step: inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name;
a confirmation step: and substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list.
Optionally, the obtaining step further includes:
and when the user list contains a plurality of user data, respectively carrying out character recognition on each user data to obtain text string information to be classified of each user.
Optionally, the training process of the classification model includes:
acquiring sample text string information;
classifying sample field names and sample text strings according to the sample text string information, and labeling the sample text strings corresponding to the sample field names with attributes; and
and training a classification model according to the sample text string corresponding to the sample field name marked by the attribute to obtain a plurality of classification groups of the sample text string information, wherein each classification group comprises a sample field name and a sample text string corresponding to the sample field name.
Optionally, the cosine similarity formula is:
Figure BDA0002336446730000031
where a denotes a vector of a text string of a classification group, B denotes a vector of a field name of the classification group, Ai denotes each component of the vector a, and Bi denotes each component of the vector B.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, on which a user list information processing program is stored, where the user list information processing program is executable by one or more processors to implement the steps of the user list information processing method.
Compared with the prior art, the text string information to be classified is obtained by acquiring a user list for character recognition, the text string information to be classified is input into a pre-trained classification model, a plurality of classification groups of the text string information to be classified are output, each classification group comprises a field name and a text string corresponding to the field name, the field name and the text string in each classification group are substituted into a cosine similarity formula, the cosine similarity value of the field name and the text string of each classification group is calculated, when the calculated cosine similarity value exceeds a preset threshold value, the field name and the text string in the classification group are judged to be correctly matched, and all classification group information of the user list is determined. The invention adopts a classification model to automatically identify and classify a plurality of classification groups of the user list, and cosine similarity calculation is carried out on each classification group to confirm the matching degree of text strings and field names in the classification groups, thereby improving the accuracy of user list information classification.
Drawings
FIG. 1 is a diagram of an electronic device according to an embodiment of the invention;
FIG. 2 is a block diagram of an embodiment of a user list information processing procedure in FIG. 1;
fig. 3 is a flowchart of an embodiment of a method for processing user list information according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Fig. 1 is a schematic diagram of an electronic device 1 according to an embodiment of the invention. The electronic apparatus 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. The electronic device 1 may be a computer, or may be a single network server, a server group composed of a plurality of network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing, where cloud computing is one of distributed computing and is a super virtual computer composed of a group of loosely coupled computers.
In the present embodiment, the electronic device 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13, which are communicatively connected to each other through a system bus, wherein the memory 11 stores a user list information processing program 10, and the processor 12 can execute the user list information processing program 10. Fig. 1 shows only the electronic apparatus 1 having the components 11 to 13 and the user list information processing program 10, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the electronic apparatus 1, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The storage 11 includes a memory and at least one type of readable storage medium. The memory provides cache for the operation of the electronic device 1; the readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic apparatus 1, such as a hard disk of the electronic apparatus 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the electronic apparatus 1, such as a plug-in hard disk provided on the electronic apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (FlashCard), and the like. In this embodiment, the readable storage medium of the memory 11 is generally used for storing an operating system and various types of application software installed in the electronic device 1, for example, codes of the user list information processing program 10 in an embodiment of the present invention are stored. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is generally used for controlling the overall operation of the electronic apparatus 1, such as performing control and processing related to data interaction or communication with other devices. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the user list information processing program 10.
The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is used for establishing a communication connection between the electronic device 1 and a client (not shown).
Optionally, the electronic device 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic apparatus 1 and for displaying a visualized user interface.
In an embodiment of the present invention, the user list information processing program 10, when executed by the processor 12, implements the following steps of obtaining, classifying and confirming.
An acquisition step: the method comprises the steps of obtaining a user list and carrying out character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings.
The character recognition is abbreviated as OCR and the english is optical character recognition, which is a technology for processing text information, and is a process of scanning text data to form an image file and analyzing and processing the image file to obtain characters and layout information.
In this embodiment, the user list may be a user name card, a company name card, an address book, or a list of names.
In one embodiment, a personal business card is obtained and scanned or photographed for character recognition, and text string information to be classified contained in the personal business card is obtained, wherein the text string information to be classified comprises a plurality of field names (such as names, positions, addresses and the like) and text strings (such as Lianhao, general finance prison, Jiangsu province xxx and the like).
In an embodiment of the present invention, the acquiring step further includes:
and when the user list contains a plurality of user data, respectively carrying out character recognition on each user data to obtain text string information to be classified of each user.
And (3) classification step: inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name;
in this embodiment, a classification model is used to classify a plurality of field name text strings in the text string information to be classified, the classification model is a neural network model trained in advance, the field names in the text string information to be classified and a text string representing the field names can be automatically identified, and each field name and a text string representing the field name are used as a classification group.
For example, in one embodiment, text string information (e.g., name, position, address, telephone number, business scope, Liaohao, general finance prison, Jiangsu province Nantong city xxx, 130xxx, business electronics, etc.) to be classified, which is obtained by character recognition of a personal business card, is input into a pre-trained classification model, and a plurality of classification groups of the personal business card are output, wherein each classification group comprises a field name and a text string representing the field name. For example, the first classification group: the field name "corresponds to the text string" Liyihao "; a second classification group: the field name "position" corresponds to the text string "finance chief prison"; third classification group: the field name 'service scope' corresponds to the text string 'operating electronic and electric products', etc.
Wherein the training process of the classification model comprises the following steps:
acquiring sample text string information;
classifying sample field names and sample text strings according to the sample text string information, and labeling the sample text strings corresponding to the sample field names with attributes; and
and training a classification model according to the sample text string corresponding to the sample field name marked by the attribute to obtain a plurality of classification groups of the sample text string information, wherein each classification group comprises a sample field name and a sample text string corresponding to the sample field name.
The pre-trained classification model adopts a deep learning model of a neural network, and Deep Learning (DL) is a rule for learning sample data, has an analysis learning capability, and can identify data such as characters, images and sounds. In this embodiment, the pre-trained classification model is to perform deep learning on the sample field names and the sample text strings in the sample text string information, so as to automatically recognize and classify each group of the sample field names and the sample text strings. Compared with the prior art, the method has the advantages that the keywords are difficult to enumerate in full and the information classification is wrong due to manual labeling, so that the intelligent classification is realized, the flow is reduced, and the efficiency of the user list classification is improved.
A confirmation step: and substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list.
In this embodiment, although the classification model can identify and classify a plurality of classification groups in the text string information to be classified, the classification model itself cannot confirm the correctness of the field names in each classification group and the matching degree of the text strings corresponding to the field names, in order to overcome this defect, improve the accuracy of the text strings corresponding to the field names of each classified group, introduce a cosine similarity formula, substitute the field names and the text strings in each classification group into the cosine similarity, calculate the cosine similarity between the field names of each classification group and the text strings, and determine that the field names in the classification groups are correctly matched with the text strings when the calculated cosine similarity exceeds a preset threshold (for example, the preset threshold is 0.98), thereby determining all classification group information of the user list.
The cosine similarity formula is as follows:
Figure BDA0002336446730000081
a denotes a vector of a text string of one classification group, B denotes a vector of field names of the classification group, Ai denotes components of the vector a, and Bi denotes components of the vector B. For example, in one embodiment, the field names and the text strings in each of the classification groups (e.g., a first classification group: the field name "corresponds to the text string" Liaohao ", a second classification group: the field name" position "corresponds to the text string" general financial prison ", a third classification group: the field name" business scope "corresponds to the text string" business electronics ", etc.) are converted into vectors, and the vectors corresponding to each of the classification groups are substituted into a cosine similarity formula to obtain a cosine similarity value of the field name and the text string of each classification group. For example, the first classification group: a represents the vector of the text string "Lijiahao", B represents the vector of the field name "; a second classification group: a represents the vector of the text string "finance chief director", B represents the vector of the field name "position"; third classification group: a represents a vector of the text string "operate the electronic and electric products", B represents a vector of the field name "service range", and so on.
When cosine similarity calculation is carried out, firstly, converting the field names and the text strings in each classification group into vectors, expressing the word lengths of the field names and the text strings by the vectors, calculating the cosine similarity value of the field names and the text strings of each classification group by using the cosine similarity, and when the cosine similarity value is-1, indicating that the directions of the two calculated vectors are opposite and the two vectors are not matched; when the cosine similarity value is 1, the direction indicated by the two calculated vectors is completely the same, and the two vectors are matched; when the cosine similarity value is 0, it means that the two calculated vectors are not correlated, and the two vectors are independent vectors.
In another embodiment of the present invention, the user list information processing program 10 when executed by the processor 12 further implements the following steps after the confirming step:
a generation step: and according to the determined classification group information of the user list, inputting the field names and the text strings in the classification group of the user list into a list system, and generating a list information table corresponding to the user list.
In this embodiment, the list system is a normalized user list management system, and can perform normalized and unified management on the field names and the text strings entered into the list system to generate a list information table corresponding to the user list. The form system can also be a list book consisting of a plurality of list information tables, so that a user can quickly inquire and read the list book.
As can be seen from the foregoing embodiment, in the electronic device 1 provided by the present invention, firstly, a user list is obtained to perform character recognition to obtain text string information to be classified; then, inputting the information of the text string to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the information of the text string to be classified, wherein each classification group comprises a field name and a text string corresponding to the field name; and finally, substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list. According to the electronic device 1, the classification model is adopted to automatically identify and classify the plurality of classification groups of the user list, and cosine similarity calculation is carried out on each classification group to confirm the matching degree of text strings and field names in the classification groups, so that the accuracy of user list information classification is improved.
In other embodiments, the user list information processing program 10 may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, where the modules referred to in the present invention refer to a series of computer program instruction segments capable of performing specific functions to describe the execution process of the user list information processing program 10 in the electronic device 1.
Fig. 2 is a block diagram of an embodiment of the user list information processing program 10 in fig. 1.
In an embodiment of the present invention, the user list information processing program 10 includes an obtaining module 110, a classifying module 120, and a confirming module 130, which exemplarily:
the obtaining module 110 is configured to obtain a user list and perform character recognition to obtain text string information to be classified, where the text string information to be classified includes a plurality of field names and character strings.
The classification module 120 is configured to input the obtained text string information to be classified into a pre-trained classification model, and output a plurality of classification groups of the text string information to be classified, where each classification group includes a field name and a text string representing the field name.
The determining module 130 is configured to substitute the field names and the text strings in each of the classification groups into a cosine similarity formula, calculate a cosine similarity value between the field names and the text strings in each of the classification groups, and when the calculated cosine similarity value exceeds a preset threshold, determine that the field names and the text strings in the classification groups are correctly matched, and determine all classification group information of the user list.
The functions or operation steps of the obtaining module 110, the classifying module 120, and the confirming module 130 are substantially the same as those of the above embodiments, and are not described herein again.
Referring to FIG. 3, a flowchart of an embodiment of a user list information processing method according to the present invention is shown, and the user list information processing method includes steps S1-S3.
S1, obtaining a user list and carrying out character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings.
The character recognition is abbreviated as OCR and the english is optical character recognition, which is a technology for processing text information, and is a process of scanning text data to form an image file and analyzing and processing the image file to obtain characters and layout information.
In this embodiment, the user list may be a user name card, a company name card, an address book, or a list of names.
In one embodiment, a personal business card is obtained and scanned or photographed for character recognition, and text string information to be classified contained in the personal business card is obtained, wherein the text string information to be classified comprises a plurality of field names (such as names, positions, addresses and the like) and text strings (such as Lianhao, general finance prison, Jiangsu province xxx and the like).
In an embodiment of the present invention, the step S1 further includes:
and when the user list contains a plurality of user data, respectively carrying out character recognition on each user data to obtain text string information to be classified of each user.
And S2, inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name.
In this embodiment, a classification model is used to classify a plurality of field name text strings in the text string information to be classified, the classification model is a neural network model trained in advance, the field names in the text string information to be classified and a text string representing the field names can be automatically identified, and each field name and a text string representing the field name are used as a classification group.
For example, in one embodiment, text string information (e.g., name, position, address, telephone number, business scope, Liaohao, general finance prison, Jiangsu province Nantong city xxx, 130xxx, business electronics, etc.) to be classified, which is obtained by character recognition of a personal business card, is input into a pre-trained classification model, and a plurality of classification groups of the personal business card are output, wherein each classification group comprises a field name and a text string representing the field name. For example, the first classification group: the field name "corresponds to the text string" Liyihao "; a second classification group: the field name "position" corresponds to the text string "finance chief prison"; third classification group: the field name 'service scope' corresponds to the text string 'operating electronic and electric products', etc.
Wherein the training process of the classification model comprises the following steps:
acquiring sample text string information;
classifying sample field names and sample text strings according to the sample text string information, and labeling the sample text strings corresponding to the sample field names with attributes; and
and training a classification model according to the sample text string corresponding to the sample field name marked by the attribute to obtain a plurality of classification groups of the sample text string information, wherein each classification group comprises a sample field name and a sample text string corresponding to the sample field name.
The pre-trained classification model adopts a deep learning model of a neural network, and Deep Learning (DL) is a rule for learning sample data, has an analysis learning capability, and can identify data such as characters, images and sounds. In this embodiment, the pre-trained classification model is to perform deep learning on the sample field names and the sample text strings in the sample text string information, so as to automatically recognize and classify each group of the sample field names and the sample text strings. Compared with the prior art, the method has the advantages that the keywords are difficult to enumerate in full and the information classification is wrong due to manual labeling, so that the intelligent classification is realized, the flow is reduced, and the efficiency of the user list classification is improved.
And S3, substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value between the field names and the text strings of each classification group, and when the cosine similarity value obtained by calculation exceeds a preset threshold value, judging that the field names and the text strings in the classification groups are correctly matched, and determining all classification group information of the user list.
In this embodiment, although the classification model can identify and classify a plurality of classification groups in the text string information to be classified, the classification model itself cannot confirm the correctness of the field names in each classification group and the matching degree of the text strings corresponding to the field names, in order to overcome this defect, improve the accuracy of the text strings corresponding to the field names of each classified group, introduce a cosine similarity formula, substitute the field names and the text strings in each classification group into the cosine similarity, calculate the cosine similarity between the field names of each classification group and the text strings, and determine that the field names in the classification groups are correctly matched with the text strings when the calculated cosine similarity exceeds a preset threshold (for example, the preset threshold is 0.98), thereby determining all classification group information of the user list.
The cosine similarity formula is as follows:
Figure BDA0002336446730000121
a denotes a vector of a text string of one classification group, B denotes a vector of field names of the classification group, Ai denotes components of the vector a, and Bi denotes components of the vector B. For example, in one embodiment, the field names and the text strings in each of the classification groups (e.g., a first classification group: the field name "corresponds to the text string" Liaohao ", a second classification group: the field name" position "corresponds to the text string" general financial prison ", a third classification group: the field name" business scope "corresponds to the text string" business electronics ", etc.) are converted into vectors, and the vectors corresponding to each of the classification groups are substituted into a cosine similarity formula to obtain a cosine similarity value of the field name and the text string of each classification group. For example, the first classification group: a represents the vector of the text string "Lijiahao", B represents the vector of the field name "; a second classification group: a represents the vector of the text string "finance chief director", B represents the vector of the field name "position"; third classification group: a represents a vector of the text string "operate the electronic and electric products", B represents a vector of the field name "service range", and so on.
When cosine similarity calculation is carried out, firstly, converting the field names and the text strings in each classification group into vectors, expressing the word lengths of the field names and the text strings by the vectors, calculating the cosine similarity value of the field names and the text strings of each classification group by using the cosine similarity, and when the cosine similarity value is-1, indicating that the directions of the two calculated vectors are opposite and the two vectors are not matched; when the cosine similarity value is 1, the direction indicated by the two calculated vectors is completely the same, and the two vectors are matched; when the cosine similarity value is 0, it means that the two calculated vectors are not correlated, and the two vectors are independent vectors.
In another embodiment of the present invention, after step S3, the method for processing user list information further includes:
a generation step: and according to the determined classification group information of the user list, inputting the field names and the text strings in the classification group of the user list into a list system, and generating a list information table corresponding to the user list.
In this embodiment, the list system is a normalized user list management system, and can perform normalized and unified management on the field names and the text strings entered into the list system to generate a list information table corresponding to the user list. The form system can also be a list book consisting of a plurality of list information tables, so that a user can quickly inquire and read the list book.
The embodiment shows that the user list information processing method provided by the invention comprises the steps of firstly, obtaining a user list to perform character recognition to obtain the text string information to be classified; then, inputting the information of the text string to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the information of the text string to be classified, wherein each classification group comprises a field name and a text string corresponding to the field name; and finally, substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list. According to the user list information processing method provided by the invention, a classification model is adopted to automatically identify and classify a plurality of classification groups of the user list, and cosine similarity calculation is carried out on each classification group to confirm the matching degree of text strings and field names in the classification groups, so that the accuracy of user list information classification is improved.
In addition, the embodiment of the present invention further provides a computer-readable storage medium, which may be any one of or any combination of a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, and the like. The computer readable storage medium includes a user list information processing program 10, and when executed by a processor, the user list information processing program 10 implements the following operations:
a1, acquiring a user list and performing character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings;
a2, inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name;
and A3, substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, and when the calculated cosine similarity value exceeds a preset threshold value, judging that the field names and the text strings in the classification groups are correctly matched, and determining all classification group information of the user list.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the specific implementation of the user list information processing method and the electronic device, and will not be described herein again.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A user list information processing method is applied to an electronic device and is characterized by comprising the following steps:
an acquisition step: acquiring a user list and performing character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings;
and (3) classification step: inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name;
a confirmation step: and substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list.
2. The method for processing the user list information according to claim 1, wherein the obtaining step further comprises:
and when the user list contains a plurality of user data, respectively carrying out character recognition on each user data to obtain text string information to be classified of each user.
3. The method of processing subscriber list information according to claim 1, wherein the training process of the classification model comprises:
acquiring sample text string information;
classifying sample field names and sample text strings according to the sample text string information, and labeling the sample text strings corresponding to the sample field names with attributes; and
and training a classification model according to the sample text string corresponding to the sample field name marked by the attribute to obtain a plurality of classification groups of the sample text string information, wherein each classification group comprises a sample field name and a sample text string corresponding to the sample field name.
4. The method of claim 1, wherein the cosine similarity formula is:
Figure FDA0002336446720000021
where a denotes a vector of a text string of a classification group, B denotes a vector of a field name of the classification group, Ai denotes each component of the vector a, and Bi denotes each component of the vector B.
5. The method according to any one of claims 1 to 4, wherein the method further comprises, after the step of confirming:
a generation step: and according to the determined classification group information of the user list, inputting the field names and the text strings in the classification group of the user list into a list system, and generating a list information table corresponding to the user list.
6. An electronic device, comprising: the device comprises a memory and a processor, wherein the memory stores a user list information processing program which can run on the processor, and the user list information processing program realizes the following steps when being executed by the processor:
an acquisition step: acquiring a user list and performing character recognition to obtain text string information to be classified, wherein the text string information to be classified comprises a plurality of field names and character strings;
and (3) classification step: inputting the obtained text string information to be classified into a pre-trained classification model, and outputting a plurality of classification groups of the text string information to be classified, wherein each classification group comprises a field name and a text string representing the field name;
a confirmation step: and substituting the field names and the text strings in each classification group into a cosine similarity formula, calculating the cosine similarity value of the field names and the text strings of each classification group, judging that the field names and the text strings in the classification groups are correctly matched when the calculated cosine similarity value exceeds a preset threshold value, and determining all classification group information of the user list.
7. The electronic device of claim 6, wherein the obtaining step further comprises:
and when the user list contains a plurality of user data, respectively carrying out character recognition on each user data to obtain text string information to be classified of each user.
8. The electronic device of claim 6, wherein the training process of the classification model comprises:
acquiring sample text string information;
classifying sample field names and sample text strings according to the sample text string information, and labeling the sample text strings corresponding to the sample field names with attributes; and
and training a classification model according to the sample text string corresponding to the sample field name marked by the attribute to obtain a plurality of classification groups of the sample text string information, wherein each classification group comprises a sample field name and a sample text string corresponding to the sample field name.
9. The electronic device of claim 6, wherein the cosine similarity formula is:
Figure FDA0002336446720000031
where a denotes a vector of a text string of a classification group, B denotes a vector of a field name of the classification group, Ai denotes each component of the vector a, and Bi denotes each component of the vector B.
10. A computer-readable storage medium, on which a user list information processing program is stored, the user list information processing program being executable by one or more processors to implement the steps of the user list information processing method according to any one of claims 1 to 5.
CN201911388144.8A 2019-12-25 2019-12-25 User list information processing method, electronic device and computer readable storage medium Pending CN111177387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911388144.8A CN111177387A (en) 2019-12-25 2019-12-25 User list information processing method, electronic device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911388144.8A CN111177387A (en) 2019-12-25 2019-12-25 User list information processing method, electronic device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111177387A true CN111177387A (en) 2020-05-19

Family

ID=70654182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911388144.8A Pending CN111177387A (en) 2019-12-25 2019-12-25 User list information processing method, electronic device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111177387A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288039A (en) * 2020-11-26 2021-01-29 深源恒际科技有限公司 Sample labeling method and system for OCR model training
CN112801099A (en) * 2020-06-02 2021-05-14 腾讯科技(深圳)有限公司 Image processing method, device, terminal equipment and medium
CN112801099B (en) * 2020-06-02 2024-05-24 腾讯科技(深圳)有限公司 Image processing method, device, terminal equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801099A (en) * 2020-06-02 2021-05-14 腾讯科技(深圳)有限公司 Image processing method, device, terminal equipment and medium
CN112801099B (en) * 2020-06-02 2024-05-24 腾讯科技(深圳)有限公司 Image processing method, device, terminal equipment and medium
CN112288039A (en) * 2020-11-26 2021-01-29 深源恒际科技有限公司 Sample labeling method and system for OCR model training
CN112288039B (en) * 2020-11-26 2024-01-23 深源恒际科技有限公司 Sample labeling method and system for OCR model training

Similar Documents

Publication Publication Date Title
KR102171220B1 (en) Character recognition method, device, server and storage medium of claim documents
CN109471857B (en) SQL statement-based data modification method, device and storage medium
CN112016273A (en) Document directory generation method and device, electronic equipment and readable storage medium
US10783325B1 (en) Visual data mapping
CN111258799A (en) Error reporting information processing method, electronic device and computer readable storage medium
CN111553556A (en) Business data analysis method and device, computer equipment and storage medium
CN108921193B (en) Picture input method, server and computer storage medium
CN112396048B (en) Picture information extraction method and device, computer equipment and storage medium
CN112464927B (en) Information extraction method, device and system
CN117033249A (en) Test case generation method and device, computer equipment and storage medium
CN111177387A (en) User list information processing method, electronic device and computer readable storage medium
CN112163409A (en) Similar document detection method, system, terminal device and computer readable storage medium
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN109189728B (en) Intelligent hardware device, magnetic disk data processing method and storage medium
CN111444235A (en) Django-based data serialization method and device, computer equipment and storage medium
CN112364857A (en) Image recognition method and device based on numerical extraction and storage medium
CN117133006A (en) Document verification method and device, computer equipment and storage medium
CN114330240A (en) PDF document analysis method and device, computer equipment and storage medium
CN114120347A (en) Form verification method and device, electronic equipment and storage medium
CN113064984A (en) Intention recognition method and device, electronic equipment and readable storage medium
CN112966671A (en) Contract detection method and device, electronic equipment and storage medium
CN111695441A (en) Image document processing method, device and computer readable storage medium
CN114969385B (en) Knowledge graph optimization method and device based on document attribute assignment entity weight
CN110334596B (en) Invoice picture summarizing method, electronic device and readable storage medium
CN116933733A (en) Text input display method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination