CN110781227A - Information processing method and device - Google Patents

Information processing method and device Download PDF

Info

Publication number
CN110781227A
CN110781227A CN201911044241.5A CN201911044241A CN110781227A CN 110781227 A CN110781227 A CN 110781227A CN 201911044241 A CN201911044241 A CN 201911044241A CN 110781227 A CN110781227 A CN 110781227A
Authority
CN
China
Prior art keywords
information
similarity
node
database
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911044241.5A
Other languages
Chinese (zh)
Other versions
CN110781227B (en
Inventor
李亚梦
王泽林
叶晓斌
刘永生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201911044241.5A priority Critical patent/CN110781227B/en
Publication of CN110781227A publication Critical patent/CN110781227A/en
Application granted granted Critical
Publication of CN110781227B publication Critical patent/CN110781227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an information processing method and device, relates to the technical field of computers, and is used for determining target characteristic information of each piece of information in an information database, and the method comprises the following steps: acquiring current characteristic information of each piece of information in an information database; calculating the similarity between any two information in the information database to obtain at least one similarity; for the first information in the information database, the target characteristic information of the first information is determined according to the obtained current characteristic information and at least one similarity, and finally the current characteristic information of the first information is updated to the target characteristic information of the first information, so that the error characteristic information is effectively reduced.

Description

Information processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an information processing method and apparatus.
Background
In natural language processing technology, it is usually necessary to tag data in order to understand the semantics of the data. The data carrying the label can be used for training models such as neural networks. The correctness of the label is crucial to the training of the model.
The data used to train the model often includes error labels. No effective model can be trained using such data. Therefore, how to rapidly and effectively reduce the error tags is a problem which needs to be solved urgently.
Disclosure of Invention
The embodiment of the invention provides an information processing method and device, which are used for quickly and effectively reducing error labels in data.
In order to achieve the purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an information processing method is provided, including: firstly, acquiring current characteristic information of each piece of information in an information database; then calculating the similarity between any two information in the information database to obtain at least one similarity; then, for the first information in the information database, determining the target characteristic information of the first information according to the obtained current characteristic information and at least one similarity, and finally updating the current characteristic information of the first information into the target characteristic information of the first information; the first information is any one of information in an information database.
It can be seen that, in the embodiment of the present invention, the information processing apparatus determines the target feature information of each piece of information in the information database according to the current feature information of each piece of information in the database and the similarity between any two pieces of information in the information database. Compared with the prior art, in the scheme provided by the embodiment of the invention, the target characteristic information of each piece of information in the information database is determined by calculating the similarity between any two pieces of information, so that the embodiment of the invention can quickly and accurately determine the characteristic information of each piece of information in the information database, and effectively reduces the error characteristic information.
In a second aspect, there is provided an information processing apparatus comprising: the device comprises an acquisition unit, a calculation unit and a processing unit; the acquisition unit is used for acquiring the current characteristic information of each piece of information in the information database; the calculating unit is used for calculating the similarity between any two pieces of information in the information database acquired by the acquiring unit to obtain at least one similarity; for the first information in the information database, the processing unit is configured to determine target feature information of the first information according to the current feature information acquired by the acquisition unit and at least one similarity calculated by the calculation unit, and update the current feature information of the first information to the target feature information of the first information; the first information is any one of information in the information database.
In a third aspect, an information processing apparatus is provided, including a memory and a processor; the memory is used for storing computer execution instructions, and the processor is connected with the memory through a bus; when the information processing apparatus is operating, the processor executes computer-executable instructions stored in the memory to cause the information processing apparatus to perform the information processing method according to the first aspect.
The information processing apparatus may be a network device, or may be a part of an apparatus in the network device, such as a system on chip in the network device. The system on chip is configured to support the network device to implement the functions involved in the first aspect and any one of the possible implementations thereof, for example, to receive, determine, and offload data and/or information involved in the information processing method. The chip system includes a chip and may also include other discrete devices or circuit structures.
In a fourth aspect, a computer storage medium is provided, which includes computer executable instructions, which when executed on a computer, cause the computer to perform the information processing method of the first aspect.
In a fifth aspect, there is also provided a computer program product comprising computer instructions which, when run on an information processing apparatus, cause the information processing apparatus to perform the information processing method according to the first aspect described above.
It should be noted that all or part of the computer instructions may be stored on the first computer storage medium. The first computer storage medium may be packaged together with the processor of the information processing apparatus, or may be packaged separately from the processor of the information processing apparatus, which is not limited in this embodiment of the present invention.
For the description of the second, third, fourth and fifth aspects of the present invention, reference may be made to the detailed description of the first aspect; in addition, for the beneficial effects of the second aspect, the third aspect, the fourth aspect and the fifth aspect, reference may be made to the beneficial effect analysis of the first aspect, and details are not repeated here.
In the embodiment of the present invention, the names of the above-mentioned information processing apparatuses do not limit the devices or the functional modules themselves, and in actual implementation, the devices or the functional modules may appear by other names. Insofar as the functions of the respective devices or functional blocks are similar to those of the present invention, they are within the scope of the claims of the present invention and their equivalents.
These and other aspects of the invention will be more readily apparent from the following description.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of an information processing apparatus according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of an information processing method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a graph model of an information database according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating a graph model for updating an information database according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating another information processing method according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating an information processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
For the convenience of clearly describing the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", and the like are used for distinguishing the same items or similar items with basically the same functions and actions, and those skilled in the art can understand that the words "first", "second", and the like are not limited in number or execution order.
The embodiment of the invention provides an information processing method and device. The information processing device acquires the current characteristic information of each piece of information in the information database, calculates the similarity between any two pieces of information in the information database, and subsequently determines the target characteristic information of each piece of information according to the acquired current characteristic information and the calculated similarity. Therefore, the information processing device can quickly and accurately determine the characteristic information of each piece of information in the information database, and effectively reduces the error characteristic information.
The information processing apparatus may be a device for processing information, a chip in the device, or a system on a chip in the device.
Optionally, the device may be a physical machine, for example: desktop computers, also called desktop computers (desktop pcs), mobile phones, tablet computers, notebook computers, Ultra-mobile Personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), and other terminal devices.
Optionally, the information processing apparatus may also implement a function to be implemented by the information processing apparatus through a Virtual Machine (VM) deployed on a physical machine.
For ease of understanding, the structure of the information processing apparatus in the embodiment of the present invention will now be described.
Fig. 1 is a schematic diagram of a hardware structure of an information processing apparatus according to an embodiment of the present invention. The information processing apparatus includes a processor 21, a memory 22, a communication interface 23, and a bus 24. The processor 21, the memory 22 and the communication interface 23 may be connected by a bus 24.
The processor 21 is a control center of the information processing apparatus, and may be a single processor or a collective term for a plurality of processing elements. For example, the processor 21 may be a Central Processing Unit (CPU), other general-purpose processors, or the like. Wherein a general purpose processor may be a microprocessor or any conventional processor or the like.
For one embodiment, processor 21 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 1.
The memory 22 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
In a possible implementation, the memory 22 may exist separately from the processor 21, and the memory 22 may be connected to the processor 21 via a bus 24 for storing instructions or program codes. The processor 21 can implement the information processing method provided by the following embodiments of the present invention when it calls and executes the instructions or program codes stored in the memory 22.
In the embodiment of the present invention, the processing module 11, the mapping module 12, the training module 13, and the updating module 14 have different functions because the software programs stored in the memory 22 are different. The functions performed by the devices will be described in connection with the following flow charts.
In another possible implementation, the memory 22 may also be integrated with the processor 21.
The communication interface 23 is used for connecting the information processing apparatus and other devices through a communication network, where the communication network may be an ethernet, a radio access network, a Wireless Local Area Network (WLAN), or the like. The communication interface 23 may include a receiving unit for receiving data, and a transmitting unit for transmitting data.
The bus 24 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended ISA (enhanced industry standard architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 1, but it is not intended that there be only one bus or one type of bus.
It is to be noted that the configuration shown in fig. 1 does not constitute a limitation of the information processing apparatus, and the information processing apparatus may include more or less components than those shown in fig. 1, or combine some components, or a different arrangement of components, in addition to the components shown in fig. 1.
Fig. 2 is a schematic flow chart of an information processing method according to an embodiment of the present invention. In the embodiment of the present invention, the information processing apparatus takes a computer as an example, and the information processing method is described. All of the steps described below may be performed by a computer. The information processing method includes: S301-S304;
s301, the computer acquires the current characteristic information of each piece of information in the information database.
In a computer, at least one information database is included, each information database including a plurality of information, each information configured with one or more tags (e.g., labels for textual information). Wherein the mark is used to identify the information so that the mark can simply express the meaning of the information. For example: the information E is: i arrived in the first Beijing of China. Since the information E mainly wants to express the meaning that i am to beijing, a flag may be configured for the information E: beijing. Since the first country in china is beijing, the second label may also be configured for information E: first, the whole process is carried out.
In an embodiment of the invention, the computer converts the tag of each piece of information in the information database into corresponding characteristic information. Optionally, the computer may convert the tag of each piece of information in the information database through a one-hot code (one-hot code) algorithm, or may convert the tag of each piece of information in the information database through another algorithm, which is not limited herein. The one-hot code algorithm is used for converting the mark of each information in the information database into a 01 vector, wherein the 01 vector is the characteristic information corresponding to the mark of the information.
The embodiment of the invention refers to the feature information determined for the first time or the first time by the computer as the initial feature information. That is, the initial feature information of a certain piece of information is the most original feature information of the piece of information.
In the information database, a mark corresponding to the characteristic information of a certain information may accurately express the meaning of the information and may also generate ambiguity on the meaning of the information. For example: the information E is: i arrived in the first Beijing of China. If the mark corresponding to the feature information of the information E is "beijing", the information E can be understood as: the first city of China. At this time, the label "beijing" can accurately express the meaning of the information E. If the mark corresponding to the feature information of the information E is "capital", the information E can be understood as follows according to the "capital": the capital of other countries than china. Thus, the label "capital" cannot accurately represent the meaning of the information E.
Based on the information, the characteristic information of each piece of information in the information database can be kept unchanged (for example, when the mark can accurately express the meaning of the information, the characteristic information of the information can be kept unchanged), and can also be changed according to actual requirements. When the mark corresponding to the characteristic information of the information may make ambiguity on the meaning of the information, the method provided by the embodiment of the invention is adopted to update the characteristic information of the information, so that the mark corresponding to the updated characteristic information of the information accurately expresses the meaning of the information.
Since the feature information of each piece of information in the information database may change, in this embodiment, the current feature information of the piece of information is used to represent the feature information of the piece of information in a certain period of time. The current feature information of the information may specifically be initial feature information, feature information obtained after updating the initial feature information once, or feature information obtained after updating the initial feature information multiple times. In a previous time period (if the current feature information is the initial feature information, the previous time period does not exist) or a subsequent time period of the time period, the feature information of the information may be the same as or different from the current feature information.
The information related to the embodiment of the present invention may be text information, picture information, voice information, or the like, and is not limited herein. Correspondingly, the information database can be a text database, a picture database or a voice database.
S302, the computer calculates the similarity between any two pieces of information in the information database to obtain at least one similarity.
The processing modes (the mode of calculating the similarity) of any two pieces of information by the computer are the same. For convenience of description, the first information and the second information are mainly used as an example for explanation. The first information and the second information are any two different information in the information database.
Optionally, for example, the information is text information, and correspondingly, the information database is a text database. Computer determines target word marks of first text information respectively
Figure BDA0002253708550000071
And target word mark of second text information And according to the formula
Figure BDA0002253708550000073
A similarity S between the first text information and the second text information is determined.
In particular toIn determining the target word label of the first text information
Figure BDA0002253708550000074
Firstly, the computer carries out word segmentation processing on the first text information according to a second preset algorithm (for example, a computer programming language Python) to obtain a word segmentation list. Then, the computer filters the word segmentation list to remove stop words similar to the words in the word segmentation list, and a target word segmentation list is obtained, so that the words in the target word segmentation list can fully express the meaning of the first text information. Then, the computer determines the word frequency of each word in the target word segmentation list of the first text information according to a third preset algorithm (for example, term frequency-inverse text frequency index algorithm, TF-IDF)). Finally, the computer selects the word with the highest word frequency as the target word of the first text information and converts the target word into the target word mark
Figure BDA0002253708550000075
The computer may use the above "determine target word tokens
Figure BDA0002253708550000076
"mode, determining target word label of second text information
Figure BDA0002253708550000077
And S303, the computer determines the target characteristic information of the first information according to the acquired current characteristic information and the at least one similarity.
After determining at least one similarity of the first information, the computer obtains at least one similar weight value of the first information according to the at least one similarity, and determines target feature information of the first information according to the obtained at least one similar weight value and the current feature information of the first information obtained in S301.
Specifically, the computer determines a first set of first information and a second set of second information. Thereafter, the computer determines whether the first set includes the second information and whether the second set includes the first information. If the first set comprises the second information and the second set comprises the first information, the computer determines that the relationship level of the first information and the second information is a first level, and the similarity weight value between the first information and the second information is the similarity between the first information and the second information. If the first set does not include the second information, or the second set does not include the first information, the computer determines that the relationship level of the first information and the second information is a second level, and the similarity weight value between the first information and the second information is zero.
The first set may include m first candidate information, where the m first candidate information includes: arranging information corresponding to the first similarity of the first m bits in the sequence from big to small; the first similarity is a similarity corresponding to the first information in the at least one similarity, the first candidate information does not include the first information, and m is a positive integer.
The computer may determine the second set in the manner described above for "determining the first set".
Illustratively, when the information database comprises four information, namely information E, information F, information G and information H, and the first set is a set corresponding to the information E, the similarity E1 between the information E and the information F is obtained, the similarity E2 between the information E and the information G is obtained, and the similarity E3 between the information E and the information H is obtained. Then, the similarity E1, the similarity E2, and the similarity E3 are sorted in descending order. When m is 2, the information corresponding to the similarity of the first 2 bits is taken. When E1 > E2 > E3, the information F corresponding to the similarity E1 and the information G corresponding to the similarity E2 are taken, and the set formed by the information F corresponding to the similarity E1 and the information G corresponding to the similarity E is the first set corresponding to the information E. When the second set is the set corresponding to the information F, a similarity F1 between the information F and the information E is obtained, a similarity F2 between the information F and the information G is obtained, and a similarity F3 between the information F and the information H is obtained. Then, the similarity F1, the similarity F2, and the similarity F3 are sorted in descending order, and since m is 2, information corresponding to the similarity of the first 2 bits is taken. When F2 is greater than F1 is greater than F3, the information G corresponding to the similarity F2 and the information E corresponding to the similarity F1 are taken, and the set formed by the information G and the information E is the second set corresponding to the information F. When the second set comprises information E and information G, and the first set comprises information F and information G, determining that the information E and the information F are both in sets corresponding to each other, namely that the relationship level of the information E and the information F is the first level; otherwise, the relation level of the information E and the information F is determined to be the second level.
The first level and the second level are each used to indicate a correlation between the first information and the second information, and the second level is lower than the first level.
Illustratively, the relationship between the first information and the second information is divided into "related" and "unrelated", where "related" may represent a first level and "unrelated" may represent a second level. When the first set includes the second information and the second set includes the first information, then the relationship between the first information and the second information is "related". Conversely, the relationship between the first information and the second information is "irrelevant".
After determining at least one similar weight value, the computer establishes a graph model for representing the similarity between any two pieces of information in the information database according to the at least one similar weight value and the current characteristic information of the first information. Subsequently, the computer may determine target feature information of the first information according to an energy function of the graph model.
Specifically, the computer represents each piece of information in the information database by using a node, and connects the nodes by straight lines to generate the graph model. Because the nodes corresponding to any two pieces of information with the similar weight values not being zero are connected, and the nodes corresponding to any two pieces of information with the similar weight values being zero are not connected, the graph model can be used for representing the similarity between any two pieces of information.
Exemplarily, as shown in fig. 3, a schematic structural diagram of a graph model of an information database provided in an embodiment of the present invention is shown, where the information database includes information E, information F, information G, and information H, and corresponds to four nodes, i.e., a node E, a node F, a node G, and a node H, respectively, and each node includes initial feature information corresponding to the node: e0, f0, g0 and h 0. The weight values of the information E and the information F and the information G are not zero, the weight values of the information F and the information G and the information H are not zero, and the weight values of the information G and the information H are not zero. Thus, node E is connected to node F and node G, respectively, node F is connected to node G and node H, respectively, and node G is connected to node H. Namely, the adjacent nodes of the node E are the node F and the node G, the adjacent nodes of the node F are the node E, the node G and the node H, the adjacent nodes of the node G are the node E, the node F and the node H, and the adjacent nodes of the node H are the node F and the node G.
The energy function of the graph model can be expressed by the following formula:
wherein i and j respectively represent a node in the graph model, j is an adjacent node of i, and X iCurrent characteristic information, Y, of information corresponding to node i iTarget characteristic information, Y, of information corresponding to node i jCurrent characteristic information, Y, of information corresponding to node j sThe set of all neighboring nodes representing node i, NA representing a particular node in the graph model, Y NAProbability that the current characteristic information of the information corresponding to the node i, the current characteristic information of the information corresponding to the similar node not belonging to any of the nodes i belongs to the category, W ijRepresenting a similar weight value between node i and node j, Y i TRepresents Y iThe transpose of (A) is α is more than or equal to 0, and β is more than or equal to 0.
Taking the information as text information as an example, correspondingly, the information database is a text database. The computer may divide the text information in the text database into a plurality of categories. For example: city class, sports class, and movies class, etc. When a certain text message in the text database is "i come to the first Beijing beautiful Chinese, the computer can determine that the category of the text message belongs to the city class according to the current characteristic information of the text message. In the graph model, when the current feature information corresponding to the similar nodes of one node is of the A type, the node is in the A typeThe category of the current feature information corresponding to the node is likely to belong to category a. When the categories of the current feature information corresponding to the similar nodes of a node are uniformly distributed among the categories of the a category, the B category and the C category, the categories of the current feature information corresponding to the node may not belong to a certain category, but: and others. In a specific text classification task, for example, when training categories such as a city category, a movie category, and a sport category, it is highly likely that current feature information corresponding to a certain text information does not belong to any category, but should be classified into other categories. In the examples of the present invention, Y NAAnd the probability that the current characteristic information of the text information corresponding to the node i does not belong to the category of the current characteristic information of the text information corresponding to any similar node of the node i.
In the graph model, E represents an energy function, the energy function is used for representing the stability of the graph model, the lower the energy function is, the more stable the graph model is, the most stable state of the graph model can be obtained by solving the condition that the energy function is the lowest, and when the graph model is in the most stable state, the target characteristic information corresponding to the nodes in the graph model is not changed. I.e. when the energy function E takes a minimum value, Y iAnd target characteristic information of the information corresponding to the node i. For the first information in the information database, when i is 1, Y 1The target characteristic information is the first information.
The computer may determine the target characteristic information of each information in the information database in the manner of "determining the target characteristic information of the first information" described above.
S304, the computer updates the current characteristic information of the first information into the target characteristic information of the first information.
After determining the target characteristic information of the first information, the computer updates the current characteristic information of the first information in the information database into the target characteristic information so as to update the error label of the first information in the information database.
Of course, for each information in the information database, the method of S303-S304 can be used to determine the target feature information.
To further improve accuracy, the computer may repeatedly perform the above operations to determine the target feature information of each information a plurality of times. Taking the first information as an example, the computer repeatedly executes the above steps until the target feature information of the first information determined the nth time and the target feature information of the first information determined the (n-1) th time (for the nth time, the target feature information is the current feature information) are the same. n is a natural number greater than 1.
Exemplarily, as shown in fig. 4, a schematic flowchart of a graph model for updating an information database according to an embodiment of the present invention is provided to update the graph model of the information database shown in fig. 3. And the first updating, namely determining the feature information E1, F1, G1 and H1 after the first updating of the information E, the information F, the information G and the information H according to the initial feature information E0, F0, G0 and H0 of the information E, the information F, the information G and the information H and the similarity among the information E, the information F, the information G and the information H. And repeating the steps until the n-1 th updating, so as to obtain the characteristic information E (n-1), F (n-1), G (n-1) and H (n-1) after the n-1 th updating of the information E, the information F, the information G and the information H. And updating for the nth time to obtain the characteristic information en, fn, gn and hn of the information E, the information F, the information G and the information H after updating for the nth time. When en is equal to E (n-1), fn is equal to F (n-1), gn is equal to g (n-1), and fn is equal to F (n-1), the feature information en after the nth update of the information E is determined as the target feature information corresponding to the information E, and the feature information fn after the nth update of the information F is determined as the target feature information corresponding to the information F. And as gn is not equal to G (n-1) and fn is not equal to f (n-1), continuing to update until the m-1 th update to obtain the characteristic information G (m-1) and H (m-1) of the information G and the information H after the m-1 th update. And (5) updating for the mth time to obtain the feature information gm and hm of the information G and the information H after the mth updating. And determining that the characteristic information gm after the mth update of the information G is the target characteristic information corresponding to the information G, and determining that the characteristic information hm after the mth update of the information H is the target characteristic information corresponding to the information H. And finishing the updating of the information database. Wherein m > n > 2, and m and n are integers.
In the embodiment of the invention, the information processing device determines the target characteristic information of each piece of information in the information database according to the current characteristic information of each piece of information in the database and the similarity between any two pieces of information in the information database. Compared with the prior art, in the scheme provided by the embodiment of the invention, the target characteristic information of each piece of information in the information database is determined by calculating the similarity between any two pieces of information, so that the embodiment of the invention can quickly and accurately determine the characteristic information of each piece of information in the information database, and effectively reduces the error characteristic information.
Optionally, with reference to fig. 2, as shown in fig. 5, S303 may be replaced by S600 to S605.
S600, the computer determines a first set according to at least one similarity of the first information.
S601, the computer determines a second set according to at least one similarity of the second information.
S602, the computer determines the relation level of the first information and the second information according to the first set and the second set.
S603, the computer determines the similar weight values of the first information and the second information according to the relation level of the first information and the second information.
S604, the computer establishes a graph model according to at least one similar weight value and the current characteristic information of each piece of information in the information database.
S605, the computer determines target characteristic information of the first information according to the energy function of the graph model.
The scheme provided by the embodiment of the invention is mainly introduced from the perspective of a method. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the present invention can be implemented in hardware or a combination of hardware and computer software, with the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The embodiment of the present invention may perform the division of the functional modules on the terminal according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Optionally, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 6 is a schematic structural diagram of an information processing apparatus (referred to as an information processing apparatus 700) according to an embodiment of the present invention. The information processing apparatus 700 may be used to execute the information processing method shown in fig. 2 or fig. 5. The information processing apparatus 700 shown in fig. 6 includes: an acquisition unit 701, a calculation unit 702, and a processing unit 703.
An obtaining unit 701 is configured to obtain current feature information of each piece of information in the information database. For example, in conjunction with fig. 2, the acquisition unit 701 may be configured to perform S301.
A calculating unit 702, configured to calculate a similarity between any two pieces of information in the information database acquired by the acquiring unit 701 to obtain at least one similarity. For example, in conjunction with fig. 2, the computing unit 702 may be configured to perform S302.
For the first information in the information database, the processing unit 703 is configured to determine target feature information of the first information according to the current feature information acquired by the acquisition unit 701 and at least one similarity calculated by the calculation unit 702, and update the current feature information of the first information to the target feature information of the first information; the first information is any one of information in the information database. For example, in conjunction with fig. 2, the processing unit 703 may be configured to perform S303 and S304.
Optionally, the processing unit 701 is specifically configured to: according to at least one similarity calculated by the calculating unit 702, determining a similar weight value between the first information and each piece of information except the first information in the information database to obtain at least one similar weight value; and determining target characteristic information of the first information according to the at least one similar weight value and the current characteristic information of each information in the information database.
Optionally, the processing unit 701 is specifically configured to: performing the following operations on each information except the first information in the information database to determine at least one similar weight value: for second information in the information database, determining a relationship level between the first information and the second information, wherein the second information is any one of the information except the first information in the information database, and the relationship level comprises: a first level and a second level; if the relation level of the first information and the second information is a first level, determining the similarity between the first information and the second information as a similarity weight value between the first information and the second information; and if the relation level of the first information and the second information is a second level, determining that the similar weight value between the first information and the second information is zero. For example, in conjunction with fig. 5, the processing unit 703 may be configured to execute S603.
Optionally, the processing unit 701 is specifically configured to: determining a first set, the first set comprising m first candidate information, the m first candidate information comprising: arranging information corresponding to the first similarity of the first m bits in the sequence from big to small; the first similarity is the similarity corresponding to the first information in at least one similarity, the first candidate information does not include the first information, and m is a positive integer; determining a second set, the second set comprising m second candidate information, the m second candidate information comprising: arranging information corresponding to the second similarity of the first m bits in the descending order; the second similarity is the similarity corresponding to the second information in the at least one similarity, and the second candidate information does not include the second information; if the first set comprises the second information and the second set comprises the first information, determining the relation level of the first information and the second information as a first level; otherwise, determining the relation level of the first information and the second information as a second level. For example, in conjunction with fig. 5, the processing unit 703 may be configured to perform S600, S601, and S602.
Optionally, the processing unit 701 is specifically configured to: establishing a graph model, wherein the graph model is used for representing the similarity between any two pieces of information in an information database; the graph model comprises a plurality of nodes, each node is used for representing one piece of information in the information database, and the nodes corresponding to any two pieces of information with similar weight values not equal to zero in the information database are connected; determining target characteristic information of the first information according to an energy function of the graph model; the energy function is:
Figure BDA0002253708550000131
wherein E represents an energy function, i and j each represent a node in the graph model, j is an adjacent node of i, and X iCurrent characteristic information, Y, representing information corresponding to node i iObject feature information, Y, representing information corresponding to node i jCurrent characteristic information, Y, representing information corresponding to node j sThe set of all neighboring nodes representing node i, NA representing a particular node in the graph model, Y NAProbability that the current characteristic information of the information corresponding to the node i, the current characteristic information of the information corresponding to the similar node not belonging to any of the nodes i belongs to the category, W ijRepresenting a similar weight value between node i and node j, Y i TRepresents Y iα ≧ 0, β ≧ 0 the processing unit 703 may be used to perform S604 and S605, for example, in conjunction with FIG. 5.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium includes computer execution instructions, and when the computer execution instructions run on a computer, the computer is enabled to execute the information processing method provided in the foregoing embodiment.
The embodiment of the present invention further provides a computer program, which can be directly loaded into the memory and contains a software code, and the computer program can be loaded and executed by the computer to implement the information processing method provided by the above embodiment.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical function division, and there may be other division ways in actual implementation. For example, various elements or components may be combined or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. Units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed to a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may be essentially or partially contributed to by the prior art, or all or part of the technical solution may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. An information processing method characterized by comprising:
acquiring current characteristic information of each piece of information in an information database;
calculating the similarity between any two pieces of information in the information database to obtain at least one similarity;
for first information in the information database, determining target feature information of the first information according to the acquired current feature information and the at least one similarity, and updating the current feature information of the first information into the target feature information of the first information; the first information is any one of the information in the information database.
2. The information processing method according to claim 1, wherein the determining, according to the obtained current feature information and the at least one similarity, target feature information of the first information includes:
according to the at least one similarity, determining a similar weight value between the first information and each piece of information except the first information in the information database to obtain at least one similar weight value;
and determining target characteristic information of the first information according to the at least one similar weight value and the current characteristic information of each information in the information database.
3. The information processing method according to claim 2, wherein the determining a similarity weight value between the first information and each piece of information in the information database other than the first information according to the at least one similarity degree, resulting in at least one similarity weight value, includes:
performing the following operations on each information except the first information in the information database to determine at least one similar weight value:
for second information in the information database, determining a relationship level between the first information and the second information, where the second information is any one of the information in the information database except the first information, and the relationship level includes: a first level and a second level;
if the relation level of the first information and the second information is the first level, determining the similarity between the first information and the second information as a similarity weight value between the first information and the second information;
and if the relation level of the first information and the second information is the second level, determining that the similar weight value between the first information and the second information is zero.
4. The information processing method according to claim 3, wherein the determining the relationship level of the first information and the second information includes:
determining a first set, the first set comprising m first candidate information, the m first candidate information comprising: arranging information corresponding to the first similarity of the first m bits in the sequence from big to small; the first similarity is a similarity corresponding to the first information in the at least one similarity, the first candidate information does not include the first information, and m is a positive integer;
determining a second set, the second set comprising m second candidate information, the m second candidate information comprising: arranging information corresponding to the second similarity of the first m bits in the descending order; the second similarity is a similarity corresponding to the second information in the at least one similarity, and the second candidate information does not include the second information;
if the first set comprises the second information and the second set comprises the first information, determining the relationship level of the first information and the second information as the first level; otherwise, determining the relationship level of the first information and the second information as the second level.
5. The information processing method according to any one of claims 2 to 4, wherein the determining the target feature information of the first information according to the at least one similar weight value and the current feature information of each information in the information database includes:
establishing a graph model, wherein the graph model is used for representing the similarity between any two pieces of information in the information database; the graph model comprises a plurality of nodes, each node is used for representing one piece of information in the information database, and any two pieces of information with the similar weight values not equal to zero in the information database are connected with each other;
determining target characteristic information of the first information according to an energy function of the graph model; the energy function is:
Figure FDA0002253708540000021
wherein E represents the energy function, i and j each represent a node in the graph model, j is an adjacent node of i, X iCurrent characteristic information, Y, representing information corresponding to the node i iTarget feature information, Y, indicating information corresponding to the node i jCurrent characteristic information, Y, representing information corresponding to the node j sA set of all neighboring nodes representing the node i, NA represents a particular node in the graph model, Y NAProbability of the category to which the current characteristic information of the information corresponding to the node i, the current characteristic information of the information corresponding to the similar node not belonging to any of the nodes i belongs, W ijRepresenting similar weight values between the node i and the node j,
Figure FDA0002253708540000022
represents said Y iThe transpose of (A) is α is more than or equal to 0, and β is more than or equal to 0.
6. An information processing apparatus characterized by comprising: the device comprises an acquisition unit, a calculation unit and a processing unit;
the acquisition unit is used for acquiring the current characteristic information of each piece of information in the information database;
the calculating unit is used for calculating the similarity between any two pieces of information in the information database acquired by the acquiring unit to obtain at least one similarity;
for the first information in the information database, the processing unit is configured to determine target feature information of the first information according to the current feature information acquired by the acquisition unit and the at least one similarity calculated by the calculation unit, and update the current feature information of the first information to the target feature information of the first information; the first information is any one of the information in the information database.
7. The information processing apparatus according to claim 6, wherein the processing unit is specifically configured to:
according to the at least one similarity, determining a similar weight value between the first information and each piece of information except the first information in the information database to obtain at least one similar weight value;
and determining target characteristic information of the first information according to the at least one similar weight value and the current characteristic information of each information in the information database.
8. The information processing apparatus according to claim 7, wherein the processing unit is specifically configured to:
performing the following operations on each information except the first information in the information database to determine at least one similar weight value:
for second information in the information database, determining a relationship level between the first information and the second information, where the second information is any one of the information in the information database except the first information, and the relationship level includes: a first level and a second level;
if the relation level of the first information and the second information is the first level, determining the similarity between the first information and the second information as a similarity weight value between the first information and the second information;
and if the relation level of the first information and the second information is the second level, determining that the similar weight value between the first information and the second information is zero.
9. The information processing apparatus according to claim 8, wherein the processing unit is specifically configured to:
determining a first set, the first set comprising m first candidate information, the m first candidate information comprising: arranging information corresponding to the first similarity of the first m bits in the sequence from big to small; the first similarity is a similarity corresponding to the first information in the at least one similarity, the first candidate information does not include the first information, and m is a positive integer;
determining a second set, the second set comprising m second candidate information, the m second candidate information comprising: arranging information corresponding to the second similarity of the first m bits in the descending order; the second similarity is a similarity corresponding to the second information in the at least one similarity, and the second candidate information does not include the second information;
if the first set comprises the second information and the second set comprises the first information, determining the relationship level of the first information and the second information as the first level; otherwise, determining the relationship level of the first information and the second information as the second level.
10. The information processing apparatus according to any one of claims 7 to 9, wherein the processing unit is specifically configured to:
establishing a graph model, wherein the graph model is used for representing the similarity between any two pieces of information in the information database; the graph model comprises a plurality of nodes, each node is used for representing one piece of information in the information database, and any two pieces of information with the similar weight values not equal to zero in the information database are connected with each other;
determining target characteristic information of the first information according to an energy function of the graph model; the energy function is:
Figure FDA0002253708540000041
wherein E represents the energy function, i and j each represent a node in the graph model, j is an adjacent node of i, X iCurrent characteristic information, Y, representing information corresponding to the node i iTarget characteristics representing information corresponding to the node iInformation, Y jCurrent characteristic information, Y, representing information corresponding to the node j sA set of all neighboring nodes representing the node i, NA represents a particular node in the graph model, Y NAProbability of the category to which the current characteristic information of the information corresponding to the node i, the current characteristic information of the information corresponding to the similar node not belonging to any of the nodes i belongs, W ijRepresenting similar weight values between the node i and the node j,
Figure FDA0002253708540000042
represents said Y iThe transpose of (A) is α is more than or equal to 0, and β is more than or equal to 0.
11. An information processing apparatus characterized by comprising a memory and a processor; the memory is used for storing computer execution instructions, and the processor is connected with the memory through a bus; when the information processing apparatus is operating, the processor executes the computer-executable instructions stored in the memory to cause the information processing apparatus to perform the information processing method according to any one of claims 1 to 5.
12. A computer storage medium characterized by comprising computer-executable instructions that, when executed on a computer, cause the computer to perform the information processing method according to any one of claims 1 to 5.
CN201911044241.5A 2019-10-30 2019-10-30 Information processing method and device Active CN110781227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911044241.5A CN110781227B (en) 2019-10-30 2019-10-30 Information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911044241.5A CN110781227B (en) 2019-10-30 2019-10-30 Information processing method and device

Publications (2)

Publication Number Publication Date
CN110781227A true CN110781227A (en) 2020-02-11
CN110781227B CN110781227B (en) 2022-07-08

Family

ID=69387786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911044241.5A Active CN110781227B (en) 2019-10-30 2019-10-30 Information processing method and device

Country Status (1)

Country Link
CN (1) CN110781227B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666442A (en) * 1993-05-23 1997-09-09 Infoglide Corporation Comparison system for identifying the degree of similarity between objects by rendering a numeric measure of closeness, the system including all available information complete with errors and inaccuracies
CN101751425A (en) * 2008-12-10 2010-06-23 北京大学 Method for acquiring document set abstracts and device
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming
CN109657129A (en) * 2018-12-26 2019-04-19 北京百度网讯科技有限公司 For obtaining the method and device of information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666442A (en) * 1993-05-23 1997-09-09 Infoglide Corporation Comparison system for identifying the degree of similarity between objects by rendering a numeric measure of closeness, the system including all available information complete with errors and inaccuracies
CN101751425A (en) * 2008-12-10 2010-06-23 北京大学 Method for acquiring document set abstracts and device
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming
CN109657129A (en) * 2018-12-26 2019-04-19 北京百度网讯科技有限公司 For obtaining the method and device of information

Also Published As

Publication number Publication date
CN110781227B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
WO2017215370A1 (en) Method and apparatus for constructing decision model, computer device and storage device
JP7430820B2 (en) Sorting model training method and device, electronic equipment, computer readable storage medium, computer program
CN112732870B (en) Word vector based search method, device, equipment and storage medium
CN113222942A (en) Training method of multi-label classification model and method for predicting labels
CN110874396B (en) Keyword extraction method and device and computer storage medium
EP4113376A1 (en) Image classification model training method and apparatus, computer device, and storage medium
CN115761339A (en) Image processing method, apparatus, device, medium, and program product
CN116204672A (en) Image recognition method, image recognition model training method, image recognition device, image recognition model training device, image recognition equipment, image recognition model training equipment and storage medium
CN113052246B (en) Method and related apparatus for training classification model and image classification
CN113592590A (en) User portrait generation method and device
CN109635004A (en) A kind of object factory providing method, device and the equipment of database
CN113033194A (en) Training method, device, equipment and storage medium of semantic representation graph model
CN110781227B (en) Information processing method and device
CN115062783B (en) Entity alignment method and related device, electronic equipment and storage medium
CN113032251B (en) Method, device and storage medium for determining service quality of application program
CN110688508B (en) Image-text data expansion method and device and electronic equipment
CN115878989A (en) Model training method, device and storage medium
CN112597208A (en) Enterprise name retrieval method, enterprise name retrieval device and terminal equipment
CN111858917A (en) Text classification method and device
CN111831130A (en) Input content recommendation method, terminal device and storage medium
CN116257760B (en) Data partitioning method, system, equipment and computer readable storage medium
CN116166961B (en) Super network model, training method of deep learning model and information recommendation method
KR102449831B1 (en) Electronic device for providing information regarding new text, server for identifying new text and operation method thereof
CN111783813A (en) Image evaluation method, image model training device, image model training equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant