CN117892137A - Information processing and large model training method, device, equipment and storage medium - Google Patents

Information processing and large model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN117892137A
CN117892137A CN202410149441.1A CN202410149441A CN117892137A CN 117892137 A CN117892137 A CN 117892137A CN 202410149441 A CN202410149441 A CN 202410149441A CN 117892137 A CN117892137 A CN 117892137A
Authority
CN
China
Prior art keywords
information
sample
tag
candidate
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410149441.1A
Other languages
Chinese (zh)
Inventor
李飞
范中吉
石磊
黄荣升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Shanghai Xiaodu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xiaodu Technology Co Ltd filed Critical Shanghai Xiaodu Technology Co Ltd
Priority to CN202410149441.1A priority Critical patent/CN117892137A/en
Publication of CN117892137A publication Critical patent/CN117892137A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The disclosure provides an information processing and large model training method, device, equipment and storage medium, relates to the technical field of computers, and particularly relates to the technical fields of artificial intelligence, large language models and the like. The specific implementation scheme is as follows: determining candidate labels corresponding to the input information from a hierarchical label system, wherein the candidate labels comprise at least one candidate aggregation label and at least two candidate initial labels; determining semantic description information corresponding to the candidate tag; generating information to be processed based on the input information, the candidate tag and the semantic description information; and determining a target label corresponding to the input information from at least two candidate initial labels according to the information to be processed. According to the method and the device, the hierarchical label system is built by utilizing initial label aggregation, the input information and the candidate labels in the hierarchical label system are processed by adopting a natural language processing technology, the target labels matched with the input information in the initial labels can be accurately predicted, and the information processing efficiency and the label prediction accuracy are ensured.

Description

Information processing and large model training method, device, equipment and storage medium
Technical Field
The disclosure relates to the technical field of computers, in particular to the technical fields of artificial intelligence, deep learning, large model generation and the like, and particularly relates to an information processing and large model training method, device, equipment and storage medium thereof.
Background
The large language model (LLM, large Language Model) is an artificial intelligence model, consisting of a neural network with many parameters (typically billions of weights or more), that can generate natural language text or understand the meaning of language text. The large language model can process various natural language tasks, such as text classification, question-answering, dialogue and the like, and is an important path to artificial intelligence.
Disclosure of Invention
The disclosure provides an information processing and training method, device, equipment and storage medium of a large model thereof, and improves the accuracy of information processing.
According to a first aspect of the present disclosure, there is provided an information processing method including:
determining candidate labels corresponding to input information from a hierarchical label system, wherein the hierarchical label system comprises n layers of labels, the 1 st layer of labels are initial labels, the other layers of labels are aggregation labels, n is an integer greater than 1, and the candidate labels comprise at least one candidate aggregation label and at least two candidate initial labels;
Determining semantic description information corresponding to the candidate tag;
generating information to be processed based on the input information, the candidate tag and the semantic description information;
and determining a target label corresponding to the input information from at least two candidate initial labels according to the information to be processed.
According to a second aspect of the present disclosure, there is provided a training method of an information processing large model, including:
determining a sample label corresponding to sample information from a hierarchical label system, wherein the hierarchical label system comprises n layers of labels, the 1 st layer of labels are initial labels, the other layers of labels are aggregation labels, n is an integer greater than 1, and the sample label comprises at least one sample aggregation label and one sample initial label;
determining sample description information corresponding to the sample label;
generating sample data based on the sample information, the sample tag and the sample description information;
and training the initial large model by taking sample data as input and a sample initial label as expected output to obtain the information processing large model.
According to a third aspect of the present disclosure, there is provided an information processing apparatus including:
the first determining module is configured to determine candidate tags corresponding to input information from a hierarchical tag system, wherein the hierarchical tag system comprises n layers of tags, the 1 st layer of tags are initial tags, the other layers of tags are aggregation tags, n is an integer greater than 1, and the candidate tags comprise at least one candidate aggregation tag and at least two candidate initial tags;
The second determining module is configured to determine semantic description information corresponding to the candidate tag;
the first generation module is configured to generate information to be processed based on the input information, the candidate labels and the semantic description information;
and the third determining module is configured to determine a target tag corresponding to the input information from at least two candidate initial tags according to the information to be processed.
According to a fourth aspect of the present disclosure, there is provided a training apparatus of an information processing large model, comprising:
a fourth determining module configured to determine a sample tag corresponding to the sample information from a hierarchical tag system, where the hierarchical tag system includes n layers of tags, a 1 st layer tag is an initial tag, other layers of tags are aggregation tags, n is an integer greater than 1, and the sample tag includes at least one sample aggregation tag and at least two sample initial tags;
a fifth determining module configured to determine sample description information corresponding to the sample tag;
a second generation module configured to generate sample data based on the sample information, the sample tag, and the sample description information;
the training module is configured to train the initial large model by taking sample data as input and sample initial labels as expected output to obtain the information processing large model.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method provided in the first aspect or the second aspect.
According to a sixth aspect of the present disclosure there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as provided in the first or second aspect.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided according to the first or second aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram to which the information processing methods of the present disclosure may be applied;
FIG. 2 is a schematic diagram of a first embodiment of an information processing method according to the present disclosure;
FIG. 3 is a schematic diagram of a second embodiment of an information processing method according to the present disclosure;
FIG. 4 is a schematic diagram of a third embodiment of an information processing method according to the present disclosure;
FIG. 5 is a schematic diagram of a first embodiment of a training method for an information processing large model according to the present disclosure;
FIG. 6 is a schematic structural view of one embodiment of an information processing apparatus according to the present disclosure;
FIG. 7 is a schematic diagram of a structure of one embodiment of a training device of an information processing large model according to the present disclosure;
fig. 8 is a block diagram of an electronic device used to implement an information processing method or a training method of an information processing large model of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The large language model (LLM, large Language Model) is an artificial intelligence model consisting of a neural network with many parameters (typically billions of weights or more), training a large number of unlabeled text using self-supervised learning or semi-supervised learning, can generate natural language text or understand the meaning of language text. The large language model can process various natural language tasks, such as text classification, question-answering, dialogue and the like, and is an important path to artificial intelligence.
In the information processing method, a corresponding hierarchical label system is constructed based on an initial label set, then input information and candidate labels in the hierarchical label system are analyzed and processed by using a natural language processing technology of a generated large language model, target labels in the initial label set can be accurately predicted for the input information, and accuracy of information processing and label prediction is improved.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations.
FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the information processing methods or apparatus of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include a terminal device 101, a network 102, and a server 103. The network 102 is used to provide a communication link between the terminal device 101 and the server 103, and may include various connection types, for example, a wired communication link, a wireless communication link, or an optical fiber cable, etc.
A user can interact with the server 103 through the network 102 using the terminal device 101 to receive or transmit information or the like. Various client applications, such as social applications, human-computer interaction applications, leisure or entertainment applications, etc., may be installed on the terminal device 101.
The terminal device 101 may be hardware or software. When the terminal device 101 is hardware, it may be a variety of electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, and desktop computers, among others. When the terminal apparatus 101 is software, it may be installed in the above-described electronic apparatus. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.
The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module. The present invention is not particularly limited herein.
The information processing method provided by the embodiments of the present disclosure is generally executed by the server 103, and accordingly, the information processing apparatus is generally provided in the server 103.
It should be noted that the numbers of the terminal device 101, the network 102, and the server 103 in fig. 1 are merely illustrative. There may be any number of terminal devices 101, networks 102, and servers 103, as desired for implementation.
Fig. 2 shows a flow 200 of one embodiment of an information processing method according to the present disclosure, with reference to fig. 2, comprising the steps of:
step S201, determining candidate labels corresponding to the input information from the hierarchical label system.
In the embodiment of the present disclosure, an execution subject of the information processing method, for example, the server 103 shown in fig. 1, determines, from the hierarchical tag system, a candidate tag corresponding to the input information according to the received input information.
The hierarchical label system in the embodiment of the disclosure includes n layers of labels, n is an integer greater than 1, where the 1 st layer of labels is an initial label, and other layers of labels are aggregated labels, for example, the other layers of labels are obtained according to the initial label aggregation.
Illustratively, labels outside the 1 st layer in the hierarchical label system are all obtained by layer aggregation on the basis of the initial label.
In some optional implementations of embodiments of the present disclosure, the x-th layer tags are polymerized based on x-1-th layer tags, where x is an integer greater than 1 and less than or equal to n.
For example, layer 2 tags are based on layer 1 initial tag aggregation, layer 3 tags are based on layer 2 tag aggregation, and so on.
Illustratively, the x-layer aggregation tags may be aggregated based on at least one of semantic similarity, semantic association, etc. of the x-1 layer tags. For example, the x-1 layer labels with semantic relevance greater than a preset relevance threshold are aggregated into an x layer aggregation label. For another example, more than two x-1 layer labels with semantic similarity greater than a preset similarity threshold and semantic relevance greater than a preset relevance threshold may be aggregated into one x-layer label.
In some alternative implementations, there is no semantic overlap between the aggregated tags of each layer. For example, relative to any two aggregate labels in layer 2 labels, the corresponding initial labels in layer 1 do not overlap.
In an embodiment of the present disclosure, the candidate tags include at least one candidate aggregate tag and at least two candidate initial tags.
According to the embodiment of the disclosure, through the hierarchical relation among the labels in the hierarchical label system, the auxiliary execution body precisely matches the initial labels for the input information, so that the candidate labels determined for the input information at least comprise one candidate aggregation label and at least two candidate initial labels.
For example, after receiving the input information, the executing body may combine the semantic information of the input information, determine at least one candidate aggregation tag according to the order of the hierarchical structure from high to low in the hierarchical tag system, and determine at least two candidate initial tags under the corresponding aggregation tags according to the candidate aggregation tags.
For example, the executing body determines at least one candidate aggregation tag from the n-th layer tag according to the received or acquired input information, and then determines at least one candidate aggregation tag from the n-1-th layer tag until at least two candidate initial tags are determined from the 1-th layer tag.
Illustratively, the candidate aggregated tags are aggregated based at least on the at least two candidate initial tags, thereby ensuring the screening assistance of the candidate aggregated tags to the candidate initial tags.
Step S202, determining semantic description information corresponding to the candidate labels.
In the embodiment of the present disclosure, the execution subject of the information processing method, such as the server 103 shown in fig. 1, determines, for the candidate tags determined in step S201, semantic description information corresponding to each candidate tag.
The semantic description information corresponding to the candidate tag can be description information capable of performing grammatical interpretation and semantic interpretation on the candidate tag.
In some alternative implementations, each layer of tags in the hierarchy of tags includes an aggregate tag or an initial tag in the hierarchy, and may also include semantic description information of the aggregate tag or the initial tag. Therefore, after the candidate labels corresponding to the input information are determined, the semantic description information corresponding to each candidate label can be obtained from the hierarchical label system according to the candidate labels.
In some alternative implementations, the execution body may generate corresponding tag description information for each aggregate tag and initial tag in the hierarchical tag system, and perform association record with the hierarchical tag system. Illustratively, the recording is performed in a preset mapping relationship. Then, after determining the candidate labels corresponding to the input information, the label description information corresponding to each candidate label can be further determined according to the preset mapping relation and used as the corresponding semantic description information.
In some alternative implementations, the execution body may also determine corresponding candidate tags for the input information first, and then generate corresponding semantic description information for each candidate tag separately. For example, corresponding semantic description information may be generated for each candidate tag by a pre-trained neural network model.
In some optional implementations of embodiments of the present disclosure, determining semantic description information corresponding to a candidate tag includes: and inputting the candidate labels into a pre-trained description model to obtain semantic description information corresponding to the candidate labels.
In this implementation manner, the executing body inputs the candidate tag determined in step S201 into a pre-trained description model, and the output of the description model is the semantic description information corresponding to the candidate tag. Wherein the pre-trained description model has semantic generation capabilities.
The description model may be a deep-learning semantic analysis neural network model, or a generative large language model with natural language processing capability, for example.
In the embodiment of the disclosure, the execution body generates the corresponding semantic description information for each candidate tag by utilizing the semantic analysis capability or the natural language processing capability of the pre-trained description model, so that the accuracy of the semantic description information, namely, the description accuracy of the semantic description information on the candidate tag, can be effectively ensured.
Step S203, generating information to be processed based on the input information, the candidate tag, and the semantic description information.
In the embodiment of the present disclosure, an execution subject of the information processing method, such as the server 103 shown in fig. 1, generates information to be processed based on the received input information, the candidate tags corresponding to the input information, and the semantic description information corresponding to each candidate tag.
The executing body carries out association record and storage on the input information, the candidate labels and semantic description information corresponding to the candidate labels to obtain information to be processed. For example, candidate labels and corresponding semantic description information are added in front of input information, and association recording is carried out to obtain information to be processed.
In some optional implementations of embodiments of the present disclosure, generating the information to be processed based on the input information, the candidate tag, and the semantic description information includes: and carrying out association record on the candidate labels, the semantic description information and the input information based on the hierarchical order of the candidate labels in the hierarchical label system to obtain information to be processed.
In the implementation mode, the execution main body arranges each candidate label and the corresponding semantic description information according to the hierarchical order of each candidate label in the hierarchical label system, and carries out association record with the input information to obtain the information to be processed.
Illustratively, the executing body records each candidate tag and the corresponding semantic description information thereof in front of the input information in an associated manner according to the sequence of the distance from the input information from near to far and the hierarchical order of each candidate tag in the hierarchical tag system from low to high (from layer 1 to layer n), so as to obtain the information to be processed. For example, the generated information to be processed sequentially includes, in order of information content: the method comprises the steps of an n-layer candidate aggregation tag and corresponding semantic description information thereof, an n-1-layer candidate aggregation tag and corresponding semantic description information thereof, a … … 2-layer candidate aggregation tag and corresponding semantic description information thereof, at least two candidate initial tags and corresponding semantic description information thereof, and input information.
In some alternative implementations, the execution body may also record each candidate tag and corresponding semantic description information in front of or behind the input information in an associated manner according to the order of the levels of each candidate tag in the hierarchical tag system from high to low and the distance between each candidate tag and the input information from near to far.
In the embodiment of the disclosure, the execution body carries out association record on each candidate tag and the corresponding semantic description information thereof according to the hierarchical order of the hierarchical tag system, so that the sequence of the candidate tags and the semantic description information thereof in the information to be processed can be effectively ensured, the information logic sequence of the information to be processed is effectively ensured, logic guarantee is provided for the subsequent determination of the target tag, and the determination accuracy of the target tag is improved.
Step S204, determining a target label corresponding to the input information from at least two candidate initial labels according to the information to be processed.
In the embodiment of the present disclosure, the execution subject of the information processing method, for example, the server 103 shown in fig. 1, determines, according to the information to be processed generated in step S203, a target initial tag from at least two candidate initial tags among the candidate tags, as a target tag corresponding to the input information.
The executing main body performs semantic analysis on the information to be processed, determines target aggregation tags corresponding to the input information layer by layer according to the sequence from the nth layer to the 1 st layer in the hierarchical tag system, and then determines target initial tags from the target aggregation tags and semantic description information of at least two candidate initial tags according to the target aggregation tags and the semantic description information of the at least two candidate initial tags, and uses the target initial tags as target tags corresponding to the input information.
In some optional implementations, the execution body determines, according to semantic description information of each candidate aggregation tag, a target aggregation tag corresponding to the input information in each hierarchy from an nth layer to a 1 st layer according to a hierarchy order in the hierarchy tag system layer by layer until a target aggregation tag in a 2 nd layer tag is determined, and then determines, according to the target aggregation tag in the 2 nd layer, a target initial tag corresponding to the input information from at least two candidate initial tags by combining semantic description information of the at least two candidate initial tags with the input information, and uses the target initial tag corresponding to the input information as a target tag corresponding to the input information.
In some optional implementation manners, the execution body may analyze and process the information to be processed by adopting a pre-trained neural network model, so as to obtain a target tag corresponding to the input information. The neural network model may be a deep-learning semantic analysis model, or may be a generative large language model, for example.
In some optional implementations of the embodiments of the present disclosure, determining, according to information to be processed, a target tag corresponding to input information from at least two candidate initial tags includes: inputting the information to be processed into a pre-trained information processing large model to obtain target labels corresponding to the input information in at least two candidate initial labels.
The pre-trained information processing large model is a generated large language model.
In this implementation manner, the execution subject directly inputs the generated information to be processed into the pre-trained information processing large model, and inputs the target tag corresponding to the input information in the at least two candidate initial tags through the information processing large model.
In the implementation mode, the execution main body adopts the pre-trained information processing large model to analyze and process the information to be processed, so that the determination efficiency and accuracy of the target label can be effectively improved.
According to the information processing method provided by the embodiment of the disclosure, an execution main body determines at least one candidate aggregation tag and at least two candidate initial tags corresponding to input information from a hierarchical tag system, the candidate aggregation tags are used as candidate tags corresponding to the input information, semantic description information corresponding to each candidate tag is determined, the input information and each candidate tag and the corresponding semantic description information are associated and recorded, information to be processed is generated, the information to be processed is processed, and a target tag corresponding to the input information in the at least two candidate initial tags is determined. According to the scheme, through the hierarchical order of the candidate labels in the hierarchical label system and the corresponding relation between the candidate aggregation labels and the candidate initial labels, at least two candidate initial labels can be determined from the layer 1 initial labels, then the target label is determined, the matching performance of the determined target label and input information can be effectively improved, and the accuracy and the determining efficiency of the target label are ensured.
In the technical scheme of the disclosure, the related personal information of the user is acquired, stored, applied and the like under the condition of obtaining the authorization of the user, and meets the requirements of related laws and regulations without violating the public order colloquial.
Fig. 3 shows a flow 300 of a second embodiment of an information processing method according to the present disclosure, which, referring to fig. 3, includes the steps of:
step S301, determining history information of the input information corresponding to the user.
In the embodiment of the present disclosure, the execution subject of the information processing method, such as the server 103 shown in fig. 1, determines, according to the acquired or received input information, the history processing information of the user to which the input information corresponds.
The history processing information comprises at least one piece of history input information in a history information processing record corresponding to the input information of the user, a history target label corresponding to the history input information, a history candidate label corresponding to each piece of history input information and the like.
In some optional implementations, after receiving or acquiring the input information, the executing body determines a user who sends the input information, then acquires a history information processing record of the user, and acquires at least one piece of history input information and a corresponding history target tag thereof from the history information processing record as history processing information of the user corresponding to the input information.
In some optional implementations, after receiving the input information, the executing body determines a sending user of the input information, then obtains user portrait information of the sending user, then determines candidate users from a user set according to the user portrait information, wherein the similarity between the user portrait information of the candidate users and the user portrait information of the sending user is greater than a preset similarity threshold, and then obtains a plurality of pieces of historical input information and corresponding historical target labels from historical information processing records of the sending user and each candidate user as historical processing information corresponding to the input information.
Accordingly, after determining the candidate users, the executing body may further acquire a history candidate tag corresponding to each history input information from the history information processing record of the sending user and each candidate user, as a part of the history processing information.
In some alternative implementations, the historical input information in the historical processing information is information that is semantically similar or analogous to the input information. Illustratively, the executing body performs similarity analysis on each piece of history input information in the history information processing record and the current input information, for example, performs similarity analysis on at least one of semantics, a belonging field and the like, determines the history input information with each degree of similarity being greater than a preset similarity threshold value or with each degree of similarity weighted and summed to obtain a value (such as a total value or an average value) greater than the preset threshold value as the history input information corresponding to the current input information, then acquires a history target tag corresponding to the history input information, and takes the determined history input information and the history target tag thereof as the history processing information corresponding to the current input information.
Step S302, based on the input information and the historical processing information, a candidate label corresponding to the input information is determined from the hierarchical label system.
In the embodiment of the present disclosure, an execution subject of the information processing method, for example, the server 103 shown in fig. 1, determines, from the hierarchical label system, a candidate label corresponding to the input information based on the input information and the history processing information of the user corresponding to the input information.
Illustratively, the executing body determines candidate labels corresponding to the input information from the hierarchical label system according to the corresponding relation between the historical input information and the historical target labels in the historical processing information and the corresponding historical aggregation labels of the historical target labels in each hierarchy of the hierarchical label system by combining the semantic information of the input information.
The hierarchical label system comprises n layers of labels, n is an integer greater than 1, the 1 st layer of labels are initial labels, and other layers of labels are aggregation labels, for example, the other layers of labels are obtained according to initial label aggregation.
Illustratively, labels outside the 1 st layer in the hierarchical label system are all obtained by layer aggregation on the basis of the initial label. For example, the x-th layer tag is polymerized based on the x-1 th layer tag, x being an integer greater than 1 and less than or equal to n.
In some alternative implementations, there is no semantic overlap between the aggregated tags of each layer. For example, relative to any two aggregate labels in layer 2 labels, the corresponding initial labels in layer 1 do not overlap.
According to the embodiment of the disclosure, through the hierarchical relation among the labels in the hierarchical label system, the auxiliary execution body precisely matches the initial labels for the input information, so that the candidate labels determined for the input information at least comprise one candidate aggregation label and at least two candidate initial labels.
For example, the executing body may combine the input information based on each history aggregation tag corresponding to the history target tag corresponding to the history input information in the hierarchical tag system, and determine at least one candidate aggregation tag from the nth layer tag according to the order from high to low of each hierarchical structure in the hierarchical tag system, and then determine at least one n-1 layer candidate aggregation tag corresponding to the nth layer candidate aggregation tag from the nth-1 layer tag until at least two candidate initial tags are determined from the 1 st layer tags.
Wherein, a candidate aggregation label of the layer 2 is at least polymerized based on the at least two candidate initial labels, a candidate aggregation label of the layer 3 is at least polymerized based on the candidate aggregation label of the layer 2, and so on, thereby ensuring the screening assistance of the candidate aggregation label to the candidate initial labels.
In some alternative implementations, the execution body may select all hierarchies in the hierarchy of hierarchy tags when determining the candidate tags, and may also select a portion of the hierarchy of hierarchy tags, where the selected portion of hierarchy includes the initial tag of layer 1. For example, the execution body may select a label with a certain number of layers from layer 1 upwards as a candidate layer according to the input information, and then determine a candidate label from the candidate layer.
In some optional implementations of embodiments of the present disclosure, determining candidate tags from a hierarchical tag architecture based on input information and historical processing information includes: determining the 1 st layer to the m th layer as candidate layers from a hierarchical label system, wherein m is an integer greater than 1 and less than or equal to n; determining a candidate aggregation tag from the m-th layer tag based on the input information and the history processing information in response to m being equal to 2; in response to m being greater than 2, determining candidate aggregation tags for each layer from the m-th layer to the 2-th layer tags in turn based on the input information and the history processing information; at least two candidate initial tags are determined from the layer 1 tags based on the candidate aggregate tags in layer 2 and the input information.
In this implementation manner, the execution body determines, according to the input information, that the 1 st layer to the m st layer are candidate layers from the hierarchical label system. For example, according to semantic information of input information and the technical field.
In some optional implementations, the executing body selects the 1 st layer to the m st layer from the hierarchical tag system as the candidate layer according to the input information and the input limit information of the information processing large model selected when the information to be processed is processed later.
Wherein the input limit information of the information processing large model includes its limit input length, for example, the limit input length of the selected information processing large model is 2048 characters.
Illustratively, the executing body determines a maximum number of selectable candidate labels according to the input length of the input information and the limited input length of the selected model, and combines the average length of each label or the sum of the average lengths of each label and the semantic description information thereof in the hierarchical label system, and then determines a candidate layer from the hierarchical label system according to the maximum number of selectable candidate labels.
After determining the candidate layers, the execution main body determines at least one candidate aggregation label from the 2 nd layer if the number of layers of the candidate layers is 2, and then determines at least two candidate initial labels from the 1 st layer according to the candidate aggregation label of the 2 nd layer and input information; if the number of layers of the candidate layers is greater than 2, determining corresponding candidate aggregation labels layer by layer from an mth layer to a 2 nd layer according to each candidate layer, for example, firstly determining the candidate aggregation label in the mth layer, then determining the candidate aggregation label in the m-1 th layer according to the candidate aggregation label of the mth layer and input information, and so on until determining the candidate aggregation label in the 2 nd layer; and then determining at least two candidate initial tags from the layer 1 according to the candidate aggregation tags of the layer 2 and the input information.
In the embodiment of the disclosure, the execution body determines the candidate layer including the layer 1 tag from the hierarchical tag system according to the input information, and then determines the candidate tag from the candidate layer, so that low processing efficiency caused by excessive number of the candidate tags can be effectively avoided, and the determination efficiency and the information processing efficiency of the candidate tag are improved.
Step S303, determining semantic description information corresponding to the candidate labels.
In the embodiment of the present disclosure, the execution subject of the information processing method, such as the server 103 shown in fig. 1, determines, for each candidate tag determined in step S302, its corresponding semantic description information.
In the embodiment of the present disclosure, the specific operation of step S303 is described in detail in step S202 in the embodiment shown in fig. 2, and will not be described herein.
Step S304, generating information to be processed based on the input information, the candidate labels and the semantic description information.
In the embodiment of the present disclosure, an execution subject of the information processing method, such as the server 103 shown in fig. 1, generates information to be processed based on the received input information, each candidate tag corresponding to the determined input information, and semantic description information corresponding to each candidate tag.
In the embodiment of the present disclosure, the specific operation of step S304 is described in detail in step S203 in the embodiment shown in fig. 2, and will not be described herein.
Step S305, determining a target label corresponding to the input information from at least two candidate initial labels according to the information to be processed.
In the embodiment of the present disclosure, the execution body of the information processing method, for example, the server 103 shown in fig. 1, determines, according to the information to be processed generated in step S304, a target tag corresponding to the input information from at least two candidate initial tags of the candidate tags.
In the embodiment of the present disclosure, the specific operation of step S305 is described in detail in step S204 in the embodiment shown in fig. 2, and will not be described herein.
According to the information processing method provided by the embodiment of the disclosure, firstly, the history processing information of the user corresponding to the input information is determined, then the input information is combined with the corresponding history processing information, the candidate labels corresponding to the input information are determined from the hierarchical label system, the semantic description information corresponding to each candidate label is determined, then the information to be processed is generated based on the input information, the candidate labels and the semantic description information corresponding to the candidate labels, and the target labels corresponding to the input information are obtained through analysis and processing of the information to be processed. According to the scheme, the input information is combined with the historical processing information of the user corresponding to the input information to determine the candidate labels, so that the determination accuracy and the determination efficiency of the candidate labels can be improved, and the determination accuracy of the target labels is improved.
Fig. 4 is an implementation flow 400 of a third embodiment of an information processing method according to the present disclosure, which, referring to fig. 4, includes the steps of:
step S401, input information of a user and a field label set corresponding to the input information are obtained.
In the embodiment of the present disclosure, an execution body of the information processing method, for example, the server 103 shown in fig. 1, obtains input information of a user and a field tag set corresponding to the input information.
The manner in which the execution subject obtains the input information of the user may be to obtain the input information of the user in sequence in the input information queue, or may be to directly receive the input information of the user.
After the input information of the user is acquired, acquiring a field label set corresponding to the input information. For example, the execution subject determines the target domain to which the input information belongs by performing semantic analysis on the input information, and then obtains a corresponding domain label set according to the target domain.
Step S402, determining a hierarchical label system according to the domain label set.
In the embodiment of the present disclosure, an execution body of the information processing method, for example, the server 103 shown in fig. 1, uses the domain label in the obtained domain label set as an initial label of layer 1, and performs layer-by-layer aggregation to construct a hierarchical label system.
Illustratively, according to the number of labels in the domain label set, the number of layers obtained by determining the hierarchical label system is n, and n is an integer greater than 1. And then taking the labels in the field label set as the initial labels of the 1 st layer, carrying out aggregation to obtain a plurality of aggregation labels of the 2 nd layer, then further carrying out aggregation based on the aggregation labels of the 2 nd layer to obtain aggregation labels of the 3 rd layer, and so on, carrying out layer-by-layer aggregation to obtain n-layer labels, and constructing a hierarchical label system.
Step S403, determining candidate labels corresponding to the input information from the hierarchical label system.
In the embodiment of the present disclosure, the execution subject of the information processing method, such as the server 103 shown in fig. 1, determines at least one candidate aggregate tag and at least two candidate initial tags corresponding to the input information as candidate tags from the hierarchical tag system constructed in step S402.
For example, the executing body may input the hierarchical label system and the input information into a pre-trained large language model, and output at least one candidate aggregate label and at least two candidate initial labels corresponding to the input information through the pre-trained large language model as candidate labels corresponding to the input information.
And step 404, inputting the candidate labels into a pre-trained description model to obtain semantic description information corresponding to the candidate labels.
In the embodiment of the present disclosure, the execution body of the information processing method, for example, the server 103 shown in fig. 1, inputs the candidate labels into the model by using the pre-trained description model, and may directly output and obtain the semantic description information corresponding to each candidate label.
Step S405, generating information to be processed based on the input information, the candidate tag, and the semantic description information.
In the embodiment of the present disclosure, the specific operation of step S405 is described in detail in step S203 in the embodiment shown in fig. 2, and will not be described herein.
Step S406, determining a target label corresponding to the input information from at least two candidate initial labels according to the information to be processed.
In the embodiment of the present disclosure, the specific operation of step S406 is described in detail in step S204 in the embodiment shown in fig. 2, and will not be described herein.
In the embodiment of the disclosure, an executing body determines a corresponding domain label set according to input information of a user, aggregates and constructs a hierarchical label system based on labels in the domain label set as initial labels, then determines candidate labels corresponding to the input information by using hierarchical relations among the labels in the hierarchical label system, and accordingly determines a target label corresponding to the input information from the initial labels. According to the scheme, the corresponding field label set is selected by identifying the field of the input information, a hierarchical label system is built according to the field label set, the suitability of the hierarchical label system and the input information is improved, the hierarchical relation between each aggregation label and the initial label in the built hierarchical label system is further utilized, an auxiliary execution body determines the target label from the initial label, and the determination accuracy of the target label is effectively improved.
FIG. 5 illustrates an implementation flow 500 of one embodiment of a training method for an information handling large model according to the present disclosure, with reference to FIG. 5, comprising the steps of:
in step S501, a sample tag corresponding to the sample information is determined from the hierarchical tag system.
In the embodiment of the present disclosure, an execution subject of the training method of the information processing large model, such as the server 103 shown in fig. 1, first obtains sample information, and then determines a sample tag corresponding to the sample information from the hierarchical tag system.
The hierarchical label system comprises n layers of labels, n is an integer greater than 1, the 1 st layer of labels are initial labels, and other layers of labels are aggregation labels, for example, the other layers of labels are obtained according to initial label aggregation.
Illustratively, labels outside the 1 st layer in the hierarchical label system are all obtained by layer aggregation on the basis of the initial label.
In some alternative implementations of embodiments of the present disclosure, a hierarchical label system is employed in which the x-th layer labels are aggregated based on the x-1-th layer labels, where x is an integer greater than 1 and less than or equal to n. For example, layer 2 tags are based on layer 1 initial tag aggregation, layer 3 tags are based on layer 2 tag aggregation, and so on.
Illustratively, the x-layer aggregation tags may be aggregated based on at least one of semantic similarity, semantic association, etc. of the x-1 layer tags. For example, the x-1 layer labels with semantic relevance greater than a preset relevance threshold are aggregated into an x layer aggregation label. For another example, more than two x-1 layer labels with semantic similarity greater than a preset similarity threshold and semantic relevance greater than a preset relevance threshold may be aggregated into one x-layer label.
In some alternative implementations, there is no semantic overlap between the aggregated tags of each layer. For example, relative to any two aggregate labels in layer 2 labels, the corresponding initial labels in layer 1 do not overlap.
In an embodiment of the present disclosure, the sample tags include at least one sample aggregate tag and at least two sample initial tags.
According to the embodiment of the disclosure, model training is assisted through the hierarchical relationship among the labels in the hierarchical label system, so that the information processing large model obtained through training can be accurately matched with the initial label for any input information by using the hierarchical relationship in the hierarchical label system.
For example, after the execution body obtains the sample information, the execution body may combine the semantic information of the sample information, determine the initial sample labels according to the order of the hierarchical structure from low to high in the hierarchical label system, and then determine the sample aggregation labels in each layer of aggregation labels layer by layer based on the initial sample labels according to the aggregation relation among the labels in different levels in the hierarchical label system.
For example, the execution body determines, according to the sample information, a sample initial tag corresponding to the input information from the initial tags of layer 1, then determines, from the tags of layer 2, a sample aggregate tag obtained at least based on the sample initial tag aggregate, and so on, until a corresponding sample aggregate tag is determined from the tags of layer n.
In some alternative implementations, to avoid that the number of the determined sample labels is too large to affect the model training efficiency, or that the number of the determined sample labels is not limited by the limited input length of the initial large model, the execution body may determine, according to the information length of the sample information and the limited input length of the initial large model, a sample layer corresponding to the sample information from the hierarchical label system, and then determine the sample label from the corresponding sample layer.
It should be noted that, since the trained large model is mainly used to predict target tags from initial tags, the sample layer selected by the execution subject from the hierarchical tag hierarchy includes layer 1 tags.
In some optional implementations of embodiments of the present disclosure, determining a sample tag corresponding to sample information from a hierarchical tag hierarchy includes: determining layers 1 to k as sample layers from a hierarchical tag system, wherein k is an integer greater than 1 and less than or equal to n; determining a sample initial tag from the layer 1 tags based on the sample information; determining a sample aggregate tag from the kth layer tags based on the sample initial tag and the sample information in response to k=2; and in response to k being greater than 2, determining a sample aggregation label of each layer from the 2 nd layer to the k th layer labels in turn based on the sample initial label and the sample information.
In the embodiment of the disclosure, the execution body determines the number of sample tags allowed to be input by the initial large model according to the information length of the sample information and the limited input length of the initial large model, and determines the 1 st layer to the k th layer as the sample layer from the hierarchical tag system according to the number of the sample tags.
Illustratively, the number k of sample layers is less than the number of sample tags.
In the implementation mode, the execution main body determines the sample layer from the hierarchical label system according to the sample information and the initial large model, and then determines the sample label from the sample layer, so that the determined sample label can be effectively ensured to conform to the input limit of the initial large model, and the model training efficiency is improved.
Step S502, determining sample description information corresponding to the sample tag.
In the embodiment of the present disclosure, the execution subject of the training method of the information processing large model, for example, the server 103 shown in fig. 1, determines corresponding sample description information for the sample label determined in step S501.
The sample description information corresponding to the sample label can be description information capable of performing a grammatical interpretation and a semantic interpretation on the sample label.
In some alternative implementations, each layer of tags in the hierarchy of tags includes an aggregate tag or an initial tag in the hierarchy, and may also include semantic description information of the aggregate tag or the initial tag. Therefore, after determining the sample label corresponding to the sample information, the semantic description information corresponding to each sample label can be obtained from the hierarchical label system according to the sample label and used as the corresponding sample description information.
In some alternative implementations, the execution body may generate corresponding tag description information for each aggregate tag and initial tag in the hierarchical tag system, and perform association record with the hierarchical tag system. Illustratively, the recording is performed in a preset mapping relationship. Then, after determining the sample label corresponding to the sample information, the label description information corresponding to each sample label can be further determined according to the preset mapping relation and used as the corresponding sample description information.
In some alternative implementations, the execution body may also determine corresponding sample tags for the sample information first, and then generate corresponding sample description information for each sample tag separately. For example, corresponding sample description information may be generated for each sample tag by a pre-trained neural network model.
In some optional implementations of embodiments of the present disclosure, determining sample description information corresponding to a sample tag includes: and inputting the sample label into a pre-trained description model to obtain sample description information corresponding to the sample label.
In this implementation manner, the execution body inputs the sample tag determined in step S201 into a pre-trained description model, and the output of the description model is sample description information corresponding to the sample tag. Wherein the pre-trained description model has semantic generation capabilities.
The description model may be a deep-learning semantic analysis neural network model, or a generative large language model with natural language processing capability, for example.
In the embodiment of the disclosure, the execution body generates the corresponding sample description information for each sample tag by utilizing the semantic analysis capability or the natural language processing capability of the pre-trained description model, so that the accuracy of the sample description information, namely, the description accuracy of the sample description information on the sample tag, can be effectively ensured.
Step S503, generating sample data based on the sample information, the sample tag, and the sample description information.
In the embodiment of the present disclosure, an execution subject of the training method of the information processing large model, such as the server 103 shown in fig. 1, correlates and records sample information, sample tags, and sample description information, and generates sample data.
The execution body performs association recording and storage on the sample information, the sample label and sample description information corresponding to the sample label to obtain sample data. For example, the sample label and the corresponding sample description information are added in front of the sample information, and association recording is performed to obtain sample data.
In some optional implementations of embodiments of the present disclosure, generating sample data based on the sample information, the sample tag, and the sample description information includes: and based on the hierarchical order of the sample tags in the hierarchical tag system, carrying out association record on the sample tags, sample description information of the sample tags and sample information to obtain sample data.
In the implementation manner, the execution main body arranges each sample label and the corresponding sample description information according to the hierarchical order of each sample label in the hierarchical label system, and performs association record with the sample information to obtain sample data.
Illustratively, the execution body records each sample label and the corresponding sample description information thereof in front of the sample information according to the sequence of the distance from the sample information from near to far and the hierarchical order of each sample label in the hierarchical label system from low to high (from layer 1 to layer n), so as to obtain sample data. For example, the generated sample data sequentially includes, in order of information content: an nth layer sample aggregation tag and corresponding sample description information thereof, an n-1 layer sample aggregation tag and corresponding sample description information thereof, a … … 2 nd layer sample aggregation tag and corresponding sample description information thereof, at least two sample initial tags and corresponding sample description information thereof, and sample information.
In some alternative implementations, the execution body may also record each sample tag and corresponding sample description information in front of or behind the sample information in an order of from high to low and from near to far according to the hierarchical order of each sample tag in the hierarchical tag system.
In the embodiment of the disclosure, the execution body carries out association record on each sample label and the corresponding sample description information according to the hierarchical order of the hierarchical label system, so that the sequence of the sample labels and the sample description information in the sample data can be effectively ensured, the information logic sequence of the sample data is effectively ensured, logic guarantee is provided for the subsequent determination of the target labels, and the determination accuracy of the target labels is improved.
Step S504, training an initial large model by taking sample data as input and a sample initial label as expected output to obtain an information processing model.
In the embodiment of the present disclosure, an execution subject of a training method of an information processing large model, for example, the server 103 shown in fig. 1, trains an initial large model with sample data as input and a sample initial tag as desired output, and obtains an information processing model.
According to the training method of the information processing large model provided by the embodiment of the disclosure, an execution main body determines at least one sample aggregation label and one sample initial label corresponding to sample information from a hierarchical label system as sample labels corresponding to the sample information, then determines sample description information corresponding to each sample label, carries out association record on the sample information and each sample label and the corresponding sample description information thereof to generate sample data, and then carries out model training by taking the sample data as input and the sample initial label as expected output to obtain the information processing large model. According to the method, one sample initial label is determined from the 1 st layer initial labels of the hierarchical label system, then the hierarchical order of each label in the hierarchical label system is utilized, the sample aggregation labels corresponding to the sample initial labels are determined layer by layer from low to high, so that the sample labels corresponding to sample information are determined, the matching performance of the determined sample labels and the sample information can be effectively improved, the sample initial labels are utilized as expected output, the model training accuracy can be effectively ensured, and the trained information processing large model can accurately predict corresponding target labels for any input information.
In the technical scheme of the embodiment of the disclosure, the related personal information of the user is acquired, stored, applied and the like under the condition of obtaining the authorization of the user, and meets the requirements of related laws and regulations without violating the popular public order.
As an implementation of the information processing method shown in the above-described figures, fig. 6 shows an embodiment of an information processing apparatus according to the present disclosure. The information processing apparatus 600 corresponds to the method embodiment shown in fig. 2 and 3, and can be applied to various electronic devices.
Referring to fig. 6, an information processing apparatus 600 provided by an embodiment of the present disclosure includes: a first determination module 601, a second determination module 602, a first generation module 603, and a third determination module 604. The first determining module 601 is configured to determine a candidate tag corresponding to the input information from a hierarchical tag system, where the hierarchical tag system includes n layers of tags, a 1 st layer tag is an initial tag, other layers of tags are aggregation tags, n is an integer greater than 1, and the candidate tag includes at least one candidate aggregation tag and at least two candidate initial tags; the second determining module 602 is configured to determine semantic description information corresponding to the candidate module; the first generation module 603 is configured to generate a pending letter based on the input information, the candidate tag, and the semantic description information; the third determining module 604 is configured to determine, according to the information to be processed, a target tag corresponding to the input information from at least two candidate initial tags.
In the information processing apparatus 600 of the embodiment of the present disclosure, the specific processes of the first determining module 601, the second determining module 602, the first generating module 603, and the third determining module 604 and the technical effects thereof may refer to the relevant descriptions of steps S201 to S204 in the corresponding embodiment of fig. 2, and are not repeated herein.
In some optional implementations of embodiments of the present disclosure, the first determining module 601 includes a first determining unit and a second determining unit, where the first determining unit is configured to determine that the input information corresponds to historical processing information of the user; the second determination unit is configured to determine a candidate tag corresponding to the input information from the hierarchical tag hierarchy based on the input information and the history processing information.
In the information processing apparatus according to the embodiment of the present disclosure, the specific processes of the first determining unit and the second determining unit and the technical effects thereof may refer to the descriptions related to steps S301 to S302 in the corresponding embodiment of fig. 3, and are not repeated herein.
In some optional implementations of embodiments of the present disclosure, the second determining unit is configured to: determining layers 1 to m from the hierarchical tag system as candidate layers, wherein m is an integer greater than 1 and less than or equal to n; determining a candidate aggregation tag from the m-th layer tag based on the input information and the history processing information in response to m being equal to 2; in response to m being greater than 2, determining candidate aggregation tags for each layer from the m-th layer to the 2-th layer tags in turn based on the input information and the history processing information; at least two candidate initial tags are determined from the layer 1 tags based on the candidate aggregate tags in layer 2 and the input information.
In some optional implementations of embodiments of the present disclosure, the x-th layer tags are polymerized based on x-1-th layer tags, where x is an integer greater than 1 and less than or equal to n.
In some optional implementations of embodiments of the present disclosure, the second determining module is configured to: and inputting the candidate labels into a pre-trained description model, and obtaining semantic description information corresponding to the candidate labels.
In some optional implementations of embodiments of the present disclosure, the first generation module is configured to: and based on the hierarchical order of the candidate labels in the hierarchical label system, associating and recording the candidate labels, the semantic description information and the input information to obtain the information to be processed.
In some optional implementations of embodiments of the present disclosure, the third determination module is configured to: inputting the information to be processed into a pre-trained information processing large model to obtain target labels corresponding to the input information in at least two candidate initial labels.
As an implementation of the training method of the information processing large model shown in the above-described figures, fig. 7 shows an embodiment of a training apparatus of the information processing large model according to the present disclosure. The training apparatus 700 of the information processing large model corresponds to the embodiment of the method shown in fig. 5, and can be applied to various electronic devices.
Referring to fig. 7, a training apparatus 700 for information processing large models provided in an embodiment of the present disclosure includes: a fourth determination module 701, a fifth determination module 702, a second generation module 703, and a training module 704. The fourth determining module 701 is configured to determine a sample tag corresponding to sample information from a hierarchical tag system, where the hierarchical tag system includes n layers of tags, a 1 st layer tag is an initial tag, other layers of tags are aggregate tags, n is an integer greater than 1, and the sample tag includes at least one sample aggregate tag and one sample initial tag; the fifth determining module 702 is configured to determine sample description information corresponding to the sample tag; the second generation module 703 is configured to generate sample data based on the sample information, the sample tag, and the sample description information; training module 704 is configured to train the initial large model with sample data as input and sample initial tags as desired output to obtain an information processing large model.
In the training device for information processing large models in the embodiments of the present disclosure, specific processes and technical effects of the fourth determining module 701, the fifth determining module 702, the second generating module 703 and the training module 704 may refer to the relevant descriptions of steps S501 to S504 in the corresponding embodiment of fig. 5, which are not described herein again.
In some optional implementations of embodiments of the present disclosure, the fourth determination module is configured to: determining layers 1 to k as sample layers from a hierarchical tag system, wherein k is an integer greater than 1 and less than or equal to n; determining a sample initial tag from the layer 1 tags based on the sample information; determining a sample aggregate tag from the kth layer tag based on the sample initial tag and the sample information in response to k being equal to 2; and in response to k being greater than 2, determining a sample aggregation label of each layer from the 2 nd layer to the k th layer labels in turn based on the sample initial label and the sample information.
In some optional implementations of embodiments of the present disclosure, the x-th layer tag is polymerized based on the x-1-th layer tag, where x is an integer greater than 1 and less than or equal to n.
In some optional implementations of embodiments of the present disclosure, the fifth determination module is configured to: and inputting the sample label into a pre-trained description model to obtain sample description information corresponding to the sample label.
In some optional implementations of embodiments of the present disclosure, the sample tags, sample description information, and sample information are associated and recorded based on a hierarchical order of the sample tags in a hierarchical tag hierarchy, resulting in sample data.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, for example, an information processing method or a training method of an information processing large model. For example, in some embodiments, the information processing method or the training method of the information processing large model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above-described information processing method or training method of the information processing large model may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the information processing method or the training method of the information processing large model in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (27)

1. An information processing method, comprising:
determining candidate labels corresponding to input information from a hierarchical label system, wherein the hierarchical label system comprises n layers of labels, the 1 st layer of labels are initial labels, the other layers of labels are aggregation labels, n is an integer greater than 1, and the candidate labels comprise at least one candidate aggregation label and at least two candidate initial labels;
determining semantic description information corresponding to the candidate tag;
Generating information to be processed based on the input information, the candidate tag and the semantic description information;
and determining a target label corresponding to the input information from the at least two candidate initial labels according to the information to be processed.
2. The method of claim 1, wherein the determining candidate tags corresponding to the input information from the hierarchical tag hierarchy comprises:
determining historical processing information of a user corresponding to the input information;
and determining candidate labels corresponding to the input information from the hierarchical label system based on the input information and the historical processing information.
3. The method of claim 2, wherein the determining, based on the input information and the historical processing information, a candidate tag corresponding to the input information from the hierarchical tag hierarchy comprises:
determining layers 1 to m from the hierarchical label system as candidate layers, wherein m is an integer greater than 1 and less than or equal to n;
determining the candidate aggregated tag from the m-th layer tags based on the input information and the history processing information in response to m being equal to 2;
determining the candidate aggregation label of each layer from the m-th layer label to the 2-th layer label in turn based on the input information and the history processing information in response to m being greater than 2;
The at least two candidate initial tags are determined from the layer 1 tags based on the candidate aggregate tags in the layer 2 and the input information.
4. The method of claim 1, wherein the x-th layer tag is polymerized based on the x-1 th layer tag, x being an integer greater than 1 and less than or equal to n.
5. The method of claim 1, wherein the determining semantic description information corresponding to the candidate tag comprises:
and inputting the candidate labels into a pre-trained description model to obtain semantic description information corresponding to the candidate labels.
6. The method of claim 1, wherein the generating the information to be processed based on the input information, the candidate tag information, and the semantic description information comprises:
and based on the hierarchical order of the candidate labels in the hierarchical label system, carrying out association record on the candidate labels, the semantic description information and the input information to obtain the information to be processed.
7. The method of claim 1, wherein the determining, according to the information to be processed, a target tag corresponding to the input information from the at least two candidate initial tags includes:
Inputting the information to be processed into a pre-trained information processing large model to obtain a target label corresponding to the input information in the at least two candidate initial labels.
8. A training method of an information processing large model, comprising:
determining a sample label corresponding to sample information from a hierarchical label system, wherein the hierarchical label system comprises n layers of labels, the 1 st layer of labels are initial labels, the other layers of labels are aggregation labels, n is an integer greater than 1, and the sample label comprises at least one sample aggregation label and one sample initial label;
determining sample description information corresponding to the sample label;
generating sample data based on the sample information, the sample tag, and the sample description information;
and training an initial large model by taking the sample data as input and the sample initial label as expected output to obtain an information processing large model.
9. The method of claim 8, wherein the determining, from the hierarchical label hierarchy, a sample label corresponding to the sample information comprises:
determining layers 1 to k as sample layers from the hierarchical tag system, wherein k is an integer greater than 1 and less than or equal to n;
Determining a sample initial tag from the layer 1 tags based on the sample information;
determining a sample aggregate tag from a kth layer tag based on the sample initial tag and the sample information in response to k being equal to 2;
and in response to k being greater than 2, determining the sample aggregation label of each layer from the labels of the 2 nd layer to the k th layer in sequence based on the sample initial label and the sample information.
10. The method of claim 8, wherein the x-th layer tag is polymerized based on the x-1 th layer tag, x being an integer greater than 1 and less than or equal to n.
11. The method of claim 8, wherein the determining the sample description information corresponding to the sample tag comprises:
and inputting the sample label into a pre-trained description model to obtain sample description information corresponding to the sample label.
12. The method of claim 8, wherein the generating sample data based on the sample information, the sample tag, and the sample description information comprises:
and based on the hierarchical order of the sample tags in the hierarchical tag system, carrying out association record on the sample tags, the sample description information and the sample information to obtain the sample data.
13. An information processing apparatus comprising:
the first determining module is configured to determine candidate tags corresponding to input information from a hierarchical tag system, wherein the hierarchical tag system comprises n layers of tags, the 1 st layer of tags are initial tags, the other layers of tags are aggregation tags, n is an integer greater than 1, and the candidate tags comprise at least one candidate aggregation tag and at least two candidate initial tags;
the second determining module is configured to determine semantic description information corresponding to the candidate tag;
a first generation module configured to generate information to be processed based on the input information, the candidate tag, and the semantic description information;
and the third determining module is configured to determine a target tag corresponding to the input information from the at least two candidate initial tags according to the information to be processed.
14. The apparatus of claim 13, wherein the first determination module comprises:
a first determination unit configured to determine history processing information of a user to which the input information corresponds;
and a second determining unit configured to determine a candidate tag corresponding to the input information from the hierarchical tag system based on the input information and the history processing information.
15. The apparatus of claim 14, wherein the second determination unit is configured to:
determining layers 1 to m from the hierarchical label system as candidate layers, wherein m is an integer greater than 1 and less than or equal to n;
determining the candidate aggregated tag from the m-th layer tags based on the input information and the history processing information in response to m being equal to 2;
determining the candidate aggregation label of each layer from the m-th layer label to the 2-th layer label in turn based on the input information and the history processing information in response to m being greater than 2;
at least two candidate initial tags are determined from the layer 1 tags based on the candidate aggregate tags in layer 2 and the input information.
16. The apparatus of claim 13, wherein the x-th layer tag is polymerized based on the x-1 th layer tag, x being an integer greater than 1 and less than or equal to n.
17. The apparatus of claim 13, wherein the second determination module is configured to:
and inputting the candidate labels into a pre-trained description model to obtain semantic description information corresponding to the candidate labels.
18. The apparatus of claim 13, wherein the first generation module is configured to:
And based on the hierarchical order of the candidate labels in the hierarchical label system, carrying out association record on the candidate labels, the semantic description information and the input information to obtain the information to be processed.
19. The apparatus of claim 13, wherein the third determination module is configured to:
inputting the information to be processed into a pre-trained information processing large model to obtain a target label corresponding to the input information in the at least two candidate initial labels.
20. A training device for information processing large models, comprising:
a fourth determining module, configured to determine a sample tag corresponding to sample information from a hierarchical tag system, where the hierarchical tag system includes n layers of tags, a 1 st layer tag is an initial tag, other layers of tags are aggregation tags, n is an integer greater than 1, and the sample tag includes at least one sample aggregation tag and one sample initial tag;
a fifth determining module configured to determine sample description information corresponding to the sample tag;
a second generation module configured to generate sample data based on the sample information, the sample tag, and the sample description information;
And the training module is configured to train the initial large model by taking the sample data as input and the sample initial label as expected output to obtain an information processing large model.
21. The apparatus of claim 20, wherein the fourth determination module is configured to:
determining layers 1 to k as sample layers from the hierarchical tag system, wherein k is an integer greater than 1 and less than or equal to n;
determining a sample initial tag from the layer 1 tags based on the sample information;
determining a sample aggregate tag from a kth layer tag based on the sample initial tag and the sample information in response to k being equal to 2;
and in response to k being greater than 2, determining the sample aggregation label of each layer from the labels of the 2 nd layer to the k th layer in sequence based on the sample initial label and the sample information.
22. The apparatus of claim 20, wherein the x-th layer tag is polymerized based on the x-1 th layer tag, x being an integer greater than 1 and less than or equal to n.
23. The apparatus of claim 20, wherein the fifth determination module is configured to:
and inputting the sample label into a pre-trained description model to obtain sample description information corresponding to the sample label.
24. The apparatus of claim 20, wherein the second generation module is configured to:
and based on the hierarchical order of the sample tags in the hierarchical tag system, carrying out association record on the sample tags, the sample description information and the sample information to obtain the sample data.
25. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.
26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.
27. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-12.
CN202410149441.1A 2024-02-01 2024-02-01 Information processing and large model training method, device, equipment and storage medium Pending CN117892137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410149441.1A CN117892137A (en) 2024-02-01 2024-02-01 Information processing and large model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410149441.1A CN117892137A (en) 2024-02-01 2024-02-01 Information processing and large model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117892137A true CN117892137A (en) 2024-04-16

Family

ID=90651904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410149441.1A Pending CN117892137A (en) 2024-02-01 2024-02-01 Information processing and large model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117892137A (en)

Similar Documents

Publication Publication Date Title
CN113326764B (en) Method and device for training image recognition model and image recognition
CN108205699B (en) Generating outputs for neural network output layers
CN114780727A (en) Text classification method and device based on reinforcement learning, computer equipment and medium
CN111414482A (en) Event argument extraction method and device and electronic equipment
KR20220029384A (en) Entity linking method and device, electronic equipment and storage medium
CN112749300B (en) Method, apparatus, device, storage medium and program product for video classification
CN113553412B (en) Question-answering processing method, question-answering processing device, electronic equipment and storage medium
US20220374678A1 (en) Method for determining pre-training model, electronic device and storage medium
CN114861889A (en) Deep learning model training method, target object detection method and device
CN115631261A (en) Training method of image generation model, image generation method and device
JP2024515199A (en) Element text processing method, device, electronic device, and storage medium
CN114494747A (en) Model training method, image processing method, device, electronic device and medium
CN114037059A (en) Pre-training model, model generation method, data processing method and data processing device
CN115130470B (en) Method, device, equipment and medium for generating text keywords
CN114792097B (en) Method and device for determining prompt vector of pre-training model and electronic equipment
CN113361621B (en) Method and device for training model
CN115687934A (en) Intention recognition method and device, computer equipment and storage medium
CN113468857B (en) Training method and device for style conversion model, electronic equipment and storage medium
CN113408304B (en) Text translation method and device, electronic equipment and storage medium
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN114969332A (en) Method and device for training text audit model
CN117892137A (en) Information processing and large model training method, device, equipment and storage medium
CN113239215A (en) Multimedia resource classification method and device, electronic equipment and storage medium
CN112784600A (en) Information sorting method and device, electronic equipment and storage medium
CN114492456B (en) Text generation method, model training method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination