WO2023071120A1

WO2023071120A1 - Method for recognizing proportion of green assets in digital assets and related product

Info

Publication number: WO2023071120A1
Application number: PCT/CN2022/090224
Authority: WO
Inventors: 诸世卓; 崔伟旗; 刘琛
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-10-30
Filing date: 2022-04-29
Publication date: 2023-05-04
Also published as: CN113902569A

Abstract

The present application relates to the technical field of artificial intelligence, and in particular, to a method for recognizing a proportion of green assets in digital assets and a related product. The method comprises: performing text recognition on obtained position holding data of digital assets to be recognized to obtain a plurality of first digital assets and second digital assets; obtaining at least one first text segment according to asset information of the first digital assets; determining a similarity between the first text segment and a plurality of second text segments; determining a target first text segment according to the similarity between the first text segment and the plurality of second text segments; determining a proportion of green assets in the first digital assets according to an asset distribution described by the target first text segment; and determining, according to the proportion of the green assets in the first digital assets and a proportion of green assets in the second digital assets, a proportion of green assets in the digital assets to be recognized.

Description

Identification method of the proportion of green assets in digital assets and related products

priority statement

This application claims the priority of the Chinese patent application submitted to the China Patent Office on October 30, 2021 with the application number 202111280770.2 and the title of the invention is "Method for Identifying the Proportion of Green Assets in Digital Assets and Related Products", all of which The contents are incorporated by reference in this application.

technical field

This application relates to the field of artificial intelligence technology, and specifically relates to a method for identifying the proportion of green assets in digital assets and related products.

Background technique

In the context of global climate change cooperation, various management departments need to clarify the scale of green and non-green assets within their jurisdiction in order to more scientifically deploy the path to achieve carbon peaking and carbon neutrality.

Investment institutions play a very important role in the process of achieving carbon peaking and carbon neutrality. The choice of investment targets will actually guide enterprises to develop in the direction of green industry and carbon neutrality.

The inventor realized that when investment institutions counted their green investment ratios, due to the need for supervision and confidentiality, cross-departmental sharing cannot be carried out, and all statistics are performed manually by various departments, which is highly subjective and low in accuracy.

Contents of the invention

The embodiments of the present application provide a method for identifying the proportion of green assets in digital assets and related products, so as to improve the identification accuracy of the proportion of green assets in digital assets.

In the first aspect, the embodiment of the present application provides a method for identifying the proportion of green assets in digital assets based on text recognition, including: performing text recognition on the acquired position data of digital assets to be identified, and obtaining a plurality of first digital assets and second digital assets, wherein the asset information of each of the first digital assets is disclosed in the position data, and the asset information of the second digital assets is not disclosed in the position data; according to each of the Asset information of a digital asset, obtaining the disclosure data of each of the first digital assets, and inputting the disclosure data of each of the first digital assets into a machine reading comprehension model for text segmentation to obtain at least one first text segment, wherein , the at least one first text segment is used to describe the asset distribution of each of the first digital assets; according to the similarity model, the similarity between each of the first text segments and a plurality of second text segments is determined, Wherein, the multiple second text segments are used to describe multiple fund distributions with green attributes; according to the similarity between each of the first text segments and the multiple second text segments, the at least A target first text segment in a first text segment; determine the green color in each of the first digital assets according to the asset distribution described in the target first text segment and the total amount of each of the first digital assets The proportion of assets; according to the portrait of the manager of the digital asset to be identified, obtain all digital assets managed by the manager, and obtain the average value of green assets in digital assets that disclose asset information among all digital assets proportion, and take the average proportion as the proportion of green assets in the second digital asset; according to the proportion of green assets in each of the first digital assets and the green The proportion of assets is to determine the proportion of green assets in the digital assets to be identified.

In the second aspect, the embodiment of the present application provides an identification device for the proportion of green assets, including: an acquisition unit and a processing unit; the acquisition unit is used to acquire the position data of the digital assets to be identified; the processing unit uses After performing text recognition on the obtained position data of digital assets to be identified, a plurality of first digital assets and second digital assets are obtained, wherein the asset information of each of the first digital assets is disclosed in the position data, so The asset information of the second digital asset is not disclosed in the position data; the acquiring unit is further configured to acquire the disclosed data of each of the first digital assets according to the asset information of each of the first digital assets; The processing unit is further configured to input the disclosure data of each of the first digital assets into the machine reading comprehension model for text segmentation to obtain at least one first text segment, wherein the at least one first text segment is used to describe each The asset distribution of the first digital asset; according to the similarity model, determine the similarity between each of the first text segments and a plurality of second text segments, wherein the plurality of second text segments are used to describe multiple A fund distribution with a green attribute; according to the similarity between each of the first text segments and the plurality of second text segments, determine the target first text segment in the at least one first text segment; according to According to the distribution of assets described in the target first text paragraph, and the total amount of each of the first digital assets, determine the proportion of green assets in each of the first digital assets; according to the management of the digital assets to be identified The portrait of the manager, obtain all the digital assets managed by the manager, and obtain the average proportion of green assets among the digital assets whose asset information is disclosed in all the digital assets, and use the average proportion as the first The proportion of green assets in the second digital assets; according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets, determine the proportion of the digital assets to be identified Proportion of green assets.

In a third aspect, an embodiment of the present application provides an electronic device, which includes: a processor and a memory, the processor is connected to the memory, the memory is used to store computer programs, and the processor is used to execute the A computer program stored in the memory to cause the electronic device to perform the following steps:

performing text recognition on the acquired position data of digital assets to be identified, and obtaining a plurality of first digital assets and second digital assets, wherein the asset information of each of the first digital assets is disclosed in the position data, and the The asset information of the second digital asset is not disclosed in the position data;

According to the asset information of each of the first digital assets, obtain the disclosure data of each of the first digital assets, and input the disclosure data of each of the first digital assets into a machine reading comprehension model for text segmentation, to obtain at least one first digital asset a text segment, wherein the at least one first text segment is used to describe the asset distribution of each of the first digital assets;

According to the similarity model, determine the similarity between each of the first text segments and a plurality of second text segments, wherein the plurality of second text segments are used to describe a plurality of capital distributions with green attributes;

determining a target first text segment in the at least one first text segment according to the similarity between each of the first text segments and the plurality of second text segments;

Determine the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets;

According to the portrait of the manager of the digital asset to be identified, obtain all the digital assets managed by the manager, and obtain the average proportion of green assets among the digital assets whose asset information is disclosed among all the digital assets, and The average proportion is taken as the proportion of green assets in the second digital asset;

Determine the proportion of green assets in the digital assets to be identified according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor so that the computer performs the following steps:

In a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to enable the computer to execute the computer program described in the first aspect. Methods.

Implementing the embodiment of the present application has the following beneficial effects:

It can be seen that in the implementation of this application, by obtaining the position data of the digital asset to be identified, and splitting the first digital asset and the second digital asset based on the position data, and then based on the text recognition technology and the machine model, it can be automatically identified The proportion of green assets in the first digital asset and the second digital asset can be obtained, and finally based on the proportion of green assets in the first digital asset and the second digital asset, the proportion of green assets in the digital assets to be identified can be automatically identified. There is no need to manually check the proportion of green assets in digital assets (funds) to be identified, thereby saving labor costs, avoiding the subjectivity caused by manual statistical processes, and increasing the proportion of green assets in funds recognition accuracy.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

FIG. 1 is a schematic flowchart of a method for identifying the proportion of green assets in digital assets based on text recognition provided by an embodiment of the present application;

Fig. 2 is a schematic diagram of the position data of a fund provided by the embodiment of the present application;

FIG. 3 is a schematic flowchart of a method for identifying the proportion of green assets in stocks provided by an embodiment of the present application;

Fig. 4 is a schematic flow chart of a similarity model training method provided by the embodiment of the present application;

FIG. 5 is a schematic flowchart of a method for identifying the proportion of green assets in a bond provided by an embodiment of the present application;

FIG. 6 is a block diagram of functional units of an identification device for the proportion of green assets provided by the embodiment of the present application;

FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

The terms "first", "second", "third" and "fourth" in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.

Reference herein to an "embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Firstly, the application scenario of this application is to identify the green assets in the fund. Therefore, the unidentified digital assets involved in this application are unidentified stocks. Funds are generally composed of multiple stocks, multiple bonds, and other fixed income. Moreover, when disclosing positions, the asset information of each stock is fully disclosed, such as the name of the stock, the proportion of the stock, and the net value of the stock. are fully disclosed, etc.; however, for bonds, not all asset information is disclosed. For example, some bonds disclose the name, proportion, net value, etc. of the bond. This application will refer to such bonds It is called disclosed bonds; some bonds do not disclose any information, such as undisclosed proportion, net value, etc., and these undisclosed bonds are collectively referred to as other bonds, etc., and such bonds are collectively referred to as undisclosed in this application bonds, and consider such bonds as a whole without subdividing them. For other fixed income, it is usually composed of bank deposits and other fixed income assets that do not have or cannot be judged as green components, because this part is not counted. Therefore, this application mainly starts from the stocks and bonds in the fund to identify the green ratio in the stocks.

For ease of description, in this application, among the digital assets to be identified, multiple digital assets (including multiple stocks and multiple disclosed bonds) that disclose asset information are referred to as multiple first digital assets, and digital assets that do not disclose asset information , that is, the undisclosed bond is called the second digital asset.

The following will introduce how to obtain the proportion of green assets in each first digital asset and the proportion of green assets in the second digital asset respectively with reference to the accompanying drawings.

Referring to FIG. 1 , FIG. 1 is a method for identifying the proportion of green assets in digital assets based on text recognition provided by an embodiment of the present application. The method is applied to the identification device of the proportion of green assets. The method includes the following steps:

101: Obtain the position data of the digital asset to be identified.

Exemplarily, the position data of the digital asset to be identified is acquired from the platform of the issuing company of the digital asset to be identified or from the third-party management platform of the digital asset to be identified by crawler technology.

102: Perform text recognition on the position data to obtain a plurality of first digital assets and second digital assets.

Exemplarily, as shown in Figure 2, text recognition is performed on the position data to obtain the keyword "stock name", and then text recognition is performed on each element under the stock name to obtain part of the first numbers in multiple first digital assets Assets, such as the stocks "China CDFG", "Wuliangye", etc. as shown in Figure 2; similarly, identify the position data, get the keyword "bond name", and then perform text recognition on each element under the bond name, Obtain another part of the first digital assets among the multiple first digital assets, such as the bonds "20 Agricultural Development 09", "21 National Development 01", etc. as shown in Figure 2 . In addition, for the second digital asset, since the asset information of the digital asset is not disclosed in the position data, it is impossible to know what these digital assets are from the position data. In this application, these undisclosed bonds are collectively referred to as The second digital asset, that is, the second digital asset is included in the digital asset to be identified by default, so there is no need to perform text recognition on the position data, and the second digital asset is included in the digital asset to be identified by default.

103: Obtain the disclosure data of each first digital asset according to the asset information of each first digital asset, and input the disclosure data of each first digital asset into the machine reading comprehension model for text segmentation to obtain at least one first text segment, Wherein, at least one first text segment is used to describe the asset distribution of each first digital asset.

Exemplarily, the asset name of each first digital asset is obtained according to the position data, and then the disclosure data of each first digital asset is obtained through crawler technology based on the asset name.

Optionally, when the first digital asset is a stock, the disclosure data of the first digital asset is the disclosure document of the stock, that is, the annual report issued by the company to which the stock belongs, and the fund distribution described in the first text paragraph is the The proportion of the sub-products of the enterprise; performing text segmentation on the annual report of the enterprise based on the machine reading comprehension model to obtain the at least one first text segment. I will describe in detail later how to segment the text of the annual report and how to obtain the proportion of green assets in stocks, so I won’t describe too much here.

Optionally, when the first digital asset is a bond (disclosed bond), the disclosure data of the first digital asset is the disclosure data of the bond, that is, the disclosure data of the bond issuer when the bond is raised, and the first The distribution of funds described in the text paragraph is the use of funds for this bond. Therefore, the at least one first text segment is obtained by performing text segmentation on the bond disclosure data through a machine reading comprehension model. I will describe in detail later how to segment the disclosed data and how to obtain the proportion of green assets in the disclosed bonds, so I won’t go into too much detail here.

104: According to the similarity model, determine the similarity between each first text segment and multiple second text segments, where the multiple second text segments are used to describe multiple fund distributions with green attributes.

Exemplarily, when calculating the proportion of green assets in stocks, the multiple funds described in the multiple second text paragraphs are distributed into multiple industries with green attributes, referred to as multiple first industries.

Exemplarily, when calculating the proportion of green assets of disclosed bonds, the multiple funds described in the multiple text paragraphs are distributed as multiple fund uses with green attributes.

105: Determine a target first text segment in at least one first text segment according to similarities between each first text segment and multiple second text segments.

Exemplarily, determine the maximum similarity among the similarities between each first text segment and multiple second text segments, and if the maximum similarity is greater than a preset threshold, use the first text segment as the target first text segment .

106: Determine the proportion of green assets in each first digital asset according to the asset distribution described in the target first text paragraph and the total amount of each first digital asset.

Exemplarily, when the first digital asset is a stock, the proportion of the sub-product described in the target first text paragraph is used as the proportion of green assets in each first digital asset. When the first digital asset is a bond, the proportion of the funds planned in the fund use described in the first text paragraph of the target to the total amount of the first digital asset is taken as the proportion of green assets in the first digital asset Compare.

107: According to the portrait of the manager of the digital asset to be identified, obtain all digital assets managed by the manager, and obtain the average proportion of green assets in digital assets that disclose asset information among all digital assets, and use the average proportion as The proportion of green assets in the second digital asset.

Exemplarily, according to the portrait of the manager of the digital asset to be identified (which can be understood as a fund manager in this application), all digital assets managed by the manager are obtained; The average proportion of assets, and take the average proportion as the proportion of green assets in the second digital asset.

Specifically, obtain all the funds managed by the fund manager; then, obtain any disclosed bond in any fund managed by the fund manager, according to the above-mentioned method of obtaining the proportion of green assets, obtain any The proportion of green assets in a disclosed bond, and then sum the proportions of green assets in all disclosed bonds in this arbitrary fund to obtain the proportion related to bonds in this arbitrary fund; finally, all managed Average the bond-related proportions in the fund to obtain the average proportion.

108: Determine the proportion of green assets in the digital assets to be identified according to the proportions of green assets in the first digital assets and the proportions of green assets in the second digital assets.

Exemplarily, to obtain the first ratio of the net value of each first digital asset relative to the net value of the digital asset to be identified, as shown in Figure 2, text recognition can be performed on the position data to obtain the first ratio; then, according to each The first proportion of the first digital asset and the proportion of the green asset determine the first proportion of the green asset in each first digital asset relative to the net value of the digital asset to be identified.

Exemplarily, the first proportion of each first digital asset can be expressed by formula (1):

is the first proportion of the i-th first digital asset among multiple first digital assets,

is the first proportion of the i-th first digital asset,

is the proportion of green assets in the i-th first digital asset.

Exemplarily, since the asset information of the second digital asset is not disclosed, it is impossible to directly obtain the second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified from the position data. However, the position data will disclose the total ratio of each first digital asset relative to the net value of the digital asset to be identified. Therefore, the second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified can be determined according to the position data and the first ratio of each first digital asset.

Exemplarily, the second ratio of the second digital asset can be expressed by formula (2):

Among them, HP ^b2 is the second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified,

is the ratio of the i-th first digital asset to the net value of the digital asset to be identified, and m is the number of multiple first digital assets.

Further, according to the second ratio of the second digital asset and the ratio of the green asset, determine the second ratio of the green asset in the second digital asset relative to the net value of the digital asset to be identified.

Exemplarily, the second proportion of the second digital asset can be expressed by formula (3):

Among them, FG ^b2 is the second proportion of the second digital asset,

is the proportion of green assets in the second digital asset.

Exemplarily, the first proportion of each first digital asset and the second proportion of the second digital asset are summed to obtain the proportion of green assets among the digital assets to be identified.

Exemplarily, the proportion of green assets in digital assets to be identified can be expressed by formula (4):

Among them, FG is the proportion of green assets among the digital assets to be identified.

In one embodiment of the present application, before determining the proportion of green assets in the digital assets to be identified, text recognition can also be performed on the position data to obtain the total amount of some of the first digital assets among the multiple first digital assets , the total amount of the second digital asset, and the total amount of the digital asset to be identified, wherein, this part of the first digital asset is the disclosed bond among the multiple first digital assets; perform text recognition on the position data, and obtain a part The total net value of the first digital asset, the total net value of the second digital asset, and the total net value of the digital asset to be identified; determine the sum of the total amount of the first digital asset and the total amount of the second digital asset, relative to the number to be identified The third ratio of the total amount of assets, that is, Pb ^v ; determine the sum of the total net value of the first digital asset and the total net value of the second digital asset, relative to the fourth ratio of the total net value of the digital asset to be identified, that is, Pb ^npv ; Determine the leverage ratio according to the third ratio and the fourth ratio, that is, determine the leverage ratio of bonds (including disclosed bonds and undisclosed bonds); for example, the leverage ratio is Pb ^v /Pb ^npv .

The reason for calculating the leverage ratio is because when calculating the proportion of green assets in bonds, the bond assets used are bond assets after leverage has been added, resulting in a relatively high proportion in the statistics. Therefore, it is necessary to remove the leverage Impact. Therefore, according to the leverage ratio, the first proportion of some of the first digital assets and the second proportion of the second digital asset are respectively deleveraged to obtain the first target proportion of some of the first digital assets and the second proportion of the second digital asset. The second target proportion; finally, the first proportion of another part of the first digital assets in the multiple first digital assets (that is, the stocks in the multiple first digital assets), the first proportion of some of the first digital assets The target proportion and the second target proportion of the second digital asset are summed to obtain the proportion of green assets in the digital assets to be identified.

Exemplarily, the green ratio of digital assets to be identified can be expressed by formula (5):

Wherein, m ₁ is the quantity of another part of the first digital asset, m ₂ is the quantity of a part of the first digital asset, m ₁ +m ₂ =m.

In one embodiment of the present application, the digital asset to be identified is any one of the multiple digital assets to be identified held by the investment institution at time t, that is, any one of the multiple funds held by the investment institution . Optionally, based on the method shown in FIG. 1 , the proportion of green assets in each digital asset to be identified among the plurality of digital assets to be identified may be determined.

Exemplarily, obtain the net value of each digital asset to be identified at time t, and the share of each digital asset to be identified held by an investment institution; according to the net value of each digital asset to be identified at time t and each The share of digital assets to be identified, and the proportion of green assets in each digital asset to be identified, determine the green scale of each digital asset to be identified held by investment institutions.

Exemplarily, the green scale of each digital asset to be identified held by an investment institution can be expressed by formula (6):

S _i =FG _i *V _i *R _i formula (6);

Among them, S _i is the green scale of the i-th unidentified digital asset held by the investment institution, FG _i is the proportion of green assets in the i-th unidentified digital asset, and V _i is described The net value of the i-th digital asset to be identified at time t, R _i is the share of the i-th digital asset to be identified held by the investment institution at time t.

Referring to FIG. 3 , FIG. 3 is a schematic flowchart of a method for identifying the proportion of green assets in stocks provided by an embodiment of the present application. The content in this embodiment is the same as that in the embodiment shown in FIG. 1 , and will not be described again here. The method of the present embodiment comprises the following steps:

301: Perform text recognition on the disclosure documents of each first digital asset to obtain the target chapters in the disclosure documents, wherein the target chapters are used to describe the main products of the company to which each first digital asset belongs, and the target chapters include target tables and the target text segment.

Wherein, the disclosure document is the annual report of the issuing company of the first digital asset for the first digital asset. Generally speaking, the chapter "I. Overview" in the chapter "Section Four Discussion and Analysis of Business Situation" in the company's annual report is used to describe the company's main products. Therefore, text recognition is performed on the disclosure document, and the chapter "Section 4 Discussion and Analysis of Business Situation" is located; then, text recognition is performed on this chapter to obtain subdivided chapters under this chapter, that is, the chapter "I. Overview", And use this subdivision chapter as the target chapter.

Exemplarily, the target section includes a first target table and a target text segment, wherein the target text segment is used to describe the main product of the enterprise to which it belongs; the target table is used to describe the main product and the turnover of the main product relative to The proportion of the total turnover of the affiliated enterprise, that is, the proportion of the main product.

It should be noted that for an enterprise, there may be one or more main products. In this application, one main product is used as an example for illustration. The situation for multiple main products is similar and will not be described again.

302: Perform entity recognition on both the target text segment and the target table to obtain the main product and the proportion of the main product, where the proportion of the main product is the ratio of the turnover of the main product to the total turnover of the affiliated enterprise .

Exemplarily, the entity recognition is performed on the target text segment, the entity related to the product is obtained, and the product corresponding to the entity is used as the main product of the enterprise to which it belongs.

For example, if the target text segment describes that the main product of the affiliated enterprise is "new energy battery", then through entity recognition, it can be obtained that the main product of the affiliated enterprise is "new energy battery".

Further, carry out entity recognition on the target table, determine the location of the "new energy battery" in the target table, and read the "new energy battery" from the table based on the location of the "new energy battery" in the target table. "Battery" turnover relative to the total turnover of the affiliated enterprise.

303: Input the target text segment into the machine reading comprehension model for text segmentation to obtain at least one first text segment, and the at least one first text segment is used to describe at least one sub-product under the main product.

Exemplarily, the machine reading comprehension (Machine Reading Comprehension, MRC) model is pre-trained, and this application does not describe the process of training the MRC model. For the text segmentation process of the present application, the problem of setting the MRC model at first is: "Which products are the sub-products (i.e. subdivided products) of the main product", and the main product is the above-mentioned entity recognition of the target text segment The main product of the MRC model, and set the article input by the MRC model as the target text segment; then, encode the question through the encoding layer of the MRC model to obtain the first vector; encode each sub-text segment in the target text segment, Obtain the second vector corresponding to each subtext segment; then, input the first vector and the second vector of each subtext segment to the interactive layer of the MRC model for interaction, and obtain the similarity between the question and each subtext segment, A subtext segment whose similarity is greater than a preset threshold is used as the at least one first text segment.

Further, by performing entity recognition on at least one first text segment, at least one sub-product under the main product can be obtained.

For example, the target text segment may describe multiple main products and sub-products under each main product. For example, if the main product described includes "new energy battery" and "wind power generation", then for the main product "new energy battery", after inputting the target text segment into the MRC model, the first output text segment is A text segment used to describe the battery, for example, at least one identified first text segment is respectively used to describe "lithium battery", "nuclear battery", and other new energy batteries.

304: According to the proportion of the main product, determine the proportion of each sub-product in the main product.

Exemplarily, according to the quantity of at least one sub-product, the proportion of the main product can be evenly split to the at least one sub-product, to obtain the proportion of each sub-product in the at least one sub-product.

It should be noted that if a certain sub-product can still be split, the sub-product can be further split, and the proportion of the sub-product can be split to finer-grained products. In this application, the main product is split once as an example, and multiple splits are not performed.

For example, if the proportion of main product A is 50%, and the main product A includes sub-product b and sub-product c, then the proportion of sub-product b and sub-product c are both 25%. Further, if sub-product b includes sub-product d and sub-product e, the proportion of sub-product b can be divided equally, and the proportions of sub-product d and sub-product e are 12.5% and 12.5% respectively.

305: According to the similarity model, determine the similarity between each first text segment and multiple second text segments, wherein the multiple second text segments describe a plurality of products as products with a green attribute.

Exemplarily, the first preset document is obtained, for example, the first preset document may be "Explanation of the Green Industry Guidance Catalog", and the products recorded in the first preset document all have green attributes; The default document performs entity recognition to obtain the industry (that is, the product) recorded in the preset document; the read product is regarded as a product with green attributes.

In an embodiment of the present application, when the first preset document records products, it may not directly record products with green attributes, but record products with green attributes through document references through other documents. Therefore, firstly, text recognition is performed on the first preset document to obtain a plurality of third text segments, wherein the plurality of third text segments are used to describe the products described in the first preset document, but a certain third text segment When describing a product, it does not directly describe the product, but refers to other documents describing the product. Therefore, if any third text segment in multiple third text segments refers to other documents, text recognition is performed on other documents to obtain a fourth text segment corresponding to the third text segment, wherein the fourth text segment is other documents The text used to describe the product with the green attribute in , and perform entity recognition on the fourth text segment to obtain the product described by the fourth text segment; therefore, multiple third text segments and the referenced fourth text segment can be used as the multiple second text paragraphs, and the products described by multiple third text paragraphs and the product described by the fourth text paragraph are all the products with green attributes.

Exemplarily, the similarity model is obtained by training multiple pairs of target training samples constructed in advance. The process of constructing multiple pairs of target training samples and the model training process will be described in detail later, and no further description will be given here. In one embodiment of the present application, the similarity model may be a RoFormer model.

Therefore, input each first text segment and each second text segment into the RoFormer model to obtain the similarity between each sub-first text segment and each second text segment.

306: Determine a target first text segment in at least one first text segment according to similarities between each first text segment and multiple second text segments.

Exemplarily, according to the similarities between each first text segment and multiple second text segments, determine the maximum similarity corresponding to each first text segment, and if the maximum similarity is greater than the similarity threshold, the The first text segment is used as the target first text segment, that is, it is determined that the sub-product described by the target first text segment is the product with the green attribute described by the second text segment corresponding to the maximum similarity.

307: Determine the proportion of green assets in each first digital asset according to the asset distribution described in the target first text paragraph and the total amount of each first digital asset.

Exemplarily, the proportion of the sub-product described in the target first text paragraph is used as the proportion of green assets in each first digital asset. It should be noted that the number of the target first text segment may be one or more, that is to say, one or more sub-products in the at least one sub-product have a green attribute.

Exemplarily, when the number of target sub-text segments is multiple, the proportions of the sub-products described by the multiple target sub-text segments are summed, and the summation result is used as the green asset in each first digital asset proportion.

Referring to FIG. 4 , FIG. 4 is a schematic flowchart of a similarity model training method provided by an embodiment of the present application. The content in this embodiment is the same as that in the embodiment shown in FIG. 3 , and will not be described again here. The method of the present embodiment comprises the following steps:

401: Get the second preset document, the products recorded in the second preset document include products with green attributes and products with non-green attributes.

Exemplarily, the second preset document is obtained through crawler technology, for example, the second preset document may be "2017 National Economic Industry Classification Catalog 2021 Revised First Edition". All current products on the market are recorded in the second preset document. Therefore, the products recorded in the second default document include products with green attributes and products with non-green attributes.

402: Perform text recognition on the second preset document to obtain multiple fifth text segments, where the multiple fifth text segments are used to describe products recorded in the second preset document.

Exemplarily, entity recognition is performed on the second preset document to obtain each product recorded in the second preset document; text segments describing each product are extracted from the second preset document through text recognition to obtain multiple fifth text segment.

403: Construct multiple pairs of target training samples according to multiple fifth text segments and multiple second text segments.

Exemplarily, synonym replacement is performed on entities in each second text segment in multiple second text segments to obtain a sixth text segment corresponding to each second text segment; then, each second text segment , and the sixth text segment corresponding to the second text segment is used as a pair of training samples to obtain multiple pairs of first training samples. In this application, multiple pairs of first training samples may also be referred to as multiple pairs of similar samples.

It should be noted that after constructing multiple pairs of first training samples, during the training process, the distance between the two training samples in a pair of first training samples is relatively close, so that after constructing multiple pairs of first training samples, you can Let the model identify some seemingly different industries on the surface, but they are actually the same green industry, so that green industries with diversified text expressions can be accurately identified.

Exemplarily, a plurality of target fifth text segments among the plurality of fifth text segments are eliminated to obtain a plurality of seventh text segments, wherein the products described in the plurality of target fifth text segments are different from those described in the plurality of second text segments The products are the same, and the multiple target fifth text segments are in one-to-one correspondence with the multiple second text segments.

Specifically, the multiple fifth text segments are subtracted from the multiple second text segments to obtain the multiple seventh text segments. Wherein, the difference set referred to in this application is essentially the difference set of the industry described by the text paragraphs, that is, the target fifth text paragraphs are removed from multiple fifth text paragraphs to obtain the multiple seventh text paragraphs.

It should be understood that by making a difference between the plurality of fifth text segments and the plurality of second text segments, the products described in the obtained plurality of seventh text segments are all products with non-green attributes.

Further, determine the second text segment corresponding to each seventh text segment, wherein the product described in the seventh text segment is the same as the product described in the second text segment, but the product described in the seventh text segment has a non-green color attribute, while the product described by the second text paragraph has a green attribute. For example, the product described in the second text paragraph is "energy-saving industrial boiler", while the product described in the seventh text paragraph is "industrial boiler". It can be seen that the products described in these two text paragraphs are both boilers, but "energy-saving industrial boilers" have green attributes, while "industrial boilers" have non-green attributes. Therefore, these two text segments can be used as a pair of training samples. Therefore, the seventh text segment and the second text segment corresponding to the seventh text segment are used as a pair of training samples to obtain multiple pairs of second training samples. In this application, multiple pairs of second training samples may be referred to as multiple pairs of dissimilar samples.

It should be explained that the reason for constructing dissimilar samples is that the model needs to recognize product names that appear to have similar expressions, but are actually products with different attributes, and learn which keywords among these similar product names are genuine. Related to green attributes, for example, the above-mentioned "energy-saving industrial boilers" and "industrial boilers", during the training process, the model can be made to remember that only boilers with "energy-saving" are products with green attributes. In this way, it can be identified that in such similar expressions, "energy-saving" is the key word closely related to green attributes.

Finally, multiple pairs of first training samples and multiple pairs of second training samples are used as the multiple pairs of target training samples.

404: Train the initial model according to multiple pairs of target training samples to obtain a similarity model.

Exemplarily, each training sample in each pair of target training samples among multiple pairs of target training samples is respectively input into the initial model to obtain a feature vector of each training sample, wherein the feature vector is used to determine the The probability that the described product has a green attribute; then, according to the feature vector of each training sample and the label of each training sample, the first loss corresponding to each training sample is determined, wherein the label of each training sample is used to identify The truth about whether the product described by each training sample has the green attribute. It should be understood that for similar samples, the labels of the two training samples in each pair of similar samples are the same, and for dissimilar samples, the labels of the two training samples in each pair of dissimilar samples are different.

Specifically, according to the feature vector of each training sample, the classifier of the initial model determines the probability that the product described by each training sample has the green attribute; according to the probability of the product described by each training sample having the green attribute and each labels of training samples, and determine the first loss corresponding to each training sample.

Further, according to the feature vector of each training sample, determine the second loss of each pair of target training samples, that is, according to the feature vectors of the two training samples in each pair of target training samples, determine the similarity between the two training samples degree, and use this similarity degree as the second loss for each pair of target samples.

Finally, according to the first loss of each training sample in each pair of target training samples and the corresponding second loss of each pair of target training samples, the initial model is trained to obtain the similarity model.

Specifically, firstly, according to the first loss of each training sample in each pair of target training samples, the first target loss of the initial model in the process of classifying the green attributes is determined. Exemplarily, weighted summation is performed on the first losses of all training samples in multiple pairs of target training samples to obtain the first target loss.

Exemplarily, the first target loss can be expressed by formula (7):

L ₁ is the first target loss, avg is the averaging operation, n is the number of pairs of first training samples, m is the number of pairs of second training samples, W is the weight of the classifier of the initial model, f _t ' is The t-th training sample among all the training samples in the multi-pair target training samples (ie 2(n+m)) training samples, l _t is the label of the t-th training sample.

Specifically, according to the second loss of each pair of target training samples, the loss of the initial model in the process of feature extraction for each pair of first training samples is determined to obtain the second target loss. Exemplarily, the second loss of each pair of first training samples is obtained, and the second loss of multiple pairs of first training samples is averaged to obtain the second target loss. Exemplarily, the second target loss can be expressed by formula (8):

Among them, L _sim is the second target loss, avg is the averaging operation, n is the number of pairs of first training samples, S _i is the i-th pair of first training samples in n pairs of first training samples,

is the eigenvector of a training sample in the ith pair of first training samples,

is the feature vector of another training sample in the i-th pair of first training samples, and |||| ₂ is an operation for calculating the similarity (distance) between the vectors.

Specifically, according to the second loss of each pair of target training samples, the loss of the initial model in the process of feature extraction for each pair of second training samples is determined to obtain the third target loss. Exemplarily, the second loss of each pair of second training samples is obtained, and the second loss of multiple pairs of second training samples is averaged to obtain the third target loss. Exemplarily, the third target loss can be expressed by formula (9):

Among them, L _dissim is the third target loss, avg is the averaging operation, m is the number of pairs of second training samples, S _j is the jth pair of first training samples in m pairs of second training samples,

is the feature vector of a training sample in the jth pair of second training samples,

is the feature vector of another training sample in the jth pair of second training samples, and |||| ₂ is an operation for calculating the similarity (distance) between the vectors.

Finally, a fourth target loss is determined according to the second target loss and the third target loss. Exemplarily, the fourth target loss is expressed by formula (10):

Among them, L ₄ is the fourth loss, and k is a preset stability parameter, which is used to prevent the fourth target loss L ₄ from being zero when L _sim is 0, thereby preventing model degradation.

The reason why the loss function of formula (10) is set is because in the process of constructing training sample pairs, it is determined that the second target loss L _sim needs to be optimized towards a relatively small direction, and the third target loss L _dissim needs to be optimized towards a relatively large direction to optimize, so the simple weighted summation cannot unify the two. After the loss function of formula (10) is set, only optimize towards the direction of the fourth target loss L ₄ which is relatively small, which can meet the optimization requirements of the second target loss L _sim and the third target loss L _dissim , thereby satisfying the entire Optimization requirements for the backpropagation process.

Finally, the fourth target loss and the first target loss are weighted to obtain the final target loss; the initial model is reversely updated based on the target loss and the gradient descent method until the initial model converges to obtain the similarity model.

In the first embodiment of the present application, when constructing similar training samples, in addition to synonym replacement, sentence pattern replacement can also be performed. Exemplarily, entity recognition is performed on multiple second text segments to obtain multiple target entities, wherein the multiple target entities are in one-to-one correspondence with multiple second text segments, that is, extracted from multiple second text segments A plurality of target entities used to describe the plurality of first products. Then, each second text segment and the target entity extracted from each second text segment are used as a pair of training samples to obtain multiple pairs of similar samples, thus constructing similar samples containing different sentence patterns. For example, "this bond will be used to repay the loan of the previous hydropower station construction project", then the second text segment and "hydropower station" will be used as a pair of similar samples. The reason for constructing such similar samples is to let the model learn " This bond will be used to repay the previous hydropower station construction project loan" and "hydropower station" are identified as green products, so constructing this similar sample can make the model not be affected by sentence patterns during the learning process, and only care about what is really related to Words related to the green attribute, thereby improving the recognition accuracy of the model.

In one embodiment of the present application, when constructing dissimilar samples, for each second text segment, a target entity is randomly selected from the remaining target entities, and the second text segment is used as a pair of dissimilar samples, which can be Multiple pairs of dissimilar samples are constructed, wherein the remaining target entities are all entities in the multiple target entities except the target entity of the second text segment. For example, by randomly replacing the above-mentioned "hydropower station" with a target entity, such as "wind station", "other project construction", etc., multiple pairs of dissimilar samples can be constructed. Constructing such dissimilar samples allows the model to learn that what needs to be paid attention to is the entity in the sentence pattern. For this dissimilar data entity, it needs to be classified into different products. As a result, the model recognizes "this bond will be used to repay the previous hydropower station construction project loan" and "wind power station" and "other project construction" as products with different attributes, so that the most similar situation can be accurately matched in such a similar situation The most popular industry is hydropower stations, which can accurately match entities, thereby improving the recognition accuracy of the model.

Referring to FIG. 5 , FIG. 5 is a schematic flowchart of a method for identifying the proportion of green assets in a bond provided by an embodiment of the present application. The content in this embodiment is the same as the embodiment shown in FIG. 1 , FIG. 3 , and FIG. 4 , and will not be described again here. The method of the present embodiment comprises the following steps:

501: Input the disclosure data of each first digital asset into the machine reading comprehension model for text segmentation to obtain multiple first text segments, wherein the multiple first text segments are used to describe multiple fund uses of each first digital asset .

It should be noted that the first digital asset here is a part of the first digital assets of the multiple first digital assets, that is, the disclosed bonds among the multiple first digital assets.

First, before determining the proportion of green assets in each first digital asset, it is possible to determine whether the first digital asset has green attributes as a whole. If it is determined that the first digital asset does not have green attributes, the first digital asset can be directly determined The proportion of green assets in assets is 0, and if it is determined that the second digital asset has green attributes, then determine the proportion of green assets in the first digital asset.

The following describes in detail the process of how to determine whether the first digital asset has green attributes.

Exemplarily, according to the above position data, determine the asset name of each first digital asset, that is, the bond name; then carry out keyword identification on the asset name of each first digital asset to obtain the first keyword, wherein, the The number of first keywords is one or more; finally, if the first keyword is a keyword in the preset keyword set, it is determined that the first digital asset has a green attribute. The preset keyword set is a set of keywords that have green attributes and are related to bonds, that is, a set of keywords obtained by extracting keywords from the bond names of each green bond. For example, the preset keyword A set of words may include: "green bond", "carbon neutral", "energy efficient", etc. That is, determine whether each bond has green attributes from the bond name, that is, determine whether each bond is a green bond.

Exemplarily, according to the position data above, determine the company to which each first digital asset belongs, that is, identify the issuing company of each bond from the position data; then, determine the industry to which the company belongs, for example, the The industry to which the main business product of the affiliated enterprise belongs shall be the industry to which the affiliated enterprise belongs. Finally, it is determined whether the industry to which it belongs belongs to an industry in a preset industry set, and if so, it is determined that the first digital asset has a green attribute, wherein the preset industry set is a set composed of industries with green attributes. Specifically, a preset document can be obtained, such as "Green Bond Support Project Catalogue", and then entity extraction can be performed on the preset document to obtain one or more green industries related to green, such as public transportation, sewage treatment, etc. ; Then, combine these green industries into a set to get the preset industry set. That is to determine whether the bond is a green bond from the industry to which the bond belongs.

For example, if the disclosed data of the first digital asset is: the type of the bond is "Guangzhou Metro Group Co., Ltd. 2020 Phase II Super-short-term Financing Bond", then it can be determined from the disclosed data that the issuing company of the bond is Guangzhou Metro Group Co., Ltd., and the industry of the issuing company is public transportation. Since public transportation is an industry in the preset industry set, it is determined that the first digital asset has a green attribute.

Exemplarily, text recognition is performed on the disclosure data of each first digital asset, and a sixth text segment is identified from the disclosure data, wherein the sixth text segment is the first digital asset described in the disclosure data of the first digital asset. A text segment for multiple funding purposes for an asset. That is, through text positioning, find the text segment describing each fund use of the bond in the disclosed data, and then extract the text segment of each fund use from the disclosed data to obtain the sixth text segment; further, for the sixth text segment Perform semantic information extraction to obtain a third feature vector of the sixth text segment; then, predict the probability that the second digital asset has a green attribute according to the third feature vector; if the probability is greater than a second threshold, determine the second number Assets have green properties.

In one embodiment of the present application, the above-mentioned method of determining whether the second digital asset has a green attribute can be realized through a trained model, which can be fasttext, textCNN, BERT model, etc., and this application does not limit it . Specifically, the text used to describe the use of funds is extracted from the bond sample, and the extracted text is used as a sample, and a label is added to the sample, and the label is used to identify whether the bond sample has a green attribute. It should be understood that when selecting bond samples, bond samples with green attributes and non-green attributes should be selected respectively to ensure that the constructed samples contain positive samples and negative samples; then, based on the extracted samples and the labels of the samples Carry out model training to obtain a prediction model for predicting whether a bond has a green attribute; finally, use the prediction model to extract semantic information from the sixth text segment to obtain the third feature vector of the sixth text segment, and pass the prediction The model processes the third feature vector to predict the probability that the second digital asset has a green attribute.

It should be noted that in practical applications, the name of the bond or the industry to which the bond belongs can be given priority to determine whether the bond has green attributes. .

It should be understood that after it is determined that each first digital asset has a green attribute, the proportion of green assets in each second digital asset can be identified. Exemplarily, a machine reading comprehension (Machine Reading Comprehension, MRC) model is trained in advance, and then the disclosure data of each first digital asset is input into the MRC model for text segmentation to obtain at least one first text segment.

Specifically, first set the problem to be solved by MRC as "which texts are used to describe the use of funds", and the input article is the disclosure data of each first digital asset; then, the problem is solved through the coding layer of the MRC model Encoding to obtain the first vector; encoding each text segment in the disclosed data through the encoding layer of the MRC model to obtain a second vector corresponding to each text segment; then, inputting the first vector and the second vector of each text segment Interact with the interaction layer of the MRC model to obtain the similarity between the question and each text segment, and use the text segment whose similarity is greater than the preset threshold as the at least one first text segment.

For example, by performing text segmentation on the disclosure data of each first digital asset through the MRC model, at least one first text segment as shown in Table 1 can be obtained.

Table 1:

502: Input each first text segment into a semantic information extraction model to extract semantic information, and obtain a first feature vector of each first text segment.

Wherein, the semantic information extraction model is pre-trained. The training process of the semantic information extraction model is described below.

Exemplarily, a training sample is constructed first. For example, extract text segments related to the use of funds from the disclosure data of multiple bonds, and label each text segment, where the label is used to identify the fact that the use of funds described in the text segment has a green attribute , where the use of the funds can be for green industries or non-green industries. For example, the purpose of funds shown in Table 1: "for the construction of the Yalong River Kara Hydropower Station" is used for the industrial project "Construction of the Yalong River Kara Hydropower Station", then this purpose of funds has a green attribute, that is, the purpose of funds For the green industry; then, each labeled text segment is used as a training sample. Further, construct initial model, wherein, this initial model can be Bert model, and it comprises semantic information extraction model and multilayer perceptron (Multilayer Perceptron, MLP), wherein, the model parameter of this semantic information extraction model and multilayer perceptron are obtained by random initialization; then the training samples are input into the semantic information extraction model for semantic information extraction, and the fourth feature vector of the training sample is obtained; the fourth feature vector is input into the multi-layer perceptron, and the training sample belongs to the The probability of the industry with green attributes; finally, according to the probability that the training sample belongs to the industry with green attributes and the label of the training sample, the initial model is trained, that is, the semantic information extraction model and the model parameters of the multi-layer perceptron Adjustment is made to obtain the target model, and the multi-layer perceptron in the target model is deleted to obtain the semantic information extraction model.

Exemplarily, each first text segment may be input into a semantic information extraction model for semantic information extraction to obtain a first feature vector of each first text segment.

In practical applications, after obtaining the target model, the target model may not be deleted, and the entire target model may be retained directly; then, each fifth text segment is input into the target model for probability prediction, and each fifth text segment The probability that the described fund use belongs to the green industry, if the probability is greater than the probability threshold, the fifth text segment is determined to be the target fifth text segment, and the target first text segment can be directly determined without similarity calculation. Improve the identification efficiency of the proportion of green assets.

503: Input each second text segment into the semantic information extraction model for semantic information extraction, and obtain the second feature vector of each second text segment, wherein, multiple second text segments are used to describe multiple first industries, multiple The primary industry is an industry with green attributes.

Exemplarily, multiple industries with green attributes, ie green industries, are obtained. Specifically, the entity (the entity is an industry) is identified on the PDF document of the "Green Bond Support Project Catalogue", multiple industries are obtained, and the multiple industries are regarded as the multiple primary industries, and the user information is extracted from the PDF document. A plurality of second text segments used to describe the plurality of first industries, wherein the plurality of second text segments are used to describe the plurality of first industries; similarly, each second text segment is input into the above-mentioned semantic information The extraction model performs semantic information extraction to obtain a second feature vector of each second text segment.

504: According to the first feature vector of each first text segment and the second feature vector of each second text segment, determine the similarity between each first text segment and multiple second text segments.

Exemplarily, the similarity between the first feature vector of each first text segment and the second feature vector of each second text segment can be determined, for example, the similarity can be obtained by the Euclidean formula between the two feature vectors distance representation, and use the similarity between two feature vectors as the similarity between each first text segment and each second text segment.

505: Determine a target first text segment among the multiple first text segments according to the similarity between each first text segment and each second text segment.

Exemplarily, according to the similarity between each first text segment and each second text segment, the maximum similarity corresponding to each first text segment is determined, and if the maximum similarity is greater than a threshold, the first text segment as the target first text segment. Specifically, if the maximum similarity is greater than the threshold, it means that the industry to which the fund use described in the first text paragraph belongs is the first industry described in the second text paragraph corresponding to the maximum similarity, that is, the industry supported by the fund use. The industry is a green industry, therefore, it can be determined that the use of funds has green attributes.

506: Use the ratio of the planned fund amount in the fund use described in the first text paragraph of the target to the total amount of each first digital asset as the proportion of green assets in each first digital asset.

Exemplarily, obtain the amount of funds planned in the use of funds described in the first text paragraph of the target, and obtain the total amount of each first digital asset, that is, obtain the total size of the first digital asset; then, the The proportion of the planned fund amount in the fund use described in the first text paragraph of the target to the total amount is used as the proportion of green assets in each first digital asset.

It should be noted that the number of the target first text segment is one or more, that is to say, the industries to which multiple fund uses among the multiple fund uses of each first digital asset have green attributes. Then, the proportion of the fund amount planned in the fund use described in the first text paragraph of each target to the total amount of each first digital asset can be used as the green ratio corresponding to the first text paragraph of each target; then, for each The sum of the green proportions of the first text segment of a target is obtained to obtain the proportion of green assets in the first digital asset.

Referring to FIG. 6 , FIG. 6 is a block diagram of functional units of a device for identifying the proportion of green assets provided by an embodiment of the present application. The device 600 for identifying the proportion of green assets includes: an acquisition unit 601 and a processing unit 602;

An acquisition unit 601, configured to acquire position data of digital assets to be identified;

The processing unit 602 is configured to perform text recognition on the acquired position data of the digital assets to be identified to obtain a plurality of first digital assets and second digital assets, wherein each of the first digital assets is disclosed in the position data The asset information of the second digital asset is not disclosed in the position data;

The obtaining unit 601 is further configured to obtain the disclosure data of each of the first digital assets according to the asset information of each of the first digital assets;

The processing unit 602 is further configured to input the disclosure data of each of the first digital assets into the machine reading comprehension model for text segmentation to obtain at least one first text segment, wherein the at least one first text segment is used to describe each asset distribution of the first digital asset;

In some possible implementations, when the disclosure data of each of the first digital assets is the annual report of the enterprise to which each of the first digital assets belongs, the asset distribution of each of the first digital assets is as follows: The proportion of the sub-products of the enterprise to which the digital asset belongs, the distribution of funds described in each of the second text paragraphs is a product with green attributes; after inputting the disclosed data of each of the first digital assets into the machine reading comprehension model for text In terms of segmenting and obtaining at least one first text segment, the processing unit 602 is specifically used for:

Perform text recognition on the annual report to obtain target chapters in the annual report, wherein the target chapters are used to describe the main products of the companies to which each of the first digital assets belongs, and the target chapters include target tables and the target text segment;

Inputting the target text segment into the machine reading comprehension model for text segmentation to obtain the at least one first text segment, each of the first text segments is used to describe a sub-product of the main product;

In terms of determining the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, the processing unit 602 specifically uses At:

Entity identification is performed on both the target text segment and the target form to obtain the proportion of the main product, where the proportion of the main product is the turnover of the main product and the value of the affiliated enterprise The ratio of the total turnover of

Determine the proportion of each sub-product in the main product according to the proportion of the main product;

Determine the proportion of the sub-product described in the target first text paragraph according to the proportion of each of the sub-products;

Determine the proportion of green assets in each of the first digital assets according to the proportion of the sub-products described in the target first text segment.

In some possible implementation manners, before determining the similarity between each of the first text segments and multiple second text segments according to the similarity model, the acquiring unit 601 is further configured to acquire the first preset document, The products recorded in the first preset document all have green attributes;

The processing unit 602 is further configured to perform text recognition on the first preset document to obtain multiple third text segments, wherein the multiple third text segments are used to describe the product;

If any third text segment in the plurality of third text segments refers to other documents, perform text recognition on the other documents to obtain a fourth text segment corresponding to any one of the third text segments, wherein, The fourth text segment is the text used to describe products with green attributes in the other documents;

Using the plurality of third text segments and a fourth text segment corresponding to any one of the third text segments as the plurality of second text segments;

performing entity extraction on each of the plurality of second text segments respectively to obtain a plurality of target entities;

Using any one of the second text segments in the plurality of second text segments and the target entity extracted from the any one of the second text segments as a pair of training samples to obtain multiple pairs of first training samples;

Randomly select a target entity from other target entities other than the target entity corresponding to the arbitrary second text segment among the plurality of target entities, and combine the randomly selected target entity with the arbitrary second text segment As a pair of training samples, multiple pairs of second training samples are obtained;

using the multiple pairs of first training samples and the multiple pairs of second training samples as multiple pairs of target training samples;

The initial model is trained according to the multiple pairs of target training samples to obtain the similarity model.

In some possible implementations, when the asset distribution of each of the first digital assets is the fund use of each of the first digital assets, the fund distribution described in each of the second text paragraphs is a fund use with a green attribute ; In terms of determining the similarity between each of the first text segments and multiple second text segments according to the similarity model, the processing unit 602 is specifically used for:

Inputting each of the first text segments into the semantic information extraction model to extract the semantic information to obtain a first feature vector of each of the first text segments;

Inputting each of the second text segments into the semantic information extraction model for semantic information extraction to obtain a second feature vector of each of the second text segments;

According to the first feature vector of each of the first text segments and the second feature vector of each of the second text segments, determine the similarity between each of the first text segments and a plurality of second text segments;

The ratio of the planned fund amount in the fund use described in the first text paragraph of the target to the total amount of each of the first digital assets is taken as the proportion of green assets in each of the first digital assets.

In some possible implementation manners, according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets, the green color in the digital assets to be identified is determined. In terms of the proportion of assets, the processing unit 602 is specifically used for:

Obtain a first ratio of the net value of each of the first digital assets relative to the net value of the digital asset to be identified;

According to the first proportion of each of the first digital assets and the proportion of green assets, determine the first proportion of the green assets of each of the first digital assets relative to the net value of the digital assets to be identified;

Determine a second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified according to the position data and the second ratio of each of the first digital assets;

According to the second proportion of the second digital asset and the proportion of the green asset, determine the second proportion of the green asset of the second digital asset relative to the net value of the digital asset to be identified;

Summing the first ratio of each of the first digital assets and the second ratio of the second digital asset to obtain the ratio of green assets in the digital assets to be identified.

In some possible implementation manners, before summing the first proportion of each of the first digital assets and the second proportion of the second digital asset, the processing unit 602 is further configured to calculate the position data performing text recognition to obtain the total amount of some of the first digital assets among the plurality of first digital assets, the total amount of the second digital assets, and the total amount of the digital assets to be identified;

performing text recognition on the position data to obtain the total net value of the part of the first digital asset, the total net value of the second digital asset, and the total net value of the digital asset to be identified;

determining a third ratio of the sum of the total amount of the part of the first digital asset and the total amount of the second digital asset relative to the total amount of the digital asset to be identified;

determining a fourth ratio of the sum of the total net value of the part of the first digital asset and the total net value of the second digital asset relative to the total net value of the digital asset to be identified;

determining the leverage ratio according to the third ratio and the fourth ratio;

According to the leverage ratio, deleveraging is performed on the first ratio of the part of the first digital asset and the second ratio of the second digital asset to obtain the first target ratio of the part of the first digital asset and the second target ratio of the second digital asset;

In terms of summing the first proportion of each of the first digital assets and the second proportion of the second digital asset to obtain the proportion of green assets in the digital assets to be identified, the processing unit 602, Specifically for:

The first proportion of another part of the first digital assets in the plurality of first digital assets, the first target proportion of the part of the first digital assets, and the second target proportion of the second digital assets The sum is obtained to obtain the proportion of green assets in the digital assets to be identified.

Referring to FIG. 7, FIG. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in FIG. 7 , an electronic device 700 includes a transceiver 701 , a processor 702 and a memory 703 . They are connected through a bus 704 . The memory 703 is used to store computer programs and data, and can transmit the data stored in the memory 703 to the processor 702 .

The processor 702 is used to read the computer program in the memory 703 to perform the following operations:

Controlling the transceiver 701 to obtain the position data of the digital asset to be identified;

Controlling the transceiver 701 to obtain the disclosure data of each of the first digital assets according to the asset information of each of the first digital assets;

Inputting the disclosure data of each of the first digital assets into the machine reading comprehension model for text segmentation to obtain at least one first text segment, wherein the at least one first text segment is used to describe each of the first digital assets asset distribution;

In some possible implementations, when the disclosure data of each of the first digital assets is the annual report of the enterprise to which each of the first digital assets belongs, the asset distribution of each of the first digital assets is as follows: The proportion of the sub-products of the enterprise to which the digital asset belongs, the distribution of funds described in each of the second text paragraphs is a product with green attributes; after inputting the disclosed data of each of the first digital assets into the machine reading comprehension model for text In terms of segmenting and obtaining at least one first text segment, the processor 702 is specifically configured to perform the following operations:

After determining the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, the processor 702 is specifically used to execute Do the following:

In some possible implementations, according to the similarity model, before determining the similarity between each of the first text segments and multiple second text segments, the processor 702 is further configured to perform the following operations:

Controlling the transceiver 701 to obtain a first preset document, the products recorded in the first preset document all have green attributes;

performing text recognition on the first preset document to obtain a plurality of third text segments, wherein the plurality of third text segments are used to describe the products recorded in the first preset document;

In some possible implementations, when the asset distribution of each of the first digital assets is the fund use of each of the first digital assets, the fund distribution described in each of the second text paragraphs is a fund use with a green attribute ; In terms of determining the similarity between each of the first text segments and multiple second text segments according to the similarity model, the processor 702 is specifically configured to perform the following operations:

In terms of determining the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, the processor 702 specifically uses to do the following:

In some possible implementation manners, according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets, the green color in the digital assets to be identified is determined. In terms of the proportion of assets, the processor 702 is specifically configured to perform the following operations:

In some possible implementation manners, before summing the first percentages of the first digital assets and the second percentages of the second digital assets, the processor 702 is further configured to perform the following operations:

performing text recognition on the position data to obtain the total amount of some of the first digital assets among the plurality of first digital assets, the total amount of the second digital assets, and the total amount of the digital assets to be identified;

In terms of summing the first proportion of each of the first digital assets and the second proportion of the second digital asset to obtain the proportion of green assets in the digital assets to be identified, the processor 702 specifically Used to do the following:

Specifically, the above-mentioned transceiver 701 may be the acquisition unit 601 of the green ratio recognition device 600 of the embodiment shown in FIG. 6, and the above-mentioned processor 702 may be the processing unit 602 of the green ratio recognition device 600 of the embodiment shown in FIG. 6 .

It should be understood that the electronic devices in this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, palmtop computers, notebook computers, mobile Internet devices MID (Mobile Internet Devices, referred to as: MID) or wearable devices, etc. The above-mentioned electronic devices are only examples, not exhaustive, including but not limited to the above-mentioned electronic devices. In practical applications, the above-mentioned electronic devices may also include: smart vehicle-mounted terminals, computer equipment, and the like.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to realize any text-based recognition as described in the above-mentioned method embodiments Part or all of the steps in the identification method for the proportion of green assets in digital assets. The computer-readable storage medium may be non-volatile or volatile.

The embodiment of the present application also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to enable the computer to execute the method described in the above method embodiments Part or all of the steps of any method for identifying the proportion of green assets in digital assets based on text recognition.

The embodiments of the present application have been introduced in detail above, and specific examples have been used in this paper to illustrate the principles and implementation methods of the present application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; meanwhile, for Those skilled in the art will have changes in specific implementation methods and application scopes based on the ideas of the present application. In summary, the contents of this specification should not be construed as limiting the present application.

Claims

A method for identifying the proportion of green assets in digital assets based on text recognition, including:

performing text recognition on the acquired position data of digital assets to be identified, and obtaining a plurality of first digital assets and second digital assets, wherein the asset information of each of the first digital assets is disclosed in the position data, and the The asset information of the second digital asset is not disclosed in the position data;

According to the asset information of each of the first digital assets, obtain the disclosure data of each of the first digital assets, and input the disclosure data of each of the first digital assets into a machine reading comprehension model for text segmentation, to obtain at least one first digital asset a text segment, wherein the at least one first text segment is used to describe the asset distribution of each of the first digital assets;

According to the similarity model, determine the similarity between each of the first text segments and a plurality of second text segments, wherein the plurality of second text segments are used to describe a plurality of capital distributions with green attributes;

determining a target first text segment in the at least one first text segment according to the similarity between each of the first text segments and the plurality of second text segments;

Determine the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets;

According to the portrait of the manager of the digital asset to be identified, obtain all the digital assets managed by the manager, and obtain the average proportion of green assets among the digital assets whose asset information is disclosed among all the digital assets, and The average proportion is taken as the proportion of green assets in the second digital asset;

Determine the proportion of green assets in the digital assets to be identified according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets.
The method according to claim 1, wherein,

When the disclosed data of each of the first digital assets is the annual report of the enterprise to which each of the first digital assets belongs, the asset distribution of each of the first digital assets is the sub-product of the enterprise to which each of the first digital assets belongs Proportion, the funds described in each of the second text paragraphs are distributed as products with green attributes;

The disclosure data of each of the first digital assets is input into the machine reading comprehension model for text segmentation to obtain at least one first text segment, including:

Perform text recognition on the annual report to obtain target chapters in the annual report, wherein the target chapters are used to describe the main products of the companies to which each of the first digital assets belongs, and the target chapters include target tables and the target text segment;

Inputting the target text segment into the machine reading comprehension model for text segmentation to obtain the at least one first text segment, each of the first text segments is used to describe a sub-product of the main product;

According to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, determining the proportion of green assets in each of the first digital assets includes:

Entity identification is performed on both the target text segment and the target form to obtain the proportion of the main product, where the proportion of the main product is the turnover of the main product and the value of the affiliated enterprise The ratio of the total turnover of

Determine the proportion of each sub-product in the main product according to the proportion of the main product;

Determine the proportion of the sub-product described in the target first text paragraph according to the proportion of each of the sub-products;

Determine the proportion of green assets in each of the first digital assets according to the proportion of the sub-products described in the target first text segment.
The method according to claim 2, wherein, according to the similarity model, before determining the similarity between each of the first text segments and a plurality of second text segments, the method further comprises:

Obtaining a first preset document, the products recorded in the first preset document all have green attributes;

performing text recognition on the first preset document to obtain a plurality of third text segments, wherein the plurality of third text segments are used to describe the products recorded in the first preset document;

If any third text segment in the plurality of third text segments refers to other documents, perform text recognition on the other documents to obtain a fourth text segment corresponding to any one of the third text segments, wherein, The fourth text segment is the text used to describe products with green attributes in the other documents;

Using the plurality of third text segments and the fourth text segment corresponding to any one of the third text segments as the plurality of second text segments;

performing entity extraction on each of the plurality of second text segments respectively to obtain a plurality of target entities;

Using any one of the second text segments in the plurality of second text segments and the target entity extracted from the any one of the second text segments as a pair of training samples to obtain multiple pairs of first training samples;

Randomly select a target entity from other target entities other than the target entity corresponding to the arbitrary second text segment among the plurality of target entities, and combine the randomly selected target entity with the arbitrary second text segment As a pair of training samples, multiple pairs of second training samples are obtained;

using the multiple pairs of first training samples and the multiple pairs of second training samples as multiple pairs of target training samples;

The initial model is trained according to the multiple pairs of target training samples to obtain the similarity model.
The method according to claim 1, wherein,

When the asset distribution of each of the first digital assets is the fund use of each of the first digital assets, the fund distribution described in each of the second text paragraphs is a fund use with a green attribute;

According to the similarity model, determining the similarity between each of the first text segments and a plurality of second text segments respectively includes:

Inputting each of the first text segments into the semantic information extraction model to extract the semantic information to obtain a first feature vector of each of the first text segments;

Inputting each of the second text segments into the semantic information extraction model for semantic information extraction to obtain a second feature vector of each of the second text segments;

According to the first feature vector of each of the first text segments and the second feature vector of each of the second text segments, determine the similarity between each of the first text segments and a plurality of second text segments;

According to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, determining the proportion of green assets in each of the first digital assets includes:

The ratio of the planned fund amount in the fund use described in the first text paragraph of the target to the total amount of each of the first digital assets is taken as the proportion of green assets in each of the first digital assets.
The method according to claim 4, wherein, according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets, the number to be identified is determined Proportion of green assets in assets, including:

Obtain a first ratio of the net value of each of the first digital assets relative to the net value of the digital asset to be identified;

According to the first proportion of each of the first digital assets and the proportion of green assets, determine the first proportion of the green assets of each of the first digital assets relative to the net value of the digital assets to be identified;

Determine a second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified according to the position data and the second ratio of each of the first digital assets;

According to the second proportion of the second digital asset and the proportion of the green asset, determine the second proportion of the green asset of the second digital asset relative to the net value of the digital asset to be identified;

Summing the first ratio of each of the first digital assets and the second ratio of the second digital asset to obtain the ratio of green assets in the digital assets to be identified.
The method according to claim 5, wherein, before summing the first proportion of each of the first digital assets and the second proportion of the second digital asset, the method further comprises:

performing text recognition on the position data to obtain the total amount of some of the first digital assets among the plurality of first digital assets, the total amount of the second digital assets, and the total amount of the digital assets to be identified;

performing text recognition on the position data to obtain the total net value of the part of the first digital asset, the total net value of the second digital asset, and the total net value of the digital asset to be identified;

determining a third ratio of the sum of the total amount of the part of the first digital asset and the total amount of the second digital asset relative to the total amount of the digital asset to be identified;

determining a fourth ratio of the sum of the total net value of the part of the first digital asset and the total net value of the second digital asset relative to the total net value of the digital asset to be identified;

determining the leverage ratio according to the third ratio and the fourth ratio;

According to the leverage ratio, deleveraging is performed on the first ratio of the part of the first digital asset and the second ratio of the second digital asset to obtain the first target ratio of the part of the first digital asset and the second target ratio of the second digital asset;

The summing of the first proportion of each of the first digital assets and the second proportion of the second digital asset to obtain the proportion of green assets in the digital assets to be identified includes:

The first proportion of another part of the first digital assets in the plurality of first digital assets, the first target proportion of the part of the first digital assets, and the second target proportion of the second digital assets The sum is obtained to obtain the proportion of green assets in the digital assets to be identified.
An identification device for the proportion of green assets, including: an acquisition unit and a processing unit;

The obtaining unit is used to obtain position data of digital assets to be identified;

The processing unit is configured to perform text recognition on the acquired position data of the digital asset to be identified to obtain a plurality of first digital assets and second digital assets, wherein each of the first digital assets is disclosed in the position data The asset information of the asset, the asset information of the second digital asset is not disclosed in the position data;

The obtaining unit is further configured to obtain the disclosure data of each of the first digital assets according to the asset information of each of the first digital assets;

The processing unit is further configured to input the disclosure data of each of the first digital assets into a machine reading comprehension model for text segmentation to obtain at least one first text segment, wherein the at least one first text segment is used to describe asset distribution of each of the first digital assets;

According to the similarity model, determine the similarity between each of the first text segments and a plurality of second text segments, wherein the plurality of second text segments are used to describe a plurality of capital distributions with green attributes;

determining a target first text segment in the at least one first text segment according to the similarity between each of the first text segments and the plurality of second text segments;

Determine the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets;

According to the portrait of the manager of the digital asset to be identified, obtain all the digital assets managed by the manager, and obtain the average proportion of green assets among the digital assets whose asset information is disclosed among all the digital assets, and The average proportion is taken as the proportion of green assets in the second digital asset;

Determine the proportion of green assets in the digital assets to be identified according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets.
The apparatus according to claim 7, wherein,

When the disclosed data of each of the first digital assets is the annual report of the enterprise to which each of the first digital assets belongs, the asset distribution of each of the first digital assets is the sub-product of the enterprise to which each of the first digital assets belongs Proportion, the funds described in each of the second text paragraphs are distributed as products with green attributes;

In terms of inputting the disclosure data of each of the first digital assets into the machine reading comprehension model for text segmentation to obtain at least one first text segment, the processing unit is specifically configured to:

Perform text recognition on the annual report to obtain target chapters in the annual report, wherein the target chapters are used to describe the main products of the companies to which each of the first digital assets belongs, and the target chapters include target tables and the target text segment;

Inputting the target text segment into the machine reading comprehension model for text segmentation to obtain the at least one first text segment, each of the first text segments is used to describe a sub-product of the main product;

In terms of determining the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, the processing unit is specifically used for :

Entity identification is performed on both the target text segment and the target form to obtain the proportion of the main product, where the proportion of the main product is the turnover of the main product and the value of the affiliated enterprise The ratio of the total turnover of

Determine the proportion of each sub-product in the main product according to the proportion of the main product;

Determine the proportion of the sub-product described in the target first text paragraph according to the proportion of each of the sub-products;

Determine the proportion of green assets in each of the first digital assets according to the proportion of the sub-products described in the target first text segment.
An electronic device, including: a processor and a memory, the processor is connected to the memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that The electronic device executes an instruction of the following steps:

performing text recognition on the acquired position data of digital assets to be identified, and obtaining a plurality of first digital assets and second digital assets, wherein the asset information of each of the first digital assets is disclosed in the position data, and the The asset information of the second digital asset is not disclosed in the position data;

According to the asset information of each of the first digital assets, obtain the disclosure data of each of the first digital assets, and input the disclosure data of each of the first digital assets into a machine reading comprehension model for text segmentation, to obtain at least one first digital asset a text segment, wherein the at least one first text segment is used to describe the asset distribution of each of the first digital assets;

According to the similarity model, determine the similarity between each of the first text segments and a plurality of second text segments, wherein the plurality of second text segments are used to describe a plurality of capital distributions with green attributes;

determining a target first text segment in the at least one first text segment according to the similarity between each of the first text segments and the plurality of second text segments;

Determine the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets;

According to the portrait of the manager of the digital asset to be identified, obtain all the digital assets managed by the manager, and obtain the average proportion of green assets among the digital assets whose asset information is disclosed among all the digital assets, and The average proportion is taken as the proportion of green assets in the second digital asset;

Determine the proportion of green assets in the digital assets to be identified according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets.
The electronic device according to claim 9, wherein,

When the disclosed data of each of the first digital assets is the annual report of the enterprise to which each of the first digital assets belongs, the asset distribution of each of the first digital assets is the sub-product of the enterprise to which each of the first digital assets belongs Proportion, the funds described in each of the second text paragraphs are distributed as products with green attributes;

The disclosure data of each of the first digital assets is input into the machine reading comprehension model for text segmentation to obtain at least one first text segment, including:

Perform text recognition on the annual report to obtain target chapters in the annual report, wherein the target chapters are used to describe the main products of the companies to which each of the first digital assets belongs, and the target chapters include target tables and the target text segment;

Inputting the target text segment into the machine reading comprehension model for text segmentation to obtain the at least one first text segment, each of the first text segments is used to describe a sub-product of the main product;

According to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, determining the proportion of green assets in each of the first digital assets includes:

Entity identification is performed on both the target text segment and the target form to obtain the proportion of the main product, where the proportion of the main product is the turnover of the main product and the value of the affiliated enterprise The ratio of the total turnover of

Determine the proportion of each sub-product in the main product according to the proportion of the main product;

Determine the proportion of the sub-product described in the target first text paragraph according to the proportion of each of the sub-products;

Determine the proportion of green assets in each of the first digital assets according to the proportion of the sub-products described in the target first text segment.
The electronic device according to claim 10, wherein, before determining the similarity between each of the first text segments and a plurality of second text segments according to the similarity model, the steps further include:

Obtaining a first preset document, the products recorded in the first preset document all have green attributes;

performing text recognition on the first preset document to obtain a plurality of third text segments, wherein the plurality of third text segments are used to describe the products recorded in the first preset document;

If any third text segment in the plurality of third text segments refers to other documents, perform text recognition on the other documents to obtain a fourth text segment corresponding to any one of the third text segments, wherein, The fourth text segment is the text used to describe products with green attributes in the other documents;

Using the plurality of third text segments and a fourth text segment corresponding to any one of the third text segments as the plurality of second text segments;

performing entity extraction on each of the plurality of second text segments respectively to obtain a plurality of target entities;

Using any one of the second text segments in the plurality of second text segments and the target entity extracted from the any one of the second text segments as a pair of training samples to obtain multiple pairs of first training samples;

Randomly select a target entity from other target entities other than the target entity corresponding to the arbitrary second text segment among the plurality of target entities, and combine the randomly selected target entity with the arbitrary second text segment As a pair of training samples, multiple pairs of second training samples are obtained;

using the multiple pairs of first training samples and the multiple pairs of second training samples as multiple pairs of target training samples;

The initial model is trained according to the multiple pairs of target training samples to obtain the similarity model.
The electronic device according to claim 9, wherein,

When the asset distribution of each of the first digital assets is the fund use of each of the first digital assets, the fund distribution described in each of the second text paragraphs is a fund use with a green attribute;

According to the similarity model, determining the similarity between each of the first text segments and a plurality of second text segments respectively includes:

Inputting each of the first text segments into the semantic information extraction model to extract the semantic information to obtain a first feature vector of each of the first text segments;

Inputting each of the second text segments into the semantic information extraction model for semantic information extraction to obtain a second feature vector of each of the second text segments;

According to the first feature vector of each of the first text segments and the second feature vector of each of the second text segments, determine the similarity between each of the first text segments and a plurality of second text segments;

According to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, determining the proportion of green assets in each of the first digital assets includes:

The ratio of the planned fund amount in the fund use described in the first text paragraph of the target to the total amount of each of the first digital assets is taken as the proportion of green assets in each of the first digital assets.
The electronic device according to claim 12, wherein, according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets, the determination of the to-be-identified The proportion of green assets in digital assets, including:

Obtain a first ratio of the net value of each of the first digital assets relative to the net value of the digital asset to be identified;

According to the first proportion of each of the first digital assets and the proportion of green assets, determine the first proportion of the green assets of each of the first digital assets relative to the net value of the digital assets to be identified;

Determine a second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified according to the position data and the second ratio of each of the first digital assets;

According to the second proportion of the second digital asset and the proportion of the green asset, determine the second proportion of the green asset of the second digital asset relative to the net value of the digital asset to be identified;

Summing the first ratio of each of the first digital assets and the second ratio of the second digital asset to obtain the ratio of green assets in the digital assets to be identified.
The electronic device according to claim 13, wherein, before summing the first proportion of each of the first digital assets and the second proportion of the second digital asset, the step further comprises:

performing text recognition on the position data to obtain the total amount of some of the first digital assets among the plurality of first digital assets, the total amount of the second digital assets, and the total amount of the digital assets to be identified;

performing text recognition on the position data to obtain the total net value of the part of the first digital asset, the total net value of the second digital asset, and the total net value of the digital asset to be identified;

determining a third ratio of the sum of the total amount of the portion of the first digital asset and the total amount of the second digital asset relative to the total amount of the digital asset to be identified;

determining a fourth ratio of the sum of the total net value of the part of the first digital asset and the total net value of the second digital asset relative to the total net value of the digital asset to be identified;

determining the leverage ratio according to the third ratio and the fourth ratio;

According to the leverage ratio, deleveraging is performed on the first ratio of the part of the first digital asset and the second ratio of the second digital asset to obtain the first target ratio of the part of the first digital asset and the second target ratio of the second digital asset;

The summing of the first proportion of each of the first digital assets and the second proportion of the second digital asset to obtain the proportion of green assets in the digital assets to be identified includes:

The first proportion of another part of the first digital assets in the plurality of first digital assets, the first target proportion of the part of the first digital assets, and the second target proportion of the second digital assets The sum is obtained to obtain the proportion of green assets in the digital assets to be identified.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to cause the computer to perform the following steps:

performing text recognition on the acquired position data of digital assets to be identified, and obtaining a plurality of first digital assets and second digital assets, wherein the asset information of each of the first digital assets is disclosed in the position data, and the The asset information of the second digital asset is not disclosed in the position data;

According to the asset information of each of the first digital assets, obtain the disclosure data of each of the first digital assets, and input the disclosure data of each of the first digital assets into a machine reading comprehension model for text segmentation, to obtain at least one first digital asset a text segment, wherein the at least one first text segment is used to describe the asset distribution of each of the first digital assets;

According to the similarity model, determine the similarity between each of the first text segments and a plurality of second text segments, wherein the plurality of second text segments are used to describe a plurality of capital distributions with green attributes;

determining a target first text segment in the at least one first text segment according to the similarity between each of the first text segments and the plurality of second text segments;

Determine the proportion of green assets in each of the first digital assets according to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets;

According to the portrait of the manager of the digital asset to be identified, obtain all the digital assets managed by the manager, and obtain the average proportion of green assets among the digital assets whose asset information is disclosed among all the digital assets, and The average proportion is taken as the proportion of green assets in the second digital asset;

Determine the proportion of green assets in the digital assets to be identified according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets.
The computer readable storage medium of claim 15, wherein:

When the disclosed data of each of the first digital assets is the annual report of the enterprise to which each of the first digital assets belongs, the asset distribution of each of the first digital assets is the sub-product of the enterprise to which each of the first digital assets belongs Proportion, the funds described in each of the second text paragraphs are distributed as products with green attributes;

The disclosure data of each of the first digital assets is input into the machine reading comprehension model for text segmentation to obtain at least one first text segment, including:

Perform text recognition on the annual report to obtain target chapters in the annual report, wherein the target chapters are used to describe the main products of the companies to which each of the first digital assets belongs, and the target chapters include target tables and the target text segment;

Inputting the target text segment into the machine reading comprehension model for text segmentation to obtain the at least one first text segment, each of the first text segments is used to describe a sub-product of the main product;

According to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, determining the proportion of green assets in each of the first digital assets includes:

Entity identification is performed on both the target text segment and the target form to obtain the proportion of the main product, where the proportion of the main product is the turnover of the main product and the value of the affiliated enterprise The ratio of the total turnover of

Determine the proportion of each sub-product in the main product according to the proportion of the main product;

Determine the proportion of the sub-product described in the target first text paragraph according to the proportion of each of the sub-products;

Determine the proportion of green assets in each of the first digital assets according to the proportion of the sub-products described in the target first text segment.
The computer-readable storage medium according to claim 16, wherein, before determining the similarity between each of the first text segments and a plurality of second text segments according to the similarity model, the steps further include:

Obtaining a first preset document, the products recorded in the first preset document all have green attributes;

performing text recognition on the first preset document to obtain a plurality of third text segments, wherein the plurality of third text segments are used to describe the products recorded in the first preset document;

If any third text segment in the plurality of third text segments refers to other documents, perform text recognition on the other documents to obtain a fourth text segment corresponding to any one of the third text segments, wherein, The fourth text segment is the text used to describe products with green attributes in the other documents;

Using the plurality of third text segments and a fourth text segment corresponding to any one of the third text segments as the plurality of second text segments;

performing entity extraction on each of the plurality of second text segments respectively to obtain a plurality of target entities;

Using any one of the second text segments in the plurality of second text segments and the target entity extracted from the any one of the second text segments as a pair of training samples to obtain multiple pairs of first training samples;

Randomly select a target entity from other target entities other than the target entity corresponding to the arbitrary second text segment among the plurality of target entities, and combine the randomly selected target entity with the arbitrary second text segment As a pair of training samples, multiple pairs of second training samples are obtained;

using the multiple pairs of first training samples and the multiple pairs of second training samples as multiple pairs of target training samples;

The initial model is trained according to the multiple pairs of target training samples to obtain the similarity model.
The computer readable storage medium of claim 15, wherein:

When the asset distribution of each of the first digital assets is the fund use of each of the first digital assets, the fund distribution described in each of the second text paragraphs is a fund use with a green attribute;

According to the similarity model, determining the similarity between each of the first text segments and a plurality of second text segments respectively includes:

Inputting each of the first text segments into the semantic information extraction model to extract the semantic information to obtain a first feature vector of each of the first text segments;

Inputting each of the second text segments into the semantic information extraction model for semantic information extraction to obtain a second feature vector of each of the second text segments;

According to the first feature vector of each of the first text segments and the second feature vector of each of the second text segments, determine the similarity between each of the first text segments and a plurality of second text segments;

According to the asset distribution described in the target first text paragraph and the total amount of each of the first digital assets, determining the proportion of green assets in each of the first digital assets includes:

The ratio of the planned fund amount in the fund use described in the first text paragraph of the target to the total amount of each of the first digital assets is taken as the proportion of green assets in each of the first digital assets.
The computer-readable storage medium according to claim 18, wherein, according to the proportion of green assets in each of the first digital assets and the proportion of green assets in the second digital assets, the determined Describe the proportion of green assets among the digital assets to be identified, including:

Obtain a first ratio of the net value of each of the first digital assets relative to the net value of the digital asset to be identified;

According to the first proportion of each of the first digital assets and the proportion of green assets, determine the first proportion of the green assets of each of the first digital assets relative to the net value of the digital assets to be identified;

Determine a second ratio of the net value of the second digital asset relative to the net value of the digital asset to be identified according to the position data and the second ratio of each of the first digital assets;

According to the second proportion of the second digital asset and the proportion of the green asset, determine the second proportion of the green asset of the second digital asset relative to the net value of the digital asset to be identified;

Summing the first ratio of each of the first digital assets and the second ratio of the second digital asset to obtain the ratio of green assets in the digital assets to be identified.
The computer-readable storage medium according to claim 19, wherein, before summing the first proportion of each of the first digital assets and the second proportion of the second digital asset, the step further comprises :

performing text recognition on the position data to obtain the total amount of some of the first digital assets among the plurality of first digital assets, the total amount of the second digital assets, and the total amount of the digital assets to be identified;

performing text recognition on the position data to obtain the total net value of the part of the first digital asset, the total net value of the second digital asset, and the total net value of the digital asset to be identified;

determining a third ratio of the sum of the total amount of the part of the first digital asset and the total amount of the second digital asset relative to the total amount of the digital asset to be identified;

determining a fourth ratio of the sum of the total net value of the part of the first digital asset and the total net value of the second digital asset relative to the total net value of the digital asset to be identified;

determining the leverage ratio according to the third ratio and the fourth ratio;

According to the leverage ratio, deleveraging is performed on the first ratio of the part of the first digital asset and the second ratio of the second digital asset to obtain the first target ratio of the part of the first digital asset and the second target ratio of the second digital asset;

The summing of the first proportion of each of the first digital assets and the second proportion of the second digital asset to obtain the proportion of green assets in the digital assets to be identified includes:

The first proportion of another part of the first digital assets in the plurality of first digital assets, the first target proportion of the part of the first digital assets, and the second target proportion of the second digital assets The sum is obtained to obtain the proportion of green assets in the digital assets to be identified.