CN113705192A

CN113705192A - Text processing method, device and storage medium

Info

Publication number: CN113705192A
Application number: CN202111014282.7A
Authority: CN
Inventors: 陈悦竹; 拓万敏; 张博
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-26
Anticipated expiration: 2041-08-31

Abstract

The application relates to the technical field of artificial intelligence, and provides a text processing method, a device and a storage medium, wherein the text processing method comprises the following steps: determining an application enterprise applying for loan and an associated enterprise having an association relationship with the application enterprise; acquiring a first text for describing characteristic information of an application enterprise and a second text for describing characteristic information of an associated enterprise; extracting semantic features of the first text to obtain a first feature vector for representing the semantic features of the first text; extracting semantic features of the second text to obtain a second feature vector for representing the semantic features of the second text; determining the similarity between the first text and the second text according to the first feature vector and the second feature vector; and determining whether the enterprise applying for diversified management exists according to the similarity between the first text and the second text. By implementing the method and the device, the accuracy of diversified operation identification of enterprises can be improved.

Description

Text processing method, device and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a text processing method, apparatus, and storage medium.

Background

The small and micro enterprises are the general names of small enterprises, micro enterprises and family workshop-type enterprises and play an important role in national economy. The credit business of the small and micro enterprises is always a highly concerned business of China, the credit business needs to investigate whether the small and micro enterprises have diversified operation, the diversified operation refers to mixed diversified operation of the small and micro enterprises, the main operation business is unclear, and the enterprises extend to an operation range which is irrelevant to the original products, technologies and markets. The small and micro enterprises mainly have the following risks in implementing diversified operation: the original operation industry is weakened, the dispersion of the attention of the capital and the management layer is caused, the enterprise must continuously inject the follow-up resources after entering the new industry, the industry is learned, the own staff team is cultivated, the dispersion of the resources is caused, and the like. In a credit scene of a financial institution, the technical scheme of judging whether a small and micro enterprise relates to diversified operation by using an algorithm is mainly to match words in an operation range text of the small and micro enterprise with common keywords of the industry to which the small and micro enterprise belongs to further obtain a matching rate, so that whether the small and micro enterprise has diversified operation is determined according to the matching rate, but the words have the same meaning generally, but the difference of the word description modes is large, and the accuracy of the mode for determining whether the small and micro enterprise has diversified operation is lower by using the keyword matching method.

Disclosure of Invention

Therefore, it is necessary to provide a text processing method, device and storage medium for solving the above technical problems, in which semantic feature extraction is performed on a text of an application enterprise, and semantic feature extraction is performed on a text of an associated enterprise, so as to mine deeper semantics of the text, determine similarity between the texts, and further determine whether the application enterprise has diversified management, thereby improving accuracy of diversified management identification of the application enterprise.

In a first aspect, the present application provides a text processing method, including:

determining an application enterprise applying for loan and an associated enterprise having an association relationship with the application enterprise;

acquiring a first text for describing the characteristic information of the application enterprise and a second text for describing the characteristic information of the associated enterprise;

extracting semantic features of the first text to obtain a first feature vector for representing the semantic features of the first text;

extracting semantic features of the second text to obtain a second feature vector for representing the semantic features of the second text;

determining the similarity between the first text and the second text according to the first feature vector and the second feature vector;

and determining whether the application enterprise has diversified management according to the similarity between the first text and the second text.

With reference to the first aspect, in some embodiments, performing semantic feature extraction on the first text to obtain a first feature vector for representing semantic features of the first text, includes:

performing word segmentation processing on the first text to obtain m first words, wherein m is an integer greater than or equal to 1;

inputting the m first participles into a word vector model, obtaining m first word vectors corresponding to the m first participles, and determining a first feature vector for representing semantic features of the first text according to the m first word vectors;

the extracting semantic features of the second text to obtain a second feature vector for representing the semantic features of the second text includes:

performing word segmentation processing on the second text to obtain n second word segmentations, wherein n is an integer greater than or equal to 1;

inputting the n second participles into a word vector model, obtaining n second word vectors corresponding to the n second participles, and determining a second feature vector for representing semantic features of the second text according to the n second word vectors.

With reference to the first aspect, in some embodiments, the determining, from the m first word vectors, a first feature vector for representing semantic features of the first text includes:

determining the m first word vectors as first feature vectors for representing semantic features of the first text;

the determining, from the n second word vectors, a second feature vector for representing semantic features of the second text, includes:

determining the n second word vectors as second feature vectors for representing semantic features of the second text;

the determining a similarity between the first text and the second text according to the first feature vector and the second feature vector comprises:

respectively calculating Euclidean distances between each first word vector and each second word vector to obtain m x n Euclidean distances;

and calculating the average Euclidean distance of the m-n Euclidean distances, and determining the similarity between the first text and the second text according to the average Euclidean distance.

according to the sequence of the first participles in the first text, which respectively correspond to the m first word vectors, splicing the m first word vectors to obtain first feature vectors for representing semantic features of the first text;

and splicing the n second word vectors according to the sequence of the second participles respectively corresponding to the n second word vectors in the second text to obtain a second feature vector for representing the semantic features of the second text.

With reference to the first aspect, in some embodiments, the determining a similarity between the first text and the second text according to the first feature vector and the second feature vector includes:

calculating the Euclidean distance between the first feature vector and the second feature vector, and determining the similarity between the first text and the second text according to the Euclidean distance; alternatively, the first and second electrodes may be,

and calculating cosine similarity between the first feature vector and the second feature vector, and determining the similarity between the first text and the second text according to the cosine similarity.

With reference to the first aspect, in some embodiments, the determining whether the enterprise of the application has diversified business according to the similarity between the first text and the second text includes:

if the similarity between the first text and the second text is larger than a first threshold value, determining that the application enterprise does not have diversified operation;

if the similarity between the first text and the second text is smaller than or equal to the first threshold, determining a first industry to which the application enterprise belongs and determining a second industry to which the associated enterprise belongs;

searching intimacy between the first industry and the second industry in a pre-stored correlation matrix, wherein the correlation matrix comprises intimacy between industries;

if the intimacy between the first industry and the second industry is greater than a second threshold value, determining that the application enterprise does not have diversified operation;

and if the intimacy between the first industry and the second industry is smaller than or equal to the second threshold value, determining that the application enterprise has diversified operation.

With reference to the first aspect, in some embodiments, the first text includes a business name and a business scope of the applying business; the second text comprises the business name and the business scope of the associated business;

the determining the first industry to which the application enterprise belongs and the determining the second industry to which the associated enterprise belongs include:

respectively acquiring a third text corresponding to the application enterprise and a fourth text corresponding to the associated enterprise, wherein the third text comprises annual report information of the application enterprise, and the fourth text comprises annual report information of the associated enterprise;

classifying the industry to which the application enterprise belongs based on the third text by using a classification model to obtain a first industry to which the application enterprise belongs;

and classifying the industry to which the associated enterprise belongs based on the fourth text by using a classification model to obtain a second industry to which the associated enterprise belongs.

In a second aspect, the present application provides a text processing apparatus, comprising:

the device comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining an application enterprise applying for loan and an associated enterprise which has an association relationship with the application enterprise;

the acquiring unit is used for acquiring a first text for describing the characteristic information of the application enterprise and a second text for describing the characteristic information of the associated enterprise;

the first feature extraction unit is used for extracting semantic features of the first text to obtain a first feature vector for expressing the semantic features of the first text;

the second feature extraction unit is used for extracting semantic features of the second text to obtain a second feature vector used for expressing the semantic features of the second text;

a second determining unit, configured to determine a similarity between the first text and the second text according to the first feature vector and the second feature vector;

and the third determining unit is used for determining whether the application enterprise has diversified management according to the similarity between the first text and the second text.

With reference to the second aspect, in some embodiments, the first feature extraction unit is specifically configured to:

the second feature extraction unit is specifically configured to:

With reference to the second aspect, in some embodiments, the first feature extraction unit is specifically configured to: determining the m first word vectors as first feature vectors for representing semantic features of the first text;

the second feature extraction unit is specifically configured to: determining the n second word vectors as second feature vectors for representing semantic features of the second text;

the second determining unit is specifically configured to: respectively calculating Euclidean distances between each first word vector and each second word vector to obtain m x n Euclidean distances;

With reference to the second aspect, in some embodiments, the first feature extraction unit is specifically configured to: according to the sequence of the first participles in the first text, which respectively correspond to the m first word vectors, splicing the m first word vectors to obtain first feature vectors for representing semantic features of the first text;

the second feature extraction unit is specifically configured to: and splicing the n second word vectors according to the sequence of the second participles respectively corresponding to the n second word vectors in the second text to obtain a second feature vector for representing the semantic features of the second text.

With reference to the second aspect, in some embodiments, the second determining unit is specifically configured to:

With reference to the second aspect, in some embodiments, the third determining unit is specifically configured to:

With reference to the second aspect, in some embodiments, the first text includes a business name and a business scope of the applying business; the second text comprises the business name and the business scope of the associated business;

the third determining unit is specifically configured to: respectively acquiring a third text corresponding to the application enterprise and a fourth text corresponding to the associated enterprise, wherein the third text comprises annual report information of the application enterprise, and the fourth text comprises annual report information of the associated enterprise;

In a third aspect, the present application provides a text processing apparatus, including a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, where the communication interface is configured to receive and send data, the memory is configured to store program code, and the processor is configured to call the program code to perform a method as described in the first aspect and any possible implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein; the computer program, when run on one or more processors, causes the terminal device to perform the method as described in the first aspect and any possible implementation form of the first aspect.

In the embodiment of the application, a first text of an application enterprise applying for loan and a second text of an associated enterprise having an association relation with the application enterprise are respectively obtained, semantic feature extraction is performed on the first text, a first feature vector used for representing semantic features of the first text is obtained, semantic feature extraction is performed on the second text, a second feature vector used for representing semantic features of the second text is obtained, the similarity between the first text and the second text is determined according to the first feature vector and the second feature vector, and whether diversified management exists in the application enterprise or not is further determined based on the similarity between the texts. In the embodiment of the application, the method and the device have the advantages that the semantic features of the texts of the application enterprises are extracted, the semantic features of the texts of the associated enterprises are extracted, the deeper semantics of the texts are mined, the similarity between the texts is determined, and then whether the application enterprises have diversified management or not is determined, so that the accuracy of diversified management identification of the application enterprises is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below.

Fig. 1 is a schematic flowchart of a text processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another text processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a text processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic diagram of another text processing apparatus according to an embodiment of the present application.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In order to more clearly describe the scheme of the present application, some drawings related to the present application are further described below.

Referring to fig. 1, fig. 1 is a schematic flowchart of a text processing method according to an embodiment of the present disclosure. As shown in fig. 1, the method comprises the steps of:

s101, determining an application enterprise applying for loan and an associated enterprise having an association relation with the application enterprise;

in the embodiment of the present application, the enterprise of the application may be a medium-scale enterprise or a small-scale enterprise. The related enterprises having an association relationship with the application enterprise may include, but are not limited to: and the enterprise has a stock right relationship or a business relationship with the application enterprise. The related enterprises having a stock right relationship with the application enterprise may be enterprises having a stock holding relationship or stock controlling relationship with the application enterprise, and enterprises having the same legal person as the application enterprise, or may further include enterprises in which legal persons of the application enterprise invest or perform duties. An enterprise associated with an applying enterprise having a business relationship may refer to an enterprise having a supply chain relationship with the applying enterprise.

For example, the association between the enterprises may be stored in advance, for example, by means of a knowledge graph. After the application enterprises applying for the loan are determined, the associated enterprises which have the association relationship with the application enterprises can be searched from the pre-stored association relationship.

S102, acquiring a first text for describing the characteristic information of the application enterprise and a second text for describing the characteristic information of the associated enterprise;

in the embodiment of the application, the first text for describing the feature information of the application enterprise may include text information such as an enterprise name and a business scope of the application enterprise, and the second text for describing the feature information of the associated enterprise may include text information such as an enterprise name and a business scope of the associated enterprise. Optionally, by calling a query interface of the business administration system, the first text of the application enterprise and the second text of the associated enterprise are obtained by querying from the business administration system through the query interface.

S103, extracting semantic features of the first text to obtain a first feature vector for representing the semantic features of the first text;

s104, extracting semantic features of the second text to obtain a second feature vector for representing the semantic features of the second text;

in the embodiment of the application, semantic feature extraction is respectively performed on the first text and the second text to obtain a first feature vector used for representing the semantic features of the first text and a second feature vector used for representing the semantic features of the second text. The obtaining modes of the first feature vector and the second feature vector may include multiple types, and the following three optional modes are taken as examples:

in the first mode, semantic feature extraction is performed on the first text through a semantic analysis model to obtain a first feature vector, and semantic feature extraction is performed on the second text through the semantic analysis model to obtain a second feature vector. The semantic analysis model may be a natural language processing model based on a BERT pre-training language model, or may be a natural language processing model based on Textcnn. It should be noted that the idea of the BERT pre-training Language Model is to construct a pre-trained topic Model through a large-scale text corpus, and to capture expressions at word and Sentence levels through two tasks, namely a pre-trained Language Model (Masked Language Model) and a Sentence Prediction (Next sequence Prediction), so that the BERT pre-training Language Model can be used as a general feature extraction Model. Because the first feature vector can represent the semantic features of the first text and the second feature vector can represent the semantic features of the second text, the method and the device determine the similarity between the first text and the second text from the perspective of text semantics, and mine the deep semantics of the text, thereby determining whether the enterprise of the application has diversified management, rather than simply capturing the difference between the enterprise of the application and the related enterprise in a keyword matching manner, and further improving the accuracy of diversified management identification of the enterprise.

And in the second mode, the word segmentation tool is used for segmenting the first text to obtain a plurality of first words, and the word segmentation tool is used for segmenting the second text to obtain a plurality of second words. Respectively identifying the part of speech of each participle, removing stop words such as 'limited companies' and the like which have no practical meaning, removing the participles which represent geographical areas or positions from each participle, and finally obtaining m first participles and n second participles. Respectively converting the m first sub-words into m first word vectors by using a word vector model of word2vec, wherein the m first word vectors can be first characteristic data used for representing industry characteristics of an application enterprise. And respectively converting the n second participles into n second word vectors by using the word vector model of word2vec, wherein the n second word vectors can be second characteristic data used for representing industry characteristics of the related enterprises.

Further, m first word vectors are determined as first feature vectors for representing semantic features of the first text, and the n second word vectors are determined as second feature vectors for representing semantic features of the second text.

And in the third mode, word segmentation is respectively carried out on the first text to obtain m first word segments, word segmentation is carried out on the second text to obtain n second word segments, the m first word segments are converted into corresponding m first word vectors by using a word vector model, and the n second word segments are converted into corresponding n second word vectors by using the word vector model. The specific word segmentation and word vector conversion process may refer to the specific description of the mode two, and will not be described herein again. Further, according to the sequence of the first participles in the first text corresponding to the m first word vectors, the m first word vectors may be spliced into a first feature vector, and according to the sequence of the second participles in the second text corresponding to the n second word vectors, the n second word vectors may be spliced into a second feature vector.

S105, determining the similarity between the first text and the second text according to the first feature vector and the second feature vector;

and S106, determining whether the application enterprises have diversified management according to the similarity between the first text and the second text.

In this embodiment of the application, after obtaining the first feature vector and the second feature vector, the similarity between the first text and the second text may be further determined according to the first feature vector and the second feature vector. And determining whether the enterprise applying for diversified management exists according to the similarity between the first text and the second text. If the first feature vector and the second feature vector are obtained in different manners, the determination manner for determining the text similarity according to the first feature vector and the second feature vector is also different, which is exemplified below:

if the first feature vector and the second feature vector are obtained in the above-mentioned manner, i.e. the first feature vector contains m first word vectors, the second feature vector contains n second word vectors, the similarity between the first text and the second text may be further calculated based on the m first word vectors and the n second word vectors, wherein, the calculation method can be that each first word vector in the m first word vectors is respectively compared with each second word vector in the n second word vectors to obtain the Euclidean distance between each first word vector and each second word vector, that is, m × n Euclidean distances are obtained in total, the m × n Euclidean distances are averaged to obtain an average Euclidean distance, and determining the similarity between the first text and the second text according to the average Euclidean distance, wherein the smaller the Euclidean distance is, the greater the similarity between the first text and the second text is. The larger the euclidean distance, the smaller the similarity between the first text and the second text. If the similarity is larger than a first threshold value, determining that no diversified operation exists in the enterprise applying for the business; if the similarity is less than or equal to the first threshold, a first industry to which the application enterprise belongs and a second industry to which the associated enterprise belongs may be further obtained, if the intimacy between the first industry and the second industry is greater than the second threshold, it is determined that the application enterprise does not have diversified operation, and if the intimacy between the first industry and the second industry is less than or equal to the second threshold, it is determined that the application enterprise has diversified operation, which may be specifically described with reference to the embodiment of fig. 2, and will not be repeated here.

If the first feature vector and the second feature vector are obtained in the first or third manner, the euclidean distance between the first feature vector and the second feature vector may be calculated, and the similarity between the first text and the second text may be determined according to the euclidean distance. Optionally, cosine similarity between the first feature vector and the second feature vector may also be calculated, and similarity between the first text and the second text is determined according to the cosine similarity, where the larger the cosine similarity is, the larger the similarity between the first text and the second text is, and correspondingly, the smaller the cosine similarity is, the smaller the similarity between the first text and the second text is. If the similarity between the first text and the second text is greater than a first threshold value, determining that no diversified operation exists in the enterprise; if the similarity between the first text and the second text is less than or equal to the first threshold, a first industry to which the application enterprise belongs and a second industry to which the associated enterprise belongs may be further obtained, if the intimacy between the first industry and the second industry is greater than the second threshold, it is determined that the application enterprise does not have diversified management, and if the intimacy between the first industry and the second industry is less than or equal to the second threshold, it is determined that the application enterprise has diversified management, which may be specifically described with reference to the description of the embodiment of fig. 2, and will not be repeated herein.

In the embodiment of the application, a first text of an application enterprise applying for loan and a second text of an associated enterprise having an association relation with the application enterprise are respectively obtained, semantic feature extraction is performed on the first text, a first feature vector used for representing semantic features of the first text is obtained, semantic feature extraction is performed on the second text, a second feature vector used for representing semantic features of the second text is obtained, the similarity between the first text and the second text is determined according to the first feature vector and the second feature vector, and whether diversified management exists in the application enterprise or not is further determined based on the similarity between the texts. In the embodiment of the application, the text of the application enterprise is subjected to semantic feature extraction, the text of the associated enterprise is subjected to semantic feature extraction, deeper semantics of the text are mined, and whether diversified operation exists in the application enterprise is determined according to the obtained feature vector, so that the accuracy of diversified operation identification of the application enterprise is improved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another text processing method according to an embodiment of the present disclosure. As shown in fig. 2, the method comprises the steps of:

s201, determining an application enterprise applying for loan and an associated enterprise having an association relation with the application enterprise;

s202, acquiring a first text for describing the characteristic information of the application enterprise and a second text for describing the characteristic information of the associated enterprise;

s203, extracting semantic features of the first text to obtain a first feature vector for representing the semantic features of the first text;

s204, performing semantic feature extraction on the second text to obtain a second feature vector for representing the semantic features of the second text;

s205, determining the similarity between the first text and the second text according to the first feature vector and the second feature vector;

steps S201 to S205 in the embodiment of the present application please refer to steps S101 to S105 in the embodiment of fig. 1, which are not described herein again.

S206, if the similarity between the first text and the second text is greater than a first threshold value, determining that the enterprise does not have diversified operation;

s207, if the similarity between the first text and the second text is smaller than or equal to the first threshold, determining a first industry to which the application enterprise belongs and determining a second industry to which the associated enterprise belongs;

specifically, optionally, if the similarity between the first text and the second text is less than or equal to the first threshold, it is necessary to further determine a first industry to which the application enterprise belongs, and determine a second industry to which the associated enterprise belongs. The determining mode for determining the industry to which the application enterprise belongs and the industry to which the associated enterprise belongs may be: and acquiring a third text of the application enterprise and a fourth text of the associated enterprise, wherein the third text and the fourth text can comprise one or more of yearbook data of the enterprise, supply chain relation data of the enterprise or research report data of the enterprise.

And further using a word segmentation tool to segment the third text, removing meaningless symbols such as punctuations and the like in the third text, obtaining a plurality of third segmented words, and forming the plurality of third segmented words into a first word sequence. And performing word segmentation on the fourth text by using a word segmentation tool, removing meaningless symbols such as punctuations and the like in the fourth text to obtain a plurality of fourth word segmentations, and forming the plurality of fourth word segmentations into a second word sequence. Further, each third word in the first word sequence is converted into a plurality of third word vectors by using a word vector model, and the plurality of third word vectors are combined into a first vector sequence. Further, the word vector model is used for converting each fourth word in the second word sequence into a plurality of fourth word vectors, and the fourth word vectors are combined into a second vector sequence.

Inputting the first vector sequence into a pre-trained FastText model, performing industry reclassification on application enterprises, inputting the FastText model into a vector sequence (a vector sequence corresponding to a word sequence after a section of text or a sentence is segmented), and outputting the probability that the word sequence belongs to different categories. The words and phrases in the sequence constitute a feature vector, the feature vector is mapped to the middle layer through linear transformation, and the middle layer is mapped to the label. The FastText model normalizes the value of the output layer to be 0-1 distribution through a softmax function, constructs the output of the neuron into probability distribution, thereby determining the probability that the application enterprises belong to each preset industry, and taking the preset industry with the maximum probability as the first industry to which the application enterprises belong. Similarly, the second vector sequence is input into the pre-trained FastText model, and the associated enterprises are subjected to industry reclassification to obtain a second industry to which the associated enterprises belong.

S208, searching intimacy between the first industry and the second industry in a pre-stored correlation matrix, wherein the correlation matrix comprises intimacy between industries;

s209, if the intimacy between the first industry and the second industry is greater than a second threshold value, determining that the diversified operation does not exist in the application enterprise;

and S210, if the intimacy between the first industry and the second industry is smaller than or equal to the second threshold value, determining that the application enterprise has diversified operation.

In the embodiment of the application, research report data of each industry and supply chain data among the industries can be obtained in advance. Based on research report data of various industries, a first matrix for representing industry relevance among the various industries is obtained. And obtaining a knowledge graph for representing supply relations among the industries based on the supply chain data among the industries. Further modifying the first matrix based on the knowledge graph to obtain a second matrix, for example, if the path between two industries in the knowledge graph is short and the correlation between the two industries in the correlation matrix is small, it is necessary to increase the value of the correlation between the two industries in the correlation matrix, where the value of the correlation between the industries in the correlation matrix may be used to represent the intimacy between the industries, and the correlation matrix may represent the correlation size between any two industries.

The second matrix may be pre-calculated and stored, the second matrix is used as a correlation matrix in the present application, after the first industry to which the application enterprise belongs is obtained and the second industry to which the application enterprise belongs is correlated, the intimacy between the first industry and the second industry can be obtained from the second matrix, the intimacy may be a value of industry correlation at a corresponding position in the second matrix, and if the intimacy is greater than a second threshold, it is indicated that the application enterprise does not have diversified operation. And if the intimacy is less than or equal to the second threshold value, determining that the enterprise applying for the diversified operation exists.

Referring to fig. 3, a schematic structural diagram of a text processing apparatus is provided in an embodiment of the present application. As shown in fig. 3, the text processing apparatus may include:

a first determination unit 10, configured to determine an application enterprise applying for a loan and an associated enterprise having an association relationship with the application enterprise;

an obtaining unit 11, configured to obtain a first text for describing feature information of the application enterprise and a second text for describing feature information of the associated enterprise;

a first feature extraction unit 12, configured to perform semantic feature extraction on the first text, and obtain a first feature vector used for representing semantic features of the first text;

a second feature extraction unit 13, configured to perform semantic feature extraction on the second text, and obtain a second feature vector used for representing semantic features of the second text;

a second determining unit 14, configured to determine a similarity between the first text and the second text according to the first feature vector and the second feature vector;

and the third determining unit 15 is configured to determine whether the enterprise application has diversified business operations according to the similarity between the first text and the second text.

In one possible design, the first feature extraction unit 12 is specifically configured to:

the second feature extraction unit 13 is specifically configured to:

In one possible design, the first feature extraction unit 12 is specifically configured to: determining the m first word vectors as first feature vectors for representing semantic features of the first text;

the second feature extraction unit 13 is specifically configured to: determining the n second word vectors as second feature vectors for representing semantic features of the second text;

the second determining unit 14 is specifically configured to: respectively calculating Euclidean distances between each first word vector and each second word vector to obtain m x n Euclidean distances;

In one possible design, the first feature extraction unit 12 is specifically configured to: according to the sequence of the first participles in the first text, which respectively correspond to the m first word vectors, splicing the m first word vectors to obtain first feature vectors for representing semantic features of the first text;

the second feature extraction unit 13 is specifically configured to: and splicing the n second word vectors according to the sequence of the second participles respectively corresponding to the n second word vectors in the second text to obtain a second feature vector for representing the semantic features of the second text.

In one possible design, the second determining unit 14 is specifically configured to:

In one possible design, the third determining unit 15 is specifically configured to:

In one possible design, the first text includes a business name and a business scope of the applying business; the second text comprises the business name and the business scope of the associated business;

the third determining unit 15 is specifically configured to: respectively acquiring a third text corresponding to the application enterprise and a fourth text corresponding to the associated enterprise, wherein the third text comprises annual report information of the application enterprise, and the fourth text comprises annual report information of the associated enterprise;

For a specific description of the embodiment of the apparatus shown in fig. 3, reference may be made to the specific description of the embodiment of the method shown in fig. 1 or fig. 2, which is not repeated herein.

Referring to fig. 4, which is a schematic structural diagram of another text processing apparatus according to an embodiment of the present disclosure, as shown in fig. 4, the text processing apparatus 1000 may include: at least one processor 1001, such as a CPU, at least one communication interface 1003, memory 1004, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The communication interface 1003 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1004 may optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 4, memory 1004, which is a type of computer storage medium, may include an operating system, network communication units, and program instructions.

In the text processing apparatus 1000 shown in fig. 4, the processor 1001 may be configured to load program instructions stored in the memory 1004 and specifically perform the following operations:

It should be noted that, for a specific implementation process, reference may be made to specific descriptions of the method embodiment shown in fig. 1 or fig. 2, which is not described herein again.

An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiment shown in fig. 1 or fig. 2, and a specific execution process may refer to specific descriptions of the embodiment shown in fig. 1 or fig. 2, which is not described herein again.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A method of text processing, comprising:

2. The method of claim 1, wherein performing semantic feature extraction on the first text to obtain a first feature vector for representing semantic features of the first text comprises:

3. The method of claim 2, wherein said determining a first feature vector from the m first word vectors for representing semantic features of the first text comprises:

4. The method of claim 2, wherein said determining a first feature vector from the m first word vectors for representing semantic features of the first text comprises:

5. The method of claim 4, wherein said determining a similarity between the first text and the second text from the first feature vector and the second feature vector comprises:

6. The method of any one of claims 1-5, wherein said determining whether there is a plurality of business operations for the enterprise based on the similarity between the first text and the second text comprises:

7. The method of claim 6, wherein the first text comprises a business name and a business scope of the requesting business; the second text comprises the business name and the business scope of the associated business;

8. A text processing apparatus, comprising:

9. A text processing apparatus comprising a processor, a memory and a communication interface, the processor, the memory and the communication interface being interconnected, wherein the communication interface is configured to receive and transmit data, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium; the computer program, when run on one or more processors, performs the method of any one of claims 1-7.