CN113761175A - Text processing method and device, electronic equipment and storage medium - Google Patents

Text processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113761175A
CN113761175A CN202110139483.3A CN202110139483A CN113761175A CN 113761175 A CN113761175 A CN 113761175A CN 202110139483 A CN202110139483 A CN 202110139483A CN 113761175 A CN113761175 A CN 113761175A
Authority
CN
China
Prior art keywords
abstract
commodity
sentence
abstracts
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110139483.3A
Other languages
Chinese (zh)
Inventor
李浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110139483.3A priority Critical patent/CN113761175A/en
Publication of CN113761175A publication Critical patent/CN113761175A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a text processing method, apparatus, electronic device and storage medium, which are applied to the technical field of text processing, and include: the method comprises the steps of obtaining an introduction text of a commodity, wherein the introduction text comprises a plurality of sentences, dividing the introduction text into a plurality of parts, determining a central sentence of each part of the plurality of parts, sampling according to random variables P-E (lambda) in each part to obtain a sampling sentence corresponding to each part, and combining the sampling sentences corresponding to each part to obtain an input text.

Description

Text processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of text processing technologies, and in particular, to a text processing method and apparatus, an electronic device, and a storage medium.
Background
In practical application, for different product categories, the data have serious unbalanced distribution and long tail. Some categories, such as common electrical appliances, coats, skirts and the like, have tens of thousands of data, while some categories, such as accessories of certain clothes, ties, hems, buttons and the like, have less than 100 data, and generally, the category with less data is changed into a small sample category.
In the process of implementing the disclosed concept, the inventor finds that the categories with small data amount are numerous, and high-quality commodity summaries cannot be generated for the categories without data enhancement processing.
Disclosure of Invention
In view of the above, the present disclosure provides a text processing method, an apparatus, an electronic device, and a storage medium, which can generate a variety of commodity input texts.
One aspect of the present disclosure provides a text processing method, including:
acquiring an introduction text of a commodity, wherein the introduction text comprises a plurality of sentences;
dividing the introduction text into a plurality of parts, and determining a central sentence of each part of the plurality of parts;
sampling is carried out in each part according to random variables P-E (lambda) to obtain a sampling sentence corresponding to each part, wherein lambda is 1/the number of sentences in the part to be sampled, P represents the probability of taking the sentences which are in distance sequence with i from the central sentence of the part, and E (·) represents exponential distribution;
and combining the sampling sentences corresponding to each part to obtain an input text, wherein the input text is used for generating the abstract of the commodity.
According to an embodiment of the present disclosure, the determining the central sentence of each of the plurality of portions comprises:
in each part, randomly acquiring any one sentence in the part as a corresponding central sentence.
According to an embodiment of the present disclosure, the determining the central sentence of each of the plurality of portions comprises:
calculating the similarity between every two sentences included in each part,
in each part, calculating the sum of the similarity of each sentence with other sentences respectively to obtain the total similarity of each sentence in each part;
and selecting the sentence with the highest total similarity as a corresponding central sentence in each part.
According to an embodiment of the present disclosure, the dividing the introduction text into a plurality of parts includes:
and utilizing a clustering algorithm to divide at least part of the sentences into the parts.
According to an embodiment of the present disclosure, further comprising:
acquiring a plurality of abstracts of the commodity based on the input text;
constructing a plurality of training samples according to the plurality of abstracts, wherein each training sample comprises two different abstracts, one abstract of the two different abstracts is used as the input of a preset abstract generating model, and the other abstract is used as the output of the abstract generating model;
and training the abstract generation model by using the plurality of training samples to obtain a trained abstract generation model, wherein the trained abstract generation model is used for generating the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity according to any one abstract of the plurality of abstracts of the commodity.
According to an embodiment of the present disclosure, further comprising:
inputting any abstract of the plurality of abstracts of the commodity into the trained abstract generating model to generate the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity.
According to an embodiment of the present disclosure, the digest generation model is a sequence-to-sequence model based on RNN, CNN, or Transformer.
Another aspect of the present disclosure provides a text processing apparatus including:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an introduction text of a commodity, and the introduction text comprises a plurality of sentences;
the dividing module is used for dividing the introduction text into a plurality of parts;
a determining module, configured to determine a central sentence of each portion;
a sampling module, configured to sample in each part according to random variables P to E (λ) to obtain a sampling sentence corresponding to each part, where λ is 1/the number of sentences in the part to be sampled, P represents a probability of taking a sentence that is ordered by a distance i from a central sentence of the part, and E (·) represents an exponential distribution;
and the merging module is used for merging the sampling sentences corresponding to each part to obtain an input text, and the input text is used for generating the abstract of the commodity.
According to an embodiment of the present disclosure, the text processing apparatus further includes:
the generation module is used for generating a plurality of abstracts of the commodity based on the input text;
the building module is used for building a plurality of training samples according to the plurality of abstracts, each training sample comprises two different abstracts, one abstract of the two different abstracts is used as the input of a preset abstract generating model, and the other abstract is used as the output of the abstract generating model;
and the training module is used for training the abstract generating model by utilizing the plurality of training samples to obtain a trained abstract generating model, and the trained abstract generating model is used for generating the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity according to any one abstract of the plurality of abstracts of the commodity.
Another aspect of the present disclosure provides an electronic device including:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, because the introduction text of the commodity is acquired, the introduction text is divided into a plurality of parts, the central sentence of each part of the plurality of parts is determined, sampling is performed according to the random variables P to E (λ) in each part to obtain the sampling sentence corresponding to each part, and the sampling sentences corresponding to each part are combined to obtain the input text, because each sampling sentence forming the input text is randomly extracted and sampling is performed according to the random variables P to E (λ), the probability that the sentences closer to the central sentence are extracted to become the sampling sentences is higher, and meanwhile, various input texts can be generated for the commodity, and the input text of the commodity is expanded.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an exemplary system architecture to which a text processing method may be applied, according to an embodiment of the disclosure;
FIG. 2 schematically shows a flow diagram of a text processing method according to an embodiment of the present disclosure;
FIG. 3A schematically illustrates a flow chart for determining a centering sentence according to an embodiment of the present disclosure;
FIG. 3B schematically shows a flow chart of determining a centering sentence according to an embodiment of the present disclosure;
FIG. 4 schematically shows a flow diagram of a text processing method according to an embodiment of the present disclosure;
FIG. 5 schematically shows a block diagram of a text processing apparatus according to an embodiment of the present disclosure;
fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a text processing method. The method comprises the steps of obtaining an introduction text of a commodity, dividing the introduction text into a plurality of parts, enabling each part of the plurality of parts to comprise at least one sentence, determining a central sentence of each part, randomly sampling one sentence in each part by using a random variable method to obtain a sampling sentence corresponding to each part, combining the sampling sentences corresponding to each part to obtain an input text, and enabling the input text to be used for generating an abstract of the commodity.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which a text processing method may be applied, according to an embodiment of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the text processing method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the text processing apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The text processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the text processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Alternatively, the text processing method provided by the embodiment of the present disclosure may also be executed by the terminal device 101, 102, or 103, or may also be executed by another terminal device different from the terminal device 101, 102, or 103. Accordingly, the text processing apparatus provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103, or disposed in another terminal device different from the terminal device 101, 102, or 103.
For example, the introduction text of the product may be originally stored in any one of the terminal apparatuses 101, 102, or 103 (for example, but not limited to, the terminal apparatus 101), or may be stored on an external storage apparatus and may be imported into the terminal apparatus 101. Then, the terminal device 101 may locally execute the text processing method provided by the embodiment of the present disclosure, or send the introduction text of the article to another terminal device, a server, or a server cluster, and execute the text processing method provided by the embodiment of the present disclosure by another terminal device, a server, or a server cluster that receives the introduction text of the article.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 schematically shows a flow diagram of a text processing method according to an embodiment of the disclosure.
As shown in fig. 2, the method includes operations S201 to S205.
In operation S201, an introduction text of a commodity is acquired, the introduction text including a plurality of sentences.
In operation S202, the introduction text is divided into a plurality of sections.
In operation S203, a central sentence of each of the plurality of portions is determined.
In operation S204, in each part, sampling is performed according to random variables P to E (λ), and a sampling sentence corresponding to each part is obtained, where λ is 1/number of sentences in the part to be sampled, P represents a probability of taking a sentence ordered by a distance i from a central sentence of the part, and E (·) represents an exponential distribution.
In operation S205, the sampling sentences corresponding to each portion are combined to obtain an input text, where the input text is used to generate an abstract of the product.
In the disclosure, the commodity may be a commodity under the category of a small sample, the introduction text of the commodity may be taken from a detail page of the commodity, and the detail page of the commodity may show the introduction commodity information in the form of a picture, a character, a video, an audio, and the like. By utilizing a text extraction technology, commodity information displayed in various forms such as pictures, characters, videos, audios and the like in a detail page of a commodity is extracted to obtain a full introduction text of the commodity. More, when the number of words of the total commodity text is less than a certain threshold, the total introduction text is the introduction text of the commodity, and when the number of words of the total introduction text is not less than the certain threshold, a compression algorithm can be sampled to compress the total introduction text to form the introduction text of the commodity. The specific value of the threshold is not limited in the present disclosure, and can be set by a person skilled in the art according to actual situations. Likewise, the number of portions is not limited in this disclosure and can be 10, 20, 50, etc. The number of sentences in each section may be the same or different, and the present disclosure is not limited thereto.
In the present disclosure, the introduction text is divided into a plurality of parts, and may be divided at will, or at least some of the plurality of sentences in the introduction text are divided into a plurality of parts by using a clustering algorithm. The clustering algorithm may be K-Means clustering, K-Medoids, Clara, Clarans, etc. The present disclosure does not limit the type of the clustering algorithm.
In the present disclosure, sampling is performed according to random variables P to E (λ) in each part to obtain a sampling sentence corresponding to each part, where λ is 1/number of sentences in the part to be sampled, P represents a probability of taking a sentence ordered by a distance i from a central sentence of the part, and E (·) represents an exponential distribution. By using random sampling that follows exponential distribution, both the randomness of the sampling sentence extracted in each portion is guaranteed, and the probability that sentences closer to the central sentence are extracted as sampling sentences is higher.
According to the method, the introduction text of the commodity is obtained, the introduction text is divided into a plurality of parts, the central sentence of each part of the plurality of parts is determined, sampling is carried out according to random variables P-E (lambda) in each part, sampling sentences corresponding to each part are obtained, the sampling sentences corresponding to each part are combined, and the input text is obtained.
Fig. 3A and 3B schematically illustrate a flow chart of determining a centering sentence according to an embodiment of the present disclosure.
As shown in fig. 3A, operation S203 includes:
s301a, in each part, randomly acquiring any one sentence in the part as a corresponding central sentence. That is, the central sentence of each part is not limited to each part, and may be any one sentence.
As shown in fig. 3B, operation S203 includes:
and S301b, calculating the similarity between every two sentences contained in each part.
S302b, in each part, calculating the sum of the similarity of each sentence with other sentences, respectively, to obtain the total similarity of each sentence in each part.
S303b, selecting the sentence with the highest total similarity as the corresponding central sentence in each part.
For example, one of the portions includes sentences M1, M2 and M3, the similarities S12, S13 and S23 between sentences M1, M2 and M3 are calculated, the total similarity of sentence M1 is S12+ S13, the total similarity of sentence M2 is S12+ S23, and the total similarity of sentence M3 is S13+ S23, and then which of S12+ S13, S12+ S23 and S13+ S23 has the largest value is compared, and if the largest value is S12+ S13, the sentence M1 is taken as the central sentence in the portion.
FIG. 4 schematically shows a flow diagram of a text processing method according to an embodiment of the disclosure.
As shown in fig. 4, the method includes:
in operation S401, a plurality of abstracts of goods based on an input text are acquired.
In operation S402, a plurality of training samples are constructed according to the plurality of digests.
In operation S403, the abstract generating model is trained using a plurality of training samples, so as to obtain a trained abstract generating model.
Each training sample comprises two different abstracts, one abstract of the two different abstracts is used as the input of a preset abstract generating model, and the other abstract is used as the output of the abstract generating model. For example, for one input text, the commodity digests are y1, y2, and y3, and the training samples may be (y1, y2), (y2, y1), (y2, y3), (y3, y2), (y1, y3), (y3, y1), and so on.
The trained abstract generation model is used for generating the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity according to any one abstract of the plurality of abstracts of the commodity. For example, for an input text with the abstracts of commodities y1, y2 and y3, more yi can be generated by using the trained abstract generation model, such as y1、y2、y3,...,ym
In the present disclosure, any one of the plurality of abstracts of the commodity is input into the trained abstract generation model, and the abstract of the commodity, which is the same as or different from the plurality of abstracts of the commodity, is generated.
In the present disclosure, the digest generation model is a sequence-to-sequence model based on a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), or a Transformer.
In the present disclosure, the obtained plurality of abstracts of the goods may be manually written from the input text. Or the input text can be respectively input into the various abstract models by utilizing the existing various abstract models to obtain a plurality of abstracts of the commodity.
Fig. 5 schematically shows a block diagram of a text processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 5, the text processing apparatus 500 includes an acquisition module 510, a dividing module 520, a determination module 530, a sampling module 540, and a merging module 550.
An obtaining module 510, configured to obtain an introduction text of a commodity, where the introduction text includes a plurality of sentences;
a dividing module 520, configured to divide the introduction text into a plurality of parts;
a determining module 530, configured to determine a central sentence of each portion;
a sampling module 540, configured to sample in each part according to random variables P to E (λ) to obtain a sampling sentence corresponding to each part, where λ is 1/number of sentences in the part to be sampled, P represents a probability of taking a sentence that is ordered by a distance i from a central sentence of the part, and E (·) represents an exponential distribution;
and a merging module 550, configured to merge the sampling sentences corresponding to each part to obtain an input text, where the input text is used to generate an abstract of the product.
In one embodiment of the present disclosure, the determining module 530 is specifically configured to randomly acquire any one sentence in each portion as a corresponding central sentence.
In one embodiment of the present disclosure, the determining module 530 includes:
a similarity operator module for calculating the similarity between each two sentences included in each part,
the word calculating module is used for calculating the sum of the similarity of each sentence with other sentences in each part to obtain the total similarity of each sentence in each part;
and the selection submodule is used for selecting the sentence with the highest total similarity in each part as a corresponding central sentence.
In one embodiment of the present disclosure, the dividing module 520 is specifically configured to divide at least a part of the sentences in the plurality of sentences into a plurality of parts by using a clustering algorithm.
In one embodiment of the present disclosure, the text processing apparatus 500 further includes:
the generation module is used for generating a plurality of abstracts of the commodity based on the input text;
the building module is used for building a plurality of training samples according to the plurality of abstracts, each training sample comprises two different abstracts, one abstract of the two different abstracts is used as the input of a preset abstract generating model, and the other abstract is used as the output of the abstract generating model;
and the training module is used for training the abstract generating model by utilizing the plurality of training samples to obtain a trained abstract generating model, and the trained abstract generating model is used for generating the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity according to any one abstract of the plurality of abstracts of the commodity.
In one embodiment of the present disclosure, the text processing apparatus 500 further includes:
and the input module is used for inputting any one abstract of the plurality of abstracts of the commodity into the trained abstract generation model to generate the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity.
In one embodiment of the present disclosure, the digest generation model is a sequence-to-sequence model based on RNN, CNN, or Transformer.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any number of the obtaining module 510, the dividing module 520, the determining module 530, the sampling module 540, and the combining module 550 may be combined in one module/unit/sub-unit to be implemented, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the obtaining module 510, the dividing module 520, the determining module 530, the sampling module 540, and the combining module 550 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any several of them. Alternatively, at least one of the obtaining module 510, the dividing module 520, the determining module 530, the sampling module 540 and the combining module 550 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
It should be noted that the text processing apparatus portion in the embodiment of the present disclosure corresponds to the text processing method portion in the embodiment of the present disclosure, and the description of the text processing apparatus portion specifically refers to the text processing method portion, which is not described herein again.
Fig. 6 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, an electronic device 600 according to an embodiment of the present disclosure includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. Processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 601 may also include onboard memory for caching purposes. Processor 601 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.
In the RAM 603, various programs and data necessary for the operation of the system 600 are stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. The processor 601 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 602 and/or RAM 603. It is to be noted that the programs may also be stored in one or more memories other than the ROM 602 and RAM 603. The processor 601 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, system 600 may also include an input/output (I/O) interface 605, input/output (I/O) interface 605 also connected to bus 604. The system 600 may also include one or more of the following components connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program, when executed by the processor 601, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 602 and/or RAM 603 described above and/or one or more memories other than the ROM 602 and RAM 603.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (11)

1. A text processing method, comprising:
acquiring an introduction text of a commodity, wherein the introduction text comprises a plurality of sentences;
dividing the introduction text into a plurality of parts, and determining a central sentence of each part of the plurality of parts;
sampling is carried out in each part according to random variables P-E (lambda) to obtain a sampling sentence corresponding to each part, wherein lambda is 1/the number of sentences in the part to be sampled, P represents the probability of taking the sentences which are in distance sequence with i from the central sentence of the part, and E (·) represents exponential distribution;
and combining the sampling sentences corresponding to each part to obtain an input text, wherein the input text is used for generating the abstract of the commodity.
2. The method of claim 1, the determining a central sentence for each of the plurality of portions comprising:
in each part, randomly acquiring any one sentence in the part as a corresponding central sentence.
3. The method of claim 1, the determining a central sentence for each of the plurality of portions comprising:
calculating the similarity between every two sentences included in each part,
in each part, calculating the sum of the similarity of each sentence with other sentences respectively to obtain the total similarity of each sentence in each part;
and selecting the sentence with the highest total similarity as a corresponding central sentence in each part.
4. The method of claim 1, the dividing the introductory text into a plurality of portions comprising:
and utilizing a clustering algorithm to divide at least part of the sentences into the parts.
5. The method of any of claims 1 to 4, further comprising:
acquiring a plurality of abstracts of the commodity based on the input text;
constructing a plurality of training samples according to the plurality of abstracts, wherein each training sample comprises two different abstracts, one abstract of the two different abstracts is used as the input of a preset abstract generating model, and the other abstract is used as the output of the abstract generating model;
and training the abstract generation model by using the plurality of training samples to obtain a trained abstract generation model, wherein the trained abstract generation model is used for generating the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity according to any one abstract of the plurality of abstracts of the commodity.
6. The method of claim 5, further comprising:
inputting any abstract of the plurality of abstracts of the commodity into the trained abstract generating model to generate the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity.
7. The method of claim 5, wherein the digest generation model is a sequence-to-sequence model based on RNN, CNN, or Transformer.
8. A text processing apparatus comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an introduction text of a commodity, and the introduction text comprises a plurality of sentences;
the dividing module is used for dividing the introduction text into a plurality of parts;
a determining module, configured to determine a central sentence of each portion;
a sampling module, configured to sample in each part according to random variables P to E (λ) to obtain a sampling sentence corresponding to each part, where λ is 1/the number of sentences in the part to be sampled, P represents a probability of taking a sentence that is ordered by a distance i from a central sentence of the part, and E (·) represents an exponential distribution;
and the merging module is used for merging the sampling sentences corresponding to each part to obtain an input text, and the input text is used for generating the abstract of the commodity.
9. The apparatus of claim 8, further comprising:
the generation module is used for generating a plurality of abstracts of the commodity based on the input text;
the building module is used for building a plurality of training samples according to the plurality of abstracts, each training sample comprises two different abstracts, one abstract of the two different abstracts is used as the input of a preset abstract generating model, and the other abstract is used as the output of the abstract generating model;
and the training module is used for training the abstract generating model by utilizing the plurality of training samples to obtain a trained abstract generating model, and the trained abstract generating model is used for generating the abstract of the commodity which is the same as or different from the plurality of abstracts of the commodity according to any one abstract of the plurality of abstracts of the commodity.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7.
CN202110139483.3A 2021-02-01 2021-02-01 Text processing method and device, electronic equipment and storage medium Pending CN113761175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110139483.3A CN113761175A (en) 2021-02-01 2021-02-01 Text processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110139483.3A CN113761175A (en) 2021-02-01 2021-02-01 Text processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113761175A true CN113761175A (en) 2021-12-07

Family

ID=78786575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110139483.3A Pending CN113761175A (en) 2021-02-01 2021-02-01 Text processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113761175A (en)

Similar Documents

Publication Publication Date Title
KR102155261B1 (en) Wide and deep machine learning models
CN107944481B (en) Method and apparatus for generating information
JP2020507861A (en) Method and apparatus for providing search results
US11758088B2 (en) Method and apparatus for aligning paragraph and video
US11354513B2 (en) Automated identification of concept labels for a text fragment
CN110866040B (en) User portrait generation method, device and system
CN112989146B (en) Method, apparatus, device, medium and program product for recommending resources to target user
CN112330382B (en) Item recommendation method, device, computing equipment and medium
CN107291774B (en) Error sample identification method and device
CN111897950A (en) Method and apparatus for generating information
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN112965916B (en) Page testing method, page testing device, electronic equipment and readable storage medium
US20190139432A1 (en) Methods and systems for animated walkthroughs in an online educational platform
KR102151322B1 (en) Information push method and device
CN112650942A (en) Product recommendation method, device, computer system and computer-readable storage medium
CN111444448B (en) Data processing method, server and system
CN112446214A (en) Method, device and equipment for generating advertisement keywords and storage medium
CN111125502B (en) Method and device for generating information
CN107273362B (en) Data processing method and apparatus thereof
CN113761175A (en) Text processing method and device, electronic equipment and storage medium
CN110888583B (en) Page display method, system and device and electronic equipment
CN114139059A (en) Resource recommendation model training method, resource recommendation method and device
CN111126649B (en) Method and device for generating information
CN113515701A (en) Information recommendation method and device
CN113010666A (en) Abstract generation method, device, computer system and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination