CN115759048A - Script text processing method and device - Google Patents

Script text processing method and device Download PDF

Info

Publication number
CN115759048A
CN115759048A CN202211521036.5A CN202211521036A CN115759048A CN 115759048 A CN115759048 A CN 115759048A CN 202211521036 A CN202211521036 A CN 202211521036A CN 115759048 A CN115759048 A CN 115759048A
Authority
CN
China
Prior art keywords
scene
text
field
role
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211521036.5A
Other languages
Chinese (zh)
Inventor
蒋松岐
周红喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haima Light Sail Entertainment Technology Co ltd
Original Assignee
Beijing Haima Light Sail Entertainment Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haima Light Sail Entertainment Technology Co ltd filed Critical Beijing Haima Light Sail Entertainment Technology Co ltd
Priority to CN202211521036.5A priority Critical patent/CN115759048A/en
Publication of CN115759048A publication Critical patent/CN115759048A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The scenario text processing method and device provided by the present disclosure can be used for the scenario content text of any scenario: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the field content text, and further obtaining first scene word similarity, second scene word similarity and third scene word similarity of the field content text; and inputting the role association data, the first scene word similarity, the second scene word similarity and the third scene word similarity into a pre-trained plot segmentation model to obtain plot segmentation points which are output by the plot segmentation model and correspond to the script text. According to the method, the plot segmentation points in the scenario text can be predicted quickly and accurately by using the pre-trained plot segmentation model through the similarity data and the role association data corresponding to each scenario in the scenario text, so that the plot segmentation efficiency and the accuracy of the scenario text are improved.

Description

Script text processing method and device
Technical Field
The present disclosure relates to the field of text processing, and in particular, to a script text processing method and apparatus.
Background
The script is a cultural and literature genre and is a text mainly composed of lines and stage instructions. In the script text, a flashback content, a montage content, a scene description content, a character dialogue sentence, a character action sentence, and the like are generally included.
A plurality of plots can be contained in a script text, the plots serve as drama and emotional components of a narrative story, and the current division of the plots in the script text mainly depends on manual judgment and operation of a drama professional. Under the condition of large text of the script, the problems of low efficiency of scenario division and easy error can occur.
Therefore, how to process the script text efficiently and correctly becomes a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the foregoing problems, the present disclosure provides a scenario text processing method and apparatus that overcome the foregoing problems or at least partially solve the foregoing problems, and the technical solution is as follows:
a transcript text processing method comprising:
obtaining a field content text corresponding to each field in the script text;
for the session content text of any of the sessions: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has a speech in the next scene;
obtaining the similarity of a first scene word of the field content text by utilizing the scene keyword vector;
obtaining the similarity of second scene words of the field content text by using the title scene word vector;
obtaining a third scene word similarity of the scene content text by using the character scene word vector;
and inputting the role associated data, the first scene word similarity, the second scene word similarity and the third scene word similarity corresponding to each scene into a pre-trained plot segmentation model to obtain plot segmentation points which are output by the plot segmentation model and correspond to the plot text.
Optionally, after obtaining the scenario segmentation points output by the scenario segmentation model and corresponding to the scenario text, the method further includes:
counting the number of plot partitioning points in the scenario text;
and generating the number of scenario plots corresponding to the scenario text based on the number of the scenario segmentation points.
Optionally, the obtaining of the content text of the session corresponding to each session in the scenario text includes:
identifying title texts in the script texts;
and dividing the script text into fields according to the title text to obtain field content texts corresponding to the fields.
Optionally, the obtaining the first scene word similarity of the field content text by using the scene keyword vector includes:
and performing similarity calculation by using the scene keywords corresponding to the field and other fields adjacent to the field to obtain the similarity of the first scene words of the field content text.
Optionally, the obtaining, by using the title scene word vector, a second scene word similarity of the field content text includes:
similarity calculation is carried out by using the field and the title scene word vectors corresponding to other fields with first incidence relation with the field, and second scene word similarity of the field content text is obtained;
and/or performing similarity calculation by using the field and the title scene word vectors corresponding to other fields having second incidence relation with the field to obtain second scene word similarity of the field content text.
Optionally, the first association relationship is adjacent to the field, and the second association relationship is adjacent to other fields having the first association relationship with the field.
Optionally, the role association data includes the number of occurrences of the first role in the field content text, the number of role association sentences, and the number of role interactions.
Optionally, the obtaining, by using the role scene word vector, a third scene word similarity of the field content text includes:
and performing similarity calculation by using the field and the character scene word vector of the next field adjacent to and after the field to obtain the third scene word similarity of the field content text.
Optionally, the scenario segmentation model is a castboost algorithm model.
A scenario text processing apparatus comprising: a scene content text obtaining unit, a text data obtaining unit, a first scene word similarity obtaining unit, a second scene word similarity obtaining unit, a third scene word similarity obtaining unit and a plot partitioning point obtaining unit,
the session content text obtaining unit is used for obtaining the session content texts corresponding to the sessions in the script text;
the text data obtaining unit is configured to, for the session content text of any one of the sessions: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content text, wherein the first role is a plurality of roles with the largest occurrence frequency in the script text, and the second role is a role which firstly appears in the scene and has speech in the next scene;
the first scene word similarity obtaining unit is configured to obtain a first scene word similarity of the field content text by using the scene keyword vector;
the second scene word similarity obtaining unit is configured to obtain a second scene word similarity of the scene content text by using the title scene word vector;
the third scene word similarity obtaining unit is configured to obtain a third scene word similarity of the field content text by using the role scene word vector;
a scenario division point obtaining unit, configured to input the role association data, the first scene word similarity, the second scene word similarity, and the third scene word similarity corresponding to each scenario into a scenario division model trained in advance, and obtain a scenario division point corresponding to the scenario text and output by the scenario division model.
By means of the technical scheme, the script text processing method and the script text processing device can obtain the corresponding scene content text of each scene in the script text; for any session of session content text: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has speech in the next scene; obtaining the similarity of first scene words of the field content text by using the scene keyword vector; obtaining the similarity of a second scene word of the field content text by using the title scene word vector; obtaining a third scene word similarity of the scene content text by using the character scene word vector; and inputting the role association data, the first scene word similarity, the second scene word similarity and the third scene word similarity into a pre-trained plot segmentation model to obtain plot segmentation points which are output by the plot segmentation model and correspond to the script text. According to the method, the plot partitioning points in the plot text can be rapidly and accurately predicted by utilizing the pre-trained plot partitioning model through the similarity data and the role correlation data corresponding to each scenario in the plot text, the plot partitioning of the plot text is facilitated, and the plot partitioning efficiency and the accuracy of the plot text are improved.
The foregoing description is only an overview of the technical solutions of the present disclosure, and the embodiments of the present disclosure are described below in order to make the technical means of the present disclosure more clearly understood and to make the above and other objects, features, and advantages of the present disclosure more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the disclosure. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flowchart illustrating an implementation of a script text processing method provided by an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating another implementation of a script text processing method provided by an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating another implementation of a script text processing method provided by an embodiment of the present disclosure;
fig. 4 shows a schematic structural diagram of a scenario text processing apparatus provided in an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, a flowchart of an implementation manner of a scenario text processing method provided in an embodiment of the present disclosure may include:
and S100, obtaining a field content text corresponding to each field in the script text.
Wherein the script text may be composed of lines and stage instructions. The script text may include a flashback content, a montage content, a scene description content, a character dialogue sentence, a character behavior sentence, and the like.
The disclosed embodiments may divide the scenario text into a plurality of sessions and obtain session content texts corresponding to the sessions. Specifically, the embodiment of the present disclosure may use a title text in the scenario text as a basis for field division, and perform field division on the scenario text to obtain field content texts corresponding to the fields respectively.
Optionally, based on the method shown in fig. 1, as shown in fig. 2, a flowchart of another implementation of the script text processing method provided in the embodiment of the present disclosure may include:
and S110, identifying a title text in the script text.
The title text is a short sentence used for indicating partial text content in the script text. It will be appreciated that the title text will typically have title text characteristics such as a particular text format and content in the transcript text.
Optionally, in the embodiment of the present disclosure, a preset title recognition algorithm may be used to perform title text classification determination on each text paragraph in the script text, and determine whether the text paragraph is a title text, so as to recognize the title text in the script text. The preset title recognition algorithm may be an algorithm for performing title text recognition based on the features of the title text.
And S120, dividing the script text into fields according to the title text to obtain field content texts corresponding to the fields.
Specifically, the embodiment of the present disclosure may determine a text content between two adjacent title texts in the script text as a corresponding field content text of a field.
The method and the device have the advantages that the scenario text is divided into the fields through the title text, the field content text of each field in the scenario text can be accurately obtained, the subsequent text content analysis and processing with the field content text as a main body are facilitated, and the accuracy of scenario division of the scenario text is improved.
S200, for the field content text of any field: and obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has a speech in the next scene.
The scene keyword vector is a word vector of a scene word extracted from the field content text. The scene word may be a word in the scene content text that describes the scene. For example: gardens, houses, factories, etc.
Optionally, the embodiment of the present disclosure may use a preset text preprocessing algorithm to preprocess the field content text, so as to screen out special content in the field content text. Alternatively, the special content may include flashback content and montage content.
Optionally, the embodiments of the present disclosure may use a TF-IDF (Term Frequency-Inverse text Frequency index) algorithm to identify and extract scene words in the preprocessed field content text. And carrying out ending word segmentation on the scene words and cleaning individual words to obtain word segmentation results of the scene words. And converting the word segmentation result into a word vector of the scene word by using a word2vector algorithm, wherein the word vector is the scene keyword vector of the field content text.
The title scene word vector is a word vector of a scene word extracted from the title text corresponding to the field content text. It is understood that the embodiments of the present disclosure may determine, in the scenario text, the title text adjacent to and preceding the field content text paragraph as the title text corresponding to the field content text.
Optionally, the disclosed embodiment may use a TF-IDF algorithm to identify and extract scene words in the title text corresponding to the field content text. And carrying out ending word segmentation on the scene words and cleaning individual words to obtain word segmentation results of the scene words. And converting the word segmentation result into a word vector of the scene word by using a word2vector algorithm, wherein the word vector is a title scene word vector of the field content text.
Wherein the first role is a main role in the script text. The disclosed embodiments may identify and extract all character names and positions of the character names in the script text in advance using a BERT (Bidirectional Encoder Representation based on a converter) algorithm. And counting the occurrence times of the character names in the script text, and determining a plurality of characters with the maximum occurrence times as main characters, namely the first characters. Optionally, the role name of the first three occurrences in the script text may be determined as the first role in the embodiment of the present disclosure.
The BERT algorithm can convert characters of the script text into digital vectors according to a preset model word list file, obtain classification results corresponding to the corresponding characters based on the digital vectors, and identify whether the corresponding characters are judged to be character names or not according to the classification results.
The classification task of the BERT algorithm is to split characters of a sentence, convert a digital vector and send the converted character into a trained dialogue sentence classification model to obtain a label judged by the model and judge whether the sentence is a dialogue sentence.
The classification task of the BERT algorithm is to split a sentence into characters, convert a digital vector and send the character to a trained behavior sentence classification model to obtain a label judged by the model and judge whether the sentence is a behavior sentence.
It will be appreciated that the BERT algorithm may output a text position corresponding to an entity synchronously for subsequent extraction of the entity using the text position.
Optionally, the character association data includes the number of occurrences of the first character in the field of content text, the number of character association sentences, and the number of character interactions.
Optionally, in the present disclosure, for any first role: the number of occurrences of the first character is counted.
The role association sentence comprises a dialogue sentence and a behavior sentence of which the first role exists in the field content text. Optionally, in the present disclosure, for any first role: and counting the number of the dialog sentences and the behavior sentences with the first role, and determining the number as the number of the role association sentences.
The dialogue sentences are sentences which express the dialogue contents needing to be exported in the script text.
The action sentences are sentences of actions and pictures which are required to perform and express the roles in the script text.
Wherein the number of character interactions includes a number of times an interaction occurs between the first characters in the field content text. Optionally, in the present disclosure, for any first role: and counting the number of times the first character interacts with other first characters.
It is understood that the embodiments of the present disclosure may determine the interaction behavior description text of the first character with the other first characters in the session content text as the interaction.
The embodiments of the present disclosure may use a BERT algorithm to identify a dialog sentence, a behavior sentence, and an interactive behavior description text in a script text in advance.
Alternatively, the second character may be a character in which a dialog sentence exists in the field of content text and in the field of content text adjacent to and subsequent to the field of content text. The disclosed embodiment may identify a character appearing in the field of content text, identify whether a dialog sentence exists in the character in the field of content text adjacent to and subsequent to the field of content text, and if so, determine the character as a second character in the field of content text.
Alternatively, the second persona may be the first persona that first appears in the session and has speech in the next session.
And the role scene word vector is a word vector of a scene word corresponding to the second role in the field content text.
Optionally, in the embodiment of the present disclosure, under the condition that the second role exists in the field of content text, the text contents of the head and the tail of each of the five lines of the field of content text, that is, the head text and the tail text, may be respectively extracted. And respectively identifying and extracting scene words corresponding to the second role in the head text and the tail text by using a TF-IDF algorithm. And carrying out ending word segmentation on the scene words and cleaning individual words to obtain word segmentation results of the scene words. And converting the word segmentation result into a word vector of the scene word by using a word2vector algorithm, wherein the word vector is the role scene word vector of the field content text. Optionally, the method and the device in the embodiment of the present disclosure may determine the scene word existing in the head text and the tail text at the same time as the second role as the scene word corresponding to the second role.
S300, obtaining the first scene word similarity of the field content text by using the scene keyword vector.
The first scene word similarity is the similarity between the scene keyword vector corresponding to the field and the scene keywords corresponding to other fields adjacent to the field. It will be appreciated that other fields adjacent to any one field may include preceding and/or succeeding fields of that field.
Optionally, in the embodiment of the present disclosure, similarity calculation may be performed by using the field and scene keywords corresponding to other fields adjacent to the field, so as to obtain a first scene word similarity of a field content text.
Specifically, the method and the device for obtaining scene word similarity of the content text of the field may use a cosine similarity algorithm to calculate similarity between the field and scene keywords corresponding to other fields adjacent to the field, so as to obtain the first scene word similarity of the content text of the field.
S400, obtaining the second scene word similarity of the field content text by using the title scene word vector.
The second scene word similarity may include similarities between the title scene word vector corresponding to the field and the title scene word vectors corresponding to other fields having the first association relationship with the field, or may include similarities between the title scene word vector corresponding to the field and the title scene word vectors corresponding to other fields having the second association relationship with the field.
Optionally, the first correlation is adjacent to the field, and the second correlation is adjacent to other fields having the first correlation. It will be appreciated that other sessions having a first relationship with the session may include preceding and/or succeeding sessions of the session. The other fields having the second association relationship with the field may include a field immediately preceding the field and may also include a field immediately succeeding the field. For ease of understanding, the description is made herein by way of example: suppose that there are in the script text, in order, field 1, field 2, field 3, field 4, and field 5, for field 3, field 2 and field 4 have a first association with field 3, and field 1 and field 5 have a second association with field 3.
Optionally, in the embodiment of the present disclosure, similarity calculation may be performed by using the field and the title scene word vectors corresponding to other fields having the first association relationship with the field, so as to obtain a second scene word similarity of the field content text.
Specifically, the similarity between the field and the title scene word vectors corresponding to other fields having a first incidence relation with the field can be calculated by using a cosine similarity algorithm, so as to obtain the second scene word similarity of the field content text.
Optionally, in the embodiment of the present disclosure, similarity calculation may be performed by using the field and the title scene word vectors corresponding to other fields having the second association relationship with the field, so as to obtain the second scene word similarity of the field content text.
Specifically, the embodiment of the present disclosure may use a cosine similarity algorithm to calculate the similarity between the field and the title scene word vectors corresponding to other fields having a second association relationship with the field, so as to obtain the second scene word similarity of the field content text.
And S500, obtaining the third scene word similarity of the field content text by using the role scene word vector.
The third scene word similarity is the similarity between the character scene word vector corresponding to the field and the character scene word vector of the next field adjacent to and after the field.
Optionally, in the embodiment of the present disclosure, similarity calculation may be performed by using the field and a character scene word vector of a next field that is adjacent to and after the field, so as to obtain a third scene word similarity of a field content text.
Specifically, the embodiment of the present disclosure may use a cosine similarity algorithm to calculate a similarity between the field and a character scene word vector of a next field adjacent to and after the field, so as to obtain a third scene word similarity of the field content text.
S600, inputting the role associated data, the first scene word similarity, the second scene word similarity and the third scene word similarity corresponding to each scene into a pre-trained plot segmentation model, and obtaining plot segmentation points which are output by the plot segmentation model and correspond to the plot text.
Optionally, the scenario segmentation model is a Catboost algorithm model. The embodiment of the disclosure can use a Catboost algorithm model, perform similarity calculation classification according to the sequencing proportion of 20%,40%,60% and 80% scene words by an ensemble learning method, calculate the discretization degree by combining role associated data, and finally output scenario segmentation points corresponding to scenario texts.
Wherein, the plot dividing point is a plot break point which does not form continuous development according to the plot logic.
It is understood that the content between any two adjacent scenario segmentation points in the scenario text is one scenario text content. The embodiment of the disclosure can rapidly and correctly divide the scenario of the scenario text through the scenario dividing points.
Optionally, based on the method shown in fig. 1, as shown in fig. 3, a flowchart of another implementation of the script text processing method provided in the embodiment of the present disclosure may further include, after step S600:
s700, counting the number of plot partitioning points in the scenario text.
And S800, generating the number of scenario plots corresponding to the scenario text based on the number of the scenario segmentation points.
The embodiment of the disclosure can visually display the number of the scenarios in the scenario text by generating the number of the scenarios in the scenario text, thereby facilitating the professional to know the scenario data of the scenario text.
According to the script text processing method, the corresponding scene content texts of each scene in the script text can be obtained; for any session content text: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has speech in the next scene; obtaining the similarity of first scene words of the field content text by using the scene keyword vector; obtaining the similarity of second scene words of the field content text by using the title scene word vector; utilizing the character scene word vector to obtain the third scene word similarity of the field content text; and inputting the role association data, the first scene word similarity, the second scene word similarity and the third scene word similarity into a pre-trained plot segmentation model to obtain plot segmentation points which are output by the plot segmentation model and correspond to the script text. According to the method, the plot segmentation points in the scenario text can be predicted quickly and accurately by using the pre-trained plot segmentation model through the similarity data and the role associated data corresponding to each scenario in the scenario text, the plot segmentation of the scenario text is facilitated, and the plot segmentation efficiency and the accuracy of the scenario text are improved.
Although the operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
Corresponding to the above method embodiment, an embodiment of the present disclosure further provides a scenario text processing apparatus, whose structure is shown in fig. 4, and may include: a scene content text obtaining unit 100, a text data obtaining unit 200, a first scene word similarity obtaining unit 300, a second scene word similarity obtaining unit 400, a third scene word similarity obtaining unit 500, and a scenario segmentation point obtaining unit 600.
A session content text obtaining unit 100, configured to obtain a session content text corresponding to each session in the scenario text.
A text data obtaining unit 200, configured to, for a field content text of any field: and obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has a speech in the next scene.
The first scene word similarity obtaining unit 300 is configured to obtain a first scene word similarity of a scene content text by using the scene keyword vector.
The second scene word similarity obtaining unit 400 is configured to obtain a second scene word similarity of the field content text by using the title scene word vector.
A third scene word similarity obtaining unit 500, configured to obtain a third scene word similarity of the field content text by using the role scene word vector.
A scenario segmentation point obtaining unit 600, configured to input the role association data, the first scene word similarity, the second scene word similarity, and the third scene word similarity corresponding to each scenario into a scenario segmentation model trained in advance, and obtain a scenario segmentation point corresponding to the scenario text and output by the scenario segmentation model.
Optionally, the scenario text processing apparatus may further include: a scenario dividing point number counting unit and a scenario number generating unit.
And a scenario division point number counting unit, configured to count the number of scenario division points in the scenario text after the scenario division point obtaining unit 600 obtains the scenario division points corresponding to the scenario text and output by the scenario division model.
And the script plot number generating unit is used for generating the script plot number corresponding to the script text based on the number of the script segmentation points.
Optionally, the session content text obtaining unit 100 includes: the title text identifying subunit and the field content text obtaining subunit.
And the title text identification subunit is used for identifying the title text in the script text.
And the field content text obtaining subunit is used for carrying out field division on the script text according to the title text to obtain field content texts corresponding to all the fields.
Optionally, the first scene word similarity obtaining unit 300 is specifically configured to perform similarity calculation by using the field and the scene keywords corresponding to other fields adjacent to the field, so as to obtain the first scene word similarity of the field content text.
Optionally, the second scene word similarity obtaining unit 400 is configured to perform similarity calculation using the field and the title scene word vectors corresponding to other fields having the first association relationship with the field to obtain the second scene word similarity of the field content text, and/or perform similarity calculation using the field and the title scene word vectors corresponding to other fields having the second association relationship with the field to obtain the second scene word similarity of the field content text.
Optionally, the first association relationship is adjacent to the field, and the second association relationship is adjacent to other fields having the first association relationship with the field.
Optionally, the character association data includes the number of occurrences of the first character in the field of content text, the number of character association sentences, and the number of character interactions.
Optionally, the third scene word similarity obtaining unit 500 is specifically configured to perform similarity calculation by using the field and a character scene word vector of a next field that is adjacent to and after the field, so as to obtain a third scene word similarity of a field content text.
Optionally, the scenario segmentation model is a Catboost algorithm model.
The script text processing device can obtain the content text of the scenes corresponding to each scene in the script text; for any session content text: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has speech in the next scene; obtaining the similarity of first scene words of the field content text by using the scene keyword vector; obtaining the similarity of second scene words of the field content text by using the title scene word vector; utilizing the character scene word vector to obtain the third scene word similarity of the field content text; and inputting the role correlation data, the first scene word similarity, the second scene word similarity and the third scene word similarity into a pre-trained plot segmentation model to obtain plot segmentation points which are output by the plot segmentation model and correspond to the scenario text. According to the method, the plot segmentation points in the scenario text can be predicted quickly and accurately by using the pre-trained plot segmentation model through the similarity data and the role associated data corresponding to each scenario in the scenario text, the plot segmentation of the scenario text is facilitated, and the plot segmentation efficiency and the accuracy of the scenario text are improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
The scenario text processing device comprises a processor and a memory, wherein the scenario content text obtaining unit 100, the text data obtaining unit 200, the first scene word similarity obtaining unit 300, the second scene word similarity obtaining unit 400, the third scene word similarity obtaining unit 500, the scenario division point obtaining unit 600 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, the similarity data and the role associated data corresponding to each field in the script text are obtained by adjusting the kernel parameters, and the script segmentation model trained in advance is utilized, so that the script segmentation points in the script text can be predicted quickly and accurately, the script text script segmentation is facilitated, and the script text script segmentation efficiency and the script text script segmentation accuracy are improved.
The disclosed embodiments provide a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements the script text processing method.
The embodiment of the disclosure provides a processor, and the processor is used for running a program, wherein the program executes the script text processing method during running.
The embodiment of the disclosure provides an electronic device, which comprises at least one processor, at least one memory and a bus, wherein the memory and the bus are connected with the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory to execute the script text processing method. The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present disclosure also provides a computer program product adapted to execute a program of initializing a screenplay text processing method step when executed on an electronic device.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, an electronic device includes one or more processors (CPUs), memory, and a bus. The electronic device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
In the description of the present disclosure, it is to be understood that the directions or positional relationships indicated as referring to the terms "upper", "lower", "front", "rear", "left" and "right", etc., are based on the directions or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the positions or elements referred to must have specific directions, be constituted and operated in specific directions, and thus, are not to be construed as limitations of the present disclosure.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one of skill in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the same. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the scope of the claims of the present disclosure.

Claims (10)

1. A script text processing method, comprising:
obtaining a field content text corresponding to each field in the script text;
for the session content text of any of the sessions: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has a speech in the next scene;
obtaining the similarity of a first scene word of the field content text by utilizing the scene keyword vector;
obtaining a second scene word similarity of the scene content text by using the title scene word vector;
obtaining the similarity of third scene words of the scene content text by using the role scene word vector;
and inputting the role associated data, the first scene word similarity, the second scene word similarity and the third scene word similarity corresponding to each scene into a pre-trained plot segmentation model to obtain plot segmentation points which are output by the plot segmentation model and correspond to the plot text.
2. The method of claim 1, wherein after obtaining a scenario segmentation point corresponding to the scenario text output by the scenario segmentation model, the method further comprises:
counting the number of plot partitioning points in the scenario text;
and generating the number of scenario plots corresponding to the scenario text based on the number of the scenario segmentation points.
3. The method according to claim 1, wherein the obtaining of the session content text corresponding to each session in the scenario text comprises:
identifying title texts in the script texts;
and dividing the script text according to the title text to obtain a field content text corresponding to each field.
4. The method according to claim 1, wherein said obtaining a first scene word similarity of the scene content text by using the scene keyword vector comprises:
and performing similarity calculation by using the scene keywords corresponding to the field and other fields adjacent to the field to obtain the first scene word similarity of the field content text.
5. The method of claim 1, wherein obtaining the second scene word similarity of the scene content text using the title scene word vector comprises:
performing similarity calculation by using the title scene word vectors corresponding to the field and other fields having a first incidence relation with the field to obtain a second scene word similarity of the field content text;
and/or performing similarity calculation by using the field and the title scene word vectors corresponding to other fields having second association relations with the field to obtain second scene word similarity of the field content text.
6. The method according to claim 5, wherein said first association is adjacent to the session and said second association is adjacent to other sessions having said first association with the session.
7. The method of claim 1, wherein the character association data includes a number of occurrences of the first character in the field of content text, a number of character associations, and a number of character interactions.
8. The method of claim 1, wherein obtaining a third scene word similarity of the scene content text using the character scene word vector comprises:
and performing similarity calculation by using the field and the character scene word vector of the next field adjacent to and after the field to obtain the third scene word similarity of the field content text.
9. The method of claim 1, wherein the plot partitioning model is a Catboost algorithm model.
10. A scenario text processing apparatus, comprising: a scene content text obtaining unit, a text data obtaining unit, a first scene word similarity obtaining unit, a second scene word similarity obtaining unit, a third scene word similarity obtaining unit and a plot dividing point obtaining unit,
the session content text obtaining unit is used for obtaining the session content texts corresponding to the sessions in the script text;
the text data obtaining unit is configured to, for the session content text of any one of the sessions: obtaining scene keyword vectors, title scene word vectors, role association data of a first role and role scene word vectors of a second role of the scene content texts, wherein the first role is a plurality of roles with the largest occurrence frequency in the script texts, and the second role is a role which firstly appears in the scene and has a speech in the next scene;
the first scene word similarity obtaining unit is configured to obtain a first scene word similarity of the field content text by using the scene keyword vector;
the second scene word similarity obtaining unit is configured to obtain a second scene word similarity of the field content text by using the title scene word vector;
the third scene word similarity obtaining unit is configured to obtain a third scene word similarity of the field content text by using the role scene word vector;
a scenario division point obtaining unit, configured to input the role association data, the first scene word similarity, the second scene word similarity, and the third scene word similarity corresponding to each scenario into a scenario division model trained in advance, and obtain a scenario division point corresponding to the scenario text and output by the scenario division model.
CN202211521036.5A 2022-11-30 2022-11-30 Script text processing method and device Pending CN115759048A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211521036.5A CN115759048A (en) 2022-11-30 2022-11-30 Script text processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211521036.5A CN115759048A (en) 2022-11-30 2022-11-30 Script text processing method and device

Publications (1)

Publication Number Publication Date
CN115759048A true CN115759048A (en) 2023-03-07

Family

ID=85341251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211521036.5A Pending CN115759048A (en) 2022-11-30 2022-11-30 Script text processing method and device

Country Status (1)

Country Link
CN (1) CN115759048A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237486A (en) * 2023-09-27 2023-12-15 深圳市黑屋文化创意有限公司 Cartoon scene construction system and method based on text content

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237486A (en) * 2023-09-27 2023-12-15 深圳市黑屋文化创意有限公司 Cartoon scene construction system and method based on text content
CN117237486B (en) * 2023-09-27 2024-05-28 深圳市黑屋文化创意有限公司 Cartoon scene construction system and method based on text content

Similar Documents

Publication Publication Date Title
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN108052577B (en) Universal text content mining method, device, server and storage medium
CN111046656B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN109034203B (en) Method, device, equipment and medium for training expression recommendation model and recommending expression
CN109582948B (en) Method and device for extracting evaluation viewpoints
CN116108857B (en) Information extraction method, device, electronic equipment and storage medium
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
CN111159375A (en) Text processing method and device
CN108205524B (en) Text data processing method and device
CN112613306A (en) Method, device, electronic equipment and storage medium for extracting entity relationship
CN115408488A (en) Segmentation method and system for novel scene text
CN109408175B (en) Real-time interaction method and system in general high-performance deep learning calculation engine
CN115759048A (en) Script text processing method and device
CN113901838A (en) Dialog detection method and device, electronic equipment and storage medium
CN111739537B (en) Semantic recognition method and device, storage medium and processor
CN113849623A (en) Text visual question answering method and device
CN113220854A (en) Intelligent dialogue method and device for machine reading understanding
CN113342935A (en) Semantic recognition method and device, electronic equipment and readable storage medium
CN117275466A (en) Business intention recognition method, device, equipment and storage medium thereof
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN110852103A (en) Named entity identification method and device
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium
CN113807920A (en) Artificial intelligence based product recommendation method, device, equipment and storage medium
CN112905752A (en) Intelligent interaction method, device, equipment and storage medium
CN112036188A (en) Method and device for recommending quality test example sentences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination