CN110895557B - Text feature judgment method and device based on neural network and storage medium - Google Patents

Text feature judgment method and device based on neural network and storage medium Download PDF

Info

Publication number
CN110895557B
CN110895557B CN201911185402.2A CN201911185402A CN110895557B CN 110895557 B CN110895557 B CN 110895557B CN 201911185402 A CN201911185402 A CN 201911185402A CN 110895557 B CN110895557 B CN 110895557B
Authority
CN
China
Prior art keywords
text
training
identification
characters
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911185402.2A
Other languages
Chinese (zh)
Other versions
CN110895557A (en
Inventor
邓立邦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Zhimeiyuntu Tech Corp ltd
Original Assignee
Guangdong Zhimeiyuntu Tech Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Zhimeiyuntu Tech Corp ltd filed Critical Guangdong Zhimeiyuntu Tech Corp ltd
Priority to CN201911185402.2A priority Critical patent/CN110895557B/en
Publication of CN110895557A publication Critical patent/CN110895557A/en
Application granted granted Critical
Publication of CN110895557B publication Critical patent/CN110895557B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a text characteristic judgment method, a text characteristic judgment device and a storage medium based on a neural network, wherein the text characteristic judgment method comprises the steps of distributing identification information for identification characters, and generating character identification associated data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters; acquiring a text training set, wherein the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters; generating a curve coordinate graph corresponding to each training text according to the character identification association data; taking the curve coordinate graph as input, taking corresponding training text characteristics as output, and training by utilizing a neural network to obtain a text characteristic judgment template; and acquiring a text to be recognized, and judging the text to be recognized according to the text characteristic judging template to determine text characteristics. According to the scheme, the depth analysis of the text content is realized, the accuracy of the obtained text characteristics is high, and meanwhile, the judgment efficiency is improved.

Description

Text feature judgment method and device based on neural network and storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a text feature judgment method and device based on a neural network and a storage medium.
Background
With the development of social networks, e-commerce, mobile internet and other technologies, blogs, forums and social service networks such as public comments generate a great deal of user participation, and for valuable text comment data information with emotional colors such as characters, events, products and the like, the text comment data information is rapidly expanded and does not express various emotional colors and emotional tendencies of people, such as happiness, anger, grief, joy and criticism, praise and the like. The text comment data information is fully mined and deeply analyzed, so that the viewpoints and the standpoints of netizens can be better understood, and the decision in various fields such as public opinion management and control, business decision, viewpoint search, information prediction, emotion management and the like can be better assisted.
In the prior art, when a text is processed to determine a text characteristic (such as an emotional characteristic represented by the text), the accuracy is low, and the determination effect and the determination efficiency are poor.
Disclosure of Invention
The embodiment of the invention provides a text feature judgment method, a text feature judgment device, text feature judgment equipment and a storage medium based on a neural network, which improve the accuracy of a text feature judgment result, have high processing efficiency and good processing effect and realize deep excavation and analysis of a text.
In a first aspect, an embodiment of the present invention provides a text feature determination method based on a neural network, where the method includes:
allocating identification information to the identification characters to generate character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters;
acquiring a text training set, wherein the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters;
generating a curve coordinate graph corresponding to each training text according to the character identification association data;
taking the curve coordinate graph as input, taking corresponding training text characteristics as output, and training by utilizing a neural network to obtain a text characteristic judgment template;
and acquiring a text to be recognized, and judging the text to be recognized according to the text characteristic judging template to determine text characteristics.
In a second aspect, an embodiment of the present invention further provides a text feature determination apparatus based on a neural network, where the apparatus includes:
the data generation module is used for distributing identification information for the identification characters and generating character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters;
a training set obtaining module, configured to obtain a text training set, where the text set includes multiple training texts and training text features corresponding to each training text, and each training text is composed of one or more identification characters;
a coordinate graph generating module, configured to generate a curve coordinate graph corresponding to each training text according to the character identifier association data;
the template generation module is used for taking the curve coordinate graph as input, taking the corresponding training text characteristic as output and training by utilizing a neural network to obtain a text characteristic judgment template;
and the text characteristic determining module is used for acquiring the text to be recognized and judging the text to be recognized according to the text characteristic judging template to determine the text characteristic.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for determining text features based on a neural network according to the embodiment of the present invention.
In a fourth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for determining text features based on a neural network according to the present invention.
In the embodiment of the invention, identification information is distributed to identification characters to generate character identification associated data, the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters, a text training set is obtained, the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, each training text consists of one or more identification characters, a curve coordinate graph corresponding to each training text is generated according to the character identification associated data, the curve coordinate graph is used as input, the corresponding training text characteristics are used as output, a neural network is used for training to obtain a text characteristic judgment template, a text to be recognized is obtained, the text to be recognized is judged according to the text characteristic judgment template to determine the text characteristics, and the deep analysis of the text content is realized, the obtained text features have high accuracy, and meanwhile, the judgment efficiency is improved.
Drawings
Fig. 1 is a flowchart of a text feature determination method based on a neural network according to an embodiment of the present invention;
fig. 2 is a flowchart of another text feature determination method based on a neural network according to an embodiment of the present invention;
fig. 3 is a flowchart of another text feature determination method based on a neural network according to an embodiment of the present invention;
fig. 4 is a first schematic diagram of a curved graph according to an embodiment of the present invention;
fig. 5 is a flowchart of another text feature determination method based on a neural network according to an embodiment of the present invention;
FIG. 6 is a second schematic view of a graph according to an embodiment of the present invention;
FIG. 7 is a third schematic view of a graph according to an embodiment of the present invention;
fig. 8 is a block diagram of a text feature determination apparatus based on a neural network according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad invention. It should be further noted that, for convenience of description, only some structures, not all structures, relating to the embodiments of the present invention are shown in the drawings.
Fig. 1 is a flowchart of a text feature determination method based on a neural network according to an embodiment of the present invention, where this embodiment is applicable to determining text features, for example, corresponding emotional features may be determined for a piece of comment data, and the method may be executed by a computing device such as a server computer, and specifically includes the following steps:
step S101, distributing identification information for identification characters, and generating character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters.
The identification characters are characters used for comment or record of the user, and may be at least one of Chinese characters, punctuation characters, numeric characters and English characters, and the identification characters are not limited to the listed characters and may include any other contents capable of being entered and displayed.
The identification information is data that plays a role of identification, each identification character is assigned with unique corresponding identification information, illustratively, serial numbers 1 to 10000, each serial number corresponds to an identification character, for example, identification characters "day", "ground", and "person" correspond to identification information 1, 2, and 3, respectively.
The character identification association data can be stored in a manner of associating fields in a mapping table or a database, that is, the identification characters are associated with the allocated identification information, and after the identification characters are determined, the identification information corresponding to the identification characters can be uniquely determined.
Step S102, a text training set is obtained, the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters.
The text training set is a combination of texts for learning and training, the text training set comprises a plurality of training texts, each training text comprises a plurality of identification characters, and the training texts can be comment data acquired through a network, such as comment data of a news, an entertainment event or a movie and television play.
And S103, generating a curve coordinate graph corresponding to each training text according to the character identification association data.
As described above, each training text is composed of a plurality of identification characters, and each identification character and corresponding identification information are recorded in the character identification association data. In one embodiment, a curve coordinate graph corresponding to each training text is generated according to the character identification association data, wherein the curve coordinate graph is marked with the occurrence frequency of each character in the training text and the corresponding identification information.
And S104, taking the curve coordinate graph as input, taking the corresponding training text characteristic as output, and training by utilizing a neural network to obtain a text characteristic judgment template.
In the training process, text features can be set, for example, ten text feature grades are set, corresponding unique text feature grades are respectively matched for different training texts, a curve coordinate graph corresponding to each training text is used as input, the corresponding set text feature grade is used as output to conduct neural network training to obtain a text feature judgment template, and the specific neural network training can be implemented by using the existing mature neural network, such as a CNN convolutional neural network. And recording a corresponding standard characteristic curve coordinate graph and a corresponding text characteristic in the text characteristic template obtained by training. Illustratively, the text feature template records: the characteristic curve 1 corresponds to the text characteristic 1; the characteristic curve 2 corresponds to the text characteristic 2; the characteristic curve 3 corresponds to the text characteristic 3. It should be noted that the above example of the text feature template record is used for illustration, and a large number of feature graphs and corresponding text features are recorded in the template, wherein each text feature may also correspond to a plurality of different feature graphs, such as the feature curve 100, the feature curve 125, and the feature curve 306, which correspond to the text feature 5.
And S105, acquiring a text to be recognized, and judging the text to be recognized according to the text characteristic judging template to determine text characteristics.
The text to be recognized is an acquired text which needs text feature judgment, such as a section of comment text, and the section of text is judged according to the obtained text feature judgment template to obtain a corresponding text feature, where the text feature includes set emotional features, exemplarily, the emotional features include happiness, calmness, pessimism, and the like, and each emotional feature corresponds to one of the determined text feature levels.
According to the scheme, identification information is distributed to identification characters, character identification association data is generated, a text training set is obtained, the text set comprises a plurality of training texts and training text characteristics corresponding to the training texts, a curve coordinate graph corresponding to each training text is generated according to the character identification association data, the curve coordinate graph is used as input, the corresponding training text characteristics are used as output, a neural network is used for training to obtain a text characteristic judgment template, a text to be recognized is obtained, the text to be recognized is judged and determined according to the text characteristic judgment template, the mode is that the neural network model is used for training, each character identification is associated with the distributed identification information, when a judgment result is finally determined through comparison, the text to be recognized which is obtained by the trained text characteristic judgment template is used for carrying out characteristic judgment to obtain the text characteristics, the text characteristics can be judged efficiently and accurately, and the efficiency is higher than that of manual or other intelligent modes.
Fig. 2 is a flowchart of another text feature determination method based on a neural network according to an embodiment of the present invention, and provides an optimized text feature determination method. As shown in fig. 2, the technical solution is as follows:
step S201, determining different identification characters with preset quantity, allocating a unique identification serial number to each identification character, and storing each identification character and the corresponding identification serial number in an associated manner to generate a character identification association table or a character identification association matrix.
In one embodiment, a preset number of different identification characters is determined, for example, the preset number may be 6000, and the preset number of different characters may be frequently-used kanji characters and frequently-used punctuation characters, that is, identification information is not allocated to the frequently-used kanji characters, where the identification information is an identification number, for example, a total number of 6000 serial numbers from 1 to 6000 serial numbers. And generating a character identification association table or a character identification association matrix by associating and storing each identification character and the corresponding identification sequence number, namely determining the identification sequence number corresponding to each identification character through query of the character identification association table or the character identification association matrix. In one embodiment, a matrix operation is performed on 6000 identification characters, and corresponding identification information is generated for each character and stored in a matrix form.
Specifically, the determining manner of the different identifier characters of the preset number includes: the method comprises the steps that word information such as webpages and comments is randomly obtained through a web crawler, words in the word information are identified, the words in an identification result are counted, the statistics comprises the number of times, frequency and the like of the words, all the words with the number of times of occurrence larger than the preset number of times (such as 10 times) are recorded, the total number of the words is determined to be the preset number, a unique identification serial number is distributed for each word, and the statistical modes of punctuation marks and English words are the same.
Step S202, a text training set is obtained, the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters.
Step S203, generating a curve coordinate graph corresponding to each training text according to the character identification association data.
And S204, taking the curve coordinate graph as input, taking the corresponding training text characteristic as output, and training by utilizing a neural network to obtain a text characteristic judgment template.
And S205, acquiring a text to be recognized, and judging the text to be recognized according to the text characteristic judgment template to determine text characteristics.
According to the scheme, the specific character identification association table or the specific character identification association matrix is generated by determining the different identification characters with the preset number, allocating the unique identification serial number to each identification character and associating and storing each identification character and the corresponding identification serial number, so that the pertinence of the text feature judgment template obtained by training is improved, and the text feature judgment efficiency is improved.
Fig. 3 is a flowchart of another text feature determination method based on a neural network according to an embodiment of the present invention, which provides an example of a specific curve coordinate determination diagram. As shown in fig. 3, the technical solution is as follows:
step S301, distributing identification information for identification characters, and generating character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters.
Step S302, a text training set is obtained, wherein the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters.
Step S303, determining each training identification character contained in the training text, and determining identification information corresponding to each training identification character according to the character identification association data.
In one embodiment, the training text is subjected to character recognition to determine each training identification character, i.e., specific character content in the text, such as specific chinese characters, punctuation marks, english words, and the like. And after each training identification character is determined, identification information corresponding to each training identification character is determined according to the character identification association data.
Step S304, counting the identification information of each training identification character, and generating a curve coordinate graph corresponding to the training text, wherein the abscissa of the curve coordinate graph is the identification information, and the ordinate is the frequency of occurrence of each training identification character.
Fig. 4 is a first schematic diagram of a curve coordinate graph provided in an embodiment of the present invention, and as shown in fig. 4, an abscissa of the curve coordinate graph is identification information, and an ordinate is a number of times of occurrence of each training identification character, although contents represented by the abscissa and the ordinate may be interchanged. In fig. 4, 1 to 6000 pieces of identification information are recorded in the abscissa, and 1 to 30 times of statistics times are marked on the ordinate, and the upper limit adaptability adjustment is performed according to the actual training text length statistics times. Each training text corresponds to one curve coordinate graph through the statistical mode. In one embodiment, the abscissa may be identification information and the ordinate may be the frequency of occurrence of each identification character.
And S305, taking the curve coordinate graph as input, taking the corresponding training text characteristic as output, and training by utilizing a neural network to obtain a text characteristic judgment template.
And S306, acquiring a text to be recognized, and judging the text to be recognized according to the text characteristic judging template to determine text characteristics.
According to the scheme, after each training identification character contained in the training text is determined, identification information corresponding to each training identification character is determined according to the character identification association data, the identification information of each training identification character is counted, a curve coordinate graph corresponding to the training text is generated, the abscissa of the curve coordinate graph is the identification information, the ordinate of the curve coordinate graph is the frequency of occurrence of each training identification character, neural network training is further carried out based on the curve coordinate graph to obtain a text characteristic judgment template, the text characteristic judgment mode does not need complex calculation operation, and based on text content, the text characteristic judgment efficiency and the accuracy are high.
Fig. 5 is a flowchart of another text feature determination method based on a neural network according to an embodiment of the present invention, and provides a method for determining text features by specifically determining a text to be recognized. As shown in fig. 5, the technical solution is as follows:
step S401, allocating identification information to identification characters, and generating character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters.
Step S402, a text training set is obtained, the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters.
And S403, generating a curve coordinate graph corresponding to each training text according to the character identification association data.
And S404, taking the curve coordinate graph as input, taking the corresponding training text characteristic as output, and training by utilizing a neural network to obtain a text characteristic judgment template.
Step S405, a text to be recognized is obtained, identification characters in the text to be recognized are determined, and a corresponding curve coordinate graph is generated according to the character identification association data.
In an embodiment, after the text to be recognized is obtained, the text to be recognized is recognized to obtain the identification characters therein, and the specific text recognition mode may be a template matching method or a geometric feature extraction method. And after the identification characters are identified, determining a curve coordinate graph of the text to be identified according to the character identification association data.
Step S406, comparing the curve coordinate graph corresponding to the text to be recognized with the characteristic curve coordinate graph in the text characteristic judgment template, and determining the text characteristic of the text to be recognized according to the comparison result.
In one embodiment, one or more text feature determination templates are stored for different text features, and each text feature determination template can be embodied by a specific curve coordinate graph. And comparing the curve coordinate graph corresponding to the text to be recognized with the characteristic curve coordinate graph in the text characteristic judgment template, and determining the text characteristic of the text to be recognized according to the comparison result.
For example, fig. 6 shows a curve coordinate diagram of four text features and corresponding text feature determination templates, respectively, and fig. 6 is a second schematic diagram of a curve coordinate diagram provided in an embodiment of the present invention, as shown in fig. 6, where a first text feature can represent happy, a second text feature can represent calm, a third text feature can represent sad, and a fourth text feature can represent angry.
Fig. 7 is a third schematic view of a curve coordinate graph provided in the embodiment of the present invention, as shown in fig. 7, the curve coordinate graph is exemplarily corresponding to a text to be recognized, the curve coordinate graph is compared with the curve coordinate graph in fig. 6 to determine a curve coordinate graph with the highest similarity, the curve coordinate graph is taken as a curve coordinate graph matched with the text to be recognized, a text feature corresponding to the matched curve coordinate graph is determined as a text feature of the text to be recognized, and a text feature three is exemplarily determined as a text feature of the text to be recognized, so that it is correspondingly determined that the text to be recognized represents pessimistic emotion.
According to the scheme, the curve coordinate graph corresponding to the text to be recognized is matched with the curve graph in the text feature judgment template, the text feature corresponding to the text to be recognized is output according to the matching result, and rapid judgment of the text feature can be achieved so as to better assist decisions in various fields such as public opinion management and control, business decision, viewpoint search, information prediction, emotion management and the like.
On the basis of the above technical solution, after the text training set is obtained, the method further includes: grouping the training texts in the text training set according to the number of the identification characters; correspondingly, the training by using the neural network to obtain the text feature judgment template includes: and training the training texts in each group by using a neural network to obtain text characteristic judgment templates corresponding to different identification character numbers. Illustratively, the number of identification characters included in different training texts is grouped, for example, a group with the number of identification characters smaller than 25, a group with the number of identification characters larger than 25 and smaller than 50, a group with the number of identification characters larger than 50 and smaller than 10, and a group with the number of identification characters larger than 100. After grouping is finished, training is carried out by utilizing a neural network according to the training texts in each group to obtain a text characteristic judgment template, wherein the text characteristic template corresponds to different character identification numbers. When the text features to be recognized are judged, the number of identification characters of the text to be recognized is firstly determined, and the matching judgment is carried out on the text feature judgment templates with the corresponding number, so that the accuracy of the text features determined to be given is further optimized.
Fig. 8 is a block diagram of a structure of a text feature determination device based on a neural network according to an embodiment of the present invention, where the text feature determination device is configured to execute the text feature determination method based on a neural network according to the foregoing embodiment, and has corresponding functional modules and beneficial effects of the execution method. As shown in fig. 8, the apparatus specifically includes: a data generation module 101, a training set acquisition module 102, a coordinate graph generation module 103, a template generation module 104, and a text feature determination module 105, wherein,
the data generating module 101 is configured to allocate identification information to an identification character and generate character identification association data, where the identification character includes at least one of a chinese character, a punctuation character, a numeric character, and an english character;
a training set obtaining module 102, configured to obtain a text training set, where the text set includes multiple training texts and training text features corresponding to each training text, and each training text is composed of one or more identification characters;
a coordinate graph generating module 103, configured to generate a curve coordinate graph corresponding to each training text according to the character identifier association data;
the template generating module 104 is configured to use the curve coordinate graph as an input, use a corresponding training text feature as an output, and train by using a neural network to obtain a text feature judgment template;
the text feature determining module 105 is configured to obtain a text to be recognized, and determine a text feature by judging the text to be recognized according to the text feature judging template.
According to the scheme, identification information is distributed to identification characters, character identification association data is generated, a text training set is obtained, the text set comprises a plurality of training texts and training text characteristics corresponding to the training texts, a curve coordinate graph corresponding to each training text is generated according to the character identification association data, the curve coordinate graph is used as input, the corresponding training text characteristics are used as output, a neural network is used for training to obtain a text characteristic judgment template, a text to be recognized is obtained, the text to be recognized is judged and determined according to the text characteristic judgment template, the mode is that the neural network model is used for training, each character identification is associated with the distributed identification information, when a judgment result is finally determined through comparison, the text to be recognized which is obtained by the trained text characteristic judgment template is used for carrying out characteristic judgment to obtain the text characteristics, the text characteristics can be judged efficiently and accurately, and the efficiency is higher than that of manual or other intelligent modes.
In a possible embodiment, the data generating module 101 is specifically configured to:
determining different identification characters with preset quantity, and allocating a unique identification serial number to each identification character;
and storing each identification character and the corresponding identification serial number in an associated manner to generate a character identification association table or a character identification association matrix.
In a possible embodiment, the coordinate graph generating module 103 is specifically configured to:
determining each training identification character contained in the training text;
determining identification information corresponding to each training identification character according to the character identification association data;
and counting the identification information of each training identification character to generate a curve coordinate graph corresponding to the training text, wherein the abscissa of the curve coordinate graph is the identification information, and the ordinate is the occurrence frequency of each training identification character.
In one possible embodiment, the text feature determination template includes:
different characteristic curve coordinate graphs and respective corresponding text features, wherein the text features comprise emotion features.
In a possible embodiment, the text feature determination module 105 is specifically configured to:
determining identification characters in the text to be recognized, and generating a corresponding curve coordinate graph according to the character identification association data;
comparing the curve coordinate graph corresponding to the text to be recognized with the characteristic curve coordinate graph in the text characteristic judgment template;
and determining the text characteristics of the text to be recognized according to the comparison result.
In a possible embodiment, the text feature determination module 105 is specifically configured to:
determining the similarity between a curve coordinate graph corresponding to the text to be recognized and a characteristic curve coordinate graph in the text characteristic judgment template;
and determining the characteristic curve coordinate graph with the highest similarity as a comparison result.
In a possible embodiment, the data generating module 101 is further configured to: after a text training set is obtained, grouping training texts in the text training set according to the number of identification characters;
the template generating module 104 is specifically configured to:
and training the training texts in each group by using a neural network to obtain text characteristic judgment templates corresponding to different identification character numbers.
Fig. 9 is a schematic structural diagram of an apparatus according to an embodiment of the present invention, as shown in fig. 9, the apparatus includes a processor 201, a memory 202, an input device 203, and an output device 204; the number of the processors 201 in the device may be one or more, and one processor 201 is taken as an example in fig. 9; the processor 201, the memory 202, the input device 203 and the output device 204 in the apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 9.
The memory 202 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the text feature determination method based on neural network in the embodiment of the present invention. The processor 201 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 202, that is, implements the above-described text feature determination method based on a neural network.
The memory 202 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 202 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 202 may further include memory located remotely from the processor 201, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 203 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the apparatus. The output device 204 may include a display device such as a display screen.
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for determining text features based on a neural network, the method including:
allocating identification information to the identification characters to generate character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters;
acquiring a text training set, wherein the text set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters;
generating a curve coordinate graph corresponding to each training text according to the character identification association data;
taking the curve coordinate graph as input, taking corresponding training text characteristics as output, and training by utilizing a neural network to obtain a text characteristic judgment template;
and acquiring a text to be recognized, and judging the text to be recognized according to the text characteristic judging template to determine text characteristics.
From the above description of the embodiments, it is obvious for those skilled in the art that the embodiments of the present invention can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device) perform the methods described in the embodiments of the present invention.
It should be noted that, in the embodiment of the text feature determination device based on a neural network, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
It should be noted that the foregoing is only a preferred embodiment of the present invention and the technical principles applied. Those skilled in the art will appreciate that the embodiments of the present invention are not limited to the specific embodiments described herein, and that various obvious changes, adaptations, and substitutions are possible, without departing from the scope of the embodiments of the present invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the concept of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims (8)

1. The text feature judgment method based on the neural network is characterized by comprising the following steps of:
allocating identification information to identification characters and generating character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters;
acquiring a text training set, wherein the text training set comprises a plurality of training texts and training text characteristics corresponding to each training text, and each training text is composed of one or more identification characters;
generating a curve coordinate graph corresponding to each training text according to the character identification association data, wherein each training identification character contained in the training text is determined, identification information corresponding to each training identification character is determined according to the character identification association data, the identification information of each training identification character is counted, and the curve coordinate graph corresponding to the training text is generated, the abscissa of the curve coordinate graph is the identification information, and the ordinate is the occurrence frequency of each training identification character;
taking the curve coordinate graph as input, taking corresponding training text characteristics as output, and training by utilizing a neural network to obtain a text characteristic judgment template;
the method comprises the steps of obtaining a text to be recognized, judging the text to be recognized according to the text characteristic judging template to determine text characteristics, wherein identification characters in the text to be recognized are determined, a corresponding curve coordinate graph is generated according to the character identification association data, the curve coordinate graph corresponding to the text to be recognized is compared with a characteristic curve coordinate graph in the text characteristic judging template, and the text characteristics of the text to be recognized are determined according to a comparison result.
2. The method according to claim 1, wherein said assigning identification information to the identification character and generating character identification association data comprises:
determining different identification characters with preset quantity, and allocating a unique identification serial number to each identification character;
and storing each identification character and the corresponding identification serial number in an associated manner to generate a character identification association table or a character identification association matrix.
3. The method according to any one of claims 1-2, wherein the text feature determination template comprises:
different characteristic curve coordinate graphs and respective corresponding text features, wherein the text features comprise emotional features.
4. The method according to claim 1, wherein comparing the curve coordinate graph corresponding to the text to be recognized with the characteristic curve coordinate graph in the text characteristic judgment template comprises:
determining the similarity between a curve coordinate graph corresponding to the text to be recognized and a characteristic curve coordinate graph in the text characteristic judgment template;
and determining the characteristic curve coordinate graph with the highest similarity as a comparison result.
5. The method of claim 1, after obtaining the text training set, further comprising:
grouping the training texts in the text training set according to the number of the identification characters;
correspondingly, the training by using the neural network to obtain the text feature judgment template includes:
and training the training texts in each group by using a neural network to obtain text characteristic judgment templates corresponding to different identification character numbers.
6. A text feature determination device based on a neural network includes:
the data generation module is used for distributing identification information for the identification characters and generating character identification association data, wherein the identification characters comprise at least one of Chinese characters, punctuation characters, numeric characters and English characters;
a training set obtaining module, configured to obtain a text training set, where the text training set includes multiple training texts and training text features corresponding to each training text, and each training text is composed of one or more identification characters;
a coordinate graph generating module, configured to generate a curve coordinate graph corresponding to each training text according to the character identification association data, where the curve coordinate graph includes each training identification character included in the training text, identification information corresponding to each training identification character is determined according to the character identification association data, the identification information of each training identification character is counted, and a curve coordinate graph corresponding to the training text is generated, where an abscissa of the curve coordinate graph is the identification information, and an ordinate is the number of times that each training identification character appears;
the template generation module is used for taking the curve coordinate graph as input, taking the corresponding training text characteristic as output and training by utilizing a neural network to obtain a text characteristic judgment template;
the text feature determination module is used for acquiring a text to be recognized, determining text features according to the text feature determination template, determining identification characters in the text to be recognized, generating a corresponding curve coordinate graph according to the character identification association data, comparing the curve coordinate graph corresponding to the text to be recognized with the feature curve coordinate graph in the text feature determination template, and determining the text features of the text to be recognized according to comparison results.
7. A text feature determination device based on a neural network, the text feature determination device comprising: one or more processors; storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the neural network-based text feature determination method of any one of claims 1-5.
8. A storage medium containing computer-executable instructions for performing the neural network-based text feature determination method of any one of claims 1-5 when executed by a computer processor.
CN201911185402.2A 2019-11-27 2019-11-27 Text feature judgment method and device based on neural network and storage medium Active CN110895557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911185402.2A CN110895557B (en) 2019-11-27 2019-11-27 Text feature judgment method and device based on neural network and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911185402.2A CN110895557B (en) 2019-11-27 2019-11-27 Text feature judgment method and device based on neural network and storage medium

Publications (2)

Publication Number Publication Date
CN110895557A CN110895557A (en) 2020-03-20
CN110895557B true CN110895557B (en) 2022-06-21

Family

ID=69788473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911185402.2A Active CN110895557B (en) 2019-11-27 2019-11-27 Text feature judgment method and device based on neural network and storage medium

Country Status (1)

Country Link
CN (1) CN110895557B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1158460A (en) * 1996-12-31 1997-09-03 复旦大学 Multiple languages automatic classifying and searching method
CN102216941A (en) * 2008-08-19 2011-10-12 数字标记公司 Methods and systems for content processing
CN104021115A (en) * 2014-06-13 2014-09-03 北京理工大学 Chinese comparative sentence recognizing method and device based on neural network
CN108920545A (en) * 2018-06-13 2018-11-30 四川大学 The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension
CN109101597A (en) * 2018-07-31 2018-12-28 中电传媒股份有限公司 A kind of electric power news data acquisition system
CN109947914A (en) * 2019-02-21 2019-06-28 扬州大学 A kind of software defect automatic question-answering method based on template
CN110096591A (en) * 2019-04-04 2019-08-06 平安科技(深圳)有限公司 Long text classification method, device, computer equipment and storage medium based on bag of words
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488527B (en) * 2015-11-27 2020-01-10 小米科技有限责任公司 Image classification method and device
EP3557439A1 (en) * 2018-04-16 2019-10-23 Tata Consultancy Services Limited Deep learning techniques based multi-purpose conversational agents for processing natural language queries

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1158460A (en) * 1996-12-31 1997-09-03 复旦大学 Multiple languages automatic classifying and searching method
CN102216941A (en) * 2008-08-19 2011-10-12 数字标记公司 Methods and systems for content processing
CN104021115A (en) * 2014-06-13 2014-09-03 北京理工大学 Chinese comparative sentence recognizing method and device based on neural network
CN108920545A (en) * 2018-06-13 2018-11-30 四川大学 The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension
CN109101597A (en) * 2018-07-31 2018-12-28 中电传媒股份有限公司 A kind of electric power news data acquisition system
CN109947914A (en) * 2019-02-21 2019-06-28 扬州大学 A kind of software defect automatic question-answering method based on template
CN110096591A (en) * 2019-04-04 2019-08-06 平安科技(深圳)有限公司 Long text classification method, device, computer equipment and storage medium based on bag of words
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于深度学习的多维特征微博情感分析";金志刚 等;《中南大学学报(自然科学版)》;20180526;第49卷(第05期);1135-1140 *

Also Published As

Publication number Publication date
CN110895557A (en) 2020-03-20

Similar Documents

Publication Publication Date Title
Bansal et al. On predicting elections with hybrid topic based sentiment analysis of tweets
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
US9348934B2 (en) Systems and methods for facilitating open source intelligence gathering
CN110457672B (en) Keyword determination method and device, electronic equipment and storage medium
CN107679144A (en) News sentence clustering method, device and storage medium based on semantic similarity
CN108846138B (en) Question classification model construction method, device and medium fusing answer information
CN111061838B (en) Text feature keyword determination method and device and storage medium
CN108920649B (en) Information recommendation method, device, equipment and medium
CN110147552B (en) Education resource quality evaluation mining method and system based on natural language processing
Liu et al. Mining urban perceptions from social media data
CN104077417A (en) Figure tag recommendation method and system in social network
US20140297628A1 (en) Text Information Processing Apparatus, Text Information Processing Method, and Computer Usable Medium Having Text Information Processing Program Embodied Therein
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
CN111767393A (en) Text core content extraction method and device
CN111061837A (en) Topic identification method, device, equipment and medium
CN111984589A (en) Document processing method, document processing device and electronic equipment
CN113343108A (en) Recommendation information processing method, device, equipment and storage medium
CN114387061A (en) Product pushing method and device, electronic equipment and readable storage medium
CN111552798A (en) Name information processing method and device based on name prediction model and electronic equipment
CN111488501A (en) E-commerce statistical system based on cloud platform
CN113254651B (en) Method and device for analyzing referee document, computer equipment and storage medium
CN112417875B (en) Configuration information updating method and device, computer equipment and medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN110442696B (en) Query processing method and device
CN110895557B (en) Text feature judgment method and device based on neural network and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant