WO2018205178A1

WO2018205178A1 - Text exploration and measurement system and method

Info

Publication number: WO2018205178A1
Application number: PCT/CN2017/083848
Authority: WO
Inventors: 曹修源; 苏辛词
Original assignee: 曹修源; 苏辛词
Priority date: 2017-05-10
Filing date: 2017-05-10
Publication date: 2018-11-15

Abstract

A text exploration and measurement system (1) comprising: an automation system (110) that may obtain a first data set (111) to be compared, wherein the first data set (111) has at least one data component (112); a second data set (120), comprising a specific topic subset (122) of at least one specific topic set (121) and a weighting component (124), wherein the specific topic subset (122) has a feature component (123) corresponding to the content of the first data set (111); an analysis server (130) that is informationally connected to the first data set (111) and the second data set (120), carries out a comparison operation between the first data set (111) and the second data set (120), and performs a weighting operation on the basis of the results of the feature component (123) and the weighting component (124) corresponding to the at least one data component (112) so as to obtain a measurement reference value for the content of the at least one data component (112) in the at least one specific topic set (121) and/or the specific topic subset (122).

Description

Text exploration measurement system and method

[Technical Field]

The present invention relates to a word exploration measurement system and method, and more particularly to a network word exploration measurement system.

【Background technique】

At present, traditional market research companies are unable to perform immediate monitoring of the benefits of corporate advertising. According to statistics, under normal circumstances, an advertisement period is expected to have 3 to 4 advertisements within 2 months, and 1 advertisement will affect consumers' perception and attitude towards advertisement content within 2 weeks, while waiting for the traditional market investigation through the advertisement release period After the completion of the assignments such as waiting, issuing questionnaires, recycling questionnaires, and statistical analysis, the advertising cycle has passed, and the results can only be evaluated afterwards, and the impact of advertising content on consumer opinions cannot be understood in real time.

With the Internet word of mouth and online media leading the market trend, the research and analysis of online community big data is also regarded as the source of consumer opinions. However, most of the advertisements used to visit the website number/residence time as the basis for evaluation, and most of them have recently changed to The number of views is set as the target, but the evaluation basis and method still have the above problems.

In order to solve the aforementioned shortcomings, although some operators have developed various automatic evaluation systems for network evaluation, the analysis system and method only assign weighted scores to each text for each network evaluation content, and then comprehensively calculate all the scores to obtain one. A rough result, is the text in the network evaluation? It is not representative of the opinions, or the content of the opinions that belong to different facets cannot be judged. The evaluation results obtained in this way do not truly present the perception and attitude of the consumers.

In view of the conventional text search and measurement system and method, there are still many areas for improvement; the applicants of this case have developed the case after careful study, so that the network word exploration measurement system and method can be more perfect, accurate and more Easy to operate and close to the real state of the market and the effect of the reaction.

[Summary of the Invention]

The invention separates different text systems by constructing a multi-faceted text data set to specifically analyze the scores of the meanings of the specific words in the text in the text, and through the classification system of the feature words and the weighted characters. It is a target expression or a perception attitude, so as to achieve the effect of improving the validity of the survey results and truly reflecting the opinions of consumers, and achieve the purpose of responding to consumers' opinions in real time.

In one aspect, the present invention provides a text search measurement system comprising: a first data set having at least one data component to be compared; and a second data set comprising a specific one of at least one specific topic set a subject sub-collection, and a weighting component, wherein the specific topic sub-collection includes one feature component corresponding to the first data set content; and an analysis server, the information is connected to the first data set and the second data set, and executed The step of comparing the first data set and the second data set, performing a weighting operation according to the feature component and a result of the weighting component corresponding to the at least one data component, to obtain the at least one data component at the A reference to a particular set of topics and/or one of the content of the particular subtopic of the particular subject.

According to the above concept, wherein the at least one data component comprises the result of an automated system.

According to the above concept, the automation system further includes a segmentation system, and the information is connected to the at least one data component, and the at least one data component is divided into at least one block to obtain a target word of the at least one block.

According to the above concept, wherein the at least one particular topic set and/or the particular topic subset has its corresponding weighting component.

According to the above concept, further comprising a third data set, wherein the third data set includes a result of the weighting operation and the metric reference value for the at least one particular topic set and/or the content of the particular topic subset.

According to the above concept, the second data set is preset or can be set by a user through a usage interface, the specific topic subset, the feature component, and/or the weighting component.

According to the above concept, the weighting component has a measurement reference value ranging between -5 and +5.

According to the above concept, the feature component is selected from the group consisting of academic journals, papers, questionnaires, market reports, interviews, and machine learning algorithms.

In another aspect, the present invention provides a method for verifying the validity of a text search measurement system, comprising a step of performing a statistical verification of the results obtained by the text search measurement system and a comparison result.

According to the above concept, wherein the statistical verification is selected from one or more statistical methods comprising paired sample t assays.

In one aspect, the present invention further provides a method for text search measurement, comprising: Step 1: obtaining, by an automated system, a first data set to be compared, wherein the first data set includes at least one data Component; step two, establishing a second data set, wherein the second data set a specific subject sub-set comprising at least one specific topic set, and a weighting component, wherein the specific topic sub-set includes one feature component corresponding to the first data set content; and step 3, an analysis server executes the first data The step of comparing the step with the second data set; in step 4, the analyzing server performs a weighting operation according to the feature component and the result of the weighting component corresponding to the data component, to obtain the at least one data component for the at least A reference to a particular set of topics and/or one of the content of the particular subtopic of the particular subject.

According to the above concept, wherein the comparing step comprises: a first comparing step of comparing the data component with the feature component, and a second comparing step of performing the result of the first comparing step with the weighting component Comparison.

According to the above concept, there is further included a step: a segmentation system distinguishes the at least one data component into at least one block, and obtains a target word of the at least one block.

According to the above concept, there is further included a step: a user sets the specific topic set, the specific topic subset, the feature component, and/or the weighting component through a usage interface.

According to the above concept, there is further included a step of establishing a third data set comprising the result of the weighting operation and the metric reference value for the at least one particular topic set and/or the content of the particular topic subset.

According to the above concept, the method further includes a step of: performing the foregoing verification method on the result of the third data set, adjusting the content of the feature component according to the result of the statistical verification, and performing the method of the foregoing text exploration measurement again until the result of the statistical verification It has statistical significance.

In the present case, it is easier to let the general knowledge in this field understand the spirit of the case by the following drawings and implementation descriptions.

[Description of the Drawings]

FIG. 1 is a schematic diagram of an embodiment of a text search and measurement system of the present invention.

FIG. 2 is a schematic diagram of another embodiment of the text search and measurement system of the present invention.

3 is a schematic diagram of an embodiment of a facet sub-collection of a particular subject matter of the present invention.

4 is a schematic diagram of another embodiment of a facet subset of a particular subject matter of the present disclosure.

FIG. 5 is a schematic diagram showing an embodiment of weighting values of the weighting component of the present invention.

FIG. 6 is a schematic diagram showing an embodiment of a reference value obtained by comparing the contents of the data sentence in the present case.

Figure 7 is a schematic diagram of the method for measuring text in this case.

【detailed description】

The present invention will be described in the following examples to enable those of ordinary skill in the art to understand the spirit of the inventor's creation and to accomplish it. However, the embodiments of the present invention are not limited by the following embodiments.

Please refer to FIG. 1 , which is a schematic diagram of an embodiment of the text search measurement system 1 . As shown, in the present embodiment, the text search measurement system 1 includes an automation system 110, a second data set 120, an analysis server 130, and a usage interface 140. In an embodiment, the automation system 110 includes a exploration program 113 and/or Or a segmentation system 114, which may be a crawler program and/or a word breaker system, collecting text data of various social network platforms including Facebook, Youtube, PTT, Twitter, etc., and obtaining a first data set 111 to be compared; The second data set 120 includes a multi-level text data set 121 of a specific theme, a facet sub-set 122 of a specific topic, and a feature text component 123 and a weighted text component 124 for comparison; wherein the analysis server 130 is connected to the to-be-matched The data set 111 and the second data set 120 are weighted according to the content of the data set and the result of the feature text component 123 and the weighted text component 124 corresponding to the data sentence content 112 to be compared, and the data to be compared is given. The weighted value of the target word -5 to +5 points in the sentence content 112 obtains a measurement reference value of the different facets of the specific subject or the specific subject to be compared with the data.

In another embodiment, the segmentation system 114 is coupled to the at least one to-be-matched data sentence content 112, and distinguishes the to-be-matched data sentence content 112 into a plurality of blocks to obtain target words in the plurality of blocks.

In one embodiment, the feature text component 123 is selected from national journals and paper-confirmed text components (eg, Aaker's 1997 brand personality 42 feature characters) and the national translation language of the text component, or national journals and papers for the subject. The relevant words are provided for quantification (questionnaire) and qualitative (expert interview) to obtain the text components.

In one embodiment, the multi-level text data set 121 of a particular topic includes a facet sub-set 122 of a particular topic; in one embodiment, the second data set 120 contains Chinese-English text.

In an embodiment, the second data set 120 includes a language dictionary database, a slang database, and a self-built language database.

In an embodiment, the analysis server 130 may apply a machine learning algorithm to give weight values.

In another embodiment, the multi-level text data set 121 of the specific theme is a default or multi-level text data set 121 and a facet sub-set 122 thereof, and a feature text component that can be set or built by the user through the interface 140. 123 or weighted text component 124.

Please refer to FIG. 2 , which is a schematic diagram of another embodiment of the text search measurement system 2 of the present invention. As shown, in an embodiment, the second data set 220 includes a specific topic set 221 and a weighted text component 223 corresponding to the specific topic set 221; the specific topic set 221 includes a facet subset 222 of a particular topic and a specific topic The weighted text component 224 corresponding to the facet set 222.

In an embodiment, the feature text component 225 is constructed according to different cultures and different facets in different industries; in an embodiment, the user can build the feature text component 225 by itself.

In another embodiment, the text search measurement system further includes a result data set 240 that the analysis server 230 performs the comparison, including the results of the weighting operations and the measurement reference for the content of the data to be compared 212 on a particular topic or its different facets. value.

Please refer to FIG. 3, which is a schematic diagram of an embodiment of a facet sub-collection of a specific topic set of the present invention. As shown, in one embodiment, a particular set of topics can be subdivided into a number of facet sub-collections, such as facet items for marketing tactical benefits, such as Action, Awareness, Desire, Excited, Happy, etc., each facet item There are corresponding feature text components under it.

Please refer to FIG. 4, which is a schematic diagram of another embodiment of a facet sub-set of a particular subject matter of the present invention. The facet set and feature text components are built in three steps, including extracting keywords through journal paper questionnaires, relevant keywords from specific topics of focus interviews, and keywords or online popular terms obtained by machine learning algorithms.

Please refer to FIG. 5 , which is a schematic diagram of an embodiment of weighting values of the weighting component of the present invention. As shown, the different weighting components represent different weighting values, respectively.

In one embodiment, a weighting value of -5 to +5 points of the target word in the content of the data sentence to be compared is given, and the polarity is proportionally subdivided into a range such as a Likert Scale.

Please refer to FIG. 6 , which is a schematic diagram showing an embodiment of the reference value of the content of the data sentence to be compared. In an embodiment, the metric reference may represent a different facet score of the content of the data sentence to be compared to a particular topic or particular topic.

Please refer to FIG. 7 , which is a schematic diagram of the method for measuring text in this case. In an embodiment, the method for text search measurement includes the step S101: obtaining, by the crawler program, the data set to be compared includes at least one content of the data to be compared; and step S102, establishing a text database, which includes multiple levels of the specific topic. a text data set, a facet sub-set of a specific theme, and a feature text component and a weighted text component; in step S103, the at least one to-be-matched data sentence content is divided into a single or a plurality of blocks by the Chinese-English word-cutting system to obtain at least one area The target word of the block. Step S104, the analyzing server performs a step of comparing the data set to be compared with the multi-level text data set including the specific topic; and in step S105, the analyzing server corresponds to the content of the at least one data to be compared according to the feature component and the weighting component. A weighting operation is performed to obtain a reference value for the content of the data to be compared to a particular topic and/or its different facets on a particular topic.

In an embodiment, step S104 further includes step S1041, comparing the content of the data to be compared with the feature text component, and step S1042, comparing the comparison result of step S1041 with the weighted text component.

In another embodiment, the content of the data clause to be compared includes the results of the crawler program and/or the word breaker system.

In an embodiment, the multi-level text data set of a particular topic and the facet sub-set of a particular topic have their corresponding weighted text components.

In another embodiment, the method for text search measurement further includes a step of setting a text data set of a specific theme, a facet sub-set of a specific theme, a feature text component, or a weighted text component by using an interface.

In one embodiment, the method of text search measurement further comprises a step of establishing a comparison result data set comprising the result of the weighting operation and the measurement of the content of the data to be compared on a particular topic and/or a particular topic at different facets Reference.

In another embodiment, the method for text search measurement further comprises a step of statistically verifying the comparison result data set, correcting the content of the feature text component according to the result of the statistical verification, and performing the method of text exploration measurement again, performing statistical verification until The results of this statistical verification are statistically significant.

The present invention further provides a method of verifying the validity of a word exploration measurement system. In one embodiment, the method for verifying a text search measurement system includes a step of including a result of the weighted operation of the text search measurement system and the content of the data to be compared on a particular topic and/or a particular topic. The reference value of the measurement is compared with the general questionnaire result corresponding to it, and the statistical verification such as the paired sample t verification is performed to confirm the validity.

The above description is only the preferred embodiment of the present invention, and is not intended to limit the scope of implementation of the present invention; any changes and modifications made by those having ordinary knowledge in the field without departing from the spirit and scope of the present invention, It does not depart from the claims as claimed.

【Symbol Description】

1 word exploration measurement system

2 word exploration measurement system

100 methods

S101, S102, S103, S104, S105, S1041, S1042 Steps

110 automation system

111 to compare data sets

112 to compare the content of the data sentence

113 Exploration procedures

114 segmentation system

120 text database

121 text data collection for a specific topic

122 facet subcollections of a specific topic

123 feature text component

124 weighted text component

130 Analysis Server

140 Using the interface

210 automation system

211 to be compared to the data set

212 to compare the content of the data sentence

220 text database

221 text data collection for a specific topic

222 Faceted Subcollections for a Specific Topic

223 weighted text component

224 weighted text component

225 feature text component

230 Analysis Server

240 comparison result data set

Claims

A word exploration measurement system comprising:

a first data set having at least one data component to be compared;

a second data set comprising a specific subject sub-set of at least one specific topic set, and a weighting component, wherein the specific topic sub-set includes one of the feature components corresponding to the first data set content;

An analysis server, the information is connected to the first data set and the second data set, and performing a comparison step of the first data set and the second data set, according to the feature component and the weighting component corresponding to the at least The result of a data component is subjected to a weighting operation to obtain a reference value for the at least one data component in the at least one specific topic set and/or one of the specific topic sub-collection contents.
The word search measurement system of claim 1 wherein the at least one data component comprises a result of an automated system.
The word search measurement system of claim 2, wherein the automation system further comprises a segmentation system, wherein the information is connected to the at least one data component, and the at least one data component is divided into at least one block to obtain the at least one block. The target word.
The word search measurement system of claim 1 wherein the at least one particular topic set and/or the particular topic subset has its corresponding weighting component.
The word search measurement system of claim 1 further comprising a third data set, wherein the third data set includes results of the weighting operation and content related to the at least one particular topic set and/or the particular topic subset content The measurement reference value.
The phrasing measurement system of claim 1 , wherein the second data set is preset or can be set by a user through a usage interface, the particular topic subset, the feature component, and/or The weighting component.
The word search measurement system of claim 1 wherein the weighting component has a range of reference values between -5 and +5.
The word search measurement system of claim 1 wherein the feature component is selected from the group consisting of academic journals, papers, questionnaires, market reports, interviews, and machine learning algorithms.
A method of verifying the validity of a word search measurement system according to any of claims 1-8, comprising a step of performing a statistical verification of the results obtained by the text search measurement system with a comparison result.
The method of verifying according to claim 9, wherein the statistical verification is selected from one or more statistical methods comprising paired sample t assays.
A method of text exploration measurement that includes:

Step 1: obtaining, by an automated system, one of the first data sets to be compared, wherein the first data set includes at least one data component;

Step 2: Establish a second data set, where the second data set includes a specific topic subset of at least one specific topic set, and a weighting component, where the specific topic subset includes content corresponding to the first data set One feature component;

Step 3: An analysis server performs a comparison step of the first data set and the second data set;

Step 4: The analysis server performs a weighting operation according to the feature component and the result of the weight component corresponding to the data component, to obtain information about the at least one data component in the at least one specific topic set and/or the specific topic subset content. One measures the reference value.
The method of claim 11 wherein the at least one data component comprises the result of an automated system.
The method of claim 11 wherein the at least one particular topic set and/or the particular topic subset has its corresponding weighting component.
The method of any of claims 11-13, wherein the comparing step comprises:

a first comparison step of comparing the data component with the feature component, and;

A second alignment step compares the result of the first alignment step with the weighting component.
A method according to any of claims 11-13, further comprising a step:

A segmentation system distinguishes the at least one data component into at least one block, and obtains a target word of the at least one block.
A method according to any of claims 11-13, further comprising a step:

A user sets the particular set of topics, the particular subset of topics, the feature component, and/or the weighting component through a usage interface.
A method according to any of claims 11-13, further comprising a step:

A third data set is created that includes the result of the weighting operation and the metric reference value for the at least one particular topic set and/or the particular topic subset.
The method of claim 17 further comprising a step of:

Performing the method according to claim 9 or 10 on the result of the third data set, adjusting the content of the feature component according to the result of the statistical verification, and performing the method according to claim 11 again until the result of the statistical verification has Statistical significance.