CN107016050A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN107016050A
CN107016050A CN201710106370.7A CN201710106370A CN107016050A CN 107016050 A CN107016050 A CN 107016050A CN 201710106370 A CN201710106370 A CN 201710106370A CN 107016050 A CN107016050 A CN 107016050A
Authority
CN
China
Prior art keywords
data
target
type
target data
data type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710106370.7A
Other languages
Chinese (zh)
Other versions
CN107016050B (en
Inventor
杨柳
何伟
胡红艳
索娟
李雅洁
高阳
马斌
李志刚
王天军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Xinjiang Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Xinjiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Information and Telecommunication Branch of State Grid Xinjiang Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201710106370.7A priority Critical patent/CN107016050B/en
Publication of CN107016050A publication Critical patent/CN107016050A/en
Application granted granted Critical
Publication of CN107016050B publication Critical patent/CN107016050B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data processing method and device.Wherein, this method includes:Pending data is obtained, wherein, pending data is that, for the data for the work order information for reflecting destination object, the data type of pending data includes target data type;Determine the corresponding target data dividing mode of target data type;Data division is carried out for the data of target data type to data type according to target data dividing mode.The present invention solves correlation technique and polytype data is carried out the technical problem for causing data-handling efficiency relatively low is uniformly processed.

Description

Data processing method and device
Technical field
The present invention relates to data processing field, in particular to a kind of data processing method and device.
Background technology
With the deep understanding to core competitiveness and adjustment, the ability of customer service has become the most crucial valency of enterprise One of value, Customer Service Center arises at the historic moment.Customer Service Center is that also known as call center (Call Center) or phone are sought Sell (Telemarketing), it is based on CTI (Computer Telephony Integration, i.e. computer telephone integration) The complete comprehensive letter that technology makes full use of the multiple function of communication network and computer network integrated and is connected as a single entity with enterprise Cease service system.Customer Service Center is the direct window exchanged between enterprise and client, produced in communication process be used for believe The data of breath interaction play very important be uniformly coordinated for the sale of whole enterprise, scheduling, management, personnel checking-up, increment Effect.
Therefore, in order to effectively using the data of the information exchange produced in communication process, it is necessary to the data are handled, Facilitate the use the data and carry out data analysis, excavate the effective information in the data.
In the prior art, it is typically to exist according to the data for the data of the information exchange produced in the communication process The time order and function order of information exchange, the data are uniformly processed.But, in data processing, collect image and Speech data, also, in data processing, commonly used approach is each Dynamic data exchange storage, Decentralization can be formed " data silo ", is unfavorable for data processing and utilization.
Polytype data are carried out for above-mentioned correlation technique asking of causing data-handling efficiency relatively low is uniformly processed Topic, not yet proposes effective solution at present.
The content of the invention
The embodiments of the invention provide a kind of data processing method and device, at least to solve correlation technique to polytype Data carry out that the technical problem that causes data-handling efficiency relatively low is uniformly processed.
One side according to embodiments of the present invention there is provided a kind of data processing method, including:Obtain pending number According to, wherein, the pending data is for the data for the work order information for reflecting destination object, the data of the pending data Type at least includes target data type;Determine the corresponding target data dividing mode of the target data type;According to described Target data dividing mode carries out data division to data type for the data of the target data type.
Further, the target data type includes at least one of:Image formatted data;Phonetic matrix data; Structured text formatted data.
Further, in the case where the target data type is described image formatted data, the target data is drawn The mode of dividing is in the way of geometry carries out cutting to described image formatted data;It is described in the target data type In the case of phonetic matrix data, the target data dividing mode is institute's speech format by data volume less than predetermined threshold The mode that data are merged;In the case where the target data type is the structured text formatted data, the mesh Mark data dividing mode is the mode for being split the corresponding tables of data of the structured text formatted data.
Further, in the case where the target data type is institute's speech format data, wherein, it is described according to institute Stating target data dividing mode and data type is divided for the data progress data of the target data type includes:Obtain described The data volume of phonetic matrix data;Judge whether the data volume of institute's speech format data is less than predetermined threshold;In the voice In the case that the data volume of formatted data is less than the predetermined threshold, then institute's speech format data are defined as voice to be combined Formatted data;The phonetic matrix data to be combined are merged into processing.
Further, it is described the phonetic matrix data to be combined are merged into processing to include:To the language to be combined Sound formatted data performs following union operation, obtains phonetic matrix data block, until the data volume of institute's speech format data block It is not less than the predetermined threshold, wherein, the phonetic matrix data to be combined are when performing the union operation labeled as current Phonetic matrix data:The current speech formatted data is incorporated into institute's speech format data block;Judge institute's speech format Whether the data volume of data block is less than the predetermined threshold;It is less than the predetermined threshold in the data volume of institute's speech format data block In the case of value, next phonetic matrix data are defined as the current speech formatted data.
Further, it is described according to the target data dividing mode to data type for the target data type Data are carried out after data division, and methods described also includes:Data type is drawn for the data of the target data type The target data block obtained after point is stored in target database.
Further, it is described data type is divided for the data of the target data type after obtained target Data block is stored in after target database, and methods described also includes:It is that data type is described in the target database The data of target data type set target indexed mode.
Another aspect according to embodiments of the present invention, additionally provides a kind of data processing equipment, including:Acquiring unit, is used In obtain pending data, wherein, the pending data be for the data for the work order information for reflecting destination object, it is described to treat The data type of processing data at least includes target data type;Determining unit, for determining the target data type correspondence Target data dividing mode;Division unit, for being the target to data type according to the target data dividing mode The data of data type carry out data division.
Further, the target data type includes at least one of:Image formatted data;Phonetic matrix data; Structured text formatted data.
Further, image division module, in the situation that the target data type is described image formatted data Under, the target data dividing mode is in the way of geometry carries out cutting to described image formatted data;Voice is drawn Sub-module, in the case of in the target data type for institute's speech format data, the target data dividing mode For data volume to be less than to the mode that institute's speech format data of predetermined threshold are merged;Text division module, in institute Target data type is stated in the case of the structured text formatted data, the target data dividing mode is by the knot The mode that the corresponding tables of data of structure Document type data is split.
Further, in the case where the target data type is institute's speech format data, wherein, it is described to divide single Member includes:Acquisition module, the data volume of speech format data for obtaining;Judge module, for judging the voice lattice Whether the data volume of formula data is less than predetermined threshold;Determining module, is less than institute for the data volume in institute's speech format data In the case of stating predetermined threshold, then institute's speech format data are defined as phonetic matrix data to be combined;Merging module, is used for The phonetic matrix data to be combined are merged into processing.
Further, the merging module includes:Following union operation is performed to the phonetic matrix data to be combined, obtained To phonetic matrix data block, until the data volume of institute's speech format data block is not less than the predetermined threshold, wherein, it is described to treat Merge phonetic matrix data and be labeled as current speech formatted data when performing the union operation:Merge submodule, for inciting somebody to action The current speech formatted data is incorporated into institute's speech format data block;Judging submodule, the speech format for judging Whether the data volume of data block is less than the predetermined threshold;Determination sub-module, for the data in institute's speech format data block In the case that amount is less than the predetermined threshold, next phonetic matrix data are defined as the current speech formatted data.
Further, after the division unit, described device also includes:Memory module, for being by data type The target data block that the data of the target data type are obtained after being divided is stored in target database.
Further, after the memory module, described device also includes:Index module, in the number of targets According to being that data that data type is the target data type set target indexed mode in storehouse.
In embodiments of the present invention, the pending data of the work order information for reflecting destination object is obtained, and this is treated The corresponding target data type of processing data, and the target data is determined according to the target data type of the pending data of acquisition The corresponding target data dividing mode of type, then by data type for target data type data according to target data division side Formula carries out data division.Using the present invention, according to the corresponding data dividing mode of polytype data, by various types of numbers According to handling respectively, the purpose that the data of different types are carried out with different processing is reached, it is achieved thereby that improving at data The technique effect of efficiency is managed, and then solves correlation technique polytype data be uniformly processed to cause data processing to be imitated The relatively low technical problem of rate.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of optional data processing method according to embodiments of the present invention;
Fig. 2 is a kind of optional Hadoop cluster environment schematic diagram according to embodiments of the present invention;
Fig. 3 (a) is a kind of schematic diagram of optional horizontal cutting partial image formatted data according to embodiments of the present invention;
Fig. 3 (b) is a kind of schematic diagram of optional vertical cutting partial image formatted data according to embodiments of the present invention;
Fig. 3 (c) is a kind of schematic diagram of optional rectangular block cutting image formatted data according to embodiments of the present invention;
Fig. 3 (d) is a kind of schematic diagram of optional irregular cutting image formatted data according to embodiments of the present invention;
Fig. 4 is the schematic diagram that a kind of optional phonetic matrix data according to embodiments of the present invention merge mode;
Fig. 5 is a kind of schematic diagram of the indexed mode of optional image formatted data according to embodiments of the present invention;
Fig. 6 is a kind of schematic diagram of the storage of the data of optional work order information according to embodiments of the present invention;
Fig. 7 is a kind of schematic diagram of optional data processing equipment according to embodiments of the present invention.
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model that the present invention is protected Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover Lid is non-exclusive to be included, for example, the process, method, system, product or the equipment that contain series of steps or unit are not necessarily limited to Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product Or the intrinsic other steps of equipment or unit.
According to embodiments of the present invention there is provided a kind of data processing method embodiment, it is necessary to illustrate, in the stream of accompanying drawing The step of journey is illustrated can perform in the computer system of such as one group computer executable instructions, and, although in stream Logical order is shown in journey figure, but in some cases, can be shown or described to be performed different from order herein The step of.
Fig. 1 is a kind of flow chart of optional data processing method according to embodiments of the present invention, as shown in figure 1, the party Method comprises the following steps:
Step S102, obtains pending data, wherein, pending data is for reflecting the work order information of destination object Data, the data type of pending data at least includes target data type;
Step S104, determines the corresponding target data dividing mode of target data type;
Step S106, carries out data for the data of target data type to data type according to target data dividing mode and draws Point.
By above-mentioned steps, the pending data of the work order information for reflecting destination object is obtained, and this is pending The corresponding target data type of data, and the target data type is determined according to the target data type of the pending data of acquisition Corresponding target data dividing mode, then data type is entered for the data of target data type according to target data dividing mode Row data are divided.Using the present invention, according to the corresponding data dividing mode of polytype data, by various types of data point Other places are managed, and have reached the purpose that the data of different types are carried out with different processing, it is achieved thereby that improving data processing effect The technique effect of rate, so solve correlation technique to polytype data carry out be uniformly processed cause data-handling efficiency compared with Low technical problem.
In the scheme that step S102 is provided, pending data is the data for reflecting the work order information of destination object. Work order information is the information of transmission work order in enterprise or action, for example, enterprise by Customer Service Center with Client is exchanged, and the data message i.e. with exchange can be produced in communication process, and the data message can be customer voice text Part, error interface sectional drawing, work order describe a variety of data messages such as text, client feedback text, then the data message is work order The data of information, send the object as destination object of the data of the work order information.
In a kind of optional embodiment, pending data at least includes target data type, and the target data type can To be certain predetermined format data, for example, the target data type can be image formatted data, the target data type can be with It is structured text formatted data, the target data type can be phonetic matrix data.
In the scheme that step S104 is provided, different target data types correspond to different target data division sides Formula, passes through the type of target data, it may be determined that the corresponding target data dividing mode of the target data.For example, in number of targets According to type be image formatted data in the case of, can the corresponding target data dividing mode of the image formatted data be level Cutting.
Step S106 provide scheme in, by data type for target data type data according to the target data class The corresponding target data dividing mode of type, the data to target data type carry out data division.
As a kind of optional embodiment, target data type can include at least one of:Image formatted data;Language Sound formatted data;Structured text formatted data.Using the present invention, target data class is determined according to the form of pending data Type, can make target data type include image formatted data, phonetic matrix data, and structured text formatted data, no Data with form have different data characteristics, therefore, and the corresponding processing mode of data of different-format is also different, according to number According to form determine that target data type can include image formatted data;Phonetic matrix data;Structured text formatted data, So as to set corresponding dividing mode for different form, be easy to determine the data of different target data type with it is right The datum target dividing mode answered.
As a kind of optional embodiment, in the case where target data type is image formatted data, target data is drawn The mode of dividing is in the way of geometry carries out cutting to image formatted data;It is phonetic matrix data in target data type In the case of, target data dividing mode is less than the mode that the phonetic matrix data of predetermined threshold are merged by data volume; In the case where target data type is structured text formatted data, target data dividing mode is by structured text form The mode that the corresponding tables of data of data is split., can be according to different target data types, it is determined that different using the present invention Data processing method, be easy to handle different types of data, so as to improve the treatment effeciency of data.
Alternatively, in the case where target data type is image formatted data, image formatted data can be passed through water Truncation point, vertical cutting, rectangular block cutting or grading mode is irregularly cut, the data of picture format are divided into multiple several The data of what shape, so that the larger data of information capacity are divided into the less data of multiple information capacities, facilitate number According to processing.
Alternatively, can be by the structured text in the case where target data type is structured text formatted data Formatted data is stored in corresponding tables of data, then tables of data corresponding with the structured text formatted data is split, and obtains Go out multiple subdatasheets, so that the larger data of information capacity are divided into the less data of multiple information capacities, it is convenient The processing of data.
As a kind of optional embodiment, in the case where target data type is phonetic matrix data, wherein, according to mesh Mark data dividing mode carries out data for the data of target data type to data type and divides and can include:Obtain phonetic matrix The data volume of data;Judge whether the data volume of phonetic matrix data is less than predetermined threshold;In the data volume of phonetic matrix data In the case of less than predetermined threshold, then phonetic matrix data are defined as phonetic matrix data to be combined;By voice lattice to be combined Formula data merge processing.Using the present invention, by judging the data volume of phonetic matrix data, data volume is less than predetermined threshold The phonetic matrix data of value merge processing, obtain the phonetic matrix data that data volume is higher than predetermined threshold, it is possible to achieve right The integration of the less phonetic matrix data of data volume, reduces the number of phonetic matrix data, facilitates the processing of data.
As a kind of optional embodiment, phonetic matrix data to be combined are merged into processing to include:Treat conjunction And phonetic matrix data perform following union operation, obtain phonetic matrix data block, until the data volume of phonetic matrix data block It is not less than predetermined threshold, wherein, phonetic matrix data to be combined are labeled as current speech formatted data when performing union operation: Current speech formatted data is incorporated into phonetic matrix data block;Judge the data volume of phonetic matrix data block whether less than predetermined Threshold value;In the case where the data volume of phonetic matrix data block is less than predetermined threshold, next phonetic matrix data are defined as Current speech formatted data.Using the present invention, data are merged into by the phonetic matrix data by data volume less than predetermined threshold Amount realizes the integration to the less phonetic matrix data of data volume, reduces voice higher than the phonetic matrix data of predetermined threshold The number of formatted data, facilitates the processing of data.
It is being target data type to data type according to target data dividing mode as a kind of optional embodiment Data are carried out after data division, and the embodiment can also include:Data type is drawn for the data of target data type The target data block obtained after point is stored in target database.Using the present invention, according to the corresponding number of targets of target data type According to dividing mode, after the data of target data type are carried out into data division, target data block is obtained, by the target data Block is stored in target database, facilitates the processing of data.
Alternatively, the target data block obtained after division is stored in target database, can be according to target data block pair The target data type answered determines the corresponding target database of target data block, and target data block is stored in into corresponding number of targets According in storehouse.
As a kind of optional embodiment, obtained after data type is divided for the data of target data type Target data block is stored in after target database, and the embodiment can also include:It is that data type is in target database The data of target data type set target indexed mode., can be according to the target for being stored in target database using the present invention The corresponding target data type of data block, is set corresponding target indexed mode, is targetedly indexed using specific target Mode inquires about target data block, can improve the speed of index.
Present invention also offers a kind of preferred embodiment, the preferred embodiment is applied in polynary isomery work order there is provided one kind The data processing method of big data distributed storage and analysis platform.
In enterprise, with the continuous development of business, work order quantity doubles in geometric progression, and Customer Service Center have accumulated Substantial amounts of work order information data, including customer voice file, error interface sectional drawing, work order describe text, client feedback text etc. Multi-source heterogeneous data message.These data can provide data supporting by as key data source for data analysis.For example, By analyzing client feedback text message and processing work order sum, objective evaluation can be carried out to the service quality of contact staff, Contact staff's ability rating is evaluated, plays very important being uniformly coordinated effect.
But, if the standard and specification of the unified collection of the data deficiency of work order information and storage, for grinding for the data Respective separate storage, decentralized mode will be used by studying carefully, so as to form " data silo ", be unfavorable for data processing, and The utilization of data.
For work order information data use respective separate storage, decentralized mode, cause data-handling efficiency Low, the problems such as data availability is relatively low can have magnanimity, isomerism, complexity according to the data of multi-source heterogeneous work order information Property and the characteristics of dynamic, carry out data processing.Detailed process is as follows:
1st, according to the characteristics of the data of multi-source heterogeneous work order information, the number of work order information is set up in Hadoop cluster environment According to fusion storage model;
2nd, on the basis of fusion storage model, it is that every class data set up suitable indexed mode, improves data query Efficiency;
3rd, based on fusion storage model, and corresponding indexed mode carries out data analysis, and directly perceived on the interface of Web ends Display data analysis result.
Need it is noted that Hadoop is that the distributed system an increased income architecture is visiting there is provided high-throughput The data of application program are asked, is adapted to those application programs for having super large data set, makes full use of the computing function of cluster to carry out High-speed computation and storage.
Fig. 2 is a kind of optional Hadoop cluster environment schematic diagram according to embodiments of the present invention, as shown in Fig. 2 the collection Group rings border can include:Server end and Web ends, wherein, it can include a master server in server end Hadoop clusters With many secondary servers, master server passes through network connection with secondary server;Work order type statistics point are shown on the interface of Web ends Analysis, work order thing are tied by analyses such as statistical analysis, module failure number of times seniority among brothers and sisters, client's order amount seniority among brothers and sisters and customer service seniority among brothers and sisters Really.
As a kind of optional embodiment, the concrete mode of data processing is as follows:
(1) the Hadoop cluster environment based on HDFS distributed documents can be built under linux, by the number of work order information Classified according to according to data format.
It should be noted that Linux is the stable multiple-user network operating system of a performance.HDFS, full name Hsdoop Distributed File System, Chinese is distributed file system, and it is general to be that one kind is designed to be adapted to operate in Distributed file system on hardware.
Alternatively, the data of work order information are classified according to data format, the phonetic matrix of wav forms can be divided into Data, the image formatted data of jpg forms, structured text formatted data.
(2) in mass data parallel computation, the division of data block is the pith of parallelization processing, and data block is divided Mode, the size of deblocking and parallel efficiency calculation have close ties.In order to lift the retrieval rate of work order data, pin Data to different types of work order information can use different data block dividing modes.
1) image formatted data of jpg forms is divided.
Fig. 3 (a) is a kind of schematic diagram of optional horizontal cutting partial image formatted data according to embodiments of the present invention, is such as schemed Shown in 3 (a), for the view data of Jpg forms, cutting can be carried out to image by the way of horizontal cutting.
Fig. 3 (b) is a kind of schematic diagram of optional vertical cutting partial image formatted data according to embodiments of the present invention, is such as schemed Shown in 3 (b), for the view data of Jpg forms, cutting can be carried out to image by the way of vertical cutting.
Fig. 3 (c) is a kind of schematic diagram of optional rectangular block cutting image formatted data according to embodiments of the present invention, such as Shown in Fig. 3 (c), for the view data of Jpg forms, cutting can be carried out to image by the way of rectangular block cutting.
Fig. 3 (d) is a kind of schematic diagram of optional irregular cutting image formatted data according to embodiments of the present invention, such as Shown in Fig. 3 (d), for the view data of Jpg forms, cutting can be carried out to image by the way of irregular cutting.
2) the phonetic matrix data of wav forms are divided.
Wav voice documents are generally smaller, if client's air time was less than 5 minutes, the number of corresponding phonetic matrix data According to amount less than 5M.Hadoop clusters utilize the information metadata of data block in NameNode host node storage clusters, and storage is not enough When 5M " small documents ", the operating pressure of NameNode nodes steeply rises.Therefore, using data consolidation strategy by Wav voices " small documents " are merged.
It should be noted that NameNode is the NameSpace of management system file, it maintains file system tree and whole All file and catalogue in tree.
Fig. 4 is the schematic diagram that a kind of optional phonetic matrix data according to embodiments of the present invention merge mode, such as Fig. 4 institutes Show, numbering 1 to 7 is the phonetic matrix data less than threshold value, wherein, the height of figure represents the data volume of phonetic matrix data, The voice document for being 1,2 and 3 will be numbered to merge, the voice document for being 5 and 6 will be numbered and merged, the voice for being 4 and 7 will be numbered Piece file mergence, can composition data amount be higher than threshold value phonetic matrix data block.
Alternatively, the metadata informations such as the pooling information of phonetic matrix data block, correspondence work order number are stored in HBase data In storehouse.
It should be noted that HBase, full name Hadoop Database, be a high reliability, high-performance, towards row, Telescopic distributed memory system.
3) partition structure Document type data.
For structuring work order information data, HBase databases generation tables of data is directly deposited into, tables of data is carried out Split, then with HFile document forms storage to HDFS.
It should be noted that HFile is the file organization form of HBase data storages.
(3) the burst index research under HDFS is carried out, is that each class data build suitable indexed mode.
Fig. 5 is a kind of schematic diagram of the indexed mode of optional image formatted data according to embodiments of the present invention, such as Fig. 5 It is shown, for image formatted data, the Quadtree Spatial Index towards image pyramid is attempted, with Block (in database most Small storage and processing unit) successively index, wherein, data include multiple Block, each Block according to hierarchic sequence successively Numbering, for example, n-th layer (Level n), Block numberings are B0;(n+1)th layer (Level n+1), Block numberings are B01、B02、 B03、B04
Alternatively, for structuring work order data, multiple index can be set up in HBase databases, such as can be to work Before single numbering is indexed, the index on the basis of area is first carried out, the index on the basis of system is then carried out, then carry out Index on the basis of module, forms 3 grades of indexes of work order information, to improve the speed of index.
(4) analyzed by the data to work order information, realize work order type statistics, work order origin of an incident statistics, module event Hinder the data analyses such as number of times seniority among brothers and sisters, customer service order amount seniority among brothers and sisters, customer service seniority among brothers and sisters, and analysis result is shown at Web ends.
Fig. 6 is a kind of schematic diagram of the storage of the data of optional work order information according to embodiments of the present invention, such as Fig. 6 institutes Show, the request of HDFS client read blocks, and send this request to HDFS, HDFS is obtained according in request HBase Data block name, and back end where obtaining data block in node name, then by HDFS access interfaces from multiple data Requested data block is read in node, HDFS access interfaces send the finger for closing connection to HDFS clients after the completion of reading Show.
As described in Figure 6, in HDFS record have work order information data multiple Format Types, such as phonetic matrix data, Image formatted data, structured text formatted data, wherein, image formatted data includes error interface sectional drawing, phonetic matrix number According to including client's inquiry voice and customer service voice, structured text formatted data includes work order text message and according to number The work order text message generated according to table.
As described in Figure 6, record has the information of multiple Format Types of the data of work order information, such as voice lattice in HBase Formula data, image formatted data, structured text formatted data, wherein, the data of graphical format include data block information, residing The information such as ranks number and correspondence work order number;Structured text formatted data includes data sublist information and word table numbering etc. Information;Phonetic matrix data include the information such as voice blocking information, voice block number and correspondence work order number.
As described in Figure 6, each data block node includes data block and copy, each data block be in database most Small storage and processing unit.
According to embodiments of the present invention, a kind of data processing equipment embodiment is additionally provided, it is necessary at explanation, the data The data processing method that reason device can be used for performing in the data processing method in the embodiment of the present invention, the embodiment of the present invention can To be performed in the data processing equipment.
Fig. 7 is a kind of schematic diagram of optional data processing equipment according to embodiments of the present invention, as shown in fig. 7, the dress Putting to include:Acquiring unit 71, for obtaining pending data, wherein, pending data is for reflecting destination object The data of work order information, the data type of pending data at least includes target data type;Determining unit 73, for determining mesh Mark the corresponding target data dividing mode of data type;Division unit 75, for according to target data dividing mode to data class Type carries out data division for the data of target data type.
It should be noted that the acquiring unit 71 in the embodiment can be used for performing the step in the embodiment of the present application Determining unit 73 in S102, the embodiment can be used for performing in the step S104 in the embodiment of the present application, the embodiment Division unit 75 can be used for performing the step S106 in the embodiment of the present application.Above-mentioned module and showing that corresponding step is realized Example is identical with application scenarios, but is not limited to above-described embodiment disclosure of that.
By above-described embodiment, the pending data of the work order information for reflecting destination object is obtained, and this waits to locate The corresponding target data type of data is managed, and the target data class is determined according to the target data type of the pending data of acquisition The corresponding target data dividing mode of type, then by data type for target data type data according to target data dividing mode Carry out data division.Using the present invention, according to the corresponding data dividing mode of polytype data, by various types of data Handle respectively, reached the purpose that the data of different types are carried out with different processing, it is achieved thereby that improving data processing The technique effect of efficiency, and then solve correlation technique polytype data are carried out to be uniformly processed and cause data-handling efficiency Relatively low technical problem.
As a kind of optional embodiment, target data type can include at least one of:Image formatted data;Language Sound formatted data;Structured text formatted data.
It is used as a kind of optional embodiment, image division module, for being image formatted data in target data type In the case of, target data dividing mode is in the way of geometry carries out cutting to image formatted data;Voice divides mould Block, in the case of being phonetic matrix data in target data type, target data dividing mode is less than pre- by data volume Determine the mode that the phonetic matrix data of threshold value are merged;Text division module, for being structuring text in target data type In the case of this formatted data, target data dividing mode is to be split the corresponding tables of data of structured text formatted data Mode.
As a kind of optional embodiment, in the case where target data type is phonetic matrix data, wherein, divide single Member can include:Acquisition module, the data volume for obtaining phonetic matrix data;Judge module, for judging phonetic matrix number According to data volume whether be less than predetermined threshold;Determining module, for the data volume in phonetic matrix data less than predetermined threshold In the case of, then phonetic matrix data are defined as phonetic matrix data to be combined;Merging module, for by phonetic matrix to be combined Data merge processing.
As a kind of optional embodiment, merging module can include:Phonetic matrix data to be combined are performed with following close And operate, phonetic matrix data block is obtained, until the data volume of phonetic matrix data block is not less than predetermined threshold, wherein, wait to close And phonetic matrix data are labeled as current speech formatted data when performing union operation:Merge submodule, for by current language Sound formatted data is incorporated into phonetic matrix data block;Judging submodule, for judge phonetic matrix data block data volume whether Less than predetermined threshold;Determination sub-module, in the case of being less than predetermined threshold in the data volume of phonetic matrix data block, by under One phonetic matrix data is defined as current speech formatted data.
As a kind of optional embodiment, after division unit, the embodiment can also include:Memory module, is used for The target data block obtained after data type is divided for the data of target data type is stored in target database.
As a kind of optional embodiment, after memory module, the embodiment can also include:Index module, is used for It is that the data that data type is target data type set target indexed mode in target database.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, others can be passed through Mode is realized.Wherein, device embodiment described above is only schematical, such as division of described unit, Ke Yiwei A kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can combine or Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual Between coupling or direct-coupling or communication connection can be the INDIRECT COUPLING or communication link of unit or module by some interfaces Connect, can be electrical or other forms.
The unit illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or used When, it can be stored in a computer read/write memory medium.Understood based on such, technical scheme is substantially The part contributed in other words to prior art or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are to cause a computer Equipment (can for personal computer, server or network equipment etc.) perform each embodiment methods described of the invention whole or Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes Medium.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (14)

1. a kind of data processing method, it is characterised in that including:
Obtain pending data, wherein, the pending data be for the data for the work order information for reflecting destination object, it is described The data type of pending data at least includes target data type;
Determine the corresponding target data dividing mode of the target data type;
Data division is carried out for the data of the target data type to data type according to the target data dividing mode.
2. according to the method described in claim 1, it is characterised in that the target data type includes at least one of:
Image formatted data;
Phonetic matrix data;
Structured text formatted data.
3. method according to claim 2, it is characterised in that
In the case where the target data type is described image formatted data, the target data dividing mode is according to several What shape carries out the mode of cutting to described image formatted data;
In the case where the target data type is institute's speech format data, the target data dividing mode is by data Amount is less than the mode that institute's speech format data of predetermined threshold are merged;
In the case where the target data type is the structured text formatted data, the target data dividing mode is The mode that the corresponding tables of data of the structured text formatted data is split.
4. method according to claim 2, it is characterised in that in the target data type be institute's speech format data In the case of, wherein, it is described according to the target data dividing mode to data type be the target data type data Carrying out data division includes:
Obtain the data volume of institute's speech format data;
Judge whether the data volume of institute's speech format data is less than predetermined threshold;
It is in the case where the data volume of institute's speech format data is less than the predetermined threshold, then institute's speech format data are true It is set to phonetic matrix data to be combined;
The phonetic matrix data to be combined are merged into processing.
5. method according to claim 4, it is characterised in that described to merge the phonetic matrix data to be combined Processing includes:
Following union operation is performed to the phonetic matrix data to be combined, phonetic matrix data block is obtained, until the voice The data volume of formatted data block is not less than the predetermined threshold, wherein, the phonetic matrix data to be combined are performing described close And current speech formatted data is labeled as when operating:
The current speech formatted data is incorporated into institute's speech format data block;
Judge whether the data volume of institute's speech format data block is less than the predetermined threshold;
In the case where the data volume of institute's speech format data block is less than the predetermined threshold, by next phonetic matrix data It is defined as the current speech formatted data.
6. according to the method described in claim 1, it is characterised in that it is described according to the target data dividing mode to data Type is carried out for the data of the target data type after data division, and methods described also includes:
The target data block obtained after data type is divided for the data of the target data type is stored in number of targets According to storehouse.
7. method according to claim 6, it is characterised in that it is described by data type be the target data type The target data block that data are obtained after being divided is stored in after target database, and methods described also includes:
It is that the data that data type is the target data type set target indexed mode in the target database.
8. a kind of data processing equipment, it is characterised in that including:
Acquiring unit, for obtaining pending data, wherein, the pending data is for reflecting that the work order of destination object is believed The data of breath, the data type of the pending data at least includes target data type;
Determining unit, for determining the corresponding target data dividing mode of the target data type;
Division unit, for entering according to the target data dividing mode to data type for the data of the target data type Row data are divided.
9. device according to claim 8, it is characterised in that the target data type includes at least one of:
Image formatted data;
Phonetic matrix data;
Structured text formatted data.
10. device according to claim 9, it is characterised in that
Image division module, in the case of being described image formatted data in the target data type, the number of targets It is in the way of geometry carries out cutting to described image formatted data according to dividing mode;
Voice division module, in the case of in the target data type for institute's speech format data, the number of targets It is the mode that merges of institute's speech format data by data volume less than predetermined threshold according to dividing mode;
Text division module, it is described in the case of being the structured text formatted data in the target data type Target data dividing mode is the mode for being split the corresponding tables of data of the structured text formatted data.
11. device according to claim 9, it is characterised in that in the target data type be institute's speech format number In the case of, wherein, the division unit includes:
Acquisition module, the data volume of speech format data for obtaining;
Judge module, for judging, whether the data volume of speech format data is less than predetermined threshold;
Determining module, then will be described in the case of being less than the predetermined threshold in the data volume of institute's speech format data Phonetic matrix data are defined as phonetic matrix data to be combined;
Merging module, for the phonetic matrix data to be combined to be merged into processing.
12. device according to claim 11, it is characterised in that the merging module includes:
Following union operation is performed to the phonetic matrix data to be combined, phonetic matrix data block is obtained, until the voice The data volume of formatted data block is not less than the predetermined threshold, wherein, the phonetic matrix data to be combined are performing described close And current speech formatted data is labeled as when operating:
Merge submodule, for the current speech formatted data to be incorporated into institute's speech format data block;
Judging submodule, for judging, whether the data volume of speech format data block is less than the predetermined threshold;
Determination sub-module, in the case of being less than the predetermined threshold in the data volume of institute's speech format data block, by under One phonetic matrix data is defined as the current speech formatted data.
13. device according to claim 8, it is characterised in that after the division unit, described device also includes:
Memory module, for the target data block obtained after data type is divided for the data of the target data type It is stored in target database.
14. device according to claim 13, it is characterised in that after the memory module, described device also includes:
Index module, for being that the data that data type is the target data type set target in the target database Indexed mode.
CN201710106370.7A 2017-02-24 2017-02-24 Data processing method and device Active CN107016050B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710106370.7A CN107016050B (en) 2017-02-24 2017-02-24 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710106370.7A CN107016050B (en) 2017-02-24 2017-02-24 Data processing method and device

Publications (2)

Publication Number Publication Date
CN107016050A true CN107016050A (en) 2017-08-04
CN107016050B CN107016050B (en) 2019-12-20

Family

ID=59440506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710106370.7A Active CN107016050B (en) 2017-02-24 2017-02-24 Data processing method and device

Country Status (1)

Country Link
CN (1) CN107016050B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108248641A (en) * 2017-12-06 2018-07-06 中国铁道科学研究院电子计算技术研究所 A kind of urban track traffic data processing method and device
CN110446228A (en) * 2019-08-13 2019-11-12 腾讯科技(深圳)有限公司 Data transmission method, device, terminal device and storage medium
CN113554513A (en) * 2017-11-28 2021-10-26 创新先进技术有限公司 Data processing method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102904738A (en) * 2011-07-26 2013-01-30 华为软件技术有限公司 Work order processing method, relevant device and relevant system
CN102957544A (en) * 2011-08-17 2013-03-06 中国移动通信集团上海有限公司 Method and device for transmitting service work orders and service work order processing system
CN103714812A (en) * 2013-12-23 2014-04-09 百度在线网络技术(北京)有限公司 Voice identification method and voice identification device
CN104869006A (en) * 2014-02-25 2015-08-26 中国移动通信集团上海有限公司 Data service automatic activation method and platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102904738A (en) * 2011-07-26 2013-01-30 华为软件技术有限公司 Work order processing method, relevant device and relevant system
CN102957544A (en) * 2011-08-17 2013-03-06 中国移动通信集团上海有限公司 Method and device for transmitting service work orders and service work order processing system
CN103714812A (en) * 2013-12-23 2014-04-09 百度在线网络技术(北京)有限公司 Voice identification method and voice identification device
CN104869006A (en) * 2014-02-25 2015-08-26 中国移动通信集团上海有限公司 Data service automatic activation method and platform

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554513A (en) * 2017-11-28 2021-10-26 创新先进技术有限公司 Data processing method, device and system
CN108248641A (en) * 2017-12-06 2018-07-06 中国铁道科学研究院电子计算技术研究所 A kind of urban track traffic data processing method and device
CN110446228A (en) * 2019-08-13 2019-11-12 腾讯科技(深圳)有限公司 Data transmission method, device, terminal device and storage medium
CN110446228B (en) * 2019-08-13 2022-02-22 腾讯科技(深圳)有限公司 Data transmission method, device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN107016050B (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN109977158B (en) Public security big data analysis processing system and method
US10013440B1 (en) Incremental out-of-place updates for index structures
US20190384845A1 (en) Using computing resources to perform database queries according to a dynamically determined query size
CN106897411A (en) ETL system and its method based on Spark technologies
CN110390039A (en) Social networks analysis method, device and the equipment of knowledge based map
CN102906751A (en) Method and device for data storage and data query
US20110285715A1 (en) Method and System for Providing Scene Data of Virtual World
CN106951552A (en) A kind of user behavior data processing method based on Hadoop
CN107016050A (en) Data processing method and device
CN106294745A (en) Big data cleaning method and device
CN108509437A (en) A kind of ElasticSearch inquiries accelerated method
CN108984598A (en) A kind of fusion method and system of relationship type geologic database and NoSQL
US20180210919A1 (en) Network common data form data management
CN110008246A (en) Metadata management method and device
CN104750729B (en) A kind of data managing method and data management system based on journal file
CN108287889B (en) A kind of multi-source heterogeneous date storage method and system based on elastic table model
CN107291539B (en) Cluster program scheduler method based on resource significance level
CN110134646B (en) Knowledge platform service data storage and integration method and system
CN107886132B (en) Time series decomposition method and system for solving music traffic prediction
CN109471864A (en) A kind of facing parallel file system bilayer index method and system
CN114119317A (en) Knowledge graph construction method based on government affair service scene
CN108681577A (en) A kind of novel library structure data index method
CN109977977A (en) A kind of method and corresponding intrument identifying potential user
CN107861993A (en) A kind of data processing method and device for running application program
CN109656952A (en) Inquiry processing method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant