[go: up one dir, main page]

CN107368501A - The processing method and processing device of data - Google Patents

The processing method and processing device of data Download PDF

Info

Publication number
CN107368501A
CN107368501A CN201610319458.2A CN201610319458A CN107368501A CN 107368501 A CN107368501 A CN 107368501A CN 201610319458 A CN201610319458 A CN 201610319458A CN 107368501 A CN107368501 A CN 107368501A
Authority
CN
China
Prior art keywords
data
commodity
node
spark
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610319458.2A
Other languages
Chinese (zh)
Other versions
CN107368501B (en
Inventor
高春光
蒋佳涛
鲁艳阳
陈艺天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610319458.2A priority Critical patent/CN107368501B/en
Publication of CN107368501A publication Critical patent/CN107368501A/en
Application granted granted Critical
Publication of CN107368501B publication Critical patent/CN107368501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present invention discloses a kind of processing method and processing device of data.This method includes:Data are inquired about, the generation data exchange table that section collects on schedule;The data exchange table is handled using spark computings framework, including:The data exchange table is read, stripping and slicing is carried out to data by a data attribute;By each node of the data distribution of stripping and slicing to server cluster;Each node carries out the calculating related to the data attribute;Collect the implementing result of each node;Wherein, each node is configured with R language or python language computing modules, for handling imponderable part in spark computing frameworks.The data processing method and device of the present invention is based on spark computing frameworks, with reference to R language or python language, makes data processing more accurate complete.

Description

The processing method and processing device of data
Technical field
The disclosure relates in general to field of computer technology, and in particular to the processing method and dress of a kind of data Put.
Background technology
In present big data processing, because data volume is big, it is sometimes desirable to the content of calculating is more complicated, Meet to require using single treatment technology is more difficult.For example, in large-scale electric business enterprise, for evaluation The value of commodity, the general concept for using key value commodity KVI (Key Value Item), is being counted , it is necessary to handle mass data when calculating KVI indexes.And mass data can not be handled on unit or computing Overlong time, can not ensure calculate real-time, but single treatment technology again be relatively difficult to ensure card meter The integrality and accuracy of calculation.
Therefore, for the calculating of mass data, for example, KVI indexes calculating, it is necessary to a kind of new side Method.
Above- mentioned information is only used for strengthening the reason to the background of the disclosure disclosed in the background section Solution, therefore it can include not forming the information to prior art known to persons of ordinary skill in the art.
The content of the invention
The disclosure provides a kind of processing method and processing device of data, can quickly and accurately handle magnanimity The calculating of data.
Other characteristics and advantage of the disclosure will be apparent from by following detailed description, or partly By the practice of the disclosure and acquistion.
According to the disclosure in a first aspect, a kind of processing method of data, including:Data are looked into Ask, the generation data exchange table that section collects on schedule;Using spark computings framework to the number Handled according to swap table, including:
The data exchange table is read, stripping and slicing is carried out to data by a data attribute;
By each node of the data distribution of stripping and slicing to server cluster;
Each node carries out the calculating related to the data attribute;Collect the implementing result of each node;
Wherein, each node is configured with R language or python language computing modules, for handling spark Imponderable part in computing framework.
It is described inquiry is carried out to data to include according to an embodiment of the disclosure:Inquired about using Hive Language HQL is inquired about in data warehouse.
According to an embodiment of the disclosure, the reading data exchange table includes:Using spark Sql like language reads the data exchange table.
According to an embodiment of the disclosure, the data of the stripping and slicing are with the shape of elasticity distribution formula data set Formula stores.
According to an embodiment of the disclosure, the spark computings framework and the R language or Python language computing module carries out data exchange by pipeline.
According to an embodiment of the disclosure, the data attribute is commodity category, the data exchange Table includes commodity list, order table and flowmeter, and each node carries out related to the data attribute Calculating includes calculating merchandise valuation index, and merchandise valuation index is visited according to the user in a predetermined amount of time The amount of asking, commodity page mean residence time, sales volume, the different weights of the pulling amount of money are calculated.
According to an embodiment of the disclosure, the predetermined amount of time is 7 days or 30 days.
According to an embodiment of the disclosure, according to the merchandise valuation index calculated fraction from Arrive small sequence greatly, preceding 20% commodity are defined as most important commodity in sequencing table, 20% in sequencing table after extremely Commodity before 50% are defined as key commodity, 50% in sequencing table after be defined as typically to the commodity before 80% Commodity, 80% in sequencing table after commodity be defined as inessential commodity.
According to the second aspect of the disclosure, a kind of processing unit of data, including:Summarizing module, use Inquired about in data, the generation data exchange table that section collects on schedule;Processing module, use In being handled using spark computings framework the data exchange table, including:
The data exchange table is read, stripping and slicing is carried out to data by a data attribute;
By each node of the data distribution of stripping and slicing to server cluster;
Each node carries out the calculating related to the data attribute;
Collect the implementing result of each node;
R language or python language computing modules, by handle in spark computing frameworks can not based on The part of calculation.
According to an embodiment of the disclosure, the data attribute is commodity category, the data exchange Table includes commodity list, order table and flowmeter, and each node carries out related to the data attribute Calculating includes calculating merchandise valuation index, and merchandise valuation index is visited according to the user in a predetermined amount of time The amount of asking, commodity page mean residence time, sales volume, the different weights of the pulling amount of money are calculated.
According to an embodiment of the disclosure, the predetermined amount of time is 7 days or 30 days.
According to an embodiment of the disclosure, according to the merchandise valuation index calculated fraction from Arrive small sequence greatly, preceding 20% commodity are defined as most important commodity in sequencing table, 20% in sequencing table after extremely Commodity before 50% are defined as key commodity, 50% in sequencing table after be defined as typically to the commodity before 80% Commodity, 80% in sequencing table after commodity be defined as inessential commodity.
The data processing method and device of present embodiment, based on spark computing frameworks, possesses processing The ability of mass data, mass data can be quickly handled, and combine R language or python languages Computing module is sayed, makes data processing more accurate complete.Calculating merchandise valuation index KVI (Key Value Item during), also consider commodity page mean residence time simultaneously and pull the amount of money two because Element, make merchandise valuation index KVI more accurate, provided the foundation preferably to hold merchandise valuation strategy.
It should be appreciated that the general description and following detailed description of the above are only exemplary, and The disclosure can not be limited.
Brief description of the drawings
Its example embodiment, above and other target of the disclosure, spy is described in detail by referring to accompanying drawing Sign and advantage will become apparent.
Fig. 1 shows the system architecture figure according to disclosure example embodiment.
Fig. 2 shows the process flow figure according to the data of disclosure example embodiment.
Fig. 3 shows the process flow figure of another data according to disclosure example embodiment.
Fig. 4 shows the processing unit block diagram according to the data of disclosure example embodiment.
Embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment energy It is enough to implement in a variety of forms, and it is not understood as limited to example set forth herein;Conversely, there is provided this A little embodiments cause the disclosure more fully and completely and the design of example embodiment is comprehensive Ground is communicated to those skilled in the art.Accompanying drawing is only the schematic illustrations of the disclosure, is not necessarily It is drawn to scale.Identical reference represents same or similar part in figure, thus will omission pair Their repeated description.
In addition, described feature, structure or characteristic can be incorporated in one in any suitable manner Or more in embodiment.In the following description, there is provided many details are so as to providing to this public affairs The embodiment opened is fully understood.It will be appreciated, however, by one skilled in the art that this can be put into practice Disclosed technical scheme and omit one or more in the specific detail, or can use other Method, constituent element, step etc..In other cases, it is not shown in detail or describes known features, side Method, realization or operation are to avoid that a presumptuous guest usurps the role of the host and so that each side of the disclosure thickens.
Some block diagrams shown in accompanying drawing are functional entitys, not necessarily must with it is physically or logically only Vertical entity is corresponding.These functional entitys can be realized using software form, or at one or more These functional entitys are realized in individual hardware module or integrated circuit, or are filled in heterogeneous networks and/or processor Put and/or microcontroller device in realize these functional entitys.
Fig. 1 shows the system architecture figure according to disclosure example embodiment.
As shown in figure 1, the system architecture that the present invention uses is based on spark computing frameworks, Data exchange table 100 is inquired about into data warehouse using Hive query languages HQL.Utilize spark SQL Data exchange table 100 is read, data exchange table 100 may include the data logger for recording each calculating parameter 101st, record sheet 102 and record sheet 103.
Each record sheet collected is processed, needs progress stripping and slicing according to calculating.Stripping and slicing Data are deposited in the form of elasticity distribution formula data set RDD (Resilient Distributed Datasets) Storage.Elasticity distribution formula data set RDD (Resilient Distributed Datasets) is in distribution The abstract concept deposited, RDD provide a kind of height-limited shared drive model, i.e. RDD is only The set of the record partitioning of reading, it can only be created by performing the conversion operation of determination in other RDD, But these limit and to realize that fault-tolerant expense is very low.For developer, RDD can be regarded as A Spark object, itself are run in internal memory, and it is a RDD such as to read file, to file meter A RDD at last, result set are also a RDD, dependence between different bursts, data, The map data of key-value types can regard RDD as.
Each node server in the data distribution being cut into small pieces to server cluster is calculated, its In, R the or python language modules of each node server prepackage.If run into calculating process The part that can not be realized in spark, it can be calculated using R the or python language modules of prepackage. After the completion of calculating, the implementing result that spark collects each node data piecemeal is aggregated into a big result text Part 200, call hive import statement that destination file 200 is imported into hive data warehouses, make Used for result for inquiry.Wherein, Spark and R or python utilizes the pipeline (pipe) of operating system Carry out data exchange.
Fig. 2 shows the process flow figure according to the data of disclosure example embodiment.
As shown in Fig. 2 the processing method of the data is the system architecture based on Fig. 1, including Step S202~S204:
In step S202, data are inquired about, the generation data exchange that section collects on schedule Table.
Data to be processed are collected in inquiry, and the data being collected into are generated into what section to schedule collected Data exchange table, the predetermined amount of time can be needed by being manually set according to business.
In step S204, data swap table is handled using spark computings framework.
Data swap table is handled using spark computings framework.Data exchange table is read, according to One data attribute carries out stripping and slicing to data, and the data attribute can be the parameter being related in calculating, for example, When calculating electric business enterprise marketing key value commodity KVI (Key Value Item), it is related to commodity product Class, data attribute now can be commodity category, and data are carried out into stripping and slicing according to commodity category.Due to Need data volume to be processed larger, by the way that data stripping and slicing is divided into multiple small block datas by mass data.
Under spark computing frameworks, by each node of the data distribution being cut into small pieces to server cluster Server.
Each node server carries out the calculating related to data attribute, for example, calculating electric business enterprise pin When selling key value commodity KVI (Key Value Item), each node server enters according to commodity category Row classification, according to the sales volume and click volume of every kind of commodity, calculate the KVI indexes of commodity.
After each node server calculates, collect the implementing result of each node server, will be all Result of calculation is aggregated into a big destination file, imported into data warehouse.For example, hive can be called Import statement by result data imported into hive data warehouse tables for use.
In above-mentioned calculating process, each node server can configure R language or python language calculates Module, for handling imponderable part in spark computing frameworks.In calculating process, if The algorithm bag lacked in spark scientific algorithms storehouse be present, then can be by using R language or python Language computing module is supplemented, completely to be calculated.
The data processing method of present embodiment, based on spark computing frameworks, possesses processing magnanimity number According to ability, can quickly handle mass data, and combine R language or python language calculates Module, make data processing more accurate complete.
According to an example embodiment, when inquiring about data, Hive query languages HQL can be used Inquired about in data warehouse.
According to an example embodiment, spark computings framework calculates mould with R language or python language Block carries out data exchange by operating system pipeline.
Fig. 3 shows the process flow figure of another data according to disclosure example embodiment.
The commodity amount of large-scale electric business enterprise marketing is huge, it is also very desirable to is best understood by which commodity more Adding influences impression of the user to store, to keep the superiority to rival.In evaluation commodity In terms of value, key value commodity KVI (Key Value Item) concept is used.KVI commodity are Refer to Price Sensitive commodity, the change of price the sales volume of commodity and related other commodity can be produced compared with Big influence.And a commodity are KVI commodity, can go to weigh from multiple dimensions, including it is clear The amount of looking at, purchase volume etc..Consider these aspects of each commodity, you can which business drawn Product most attract user to browse, most easily buy customer, and these KVI commodity can more influence to use than other commodity Impression of the family to store.It is existing evaluation commodity value be according to the sales volume and click volume of commodity be divided into A, B, C, D4 class evaluates the significance level of commodity, and A is most important, D is least important.Sales volume and Click volume is gone out to represent the value of the significance level of commodity by certain weight COMPREHENSIVE CALCULATING.Wherein, the pin of commodity Amount is to utilize the tables of data inquiry for recording sales volume in database to obtain, and click volume can be visited using user Page code record is obtained when asking the page.Then sales volume and the processing of click volume fiducial markization are obtained from 0 Go out a comprehensive numerical value to 1 data value, then by certain weight calculation:
K=w1*sales_quantity+w2*traffic,
Wherein sales_quantity represents sales volume, and traffic represents click volume.Prior art is to business The evaluation of product is not comprehensive, does not account for pulling, the index of these material impact commodity of page residence time. A kind of sales volume of commodity itself is possible and little, but may pull the sale of other commodity.
In order to calculate the KVI indexes of commodity well, it is necessary to be carried out using volume of data treatment technology Support.R language is very powerful Data Analysis Services language, is well suited for adding line number to KVI index meters The realization of Data preprocess and algorithm;Python language also has very strong scientific algorithm ability, has abundant Scientific algorithm storehouse, KVI indexes, which are calculated, the good degree of accuracy and the guarantee of performance;Hadoop platforms The data storage of demand can be calculated KVI indexes and computing provides base layer support;Hive platforms are bases In hadoop database platform.
As shown in figure 3, calculate merchandise valuation index KVI (Key using above-mentioned data processing method Value Item), wherein data attribute can be commodity category SKU (Stock Keeping Unit), number Include commodity list, order table and flowmeter according to swap table, each node server is carried out and commodity category SKU (Stock Keeping Unit) related calculating, including calculate the price index KVI (Key of commodity Value Item).Wherein, commodity category SKU (Stock Keeping Unit) is that stock passes in and out meter The unit of amount, can be with part, box, pallet etc..SKU (Stock Keeping Unit) is usually A kind of necessary method of big chain store home-delivery center logistics management, has been extended to product now The abbreviation of Unified number, every kind of product are corresponding with unique SKU (Stock Keeping Unit) number. Including step step S302~S304:
In step s 302, data are inquired about, the generation commodity list that section collects on schedule, Order table and flowmeter.
The related data of merchandise sales is collected in inquiry, can use Hive query languages HQL in data warehouse Middle collection collects each item data, and the data the being collected into generation data that section collects to schedule are handed over Table is changed, the data exchange table of generation includes commodity list, order table and flowmeter.The predetermined amount of time can Needed according to business by being manually set, can be with 7 days, i.e., one week is the time limit, can also 30 days, i.e., one Individual month is the cycle, can need to set according to analysis.Using spark computings framework to data swap table Handled, can be according to the commodity category SKU (Stock Keeping Unit) in commodity list to data Stripping and slicing is carried out, by each node server of the data distribution being cut into small pieces to server cluster, each node Server carries out related calculating.
In step s 304, commodity list, order table and flowmeter are carried out using spark computings framework Processing.
When calculating merchandise valuation index KVI (Key Value Item), commodity list, order table are utilized With the information in flowmeter, user's visit capacity, commodity page mean residence time, sales volume and drawing are drawn The contents such as the dynamic amount of money, consider above-mentioned each factor, and different weights, the setting of weight can be set Many-sided needs can be considered, according to business need sets itself.
After each node server calculates, collect the implementing result of each node server, will be all Result of calculation is aggregated into a big destination file, imported into data warehouse.For example, hive can be called Import statement by result data imported into hive data warehouse tables as a result for use.
In above-mentioned calculating process, each node server can configure R language or python language calculates Module, for handling imponderable part in spark computing frameworks.If spark scientific algorithms The algorithm bag lacked in storehouse be present, then can be by using R language or python language computing modules To be supplemented, completely to be calculated.
The data processing method of present embodiment, based on spark computing frameworks, possesses processing magnanimity number According to ability, can quickly handle mass data, and combine R language or python language calculates Module, make data processing more accurate complete.Calculating merchandise valuation index KVI (Key Value Item) During, commodity page mean residence time is also considered simultaneously and pulls two factors of the amount of money, is made Merchandise valuation index KVI is more accurate, is provided the foundation preferably to hold merchandise valuation strategy.
According to an example embodiment, for merchandise valuation index KVI (the Key Value being calculated Item), can be ranked up from big to small according to the fraction of merchandise valuation index, preceding 20% in sequencing table Commodity can be identified as most important commodity, 20% in sequencing table after can be identified as crucial business to the commodity before 50% Product, 50% in sequencing table after can be identified as general merchandise to the commodity before 80%, 80% in sequencing table after Commodity can be identified as inessential commodity., can be according to the significance level of commodity point when formulating sales tactics Different sales tactics is not formulated, can the most important commodity of emphasis consideration and key commodity.It is above-mentioned for business The division of product significance level, is merely illustrative, for merchandise valuation index KVI (Key Value Item) Utilization, can voluntarily determine scope according to actual conditions.
Fig. 4 shows the processing unit block diagram according to the data of disclosure example embodiment.
As shown in figure 4, a kind of processing unit of data, including:
Summarizing module 402, for inquiring about data, the generation data that section collects on schedule are handed over Change table.
Data to be processed are collected in inquiry, and the data being collected into are generated into what section to schedule collected Data exchange table, the predetermined amount of time can be needed by being manually set according to business.
Processing module 404, for being handled using spark computings framework data swap table.
Data swap table is handled using spark computings framework.Data exchange table is read, according to One data attribute carries out stripping and slicing to data, and the data attribute can be the parameter being related in calculating, for example, When calculating electric business enterprise marketing key value commodity KVI (Key Value Item), it is related to commodity product Class, data attribute now can be commodity category, and data are carried out into stripping and slicing according to commodity category.Due to Need data volume to be processed larger, by the way that data stripping and slicing is divided into multiple small block datas by mass data.
Under spark computing frameworks, by each node of the data distribution being cut into small pieces to server cluster Server.Each node server carries out the calculating related to data attribute, for example, calculating electric business enterprise When industry sells key value commodity KVI (Key Value Item), each node server is according to sales volume KVI index calculating is carried out with click volume.
After each node server calculates, collect the implementing result of each node server, will be all Result of calculation is aggregated into a big destination file, imported into data warehouse.For example, hive can be called Import statement by result data imported into hive data warehouse tables as a result for use.
R language or python language computing module 406, for handling nothing in spark computing frameworks The part that method calculates.
In above-mentioned calculating process, each node server can configure R language or python language calculates Module, for handling imponderable part in spark computing frameworks.In calculating process, if The algorithm bag lacked in spark scientific algorithms storehouse be present, then can be by using R language or python Language computing module is supplemented, completely to be calculated.
The data processing equipment of present embodiment, based on spark computing frameworks, possesses processing magnanimity number According to ability, can quickly handle mass data, and combine R language or python language calculates Module, make data processing more accurate complete.
In large-scale electric business enterprise, using above-mentioned data processing equipment, merchandise valuation index KVI is calculated (Key Value Item), wherein data attribute can be commodity category SKU (Stock Keeping Unit), Data exchange table includes commodity list, order table and flowmeter, and each node server enters to calculate determining for commodity Valency index KVI (Key Value Item), each module specifically performs following functions:
Summarizing module 402, the related data of merchandise sales is collected for inquiring about, the data being collected into are given birth to The data exchange table collected into section to schedule, the predetermined amount of time can be needed by people according to business , can be with 7 days for setting, i.e. a week is the time limit, can also be 30 days, i.e., one month is the time limit, It can need to set according to analysis.
It can be collected using Hive query languages HQL in data warehouse and collect each item data, generate data Swap table, including commodity list, order table and flowmeter.
Processing module 404, for being handled using spark computings framework data swap table, it will cut Into fritter data distribution to each node server of server cluster, each node server carries out commodity The index KVI (Key Value Item) that fixes a price is calculated.
When calculating merchandise valuation index KVI (Key Value Item), consider user's visit capacity, Factor, each factors such as commodity page mean residence time, sales volume and the pulling amount of money can be set different Weight, the setting of weight can consider many-sided needs, according to business need sets itself.
R language or python language computing module 406, for handling nothing in spark computing frameworks The part that method calculates.
The data processing equipment of present embodiment, based on spark computing frameworks, possesses processing magnanimity number According to ability, can quickly handle mass data, and combine R language or python language calculates Module, make data processing more accurate complete.Calculating merchandise valuation index KVI (Key Value Item) During, commodity page mean residence time is also considered simultaneously and pulls two factors of the amount of money, is made Merchandise valuation index KVI is more accurate, is provided the foundation preferably to hold merchandise valuation strategy.
The illustrative embodiments of the disclosure are particularly shown and described above.It should be appreciated that The disclosure is not limited to detailed construction, set-up mode or implementation method described herein;On the contrary, the disclosure It is intended to cover comprising various modifications in the spirit and scope of the appended claims and equivalence setting.

Claims (12)

  1. A kind of 1. processing method of data, it is characterised in that including:
    Data are inquired about, the generation data exchange table that section collects on schedule;
    The data exchange table is handled using spark computings framework, including:
    The data exchange table is read, stripping and slicing is carried out to data by a data attribute;
    By each node of the data distribution of stripping and slicing to server cluster;
    Each node carries out the calculating related to the data attribute;
    Collect the implementing result of each node;
    Wherein, each node is configured with R language or python language computing modules, for handling spark Imponderable part in computing framework.
  2. 2. processing method as claimed in claim 1, it is characterised in that described that data are inquired about Including:Inquired about using Hive query languages HQL in data warehouse.
  3. 3. processing method as claimed in claim 1, it is characterised in that described to read the data friendship Changing table includes:The data exchange table is read using spark sql like language.
  4. 4. processing method as claimed in claim 1, it is characterised in that the data of the stripping and slicing are with bullet Property distributed data collection form storage.
  5. 5. processing method as claimed in claim 1, it is characterised in that the spark computings framework Data exchange is carried out by pipeline with the R language or python language computing module.
  6. 6. processing method as claimed in claim 1, it is characterised in that the data attribute is commodity Category, the data exchange table include commodity list, order table and flowmeter, each node carry out with The related calculating of the data attribute includes calculating merchandise valuation index, and merchandise valuation index is pre- according to one Fix time user's visit capacity in section, commodity page mean residence time, sales volume, pull the amount of money not Calculated with weight.
  7. 7. processing method as claimed in claim 6, it is characterised in that the predetermined amount of time is 7 It or 30 days.
  8. 8. processing method as claimed in claims 6 or 7, it is characterised in that according to the institute calculated The fraction for stating merchandise valuation index sorts from big to small, and preceding 20% commodity are defined as most important in sequencing table Commodity, 20% in sequencing table after be defined as key commodity to the commodity before 50%, 50% in sequencing table after extremely Commodity before 80% are defined as general merchandise, 80% in sequencing table after commodity be defined as inessential commodity.
  9. A kind of 9. processing unit of data, it is characterised in that including:
    Summarizing module, for inquiring about data, the generation data exchange that section collects on schedule Table;
    Processing module, for being handled using spark computings framework the data exchange table, bag Include:
    The data exchange table is read, stripping and slicing is carried out to data by a data attribute;
    By each node of the data distribution of stripping and slicing to server cluster;
    Each node carries out the calculating related to the data attribute;
    Collect the implementing result of each node;
    R language or python language computing modules, by handle in spark computing frameworks can not based on The part of calculation.
  10. 10. processing unit as claimed in claim 9, it is characterised in that the data attribute is business Product category, the data exchange table include commodity list, order table and flowmeter, and each node is carried out The calculating related to the data attribute includes calculating merchandise valuation index, and merchandise valuation index is according to one User's visit capacity, commodity page mean residence time, sales volume in predetermined amount of time, pull the amount of money Different weights are calculated.
  11. 11. processing unit as claimed in claim 10, it is characterised in that the predetermined amount of time is 7 days or 30 days.
  12. 12. the processing unit as described in claim 10 or 11, it is characterised in that according to calculating The fraction of the merchandise valuation index sort from big to small, preceding 20% commodity are defined as most in sequencing table Important goods, 20% in sequencing table after be defined as key commodity to the commodity before 50%, 50% in sequencing table Be defined as general merchandise to the commodity before 80% afterwards, 80% in sequencing table after commodity be defined as inessential business Product.
CN201610319458.2A 2016-05-13 2016-05-13 Data processing method and device Active CN107368501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610319458.2A CN107368501B (en) 2016-05-13 2016-05-13 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610319458.2A CN107368501B (en) 2016-05-13 2016-05-13 Data processing method and device

Publications (2)

Publication Number Publication Date
CN107368501A true CN107368501A (en) 2017-11-21
CN107368501B CN107368501B (en) 2020-06-30

Family

ID=60304167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610319458.2A Active CN107368501B (en) 2016-05-13 2016-05-13 Data processing method and device

Country Status (1)

Country Link
CN (1) CN107368501B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684399A (en) * 2018-12-24 2019-04-26 成都四方伟业软件股份有限公司 Data bank access method, database access device and Data Analysis Platform
CN110359919A (en) * 2019-07-26 2019-10-22 中铁隧道局集团有限公司 A kind of shield machine construction risk prevention system method and system
CN110413631A (en) * 2018-04-25 2019-11-05 中移(苏州)软件技术有限公司 A data query method and device
CN111191792A (en) * 2019-12-11 2020-05-22 平安医疗健康管理股份有限公司 Data distribution method and device and computer equipment
CN111400299A (en) * 2020-06-04 2020-07-10 成都四方伟业软件股份有限公司 Method and system for testing fusion quality of multiple data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2945823A2 (en) * 2013-01-18 2015-11-25 Serge V. Monros Microcontroller for pollution control system for an internal combustion engine
CN105354336A (en) * 2015-12-07 2016-02-24 Tcl集团股份有限公司 Method and apparatus for processing transactional database data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2945823A2 (en) * 2013-01-18 2015-11-25 Serge V. Monros Microcontroller for pollution control system for an internal combustion engine
CN105354336A (en) * 2015-12-07 2016-02-24 Tcl集团股份有限公司 Method and apparatus for processing transactional database data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴阳平: "Spark Streaming订单关联案例剖析", 《HTTPS://WWW.IBM.COM/DEVELOPERWORKS/CN/OPENSOURCE/OS-CN-SPARK-ORDER/?LANG=ZH_CN&CA=DWCHINA-_-BLUEMIX-_-WEB-_-CSDN》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413631A (en) * 2018-04-25 2019-11-05 中移(苏州)软件技术有限公司 A data query method and device
CN109684399A (en) * 2018-12-24 2019-04-26 成都四方伟业软件股份有限公司 Data bank access method, database access device and Data Analysis Platform
CN110359919A (en) * 2019-07-26 2019-10-22 中铁隧道局集团有限公司 A kind of shield machine construction risk prevention system method and system
CN111191792A (en) * 2019-12-11 2020-05-22 平安医疗健康管理股份有限公司 Data distribution method and device and computer equipment
CN111400299A (en) * 2020-06-04 2020-07-10 成都四方伟业软件股份有限公司 Method and system for testing fusion quality of multiple data

Also Published As

Publication number Publication date
CN107368501B (en) 2020-06-30

Similar Documents

Publication Publication Date Title
TWI778481B (en) Computer-implemented system for ai-based product integration and deduplication and method integrating and deduplicating products using ai
US10841743B2 (en) Branching mobile-device to system-namespace identifier mappings
US6408292B1 (en) Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions
CN109840730B (en) Method and device for data prediction
CN107368501A (en) The processing method and processing device of data
CN107146089A (en) The single recognition methods of one kind brush and device, electronic equipment
Deng et al. Solving a Closed‐Loop Location‐Inventory‐Routing Problem with Mixed Quality Defects Returns in E‐Commerce by Hybrid Ant Colony Optimization Algorithm
WO2019169050A1 (en) Inventory placement recommendation system
Lappas et al. Efficient and domain-invariant competitor mining
US20210109906A1 (en) Clustering model analysis for big data environments
US20230177545A1 (en) Systems for management of location-aware market data
CN116029637A (en) Cross-border electronic commerce logistics channel intelligent recommendation method and device, equipment and storage medium
CN116308684A (en) Online shopping platform store information pushing method and system
US20140129269A1 (en) Forecasting Business Entity Characteristics Based on Planning Infrastructure
CN116777508B (en) Medical supply analysis management system and method based on big data
CN118917781A (en) Logistics storage management method and system based on digital twinning
Sahu et al. The thematic landscape of literature on supply chain management in India: a systematic literature review
CN110020918A (en) Recommendation information generation method and system
US10235711B1 (en) Determining a package quantity
CN112819404A (en) Data processing method and device, electronic equipment and storage medium
CN112561559A (en) Merchant portrait model generation method, device, equipment and storage medium
CN103838775A (en) Data analysis method and data analysis device
JP2010277571A (en) Product selection system and method, and product selection computer program
RU2480828C1 (en) Method of predicting target value of events based on unlimited number of characteristics
KR102848300B1 (en) Method for simulating of merchandise price in e-commerce service, and device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant