CN107368501A - The processing method and processing device of data - Google Patents
The processing method and processing device of data Download PDFInfo
- Publication number
- CN107368501A CN107368501A CN201610319458.2A CN201610319458A CN107368501A CN 107368501 A CN107368501 A CN 107368501A CN 201610319458 A CN201610319458 A CN 201610319458A CN 107368501 A CN107368501 A CN 107368501A
- Authority
- CN
- China
- Prior art keywords
- data
- commodity
- node
- spark
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The present invention discloses a kind of processing method and processing device of data.This method includes:Data are inquired about, the generation data exchange table that section collects on schedule;The data exchange table is handled using spark computings framework, including:The data exchange table is read, stripping and slicing is carried out to data by a data attribute;By each node of the data distribution of stripping and slicing to server cluster;Each node carries out the calculating related to the data attribute;Collect the implementing result of each node;Wherein, each node is configured with R language or python language computing modules, for handling imponderable part in spark computing frameworks.The data processing method and device of the present invention is based on spark computing frameworks, with reference to R language or python language, makes data processing more accurate complete.
Description
Technical field
The disclosure relates in general to field of computer technology, and in particular to the processing method and dress of a kind of data
Put.
Background technology
In present big data processing, because data volume is big, it is sometimes desirable to the content of calculating is more complicated,
Meet to require using single treatment technology is more difficult.For example, in large-scale electric business enterprise, for evaluation
The value of commodity, the general concept for using key value commodity KVI (Key Value Item), is being counted
, it is necessary to handle mass data when calculating KVI indexes.And mass data can not be handled on unit or computing
Overlong time, can not ensure calculate real-time, but single treatment technology again be relatively difficult to ensure card meter
The integrality and accuracy of calculation.
Therefore, for the calculating of mass data, for example, KVI indexes calculating, it is necessary to a kind of new side
Method.
Above- mentioned information is only used for strengthening the reason to the background of the disclosure disclosed in the background section
Solution, therefore it can include not forming the information to prior art known to persons of ordinary skill in the art.
The content of the invention
The disclosure provides a kind of processing method and processing device of data, can quickly and accurately handle magnanimity
The calculating of data.
Other characteristics and advantage of the disclosure will be apparent from by following detailed description, or partly
By the practice of the disclosure and acquistion.
According to the disclosure in a first aspect, a kind of processing method of data, including:Data are looked into
Ask, the generation data exchange table that section collects on schedule;Using spark computings framework to the number
Handled according to swap table, including:
The data exchange table is read, stripping and slicing is carried out to data by a data attribute;
By each node of the data distribution of stripping and slicing to server cluster;
Each node carries out the calculating related to the data attribute;Collect the implementing result of each node;
Wherein, each node is configured with R language or python language computing modules, for handling spark
Imponderable part in computing framework.
It is described inquiry is carried out to data to include according to an embodiment of the disclosure:Inquired about using Hive
Language HQL is inquired about in data warehouse.
According to an embodiment of the disclosure, the reading data exchange table includes:Using spark
Sql like language reads the data exchange table.
According to an embodiment of the disclosure, the data of the stripping and slicing are with the shape of elasticity distribution formula data set
Formula stores.
According to an embodiment of the disclosure, the spark computings framework and the R language or
Python language computing module carries out data exchange by pipeline.
According to an embodiment of the disclosure, the data attribute is commodity category, the data exchange
Table includes commodity list, order table and flowmeter, and each node carries out related to the data attribute
Calculating includes calculating merchandise valuation index, and merchandise valuation index is visited according to the user in a predetermined amount of time
The amount of asking, commodity page mean residence time, sales volume, the different weights of the pulling amount of money are calculated.
According to an embodiment of the disclosure, the predetermined amount of time is 7 days or 30 days.
According to an embodiment of the disclosure, according to the merchandise valuation index calculated fraction from
Arrive small sequence greatly, preceding 20% commodity are defined as most important commodity in sequencing table, 20% in sequencing table after extremely
Commodity before 50% are defined as key commodity, 50% in sequencing table after be defined as typically to the commodity before 80%
Commodity, 80% in sequencing table after commodity be defined as inessential commodity.
According to the second aspect of the disclosure, a kind of processing unit of data, including:Summarizing module, use
Inquired about in data, the generation data exchange table that section collects on schedule;Processing module, use
In being handled using spark computings framework the data exchange table, including:
The data exchange table is read, stripping and slicing is carried out to data by a data attribute;
By each node of the data distribution of stripping and slicing to server cluster;
Each node carries out the calculating related to the data attribute;
Collect the implementing result of each node;
R language or python language computing modules, by handle in spark computing frameworks can not based on
The part of calculation.
According to an embodiment of the disclosure, the data attribute is commodity category, the data exchange
Table includes commodity list, order table and flowmeter, and each node carries out related to the data attribute
Calculating includes calculating merchandise valuation index, and merchandise valuation index is visited according to the user in a predetermined amount of time
The amount of asking, commodity page mean residence time, sales volume, the different weights of the pulling amount of money are calculated.
According to an embodiment of the disclosure, the predetermined amount of time is 7 days or 30 days.
According to an embodiment of the disclosure, according to the merchandise valuation index calculated fraction from
Arrive small sequence greatly, preceding 20% commodity are defined as most important commodity in sequencing table, 20% in sequencing table after extremely
Commodity before 50% are defined as key commodity, 50% in sequencing table after be defined as typically to the commodity before 80%
Commodity, 80% in sequencing table after commodity be defined as inessential commodity.
The data processing method and device of present embodiment, based on spark computing frameworks, possesses processing
The ability of mass data, mass data can be quickly handled, and combine R language or python languages
Computing module is sayed, makes data processing more accurate complete.Calculating merchandise valuation index KVI (Key Value
Item during), also consider commodity page mean residence time simultaneously and pull the amount of money two because
Element, make merchandise valuation index KVI more accurate, provided the foundation preferably to hold merchandise valuation strategy.
It should be appreciated that the general description and following detailed description of the above are only exemplary, and
The disclosure can not be limited.
Brief description of the drawings
Its example embodiment, above and other target of the disclosure, spy is described in detail by referring to accompanying drawing
Sign and advantage will become apparent.
Fig. 1 shows the system architecture figure according to disclosure example embodiment.
Fig. 2 shows the process flow figure according to the data of disclosure example embodiment.
Fig. 3 shows the process flow figure of another data according to disclosure example embodiment.
Fig. 4 shows the processing unit block diagram according to the data of disclosure example embodiment.
Embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment energy
It is enough to implement in a variety of forms, and it is not understood as limited to example set forth herein;Conversely, there is provided this
A little embodiments cause the disclosure more fully and completely and the design of example embodiment is comprehensive
Ground is communicated to those skilled in the art.Accompanying drawing is only the schematic illustrations of the disclosure, is not necessarily
It is drawn to scale.Identical reference represents same or similar part in figure, thus will omission pair
Their repeated description.
In addition, described feature, structure or characteristic can be incorporated in one in any suitable manner
Or more in embodiment.In the following description, there is provided many details are so as to providing to this public affairs
The embodiment opened is fully understood.It will be appreciated, however, by one skilled in the art that this can be put into practice
Disclosed technical scheme and omit one or more in the specific detail, or can use other
Method, constituent element, step etc..In other cases, it is not shown in detail or describes known features, side
Method, realization or operation are to avoid that a presumptuous guest usurps the role of the host and so that each side of the disclosure thickens.
Some block diagrams shown in accompanying drawing are functional entitys, not necessarily must with it is physically or logically only
Vertical entity is corresponding.These functional entitys can be realized using software form, or at one or more
These functional entitys are realized in individual hardware module or integrated circuit, or are filled in heterogeneous networks and/or processor
Put and/or microcontroller device in realize these functional entitys.
Fig. 1 shows the system architecture figure according to disclosure example embodiment.
As shown in figure 1, the system architecture that the present invention uses is based on spark computing frameworks,
Data exchange table 100 is inquired about into data warehouse using Hive query languages HQL.Utilize spark SQL
Data exchange table 100 is read, data exchange table 100 may include the data logger for recording each calculating parameter
101st, record sheet 102 and record sheet 103.
Each record sheet collected is processed, needs progress stripping and slicing according to calculating.Stripping and slicing
Data are deposited in the form of elasticity distribution formula data set RDD (Resilient Distributed Datasets)
Storage.Elasticity distribution formula data set RDD (Resilient Distributed Datasets) is in distribution
The abstract concept deposited, RDD provide a kind of height-limited shared drive model, i.e. RDD is only
The set of the record partitioning of reading, it can only be created by performing the conversion operation of determination in other RDD,
But these limit and to realize that fault-tolerant expense is very low.For developer, RDD can be regarded as
A Spark object, itself are run in internal memory, and it is a RDD such as to read file, to file meter
A RDD at last, result set are also a RDD, dependence between different bursts, data,
The map data of key-value types can regard RDD as.
Each node server in the data distribution being cut into small pieces to server cluster is calculated, its
In, R the or python language modules of each node server prepackage.If run into calculating process
The part that can not be realized in spark, it can be calculated using R the or python language modules of prepackage.
After the completion of calculating, the implementing result that spark collects each node data piecemeal is aggregated into a big result text
Part 200, call hive import statement that destination file 200 is imported into hive data warehouses, make
Used for result for inquiry.Wherein, Spark and R or python utilizes the pipeline (pipe) of operating system
Carry out data exchange.
Fig. 2 shows the process flow figure according to the data of disclosure example embodiment.
As shown in Fig. 2 the processing method of the data is the system architecture based on Fig. 1, including
Step S202~S204:
In step S202, data are inquired about, the generation data exchange that section collects on schedule
Table.
Data to be processed are collected in inquiry, and the data being collected into are generated into what section to schedule collected
Data exchange table, the predetermined amount of time can be needed by being manually set according to business.
In step S204, data swap table is handled using spark computings framework.
Data swap table is handled using spark computings framework.Data exchange table is read, according to
One data attribute carries out stripping and slicing to data, and the data attribute can be the parameter being related in calculating, for example,
When calculating electric business enterprise marketing key value commodity KVI (Key Value Item), it is related to commodity product
Class, data attribute now can be commodity category, and data are carried out into stripping and slicing according to commodity category.Due to
Need data volume to be processed larger, by the way that data stripping and slicing is divided into multiple small block datas by mass data.
Under spark computing frameworks, by each node of the data distribution being cut into small pieces to server cluster
Server.
Each node server carries out the calculating related to data attribute, for example, calculating electric business enterprise pin
When selling key value commodity KVI (Key Value Item), each node server enters according to commodity category
Row classification, according to the sales volume and click volume of every kind of commodity, calculate the KVI indexes of commodity.
After each node server calculates, collect the implementing result of each node server, will be all
Result of calculation is aggregated into a big destination file, imported into data warehouse.For example, hive can be called
Import statement by result data imported into hive data warehouse tables for use.
In above-mentioned calculating process, each node server can configure R language or python language calculates
Module, for handling imponderable part in spark computing frameworks.In calculating process, if
The algorithm bag lacked in spark scientific algorithms storehouse be present, then can be by using R language or python
Language computing module is supplemented, completely to be calculated.
The data processing method of present embodiment, based on spark computing frameworks, possesses processing magnanimity number
According to ability, can quickly handle mass data, and combine R language or python language calculates
Module, make data processing more accurate complete.
According to an example embodiment, when inquiring about data, Hive query languages HQL can be used
Inquired about in data warehouse.
According to an example embodiment, spark computings framework calculates mould with R language or python language
Block carries out data exchange by operating system pipeline.
Fig. 3 shows the process flow figure of another data according to disclosure example embodiment.
The commodity amount of large-scale electric business enterprise marketing is huge, it is also very desirable to is best understood by which commodity more
Adding influences impression of the user to store, to keep the superiority to rival.In evaluation commodity
In terms of value, key value commodity KVI (Key Value Item) concept is used.KVI commodity are
Refer to Price Sensitive commodity, the change of price the sales volume of commodity and related other commodity can be produced compared with
Big influence.And a commodity are KVI commodity, can go to weigh from multiple dimensions, including it is clear
The amount of looking at, purchase volume etc..Consider these aspects of each commodity, you can which business drawn
Product most attract user to browse, most easily buy customer, and these KVI commodity can more influence to use than other commodity
Impression of the family to store.It is existing evaluation commodity value be according to the sales volume and click volume of commodity be divided into A, B,
C, D4 class evaluates the significance level of commodity, and A is most important, D is least important.Sales volume and
Click volume is gone out to represent the value of the significance level of commodity by certain weight COMPREHENSIVE CALCULATING.Wherein, the pin of commodity
Amount is to utilize the tables of data inquiry for recording sales volume in database to obtain, and click volume can be visited using user
Page code record is obtained when asking the page.Then sales volume and the processing of click volume fiducial markization are obtained from 0
Go out a comprehensive numerical value to 1 data value, then by certain weight calculation:
K=w1*sales_quantity+w2*traffic,
Wherein sales_quantity represents sales volume, and traffic represents click volume.Prior art is to business
The evaluation of product is not comprehensive, does not account for pulling, the index of these material impact commodity of page residence time.
A kind of sales volume of commodity itself is possible and little, but may pull the sale of other commodity.
In order to calculate the KVI indexes of commodity well, it is necessary to be carried out using volume of data treatment technology
Support.R language is very powerful Data Analysis Services language, is well suited for adding line number to KVI index meters
The realization of Data preprocess and algorithm;Python language also has very strong scientific algorithm ability, has abundant
Scientific algorithm storehouse, KVI indexes, which are calculated, the good degree of accuracy and the guarantee of performance;Hadoop platforms
The data storage of demand can be calculated KVI indexes and computing provides base layer support;Hive platforms are bases
In hadoop database platform.
As shown in figure 3, calculate merchandise valuation index KVI (Key using above-mentioned data processing method
Value Item), wherein data attribute can be commodity category SKU (Stock Keeping Unit), number
Include commodity list, order table and flowmeter according to swap table, each node server is carried out and commodity category SKU
(Stock Keeping Unit) related calculating, including calculate the price index KVI (Key of commodity
Value Item).Wherein, commodity category SKU (Stock Keeping Unit) is that stock passes in and out meter
The unit of amount, can be with part, box, pallet etc..SKU (Stock Keeping Unit) is usually
A kind of necessary method of big chain store home-delivery center logistics management, has been extended to product now
The abbreviation of Unified number, every kind of product are corresponding with unique SKU (Stock Keeping Unit) number.
Including step step S302~S304:
In step s 302, data are inquired about, the generation commodity list that section collects on schedule,
Order table and flowmeter.
The related data of merchandise sales is collected in inquiry, can use Hive query languages HQL in data warehouse
Middle collection collects each item data, and the data the being collected into generation data that section collects to schedule are handed over
Table is changed, the data exchange table of generation includes commodity list, order table and flowmeter.The predetermined amount of time can
Needed according to business by being manually set, can be with 7 days, i.e., one week is the time limit, can also 30 days, i.e., one
Individual month is the cycle, can need to set according to analysis.Using spark computings framework to data swap table
Handled, can be according to the commodity category SKU (Stock Keeping Unit) in commodity list to data
Stripping and slicing is carried out, by each node server of the data distribution being cut into small pieces to server cluster, each node
Server carries out related calculating.
In step s 304, commodity list, order table and flowmeter are carried out using spark computings framework
Processing.
When calculating merchandise valuation index KVI (Key Value Item), commodity list, order table are utilized
With the information in flowmeter, user's visit capacity, commodity page mean residence time, sales volume and drawing are drawn
The contents such as the dynamic amount of money, consider above-mentioned each factor, and different weights, the setting of weight can be set
Many-sided needs can be considered, according to business need sets itself.
After each node server calculates, collect the implementing result of each node server, will be all
Result of calculation is aggregated into a big destination file, imported into data warehouse.For example, hive can be called
Import statement by result data imported into hive data warehouse tables as a result for use.
In above-mentioned calculating process, each node server can configure R language or python language calculates
Module, for handling imponderable part in spark computing frameworks.If spark scientific algorithms
The algorithm bag lacked in storehouse be present, then can be by using R language or python language computing modules
To be supplemented, completely to be calculated.
The data processing method of present embodiment, based on spark computing frameworks, possesses processing magnanimity number
According to ability, can quickly handle mass data, and combine R language or python language calculates
Module, make data processing more accurate complete.Calculating merchandise valuation index KVI (Key Value Item)
During, commodity page mean residence time is also considered simultaneously and pulls two factors of the amount of money, is made
Merchandise valuation index KVI is more accurate, is provided the foundation preferably to hold merchandise valuation strategy.
According to an example embodiment, for merchandise valuation index KVI (the Key Value being calculated
Item), can be ranked up from big to small according to the fraction of merchandise valuation index, preceding 20% in sequencing table
Commodity can be identified as most important commodity, 20% in sequencing table after can be identified as crucial business to the commodity before 50%
Product, 50% in sequencing table after can be identified as general merchandise to the commodity before 80%, 80% in sequencing table after
Commodity can be identified as inessential commodity., can be according to the significance level of commodity point when formulating sales tactics
Different sales tactics is not formulated, can the most important commodity of emphasis consideration and key commodity.It is above-mentioned for business
The division of product significance level, is merely illustrative, for merchandise valuation index KVI (Key Value Item)
Utilization, can voluntarily determine scope according to actual conditions.
Fig. 4 shows the processing unit block diagram according to the data of disclosure example embodiment.
As shown in figure 4, a kind of processing unit of data, including:
Summarizing module 402, for inquiring about data, the generation data that section collects on schedule are handed over
Change table.
Data to be processed are collected in inquiry, and the data being collected into are generated into what section to schedule collected
Data exchange table, the predetermined amount of time can be needed by being manually set according to business.
Processing module 404, for being handled using spark computings framework data swap table.
Data swap table is handled using spark computings framework.Data exchange table is read, according to
One data attribute carries out stripping and slicing to data, and the data attribute can be the parameter being related in calculating, for example,
When calculating electric business enterprise marketing key value commodity KVI (Key Value Item), it is related to commodity product
Class, data attribute now can be commodity category, and data are carried out into stripping and slicing according to commodity category.Due to
Need data volume to be processed larger, by the way that data stripping and slicing is divided into multiple small block datas by mass data.
Under spark computing frameworks, by each node of the data distribution being cut into small pieces to server cluster
Server.Each node server carries out the calculating related to data attribute, for example, calculating electric business enterprise
When industry sells key value commodity KVI (Key Value Item), each node server is according to sales volume
KVI index calculating is carried out with click volume.
After each node server calculates, collect the implementing result of each node server, will be all
Result of calculation is aggregated into a big destination file, imported into data warehouse.For example, hive can be called
Import statement by result data imported into hive data warehouse tables as a result for use.
R language or python language computing module 406, for handling nothing in spark computing frameworks
The part that method calculates.
In above-mentioned calculating process, each node server can configure R language or python language calculates
Module, for handling imponderable part in spark computing frameworks.In calculating process, if
The algorithm bag lacked in spark scientific algorithms storehouse be present, then can be by using R language or python
Language computing module is supplemented, completely to be calculated.
The data processing equipment of present embodiment, based on spark computing frameworks, possesses processing magnanimity number
According to ability, can quickly handle mass data, and combine R language or python language calculates
Module, make data processing more accurate complete.
In large-scale electric business enterprise, using above-mentioned data processing equipment, merchandise valuation index KVI is calculated
(Key Value Item), wherein data attribute can be commodity category SKU (Stock Keeping Unit),
Data exchange table includes commodity list, order table and flowmeter, and each node server enters to calculate determining for commodity
Valency index KVI (Key Value Item), each module specifically performs following functions:
Summarizing module 402, the related data of merchandise sales is collected for inquiring about, the data being collected into are given birth to
The data exchange table collected into section to schedule, the predetermined amount of time can be needed by people according to business
, can be with 7 days for setting, i.e. a week is the time limit, can also be 30 days, i.e., one month is the time limit,
It can need to set according to analysis.
It can be collected using Hive query languages HQL in data warehouse and collect each item data, generate data
Swap table, including commodity list, order table and flowmeter.
Processing module 404, for being handled using spark computings framework data swap table, it will cut
Into fritter data distribution to each node server of server cluster, each node server carries out commodity
The index KVI (Key Value Item) that fixes a price is calculated.
When calculating merchandise valuation index KVI (Key Value Item), consider user's visit capacity,
Factor, each factors such as commodity page mean residence time, sales volume and the pulling amount of money can be set different
Weight, the setting of weight can consider many-sided needs, according to business need sets itself.
R language or python language computing module 406, for handling nothing in spark computing frameworks
The part that method calculates.
The data processing equipment of present embodiment, based on spark computing frameworks, possesses processing magnanimity number
According to ability, can quickly handle mass data, and combine R language or python language calculates
Module, make data processing more accurate complete.Calculating merchandise valuation index KVI (Key Value Item)
During, commodity page mean residence time is also considered simultaneously and pulls two factors of the amount of money, is made
Merchandise valuation index KVI is more accurate, is provided the foundation preferably to hold merchandise valuation strategy.
The illustrative embodiments of the disclosure are particularly shown and described above.It should be appreciated that
The disclosure is not limited to detailed construction, set-up mode or implementation method described herein;On the contrary, the disclosure
It is intended to cover comprising various modifications in the spirit and scope of the appended claims and equivalence setting.
Claims (12)
- A kind of 1. processing method of data, it is characterised in that including:Data are inquired about, the generation data exchange table that section collects on schedule;The data exchange table is handled using spark computings framework, including:The data exchange table is read, stripping and slicing is carried out to data by a data attribute;By each node of the data distribution of stripping and slicing to server cluster;Each node carries out the calculating related to the data attribute;Collect the implementing result of each node;Wherein, each node is configured with R language or python language computing modules, for handling spark Imponderable part in computing framework.
- 2. processing method as claimed in claim 1, it is characterised in that described that data are inquired about Including:Inquired about using Hive query languages HQL in data warehouse.
- 3. processing method as claimed in claim 1, it is characterised in that described to read the data friendship Changing table includes:The data exchange table is read using spark sql like language.
- 4. processing method as claimed in claim 1, it is characterised in that the data of the stripping and slicing are with bullet Property distributed data collection form storage.
- 5. processing method as claimed in claim 1, it is characterised in that the spark computings framework Data exchange is carried out by pipeline with the R language or python language computing module.
- 6. processing method as claimed in claim 1, it is characterised in that the data attribute is commodity Category, the data exchange table include commodity list, order table and flowmeter, each node carry out with The related calculating of the data attribute includes calculating merchandise valuation index, and merchandise valuation index is pre- according to one Fix time user's visit capacity in section, commodity page mean residence time, sales volume, pull the amount of money not Calculated with weight.
- 7. processing method as claimed in claim 6, it is characterised in that the predetermined amount of time is 7 It or 30 days.
- 8. processing method as claimed in claims 6 or 7, it is characterised in that according to the institute calculated The fraction for stating merchandise valuation index sorts from big to small, and preceding 20% commodity are defined as most important in sequencing table Commodity, 20% in sequencing table after be defined as key commodity to the commodity before 50%, 50% in sequencing table after extremely Commodity before 80% are defined as general merchandise, 80% in sequencing table after commodity be defined as inessential commodity.
- A kind of 9. processing unit of data, it is characterised in that including:Summarizing module, for inquiring about data, the generation data exchange that section collects on schedule Table;Processing module, for being handled using spark computings framework the data exchange table, bag Include:The data exchange table is read, stripping and slicing is carried out to data by a data attribute;By each node of the data distribution of stripping and slicing to server cluster;Each node carries out the calculating related to the data attribute;Collect the implementing result of each node;R language or python language computing modules, by handle in spark computing frameworks can not based on The part of calculation.
- 10. processing unit as claimed in claim 9, it is characterised in that the data attribute is business Product category, the data exchange table include commodity list, order table and flowmeter, and each node is carried out The calculating related to the data attribute includes calculating merchandise valuation index, and merchandise valuation index is according to one User's visit capacity, commodity page mean residence time, sales volume in predetermined amount of time, pull the amount of money Different weights are calculated.
- 11. processing unit as claimed in claim 10, it is characterised in that the predetermined amount of time is 7 days or 30 days.
- 12. the processing unit as described in claim 10 or 11, it is characterised in that according to calculating The fraction of the merchandise valuation index sort from big to small, preceding 20% commodity are defined as most in sequencing table Important goods, 20% in sequencing table after be defined as key commodity to the commodity before 50%, 50% in sequencing table Be defined as general merchandise to the commodity before 80% afterwards, 80% in sequencing table after commodity be defined as inessential business Product.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610319458.2A CN107368501B (en) | 2016-05-13 | 2016-05-13 | Data processing method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610319458.2A CN107368501B (en) | 2016-05-13 | 2016-05-13 | Data processing method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107368501A true CN107368501A (en) | 2017-11-21 |
| CN107368501B CN107368501B (en) | 2020-06-30 |
Family
ID=60304167
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610319458.2A Active CN107368501B (en) | 2016-05-13 | 2016-05-13 | Data processing method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107368501B (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
| CN110359919A (en) * | 2019-07-26 | 2019-10-22 | 中铁隧道局集团有限公司 | A kind of shield machine construction risk prevention system method and system |
| CN110413631A (en) * | 2018-04-25 | 2019-11-05 | 中移(苏州)软件技术有限公司 | A data query method and device |
| CN111191792A (en) * | 2019-12-11 | 2020-05-22 | 平安医疗健康管理股份有限公司 | Data distribution method and device and computer equipment |
| CN111400299A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Method and system for testing fusion quality of multiple data |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2945823A2 (en) * | 2013-01-18 | 2015-11-25 | Serge V. Monros | Microcontroller for pollution control system for an internal combustion engine |
| CN105354336A (en) * | 2015-12-07 | 2016-02-24 | Tcl集团股份有限公司 | Method and apparatus for processing transactional database data |
-
2016
- 2016-05-13 CN CN201610319458.2A patent/CN107368501B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2945823A2 (en) * | 2013-01-18 | 2015-11-25 | Serge V. Monros | Microcontroller for pollution control system for an internal combustion engine |
| CN105354336A (en) * | 2015-12-07 | 2016-02-24 | Tcl集团股份有限公司 | Method and apparatus for processing transactional database data |
Non-Patent Citations (1)
| Title |
|---|
| 吴阳平: "Spark Streaming订单关联案例剖析", 《HTTPS://WWW.IBM.COM/DEVELOPERWORKS/CN/OPENSOURCE/OS-CN-SPARK-ORDER/?LANG=ZH_CN&CA=DWCHINA-_-BLUEMIX-_-WEB-_-CSDN》 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110413631A (en) * | 2018-04-25 | 2019-11-05 | 中移(苏州)软件技术有限公司 | A data query method and device |
| CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
| CN110359919A (en) * | 2019-07-26 | 2019-10-22 | 中铁隧道局集团有限公司 | A kind of shield machine construction risk prevention system method and system |
| CN111191792A (en) * | 2019-12-11 | 2020-05-22 | 平安医疗健康管理股份有限公司 | Data distribution method and device and computer equipment |
| CN111400299A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Method and system for testing fusion quality of multiple data |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107368501B (en) | 2020-06-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI778481B (en) | Computer-implemented system for ai-based product integration and deduplication and method integrating and deduplicating products using ai | |
| US10841743B2 (en) | Branching mobile-device to system-namespace identifier mappings | |
| US6408292B1 (en) | Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions | |
| CN109840730B (en) | Method and device for data prediction | |
| CN107368501A (en) | The processing method and processing device of data | |
| CN107146089A (en) | The single recognition methods of one kind brush and device, electronic equipment | |
| Deng et al. | Solving a Closed‐Loop Location‐Inventory‐Routing Problem with Mixed Quality Defects Returns in E‐Commerce by Hybrid Ant Colony Optimization Algorithm | |
| WO2019169050A1 (en) | Inventory placement recommendation system | |
| Lappas et al. | Efficient and domain-invariant competitor mining | |
| US20210109906A1 (en) | Clustering model analysis for big data environments | |
| US20230177545A1 (en) | Systems for management of location-aware market data | |
| CN116029637A (en) | Cross-border electronic commerce logistics channel intelligent recommendation method and device, equipment and storage medium | |
| CN116308684A (en) | Online shopping platform store information pushing method and system | |
| US20140129269A1 (en) | Forecasting Business Entity Characteristics Based on Planning Infrastructure | |
| CN116777508B (en) | Medical supply analysis management system and method based on big data | |
| CN118917781A (en) | Logistics storage management method and system based on digital twinning | |
| Sahu et al. | The thematic landscape of literature on supply chain management in India: a systematic literature review | |
| CN110020918A (en) | Recommendation information generation method and system | |
| US10235711B1 (en) | Determining a package quantity | |
| CN112819404A (en) | Data processing method and device, electronic equipment and storage medium | |
| CN112561559A (en) | Merchant portrait model generation method, device, equipment and storage medium | |
| CN103838775A (en) | Data analysis method and data analysis device | |
| JP2010277571A (en) | Product selection system and method, and product selection computer program | |
| RU2480828C1 (en) | Method of predicting target value of events based on unlimited number of characteristics | |
| KR102848300B1 (en) | Method for simulating of merchandise price in e-commerce service, and device thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |