CN105912274A - Streaming data positioning method and apparatus - Google Patents
Streaming data positioning method and apparatus Download PDFInfo
- Publication number
- CN105912274A CN105912274A CN201610252499.4A CN201610252499A CN105912274A CN 105912274 A CN105912274 A CN 105912274A CN 201610252499 A CN201610252499 A CN 201610252499A CN 105912274 A CN105912274 A CN 105912274A
- Authority
- CN
- China
- Prior art keywords
- data
- sampling
- described data
- locking parameter
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the present invention provide a streaming data positioning method and apparatus. The method comprises: receiving a data position parameter, inquiring descriptive information of a corresponding data block according to the data position parameter, and determining a data segment where target data is in; carrying out data sampling on the data segment in a preset step size, and obtaining data identification mark of a data sampling result; and according to the data identification mark and the data position parameter, determining whether the data sampling result contains the target data, and if yes, carrying out determination on the sampling results one by one until the target data is positioned. The method and apparatus disclosed by the present invention realize accurate and efficient data position.
Description
Technical field
The present embodiments relate to data processing field, particularly relate to a kind of stream data localization method and device.
Background technology
After substantial amounts of real-time messages is saved in message queue, by the way of distributed treatment, it is saved in HDFS
In (Hadoop distributed file system).Such as under internet environment, the message number that each moment produces is the hugest
Big, these message can be collected by message pipeline in backstage.
These real-time message datas are stream transmissions, be characterized in as current interrupted transmission and its store
It it is burst.Streaming data processes when, because its data volume is relatively big, need the abundant resource can be efficient
It is processed by ground.And distributed system has abundant resource, therefore, before data process, the streaming that need to will preserve
Data are uploaded to distributed file system.
The stream data received differs to establish a capital and is saved in internal memory, it is also possible to persistence is saved on disk, on disk
Data be one piece one piece.And the increasing of data volume, the data volume of storage in each data block As time goes on
Reach a certain amount of after, a new data block can be generated.
When being preserved to distributed file system by stream data, its difficult point is, the demand data sent according to user
Carry out the location of stream data;Such as, the data of location some day or certain data of several hours in some day.Because data
The unit of storage is block rather than stores according to time granularity.Block is the minimum particle size unit of data storage, therefore, and certain
If one month only blocks of data of a kind of data, then, this blocks of data would contain 30 day data in this month, carries out in data
Just can only navigate to a moon granularity when of location, and a day granularity can not be navigated to;If a certain data has 10 pieces of numbers for 1 day
According to, then the precision of location is less than 1 day.Currently existing scheme streaming location solution is not provided that pinpoint data merit
Can, can only obtain a relatively coarse locator value, and accuracy is the highest.
Such as, user wants that the data searched are the data on March 1, but the data on March 1 are stored in the number in March
According to, coarse localization when, can only coarse localization be stored in the data block in March to the required data segment of user, but
Specifically which blocks of data can not be informed.If the data in whole data block are all uploaded to processing system, then work
Measure the hugest.
Therefore, a kind of stream data localization method urgently proposes.
Summary of the invention
The embodiment of the present invention provides a kind of stream data localization method and device, in order to solve data locking in prior art
Inaccurate and data are uploaded to inefficient defect during distributed system, it is achieved the efficient data locking of accurate data.
The embodiment of the present invention provides a kind of stream data localization method, including:
Receive data locking parameter, according to the descriptive information of data block corresponding to described data locking parameter query, determine
The data segment at target data place;
With default step-length, described data segment carried out data sampling, and obtain the data identification of described data sampling result
Mark;
According to described data identification marking and described data locking parameter, it is judged that whether described data sampling result comprises
Described target data, when being judged to be, judges in described data sampling result one by one, until location target data.
The embodiment of the present invention provides a kind of stream data positioner, including:
First locating module, is used for receiving data locking parameter, according to the data that described data locking parameter query is corresponding
The descriptive information of block, determines the data segment at target data place;
Sampling module, for described data segment carrying out data sampling with default step-length, and obtains described data sampling
The data identification marking of result;
Second locating module, for according to described data identification marking and described data locking parameter, it is judged that described number
Whether comprise described target data according to sampled result, when being judged to be, judge one by one in described data sampling result,
Until location target data..
The stream data localization method of embodiment of the present invention offer and device, according to data locking parameter acquiring target data
Place data segment, and described data segment is sampled thus according to sampling result, data is positioned, change existing skill
When art carries out stream data location, need the troublesome operation one by one data segment contrasted, it is achieved that accurate data locking,
Improve data and be uploaded to the efficiency of distributed file system.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is this
Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to root
Other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is the techniqueflow chart of the embodiment of the present application one;
Fig. 2 is the techniqueflow chart of the embodiment of the present application two;
Fig. 3 is the device example structure schematic diagram of the embodiment of the present application three.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
The a part of embodiment of the present invention rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under not making creative work premise, broadly falls into the scope of protection of the invention.
The embodiment of the present application can have following application scenarios: stream data needs to be uploaded to distributed system parallel processing,
Before uploading, need according to target data demand, the current data preserved to be positioned, i.e. in the streaming of a large amount of storages
Data accurately in find target data and after finding these data, these data be uploaded to distributed system and carry out
Process.
Fig. 1 is the techniqueflow chart of the embodiment of the present application one, in conjunction with Fig. 1, and the embodiment of the present application one data positioning method,
Mainly comprise the following steps that
Step S110: receive data locking parameter, according to the explanation of data block corresponding to described data locking parameter query
Information is so that it is determined that the data segment at target data place;
Step S120: with default step-length, described data segment carried out data sampling, and obtain described data sampling result
Data identification marking;
Step S130: judge described data sampling result according to described data identification marking and described data locking parameter
Whether comprise described target data, when being judged to be, judge one by one in described data sampling result, until location mesh
Mark data.
Concrete, in step s 110, described data locking parameter is that demand data side sends, for according to this parameter
Carry out the location of data.Described data locking parameter can include time tag corresponding to data, line number, call number, side-play amount
Deng.Such as, a certain user needs the data obtaining xx x day month x to be analyzed, " xx x day month x " i.e. data locking parameter;
Or, it is the data of xxx that user needs to obtain call number, " call number xxx " i.e. data locking parameter.
In this step, according to the descriptive information of described data locking parameter query data base, specifically with described data locking
Parameter is reference, inquires about the descriptive information of each data block existing.The descriptive information of described data block is to describe data block
Data, comprise the descriptive information to data block and information resources, such as, in a certain data block storage be certain moon which extremely
Which data.
Owing to each data base existing substantial amounts of data, when carrying out data locking, if according to location requirement
The comparison one by one of each data is the most extremely lost time and calculates resource by parameter.But, relative to the number in each data
For amount, the data volume of its corresponding descriptive information is minimum, therefore, in the embodiment of the present application, first according to described location
The descriptive information of each data block of demand parameter traversal queries, the data block at coarse localization target data place, follow-up further according to
These data blocks carry out next step finer location.The rough query script of this step greatly reduces the scope of data locking,
Save the data locking time.
Concrete, in the step s 120, described default step-length can be a variable, according to the number comprised in data block
It is adjusted according to the size of total amount and the result sampled each time.Such as, the data volume comprised in a certain data block is the hugest
Greatly, if sampling step length is too small, then the increased frequency sampled, cause the efficiency of data locking to promote;If in a certain data block
The data volume comprised is smaller, according to excessive sampling step length, is then easily caused the data volume obtained of sampling and accounts for whole data
The proportion of block is very big, when follow-up being accurately positioned, needs the data volume compared one by one to increase.Therefore, described default step
Length is the empirical value that a data volume size comprised to data block is relevant.
Separately, after carrying out sampling for the first time to data block with less step-length, sampled result is identified, according to sampling
The data identification marking of data judges to learn that sampled result is sufficiently close to target data, and former step-length now can be kept to sample
Can also suitably increase sampling step length such that it is able to be quickly found out the data at target data place in the most time saving mode
Section.
Wherein, described data identification marking is identical with the described data locking parameter in step S110, these identification markings
Data per se with, time tag corresponding to data, line number, call number, side-play amount etc. can be included.
Concrete, in step s 130, judge described according to described data identification marking and described data locking parameter
Whether data sampling result comprises described target data, the described data locking parameter from user that i.e. contrast receives and
The described data identification marking of the sampled data read from sampled data is the most consistent.Such as, join when described data locking
During the time tag that number is data, then according to described time tag, the described data identification marking of the data that inquiry sampling obtains
In the time tag that comprises, it may be judged whether having consistent, if having, then judging that described data sampling result comprises described number of targets
According to.
If described data sampling result does not comprise described target data, then described data segment will be continued sampling.Tool
Body, from the end position of described data sampling result, with described default step-length, described data segment is carried out data and adopt again
Sample, and obtain the data identification marking of described data re-sampling result until according to described data identification marking and described data
Positional parameter judges that described data re-sampling result comprises described target data.
Wherein, it is judged that whether described data sampling result comprises described target data, can be following method:
When described data locking parameter is data time label, the number of the Article 1 data in inquiry data sampling result
According to the data time label comprised in identification marking, contrast two data time tags which front which rear.Because of streaming
Data are that a time sequencing come according to data carries out storing, and have successively in time, and therefore, contrast sampling obtains
The time tag of data i.e. can determine whether whether described sampled data comprises target data.If the Article 1 data of sampled result exist
Time corresponding on the time tag of delayed location requirement on time, then illustrate, target data must be present sample result it
Front data, the data after described Article 1 data are not the most target datas, then without again to post-sampling;Otherwise, if adopting
The Article 1 data of the sample result time that the time tag of location requirement is corresponding in advance in time, then the mesh of location requirement is described
Mark data, after described Article 1 data, also need to continue to carry out even multiple repairing weld of sampling in remaining data segment.Wherein,
During described data sampling, described default step-length can constantly be updated, updated value in the renewal process of step-length
Value size be an empirical process, the embodiment of the present application is not limiting as the size of concrete updated value.
Such as, the embodiment of the present application can have following step-length renewal process, and the data segment obtaining location is sampled,
Sampling with fixed step size N (setting N=5000), the logic that arranges of sampling step length can be 0+5000*n, first at 0+
At 5000*1, sampling one data, mates, if not comprising target data, continues to be taken at one number of sampling at 0+5000*2
According to, find to comprise target data, then can determine whether to learn target data between the 5000th data to the 1000th data, now
Needing to sample between the data segment of 5000 to 10000, step-length needs to update.Such as may be updated as (10000-5000)/2
=2500, step-length is the half of data strip number in sampled data.I.e. sample at 7500 data a data and with location ginseng
Number contrasts, if not mating, then continues to update (reducing) sampling step length.If in sampled result, the bar number of data is less than pre-
If data-quantity threshold, all data of the most once sampling out, one by one judge whether coupling, step-length now is 1 with regard to theory.This
In application embodiment, described data-quantity threshold can be 200, because when data volume is less than 200, network accesses the time of expense
Close with the time overhead of local search, at this moment stop sampling, efficiency is the highest.
It should be noted that in the embodiment of the present application, to described data locking parameter and described data identification marking
When comparing, the data identification marking of the Article 1 data being generally selected in current data sample result and described data locking
Parameter contrasts, if the described data identification marking of Article 1 data is inconsistent with described data locking parameter, the most right
Ratio remaining data in present sample result, can directly judge not comprise in present sample data result target data.
When the knot comprising target data and re-sampling next time in the result of the even re-sampling of confirming data are sampled
When not comprising target data in Guo, just stop sampling, the most just obtained comprising all sampled datas of target data.Afterwards,
In the sampled data obtained, according to described data locking parameter, whether each data in contrast sampled data is one by one
Target data, thus accomplish that high efficiency positions.In the present embodiment, according to data locking parameter acquiring target data place data
Section, and described data segment is sampled thus according to sampling result, data is positioned, change prior art and flow
During formula data locking, need the troublesome operation one by one data segment contrasted, it is achieved that accurate data locking, improve number
According to the efficiency being uploaded to distributed file system.
Fig. 2 is the techniqueflow chart of the embodiment of the present application two, in conjunction with Fig. 2, the application one data positioning method, also includes
The most feasible enforcement step:
Step S210: receive data locking parameter, according to the explanation of data block corresponding to described data locking parameter query
Information is so that it is determined that the data segment at target data place;
Step S220: with default step-length, described data segment carried out data sampling, and obtain described data sampling result
Data identification marking;
Step S230: judge described data sampling result according to described data identification marking and described data locking parameter
Whether comprise described target data, when being judged to be, judge one by one in described data sampling result, until location mesh
Mark data.
Step S240: described target data is carried out segmentation according to preset strategy;
Step S250: the result of described segmentation is encapsulated and generate distributed parallel task and is uploaded to distributed field system
System.
Above-mentioned steps S210~step S230 are with a kind of step S110 of embodiment~step S130, and here is omitted.
Concrete, in step S240, described preset strategy can include following manner:
One, divides equally described target data according to the fall into a trap quantity of operator node of distributed type assemblies;
Its two, according to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time
Between demand calculate data sectional threshold value described target data is carried out segmentation according to described data sectional threshold value.
Wherein, the process divided equally not in view of the calculating resource residual amount of each calculating node in server cluster with
And computing capability, directly pending target data is averaged point according to the quantity calculating node, it is advantageous that, save
Analysis to each calculating node computing capability, more saves the time.
In another kind of data sectional mode, before segmentation, in preferential acquisition server cluster, each node word calculates
Ability, such as computational efficiency and calculating time demand, thus according to these reference datas data are carried out segmentation when
Suitably do some, it is possible to realize more reasonably distribution of computation tasks.
Concrete, in step s 250, the target data that location obtains needed before being uploaded to distributed file system
It is packaged.
By abovementioned steps, it is accurately positioned and target data after segmentation, for the data of segmentation, often
One piece of data has starting position, end position, the storage server location of data, the metadata information etc. of data.
In this step, described encapsulation is by the starting position of each segmentation, end position, the storage server position of data
Putting, the metadata information of data etc. is all packaged into the data object of a kind of Hadoop distributed task scheduling MapReduce understanding, thus
MapReduce task takes these segment informations, it is possible to the concrete data in access segment, stores data into distributed literary composition
In part system.Owing to MapReduce task is distributed, resource abundance, treatment effeciency is the highest.
In the present embodiment, after being accurately positioned user requested data, according to default strategy, data are carried out segmentation, fully
Consider the server resource utilization rate of distributed processing system(DPS) and calculate resource utilization, improving data further and upload
Efficiency to distributed file system.
Fig. 3 is the apparatus structure schematic diagram of the embodiment of the present application three, in conjunction with Fig. 3, the embodiment of the present application one data locking
Device, the module including following:
First locating module 310, is used for receiving data locking parameter, according to the number that described data locking parameter query is corresponding
According to the descriptive information of block so that it is determined that the data segment at target data place;
Sampling module 320, for described data segment carrying out data sampling with default step-length, and obtains described data acquisition
The data identification marking of sample result;
Second locating module 330, for judging described according to described data identification marking and described data locking parameter
Whether data sampling result comprises described target data, when being judged to be, sentences one by one in described data sampling result
Disconnected, until location target data.
Wherein, described data locking parameter, specifically include: time tag corresponding to data, line number, call number, side-play amount.
Wherein, described sampling module 320 specifically for: open from the original position of described data segment, with described default
Step-length takes out the data of respective numbers from described data segment.
Wherein, described second locating module 330 is specifically additionally operable to: if it has not, then from the end of described data sampling result
Position is risen, and with described default step-length, described data segment carries out data re-sampling, and obtains described data re-sampling result
Data identification marking is until tying according to data re-sampling described in described data identification marking and described data locking parameter decision
Fruit comprises described target data.
Wherein, described second locating module 330 is additionally operable to: during described data sampling, know according to described data
Not Biao Shi and the comparing result of described data locking parameter described default step-length is updated.
Wherein, described device also includes that segmentation module 340, described segmentation module 340 are used for: according to preset strategy to described
Target data carries out segmentation, encapsulates and generates distributed parallel task by the result of described segmentation and be uploaded to distributed field system
System.
Wherein, described preset strategy includes: enter described target data according to the fall into a trap quantity of operator node of distributed type assemblies
Row is divided equally;Or, according to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time need
Seek calculating data sectional threshold value and according to described data sectional threshold value, described target data carried out segmentation.
Fig. 3 shown device can perform the method for Fig. 1 and embodiment illustrated in fig. 2, it is achieved principle and technique effect reference
Fig. 1 and embodiment illustrated in fig. 2, repeat no more.
Application example
A concrete application scenarios will be combined, with the actual example technology to the embodiment of the present application with lower part
Scheme is further elaborated.
On a storage medium, storage has the user comment data from video website, and data consumer needs to comment these
Opinion data carry out processing thus analyze video-see interest-degree and the viewing focus of user.The comment data of user is not pregnancy ceased
Raw stream data, at server end, these comment data leave in storage medium with the form of data block.Storage data source
Source is constantly come, and after current data block is filled with, new data can set up another a data block to store data.
Assume this video website in premiere on April 1 in 2016 one film, now want to obtain movie premiere after one week
User comment data thus the result of broadcast of film is analyzed.But all of comment data is all that piecemeal is deposited, number
Quantity according to block is more, and the data needed for orienting data consumer from these substantial amounts of data are difficult points.According to this
The technical scheme of application embodiment, for saving efficiency, it is impossible to all of data block is all scanned one time, can only be according to data consumption
First the data of person's demand tell in obtaining certain a period of time the address of the data block produced, and the comment data of such as 2016 is deposited
The data being placed on which data block or in April, 2016 leave those data blocks in.
Each data block is illustrated information, reads the descriptive information of each data block, i.e. may know that in this data block
House the comment data of those periods.It is assumed that in the present embodiment, the positional parameter given according to user, i.e. in April, 2016 this
Time tag navigates to data in storage medium block 1 and data block 2 deposits the comment data in April, 2016.So connect down
Come, have which data is that the data in April, 2016 are even down to which number by continuing inquiry in data block 1 and data block 2
According to being on April 1st, 2016~the data on April 7th, 2016.Now, in storage medium, except other of data block 1 and data block 2
Data block can be abandoned, and does not inquires about them, thus lacking of changing that all data blocks are all scanned by prior art
Fall into, improve the efficiency of data coarse positioning.
After comprising the target data of demand in determining data block 1 and data block 2, use certain sampling step length to dividing
Other data block 1 and data block 2 are sampled.By lower part to be described in detail as a example by the sampling process of data block 1.
Assume that sampling step length for sampling 1000 data every time, then, from the beginning of the original position of data block 1, extracts 1000
Data, and obtain the time tag of Article 1 data in this 1000 data, the time that query time label is corresponding, if time
Between before 1 day April in 2016, then abandon this 1000 sampled datas, to data block 1 continue sampling.The step next time sampled
Length can be once to take 1000 data or once take 5000 data.The end that the start node of sampling was sampled from the last time
Start, it is assumed that re-sampling 1000 data, take the Article 1 data in 1000 data, find after reduced time label still to wrap
Include target data, then can increase sampling step length again, such as, from the beginning of the end of second time sampled data, 1000 numbers of re-sampling
According to.Choose the Article 1 data in 1000 data, after reduced time label, the Article 1 data that finding samples specifically obtains
Time tag lags behind in April, 2016, can determine that target data between the 2000th data and the 3000th data, the most more
New sampling step length is (3000-2000)/2=500.Next with 500 as sampling step length, the 2000th data that sampling is obtained
With the 3000th sample between data, i.e. judge the 2500th data whether more than April 7 in 2016, if it is determined that be yes, this
Continuing to zoom out sampling step length is (2500-2000)/2=250, i.e. judges whether the 2250th data is more than April 7 in 2016, if
Being judged to it is then to extract 250 data between 2000 to 2250 out, contrasting which data in this 250 data one by one is 2016
On April 1, in~the data on April 7th, 2016.Owing to stream data is ordered into, therefore, said method is used to look for completely
Go out user's on April 1st, 1~the user comment data on April 7th, 2016.
Device embodiment described above is only schematically, and the wherein said unit illustrated as separating component can
To be or to may not be physically separate, the parts shown as unit can be or may not be physics list
Unit, i.e. may be located at a place, or can also be distributed on multiple NE.Can be selected it according to the actual needs
In some or all of module realize the purpose of the present embodiment scheme.Those of ordinary skill in the art are not paying creativeness
Work in the case of, be i.e. appreciated that and implement.
Through the above description of the embodiments, those skilled in the art it can be understood that to each embodiment can
The mode adding required general hardware platform by software realizes, naturally it is also possible to pass through hardware.Based on such understanding, on
State the part that prior art contributes by technical scheme the most in other words to embody with the form of software product, should
Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD etc., including some fingers
Make with so that a computer installation (can be personal computer, server, or network equipment etc.) performs each and implements
The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate technical scheme, it is not intended to limit;Although
With reference to previous embodiment, the present invention is described in detail, it will be understood by those within the art that: it still may be used
So that the technical scheme described in foregoing embodiments to be modified, or wherein portion of techniques feature is carried out equivalent;
And these amendment or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and
Scope.
Claims (14)
1. a data positioning method, it is characterised in that comprise the following steps that
Receive data locking parameter, according to the descriptive information of data block corresponding to described data locking parameter query, determine target
The data segment at data place;
With default step-length, described data segment carried out data sampling, and obtain the data identification mark of described data sampling result
Know;
According to described data identification marking and described data locking parameter, it is judged that whether described data sampling result comprises described
Target data, when being judged to be, judges in described data sampling result one by one, until location target data.
Method the most according to claim 1, it is characterised in that described data segment is carried out data acquisition with default step-length
Sample, specifically includes:
Open from the original position of described data segment, from described data segment, take out the number of respective numbers with described default step-length
According to.
Method the most according to claim 1, it is characterised in that according to described data identification marking and described data locking
Parameter judges whether described data sampling result comprises described target data, also includes:
If it has not, then from the end position of described data sampling result, described data segment is carried out with described default step-length
Data re-sampling, and obtain the data identification marking of described data re-sampling result, until according to described data identification marking with
And data re-sampling result described in described data locking parameter decision comprises described target data.
Method the most according to claim 1, it is characterised in that described method also includes:
During described data sampling, according to described data identification marking and the comparing result of described data locking parameter
Described default step-length is adjusted.
Method the most according to claim 1, it is characterised in that described data locking parameter, specifically includes:
Time tag corresponding to data, line number, call number, side-play amount.
Method the most according to claim 1 is characterized in that, described method also includes:
According to presetting method, described target data is carried out segmentation, the result of described segmentation is encapsulated and generate distributed parallel and appoints
Business is uploaded to distributed file system.
Method the most according to claim 6, it is characterised in that described presetting method includes:
According to the fall into a trap quantity of operator node of distributed type assemblies, described target data is divided equally;Or,
According to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time demand calculate
Data sectional threshold value also carries out segmentation according to described data sectional threshold value to described target data.
8. a data location apparatus, it is characterised in that include following module:
First locating module, is used for receiving data locking parameter, according to data block corresponding to described data locking parameter query
Descriptive information, determines the data segment at target data place;
Sampling module, for described data segment carrying out data sampling with default step-length, and obtains described data sampling result
Data identification marking;
Second locating module, for according to described data identification marking and described data locking parameter, it is judged that described data acquisition
Whether sample result comprises described target data, when being judged to be, judges one by one in described data sampling result, until
Location target data.
Device the most according to claim 7, it is characterised in that described sampling module specifically for:
Open from the original position of described data segment, from described data segment, take out the number of respective numbers with described default step-length
According to.
Device the most according to claim 7, it is characterised in that described second locating module is specifically additionally operable to:
If it has not, then from the end position of described data sampling result, described data segment is carried out with described default step-length
Data re-sampling, and obtain the data identification marking of described data re-sampling result, until according to described data identification marking with
And data re-sampling result described in described data locking parameter decision comprises described target data.
11. devices according to claim 7, it is characterised in that described second locating module is additionally operable to:
During described data sampling, according to described data identification marking and the comparing result of described data locking parameter
Described default step-length is updated.
12. devices according to claim 8, it is characterised in that described data locking parameter, specifically include:
Time tag corresponding to data, line number, call number, side-play amount.
13. devices according to claim 8 is characterized in that, described device also includes segmentation module, described segmentation module
For:
According to presetting method, described target data is carried out segmentation, the result of described segmentation is encapsulated and generate distributed parallel and appoints
Business is uploaded to distributed file system.
14. devices according to claim 13, it is characterised in that described presetting method includes:
According to the fall into a trap quantity of operator node of distributed type assemblies, described target data is divided equally;Or,
According to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time demand calculate
Data sectional threshold value also carries out segmentation according to described data sectional threshold value to described target data.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610252499.4A CN105912274A (en) | 2016-04-21 | 2016-04-21 | Streaming data positioning method and apparatus |
| PCT/CN2016/101092 WO2017181614A1 (en) | 2016-04-21 | 2016-09-30 | Streaming data positioning method, apparatus and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610252499.4A CN105912274A (en) | 2016-04-21 | 2016-04-21 | Streaming data positioning method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN105912274A true CN105912274A (en) | 2016-08-31 |
Family
ID=56747677
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610252499.4A Pending CN105912274A (en) | 2016-04-21 | 2016-04-21 | Streaming data positioning method and apparatus |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN105912274A (en) |
| WO (1) | WO2017181614A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017181614A1 (en) * | 2016-04-21 | 2017-10-26 | 乐视控股(北京)有限公司 | Streaming data positioning method, apparatus and electronic device |
| CN110147384A (en) * | 2019-04-17 | 2019-08-20 | 平安科技(深圳)有限公司 | Data search method for establishing model, device, computer equipment and storage medium |
| CN112883064A (en) * | 2021-03-02 | 2021-06-01 | 清华大学 | Self-adaptive sampling and query method and system |
| CN114764932A (en) * | 2020-12-31 | 2022-07-19 | 腾讯科技(深圳)有限公司 | Organism anti-counterfeiting identification method, device, equipment and storage medium |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109657294B (en) * | 2018-11-29 | 2023-04-21 | 中国航空工业集团公司沈阳飞机设计研究所 | Automatic test flight data analysis method and system based on characteristic parameters |
| CN111159506B (en) * | 2019-12-26 | 2023-11-14 | 广州信天翁信息科技有限公司 | Data validity identification method, device, equipment and readable storage medium |
| CN112199249B (en) * | 2020-09-16 | 2024-10-01 | 中国建设银行股份有限公司 | Method, device, equipment and medium for processing monitoring data |
| CN115220073B (en) * | 2022-07-01 | 2025-03-14 | 千寻位置网络有限公司 | Positioning method, device, equipment, medium and product |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101178693A (en) * | 2007-12-14 | 2008-05-14 | 沈阳东软软件股份有限公司 | Data cache method and system |
| US20080256143A1 (en) * | 2007-04-11 | 2008-10-16 | Data Domain, Inc. | Cluster storage using subsegmenting |
| CN102054000A (en) * | 2009-10-28 | 2011-05-11 | 中国移动通信集团公司 | Data querying method, device and system |
| CN102841860A (en) * | 2012-08-17 | 2012-12-26 | 珠海世纪鼎利通信科技股份有限公司 | Large data volume information storage and access method |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105354242A (en) * | 2015-10-15 | 2016-02-24 | 北京航空航天大学 | Distributed data processing method and device |
| CN105912274A (en) * | 2016-04-21 | 2016-08-31 | 乐视控股(北京)有限公司 | Streaming data positioning method and apparatus |
-
2016
- 2016-04-21 CN CN201610252499.4A patent/CN105912274A/en active Pending
- 2016-09-30 WO PCT/CN2016/101092 patent/WO2017181614A1/en active Application Filing
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080256143A1 (en) * | 2007-04-11 | 2008-10-16 | Data Domain, Inc. | Cluster storage using subsegmenting |
| CN101178693A (en) * | 2007-12-14 | 2008-05-14 | 沈阳东软软件股份有限公司 | Data cache method and system |
| CN102054000A (en) * | 2009-10-28 | 2011-05-11 | 中国移动通信集团公司 | Data querying method, device and system |
| CN102841860A (en) * | 2012-08-17 | 2012-12-26 | 珠海世纪鼎利通信科技股份有限公司 | Large data volume information storage and access method |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017181614A1 (en) * | 2016-04-21 | 2017-10-26 | 乐视控股(北京)有限公司 | Streaming data positioning method, apparatus and electronic device |
| CN110147384A (en) * | 2019-04-17 | 2019-08-20 | 平安科技(深圳)有限公司 | Data search method for establishing model, device, computer equipment and storage medium |
| CN110147384B (en) * | 2019-04-17 | 2023-06-20 | 平安科技(深圳)有限公司 | Data search model establishment method, device, computer equipment and storage medium |
| CN114764932A (en) * | 2020-12-31 | 2022-07-19 | 腾讯科技(深圳)有限公司 | Organism anti-counterfeiting identification method, device, equipment and storage medium |
| CN114764932B (en) * | 2020-12-31 | 2025-07-11 | 腾讯科技(深圳)有限公司 | Biometric anti-counterfeiting identification method, device, equipment and storage medium |
| CN112883064A (en) * | 2021-03-02 | 2021-06-01 | 清华大学 | Self-adaptive sampling and query method and system |
| CN112883064B (en) * | 2021-03-02 | 2022-11-15 | 清华大学 | An adaptive sampling and query method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2017181614A1 (en) | 2017-10-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105912274A (en) | Streaming data positioning method and apparatus | |
| CN110717536B (en) | A method and device for generating training samples | |
| JP6162781B2 (en) | Method and apparatus for determining the location of a new point of interest | |
| CN109828993B (en) | Statistical data query method and device | |
| WO2015050567A1 (en) | System and method for performing set operations with defined sketch accuracy distribution | |
| CN105389311B (en) | It is a kind of for determining the method and apparatus of query result | |
| CN109977135A (en) | A kind of data query method, apparatus and server | |
| US10430015B2 (en) | Image analysis | |
| CN113850837B (en) | Video processing method and device, electronic equipment, storage medium and computer product | |
| CN101425088A (en) | Key frame extracting method and system based on chart partition | |
| CN111078818A (en) | Address analysis method and device, electronic equipment and storage medium | |
| CN106648839B (en) | Data processing method and device | |
| CN111427976B (en) | Road freshness obtaining method and device | |
| CA3075730C (en) | Cold matching by automatic content recognition | |
| CN111258974A (en) | A kind of vehicle offline scene data processing method and system | |
| CN115576973B (en) | Service deployment method, device, computer equipment and readable storage medium | |
| US20170337192A1 (en) | Network node, indexing server and methods performed thereby for supportingindexing of audio visual content | |
| CN106326439B (en) | A kind of storage of real-time video, search method and device | |
| CN105468728A (en) | Cross-section data acquisition method and system | |
| CN109710827B (en) | Picture attribute management method and device, picture server and business processing terminal | |
| CN113362090A (en) | User behavior data processing method and device | |
| CN110082794B (en) | Vehicle GPS track data filtering method | |
| CN110008269B (en) | Data reflow method, device, equipment and system | |
| CN115297342B (en) | Multi-camera video processing method and device, storage medium and computer equipment | |
| CN110909072B (en) | Data table establishment method, device and equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160831 |