[go: up one dir, main page]

CN105912274A - Streaming data positioning method and apparatus - Google Patents

Streaming data positioning method and apparatus Download PDF

Info

Publication number
CN105912274A
CN105912274A CN201610252499.4A CN201610252499A CN105912274A CN 105912274 A CN105912274 A CN 105912274A CN 201610252499 A CN201610252499 A CN 201610252499A CN 105912274 A CN105912274 A CN 105912274A
Authority
CN
China
Prior art keywords
data
sampling
described data
locking parameter
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610252499.4A
Other languages
Chinese (zh)
Inventor
赵富欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LeTV Holding Beijing Co Ltd
LeTV Information Technology Beijing Co Ltd
Original Assignee
LeTV Holding Beijing Co Ltd
LeTV Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LeTV Holding Beijing Co Ltd, LeTV Information Technology Beijing Co Ltd filed Critical LeTV Holding Beijing Co Ltd
Priority to CN201610252499.4A priority Critical patent/CN105912274A/en
Publication of CN105912274A publication Critical patent/CN105912274A/en
Priority to PCT/CN2016/101092 priority patent/WO2017181614A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present invention provide a streaming data positioning method and apparatus. The method comprises: receiving a data position parameter, inquiring descriptive information of a corresponding data block according to the data position parameter, and determining a data segment where target data is in; carrying out data sampling on the data segment in a preset step size, and obtaining data identification mark of a data sampling result; and according to the data identification mark and the data position parameter, determining whether the data sampling result contains the target data, and if yes, carrying out determination on the sampling results one by one until the target data is positioned. The method and apparatus disclosed by the present invention realize accurate and efficient data position.

Description

Stream data localization method and device
Technical field
The present embodiments relate to data processing field, particularly relate to a kind of stream data localization method and device.
Background technology
After substantial amounts of real-time messages is saved in message queue, by the way of distributed treatment, it is saved in HDFS In (Hadoop distributed file system).Such as under internet environment, the message number that each moment produces is the hugest Big, these message can be collected by message pipeline in backstage.
These real-time message datas are stream transmissions, be characterized in as current interrupted transmission and its store It it is burst.Streaming data processes when, because its data volume is relatively big, need the abundant resource can be efficient It is processed by ground.And distributed system has abundant resource, therefore, before data process, the streaming that need to will preserve Data are uploaded to distributed file system.
The stream data received differs to establish a capital and is saved in internal memory, it is also possible to persistence is saved on disk, on disk Data be one piece one piece.And the increasing of data volume, the data volume of storage in each data block As time goes on Reach a certain amount of after, a new data block can be generated.
When being preserved to distributed file system by stream data, its difficult point is, the demand data sent according to user Carry out the location of stream data;Such as, the data of location some day or certain data of several hours in some day.Because data The unit of storage is block rather than stores according to time granularity.Block is the minimum particle size unit of data storage, therefore, and certain If one month only blocks of data of a kind of data, then, this blocks of data would contain 30 day data in this month, carries out in data Just can only navigate to a moon granularity when of location, and a day granularity can not be navigated to;If a certain data has 10 pieces of numbers for 1 day According to, then the precision of location is less than 1 day.Currently existing scheme streaming location solution is not provided that pinpoint data merit Can, can only obtain a relatively coarse locator value, and accuracy is the highest.
Such as, user wants that the data searched are the data on March 1, but the data on March 1 are stored in the number in March According to, coarse localization when, can only coarse localization be stored in the data block in March to the required data segment of user, but Specifically which blocks of data can not be informed.If the data in whole data block are all uploaded to processing system, then work Measure the hugest.
Therefore, a kind of stream data localization method urgently proposes.
Summary of the invention
The embodiment of the present invention provides a kind of stream data localization method and device, in order to solve data locking in prior art Inaccurate and data are uploaded to inefficient defect during distributed system, it is achieved the efficient data locking of accurate data.
The embodiment of the present invention provides a kind of stream data localization method, including:
Receive data locking parameter, according to the descriptive information of data block corresponding to described data locking parameter query, determine The data segment at target data place;
With default step-length, described data segment carried out data sampling, and obtain the data identification of described data sampling result Mark;
According to described data identification marking and described data locking parameter, it is judged that whether described data sampling result comprises Described target data, when being judged to be, judges in described data sampling result one by one, until location target data.
The embodiment of the present invention provides a kind of stream data positioner, including:
First locating module, is used for receiving data locking parameter, according to the data that described data locking parameter query is corresponding The descriptive information of block, determines the data segment at target data place;
Sampling module, for described data segment carrying out data sampling with default step-length, and obtains described data sampling The data identification marking of result;
Second locating module, for according to described data identification marking and described data locking parameter, it is judged that described number Whether comprise described target data according to sampled result, when being judged to be, judge one by one in described data sampling result, Until location target data..
The stream data localization method of embodiment of the present invention offer and device, according to data locking parameter acquiring target data Place data segment, and described data segment is sampled thus according to sampling result, data is positioned, change existing skill When art carries out stream data location, need the troublesome operation one by one data segment contrasted, it is achieved that accurate data locking, Improve data and be uploaded to the efficiency of distributed file system.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is this Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to root Other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is the techniqueflow chart of the embodiment of the present application one;
Fig. 2 is the techniqueflow chart of the embodiment of the present application two;
Fig. 3 is the device example structure schematic diagram of the embodiment of the present application three.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under not making creative work premise, broadly falls into the scope of protection of the invention.
The embodiment of the present application can have following application scenarios: stream data needs to be uploaded to distributed system parallel processing, Before uploading, need according to target data demand, the current data preserved to be positioned, i.e. in the streaming of a large amount of storages Data accurately in find target data and after finding these data, these data be uploaded to distributed system and carry out Process.
Fig. 1 is the techniqueflow chart of the embodiment of the present application one, in conjunction with Fig. 1, and the embodiment of the present application one data positioning method, Mainly comprise the following steps that
Step S110: receive data locking parameter, according to the explanation of data block corresponding to described data locking parameter query Information is so that it is determined that the data segment at target data place;
Step S120: with default step-length, described data segment carried out data sampling, and obtain described data sampling result Data identification marking;
Step S130: judge described data sampling result according to described data identification marking and described data locking parameter Whether comprise described target data, when being judged to be, judge one by one in described data sampling result, until location mesh Mark data.
Concrete, in step s 110, described data locking parameter is that demand data side sends, for according to this parameter Carry out the location of data.Described data locking parameter can include time tag corresponding to data, line number, call number, side-play amount Deng.Such as, a certain user needs the data obtaining xx x day month x to be analyzed, " xx x day month x " i.e. data locking parameter; Or, it is the data of xxx that user needs to obtain call number, " call number xxx " i.e. data locking parameter.
In this step, according to the descriptive information of described data locking parameter query data base, specifically with described data locking Parameter is reference, inquires about the descriptive information of each data block existing.The descriptive information of described data block is to describe data block Data, comprise the descriptive information to data block and information resources, such as, in a certain data block storage be certain moon which extremely Which data.
Owing to each data base existing substantial amounts of data, when carrying out data locking, if according to location requirement The comparison one by one of each data is the most extremely lost time and calculates resource by parameter.But, relative to the number in each data For amount, the data volume of its corresponding descriptive information is minimum, therefore, in the embodiment of the present application, first according to described location The descriptive information of each data block of demand parameter traversal queries, the data block at coarse localization target data place, follow-up further according to These data blocks carry out next step finer location.The rough query script of this step greatly reduces the scope of data locking, Save the data locking time.
Concrete, in the step s 120, described default step-length can be a variable, according to the number comprised in data block It is adjusted according to the size of total amount and the result sampled each time.Such as, the data volume comprised in a certain data block is the hugest Greatly, if sampling step length is too small, then the increased frequency sampled, cause the efficiency of data locking to promote;If in a certain data block The data volume comprised is smaller, according to excessive sampling step length, is then easily caused the data volume obtained of sampling and accounts for whole data The proportion of block is very big, when follow-up being accurately positioned, needs the data volume compared one by one to increase.Therefore, described default step Length is the empirical value that a data volume size comprised to data block is relevant.
Separately, after carrying out sampling for the first time to data block with less step-length, sampled result is identified, according to sampling The data identification marking of data judges to learn that sampled result is sufficiently close to target data, and former step-length now can be kept to sample Can also suitably increase sampling step length such that it is able to be quickly found out the data at target data place in the most time saving mode Section.
Wherein, described data identification marking is identical with the described data locking parameter in step S110, these identification markings Data per se with, time tag corresponding to data, line number, call number, side-play amount etc. can be included.
Concrete, in step s 130, judge described according to described data identification marking and described data locking parameter Whether data sampling result comprises described target data, the described data locking parameter from user that i.e. contrast receives and The described data identification marking of the sampled data read from sampled data is the most consistent.Such as, join when described data locking During the time tag that number is data, then according to described time tag, the described data identification marking of the data that inquiry sampling obtains In the time tag that comprises, it may be judged whether having consistent, if having, then judging that described data sampling result comprises described number of targets According to.
If described data sampling result does not comprise described target data, then described data segment will be continued sampling.Tool Body, from the end position of described data sampling result, with described default step-length, described data segment is carried out data and adopt again Sample, and obtain the data identification marking of described data re-sampling result until according to described data identification marking and described data Positional parameter judges that described data re-sampling result comprises described target data.
Wherein, it is judged that whether described data sampling result comprises described target data, can be following method:
When described data locking parameter is data time label, the number of the Article 1 data in inquiry data sampling result According to the data time label comprised in identification marking, contrast two data time tags which front which rear.Because of streaming Data are that a time sequencing come according to data carries out storing, and have successively in time, and therefore, contrast sampling obtains The time tag of data i.e. can determine whether whether described sampled data comprises target data.If the Article 1 data of sampled result exist Time corresponding on the time tag of delayed location requirement on time, then illustrate, target data must be present sample result it Front data, the data after described Article 1 data are not the most target datas, then without again to post-sampling;Otherwise, if adopting The Article 1 data of the sample result time that the time tag of location requirement is corresponding in advance in time, then the mesh of location requirement is described Mark data, after described Article 1 data, also need to continue to carry out even multiple repairing weld of sampling in remaining data segment.Wherein, During described data sampling, described default step-length can constantly be updated, updated value in the renewal process of step-length Value size be an empirical process, the embodiment of the present application is not limiting as the size of concrete updated value.
Such as, the embodiment of the present application can have following step-length renewal process, and the data segment obtaining location is sampled, Sampling with fixed step size N (setting N=5000), the logic that arranges of sampling step length can be 0+5000*n, first at 0+ At 5000*1, sampling one data, mates, if not comprising target data, continues to be taken at one number of sampling at 0+5000*2 According to, find to comprise target data, then can determine whether to learn target data between the 5000th data to the 1000th data, now Needing to sample between the data segment of 5000 to 10000, step-length needs to update.Such as may be updated as (10000-5000)/2 =2500, step-length is the half of data strip number in sampled data.I.e. sample at 7500 data a data and with location ginseng Number contrasts, if not mating, then continues to update (reducing) sampling step length.If in sampled result, the bar number of data is less than pre- If data-quantity threshold, all data of the most once sampling out, one by one judge whether coupling, step-length now is 1 with regard to theory.This In application embodiment, described data-quantity threshold can be 200, because when data volume is less than 200, network accesses the time of expense Close with the time overhead of local search, at this moment stop sampling, efficiency is the highest.
It should be noted that in the embodiment of the present application, to described data locking parameter and described data identification marking When comparing, the data identification marking of the Article 1 data being generally selected in current data sample result and described data locking Parameter contrasts, if the described data identification marking of Article 1 data is inconsistent with described data locking parameter, the most right Ratio remaining data in present sample result, can directly judge not comprise in present sample data result target data.
When the knot comprising target data and re-sampling next time in the result of the even re-sampling of confirming data are sampled When not comprising target data in Guo, just stop sampling, the most just obtained comprising all sampled datas of target data.Afterwards, In the sampled data obtained, according to described data locking parameter, whether each data in contrast sampled data is one by one Target data, thus accomplish that high efficiency positions.In the present embodiment, according to data locking parameter acquiring target data place data Section, and described data segment is sampled thus according to sampling result, data is positioned, change prior art and flow During formula data locking, need the troublesome operation one by one data segment contrasted, it is achieved that accurate data locking, improve number According to the efficiency being uploaded to distributed file system.
Fig. 2 is the techniqueflow chart of the embodiment of the present application two, in conjunction with Fig. 2, the application one data positioning method, also includes The most feasible enforcement step:
Step S210: receive data locking parameter, according to the explanation of data block corresponding to described data locking parameter query Information is so that it is determined that the data segment at target data place;
Step S220: with default step-length, described data segment carried out data sampling, and obtain described data sampling result Data identification marking;
Step S230: judge described data sampling result according to described data identification marking and described data locking parameter Whether comprise described target data, when being judged to be, judge one by one in described data sampling result, until location mesh Mark data.
Step S240: described target data is carried out segmentation according to preset strategy;
Step S250: the result of described segmentation is encapsulated and generate distributed parallel task and is uploaded to distributed field system System.
Above-mentioned steps S210~step S230 are with a kind of step S110 of embodiment~step S130, and here is omitted.
Concrete, in step S240, described preset strategy can include following manner:
One, divides equally described target data according to the fall into a trap quantity of operator node of distributed type assemblies;
Its two, according to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time Between demand calculate data sectional threshold value described target data is carried out segmentation according to described data sectional threshold value.
Wherein, the process divided equally not in view of the calculating resource residual amount of each calculating node in server cluster with And computing capability, directly pending target data is averaged point according to the quantity calculating node, it is advantageous that, save Analysis to each calculating node computing capability, more saves the time.
In another kind of data sectional mode, before segmentation, in preferential acquisition server cluster, each node word calculates Ability, such as computational efficiency and calculating time demand, thus according to these reference datas data are carried out segmentation when Suitably do some, it is possible to realize more reasonably distribution of computation tasks.
Concrete, in step s 250, the target data that location obtains needed before being uploaded to distributed file system It is packaged.
By abovementioned steps, it is accurately positioned and target data after segmentation, for the data of segmentation, often One piece of data has starting position, end position, the storage server location of data, the metadata information etc. of data.
In this step, described encapsulation is by the starting position of each segmentation, end position, the storage server position of data Putting, the metadata information of data etc. is all packaged into the data object of a kind of Hadoop distributed task scheduling MapReduce understanding, thus MapReduce task takes these segment informations, it is possible to the concrete data in access segment, stores data into distributed literary composition In part system.Owing to MapReduce task is distributed, resource abundance, treatment effeciency is the highest.
In the present embodiment, after being accurately positioned user requested data, according to default strategy, data are carried out segmentation, fully Consider the server resource utilization rate of distributed processing system(DPS) and calculate resource utilization, improving data further and upload Efficiency to distributed file system.
Fig. 3 is the apparatus structure schematic diagram of the embodiment of the present application three, in conjunction with Fig. 3, the embodiment of the present application one data locking Device, the module including following:
First locating module 310, is used for receiving data locking parameter, according to the number that described data locking parameter query is corresponding According to the descriptive information of block so that it is determined that the data segment at target data place;
Sampling module 320, for described data segment carrying out data sampling with default step-length, and obtains described data acquisition The data identification marking of sample result;
Second locating module 330, for judging described according to described data identification marking and described data locking parameter Whether data sampling result comprises described target data, when being judged to be, sentences one by one in described data sampling result Disconnected, until location target data.
Wherein, described data locking parameter, specifically include: time tag corresponding to data, line number, call number, side-play amount.
Wherein, described sampling module 320 specifically for: open from the original position of described data segment, with described default Step-length takes out the data of respective numbers from described data segment.
Wherein, described second locating module 330 is specifically additionally operable to: if it has not, then from the end of described data sampling result Position is risen, and with described default step-length, described data segment carries out data re-sampling, and obtains described data re-sampling result Data identification marking is until tying according to data re-sampling described in described data identification marking and described data locking parameter decision Fruit comprises described target data.
Wherein, described second locating module 330 is additionally operable to: during described data sampling, know according to described data Not Biao Shi and the comparing result of described data locking parameter described default step-length is updated.
Wherein, described device also includes that segmentation module 340, described segmentation module 340 are used for: according to preset strategy to described Target data carries out segmentation, encapsulates and generates distributed parallel task by the result of described segmentation and be uploaded to distributed field system System.
Wherein, described preset strategy includes: enter described target data according to the fall into a trap quantity of operator node of distributed type assemblies Row is divided equally;Or, according to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time need Seek calculating data sectional threshold value and according to described data sectional threshold value, described target data carried out segmentation.
Fig. 3 shown device can perform the method for Fig. 1 and embodiment illustrated in fig. 2, it is achieved principle and technique effect reference Fig. 1 and embodiment illustrated in fig. 2, repeat no more.
Application example
A concrete application scenarios will be combined, with the actual example technology to the embodiment of the present application with lower part Scheme is further elaborated.
On a storage medium, storage has the user comment data from video website, and data consumer needs to comment these Opinion data carry out processing thus analyze video-see interest-degree and the viewing focus of user.The comment data of user is not pregnancy ceased Raw stream data, at server end, these comment data leave in storage medium with the form of data block.Storage data source Source is constantly come, and after current data block is filled with, new data can set up another a data block to store data.
Assume this video website in premiere on April 1 in 2016 one film, now want to obtain movie premiere after one week User comment data thus the result of broadcast of film is analyzed.But all of comment data is all that piecemeal is deposited, number Quantity according to block is more, and the data needed for orienting data consumer from these substantial amounts of data are difficult points.According to this The technical scheme of application embodiment, for saving efficiency, it is impossible to all of data block is all scanned one time, can only be according to data consumption First the data of person's demand tell in obtaining certain a period of time the address of the data block produced, and the comment data of such as 2016 is deposited The data being placed on which data block or in April, 2016 leave those data blocks in.
Each data block is illustrated information, reads the descriptive information of each data block, i.e. may know that in this data block House the comment data of those periods.It is assumed that in the present embodiment, the positional parameter given according to user, i.e. in April, 2016 this Time tag navigates to data in storage medium block 1 and data block 2 deposits the comment data in April, 2016.So connect down Come, have which data is that the data in April, 2016 are even down to which number by continuing inquiry in data block 1 and data block 2 According to being on April 1st, 2016~the data on April 7th, 2016.Now, in storage medium, except other of data block 1 and data block 2 Data block can be abandoned, and does not inquires about them, thus lacking of changing that all data blocks are all scanned by prior art Fall into, improve the efficiency of data coarse positioning.
After comprising the target data of demand in determining data block 1 and data block 2, use certain sampling step length to dividing Other data block 1 and data block 2 are sampled.By lower part to be described in detail as a example by the sampling process of data block 1.
Assume that sampling step length for sampling 1000 data every time, then, from the beginning of the original position of data block 1, extracts 1000 Data, and obtain the time tag of Article 1 data in this 1000 data, the time that query time label is corresponding, if time Between before 1 day April in 2016, then abandon this 1000 sampled datas, to data block 1 continue sampling.The step next time sampled Length can be once to take 1000 data or once take 5000 data.The end that the start node of sampling was sampled from the last time Start, it is assumed that re-sampling 1000 data, take the Article 1 data in 1000 data, find after reduced time label still to wrap Include target data, then can increase sampling step length again, such as, from the beginning of the end of second time sampled data, 1000 numbers of re-sampling According to.Choose the Article 1 data in 1000 data, after reduced time label, the Article 1 data that finding samples specifically obtains Time tag lags behind in April, 2016, can determine that target data between the 2000th data and the 3000th data, the most more New sampling step length is (3000-2000)/2=500.Next with 500 as sampling step length, the 2000th data that sampling is obtained With the 3000th sample between data, i.e. judge the 2500th data whether more than April 7 in 2016, if it is determined that be yes, this Continuing to zoom out sampling step length is (2500-2000)/2=250, i.e. judges whether the 2250th data is more than April 7 in 2016, if Being judged to it is then to extract 250 data between 2000 to 2250 out, contrasting which data in this 250 data one by one is 2016 On April 1, in~the data on April 7th, 2016.Owing to stream data is ordered into, therefore, said method is used to look for completely Go out user's on April 1st, 1~the user comment data on April 7th, 2016.
Device embodiment described above is only schematically, and the wherein said unit illustrated as separating component can To be or to may not be physically separate, the parts shown as unit can be or may not be physics list Unit, i.e. may be located at a place, or can also be distributed on multiple NE.Can be selected it according to the actual needs In some or all of module realize the purpose of the present embodiment scheme.Those of ordinary skill in the art are not paying creativeness Work in the case of, be i.e. appreciated that and implement.
Through the above description of the embodiments, those skilled in the art it can be understood that to each embodiment can The mode adding required general hardware platform by software realizes, naturally it is also possible to pass through hardware.Based on such understanding, on State the part that prior art contributes by technical scheme the most in other words to embody with the form of software product, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD etc., including some fingers Make with so that a computer installation (can be personal computer, server, or network equipment etc.) performs each and implements The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate technical scheme, it is not intended to limit;Although With reference to previous embodiment, the present invention is described in detail, it will be understood by those within the art that: it still may be used So that the technical scheme described in foregoing embodiments to be modified, or wherein portion of techniques feature is carried out equivalent; And these amendment or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims (14)

1. a data positioning method, it is characterised in that comprise the following steps that
Receive data locking parameter, according to the descriptive information of data block corresponding to described data locking parameter query, determine target The data segment at data place;
With default step-length, described data segment carried out data sampling, and obtain the data identification mark of described data sampling result Know;
According to described data identification marking and described data locking parameter, it is judged that whether described data sampling result comprises described Target data, when being judged to be, judges in described data sampling result one by one, until location target data.
Method the most according to claim 1, it is characterised in that described data segment is carried out data acquisition with default step-length Sample, specifically includes:
Open from the original position of described data segment, from described data segment, take out the number of respective numbers with described default step-length According to.
Method the most according to claim 1, it is characterised in that according to described data identification marking and described data locking Parameter judges whether described data sampling result comprises described target data, also includes:
If it has not, then from the end position of described data sampling result, described data segment is carried out with described default step-length Data re-sampling, and obtain the data identification marking of described data re-sampling result, until according to described data identification marking with And data re-sampling result described in described data locking parameter decision comprises described target data.
Method the most according to claim 1, it is characterised in that described method also includes:
During described data sampling, according to described data identification marking and the comparing result of described data locking parameter Described default step-length is adjusted.
Method the most according to claim 1, it is characterised in that described data locking parameter, specifically includes:
Time tag corresponding to data, line number, call number, side-play amount.
Method the most according to claim 1 is characterized in that, described method also includes:
According to presetting method, described target data is carried out segmentation, the result of described segmentation is encapsulated and generate distributed parallel and appoints Business is uploaded to distributed file system.
Method the most according to claim 6, it is characterised in that described presetting method includes:
According to the fall into a trap quantity of operator node of distributed type assemblies, described target data is divided equally;Or,
According to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time demand calculate Data sectional threshold value also carries out segmentation according to described data sectional threshold value to described target data.
8. a data location apparatus, it is characterised in that include following module:
First locating module, is used for receiving data locking parameter, according to data block corresponding to described data locking parameter query Descriptive information, determines the data segment at target data place;
Sampling module, for described data segment carrying out data sampling with default step-length, and obtains described data sampling result Data identification marking;
Second locating module, for according to described data identification marking and described data locking parameter, it is judged that described data acquisition Whether sample result comprises described target data, when being judged to be, judges one by one in described data sampling result, until Location target data.
Device the most according to claim 7, it is characterised in that described sampling module specifically for:
Open from the original position of described data segment, from described data segment, take out the number of respective numbers with described default step-length According to.
Device the most according to claim 7, it is characterised in that described second locating module is specifically additionally operable to:
If it has not, then from the end position of described data sampling result, described data segment is carried out with described default step-length Data re-sampling, and obtain the data identification marking of described data re-sampling result, until according to described data identification marking with And data re-sampling result described in described data locking parameter decision comprises described target data.
11. devices according to claim 7, it is characterised in that described second locating module is additionally operable to:
During described data sampling, according to described data identification marking and the comparing result of described data locking parameter Described default step-length is updated.
12. devices according to claim 8, it is characterised in that described data locking parameter, specifically include:
Time tag corresponding to data, line number, call number, side-play amount.
13. devices according to claim 8 is characterized in that, described device also includes segmentation module, described segmentation module For:
According to presetting method, described target data is carried out segmentation, the result of described segmentation is encapsulated and generate distributed parallel and appoints Business is uploaded to distributed file system.
14. devices according to claim 13, it is characterised in that described presetting method includes:
According to the fall into a trap quantity of operator node of distributed type assemblies, described target data is divided equally;Or,
According to distributed type assemblies fall into a trap the quantity of operator node, the computational efficiency of described calculating node and calculate time demand calculate Data sectional threshold value also carries out segmentation according to described data sectional threshold value to described target data.
CN201610252499.4A 2016-04-21 2016-04-21 Streaming data positioning method and apparatus Pending CN105912274A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610252499.4A CN105912274A (en) 2016-04-21 2016-04-21 Streaming data positioning method and apparatus
PCT/CN2016/101092 WO2017181614A1 (en) 2016-04-21 2016-09-30 Streaming data positioning method, apparatus and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610252499.4A CN105912274A (en) 2016-04-21 2016-04-21 Streaming data positioning method and apparatus

Publications (1)

Publication Number Publication Date
CN105912274A true CN105912274A (en) 2016-08-31

Family

ID=56747677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610252499.4A Pending CN105912274A (en) 2016-04-21 2016-04-21 Streaming data positioning method and apparatus

Country Status (2)

Country Link
CN (1) CN105912274A (en)
WO (1) WO2017181614A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017181614A1 (en) * 2016-04-21 2017-10-26 乐视控股(北京)有限公司 Streaming data positioning method, apparatus and electronic device
CN110147384A (en) * 2019-04-17 2019-08-20 平安科技(深圳)有限公司 Data search method for establishing model, device, computer equipment and storage medium
CN112883064A (en) * 2021-03-02 2021-06-01 清华大学 Self-adaptive sampling and query method and system
CN114764932A (en) * 2020-12-31 2022-07-19 腾讯科技(深圳)有限公司 Organism anti-counterfeiting identification method, device, equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657294B (en) * 2018-11-29 2023-04-21 中国航空工业集团公司沈阳飞机设计研究所 Automatic test flight data analysis method and system based on characteristic parameters
CN111159506B (en) * 2019-12-26 2023-11-14 广州信天翁信息科技有限公司 Data validity identification method, device, equipment and readable storage medium
CN112199249B (en) * 2020-09-16 2024-10-01 中国建设银行股份有限公司 Method, device, equipment and medium for processing monitoring data
CN115220073B (en) * 2022-07-01 2025-03-14 千寻位置网络有限公司 Positioning method, device, equipment, medium and product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178693A (en) * 2007-12-14 2008-05-14 沈阳东软软件股份有限公司 Data cache method and system
US20080256143A1 (en) * 2007-04-11 2008-10-16 Data Domain, Inc. Cluster storage using subsegmenting
CN102054000A (en) * 2009-10-28 2011-05-11 中国移动通信集团公司 Data querying method, device and system
CN102841860A (en) * 2012-08-17 2012-12-26 珠海世纪鼎利通信科技股份有限公司 Large data volume information storage and access method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354242A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Distributed data processing method and device
CN105912274A (en) * 2016-04-21 2016-08-31 乐视控股(北京)有限公司 Streaming data positioning method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080256143A1 (en) * 2007-04-11 2008-10-16 Data Domain, Inc. Cluster storage using subsegmenting
CN101178693A (en) * 2007-12-14 2008-05-14 沈阳东软软件股份有限公司 Data cache method and system
CN102054000A (en) * 2009-10-28 2011-05-11 中国移动通信集团公司 Data querying method, device and system
CN102841860A (en) * 2012-08-17 2012-12-26 珠海世纪鼎利通信科技股份有限公司 Large data volume information storage and access method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017181614A1 (en) * 2016-04-21 2017-10-26 乐视控股(北京)有限公司 Streaming data positioning method, apparatus and electronic device
CN110147384A (en) * 2019-04-17 2019-08-20 平安科技(深圳)有限公司 Data search method for establishing model, device, computer equipment and storage medium
CN110147384B (en) * 2019-04-17 2023-06-20 平安科技(深圳)有限公司 Data search model establishment method, device, computer equipment and storage medium
CN114764932A (en) * 2020-12-31 2022-07-19 腾讯科技(深圳)有限公司 Organism anti-counterfeiting identification method, device, equipment and storage medium
CN114764932B (en) * 2020-12-31 2025-07-11 腾讯科技(深圳)有限公司 Biometric anti-counterfeiting identification method, device, equipment and storage medium
CN112883064A (en) * 2021-03-02 2021-06-01 清华大学 Self-adaptive sampling and query method and system
CN112883064B (en) * 2021-03-02 2022-11-15 清华大学 An adaptive sampling and query method and system

Also Published As

Publication number Publication date
WO2017181614A1 (en) 2017-10-26

Similar Documents

Publication Publication Date Title
CN105912274A (en) Streaming data positioning method and apparatus
CN110717536B (en) A method and device for generating training samples
JP6162781B2 (en) Method and apparatus for determining the location of a new point of interest
CN109828993B (en) Statistical data query method and device
WO2015050567A1 (en) System and method for performing set operations with defined sketch accuracy distribution
CN105389311B (en) It is a kind of for determining the method and apparatus of query result
CN109977135A (en) A kind of data query method, apparatus and server
US10430015B2 (en) Image analysis
CN113850837B (en) Video processing method and device, electronic equipment, storage medium and computer product
CN101425088A (en) Key frame extracting method and system based on chart partition
CN111078818A (en) Address analysis method and device, electronic equipment and storage medium
CN106648839B (en) Data processing method and device
CN111427976B (en) Road freshness obtaining method and device
CA3075730C (en) Cold matching by automatic content recognition
CN111258974A (en) A kind of vehicle offline scene data processing method and system
CN115576973B (en) Service deployment method, device, computer equipment and readable storage medium
US20170337192A1 (en) Network node, indexing server and methods performed thereby for supportingindexing of audio visual content
CN106326439B (en) A kind of storage of real-time video, search method and device
CN105468728A (en) Cross-section data acquisition method and system
CN109710827B (en) Picture attribute management method and device, picture server and business processing terminal
CN113362090A (en) User behavior data processing method and device
CN110082794B (en) Vehicle GPS track data filtering method
CN110008269B (en) Data reflow method, device, equipment and system
CN115297342B (en) Multi-camera video processing method and device, storage medium and computer equipment
CN110909072B (en) Data table establishment method, device and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160831