[go: up one dir, main page]

CN109800208B - Network traceability system and its data processing method, computer storage medium - Google Patents

Network traceability system and its data processing method, computer storage medium Download PDF

Info

Publication number
CN109800208B
CN109800208B CN201910046934.1A CN201910046934A CN109800208B CN 109800208 B CN109800208 B CN 109800208B CN 201910046934 A CN201910046934 A CN 201910046934A CN 109800208 B CN109800208 B CN 109800208B
Authority
CN
China
Prior art keywords
file
index
layer
datanode
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910046934.1A
Other languages
Chinese (zh)
Other versions
CN109800208A (en
Inventor
张武斌
彭闯
袁敏洵
袁小坊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Tomomichi Information Technology Co Ltd
Original Assignee
Hunan Tomomichi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Tomomichi Information Technology Co Ltd filed Critical Hunan Tomomichi Information Technology Co Ltd
Priority to CN201910046934.1A priority Critical patent/CN109800208B/en
Publication of CN109800208A publication Critical patent/CN109800208A/en
Application granted granted Critical
Publication of CN109800208B publication Critical patent/CN109800208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to network tracing technology field, a kind of network traceability system and its data processing method, computer storage medium are disclosed, to improve the resource utilization of HDFS and further increase the access efficiency of data.The method of the present invention includes: that network traceability system is divided into client layer, pretreatment layer and accumulation layer, pretreatment layer is set between client layer and accumulation layer, and in the storing process of data file, at least two small data files are merged into one big file by pretreatment layer and generate corresponding SMI index and MDI index, accordingly, the location metadata information for merging blocks of files is transferred on each corresponding distribution DataNode memory by accumulation layer from NameNode memory;When reading data, object content is quickly determined based on the location metadata information of the merging blocks of files of SMI index, MDI index and transfer.

Description

Network traceability system and its data processing method, computer storage medium
Technical field
The present invention relates to network tracing technology field more particularly to a kind of network traceability system and its data processing method, Computer storage medium.
Background technique
As internet scale constantly expands and that applies gos deep into, network safety event shows hidden, complicated and more The trend of sample, therefore monitoring to magnanimity real-time network data and analysis become a kind of application to become more and more important.
In general, network traceability system need to have long-time mass data storage ability, original number can be saved in real time for a long time According to packet, and the various statistical data such as data flow, session and application log are saved simultaneously;Have quick data retrieval capability, and Backtracking analysis is carried out to the network behavior occurred, using data and host data;It can classify at any time when checking and calling any Between section data, when finding the problem provide certain time within the scope of backtracking analyze (depending on device memory), be Rapid orientation problem occurrence cause provides more fully analysis foundation, while providing strong data point for network security Analysis ensures.
Network traceability system has mass data storage, convenient and fast showing interface, the fault location of finishing, comprehensive data The advantages such as backtracking, real-time performance monitoring.In order to guarantee the decoding efficiency of system data file, each data text of designing system Part size is between 40-50MB.Because data volume is very big, HDFS can be used to save data.But HDFS is initially designed as locating Manage big file (a typically larger than HDFS block block 120MB);Meanwhile HDFS is in order to make the transmission speed and hard disk of data Transmission speed is close, then design will tracking time (Seek) it is relatively minimized, the size of block is arranged it is bigger, in this way The time for reading and writing data block will be much larger than the tracking time, close to the transmission speed of hard disk.Whereby, if by network traceability system portion Administration causes inefficiency in HDFS.The reason of specifically including following several respects:
Metadata management in HDFS is a time-consuming task, and for small files I/O, the most of the time is for managing Metadata, and the time spent in data transmission is seldom.A large amount of small documents increase the expense of metadata operation in HDFS. On the other hand, meta-data preservation is in name node, and the information preservation of block is in back end.In addition, all these letters Breath is all loaded into physical memory.As a result, sharply increasing with small documents quantity, memory usage increases sharply.
The relevant technologies of small documents problem of existing optimization HDFS include:
" a kind of storage optimization side of the small documents hierarchical index based on Hadoop entitled disclosed in CN105183839A The patent of method ".And
It is entitled disclosed in CN106909651A " a method of based on HDFS small documents be written and read " patent.
Above-mentioned two patent discloses the mechanism for merging small documents and establishing index to file.But reading file When, client still will be interacted with NameNode, by inquiring NameNode cache file distributed meta data information, obtain text Then the actual position of part obtains file data with DataNode interaction again, and also as the small documents added in interactive process Processing server is simultaneously equipped with two-stage index and prefetches mechanism, and interacting for complexity file reading is gone back while improving cost Journey, such as: when the file of inquiry is always different, it is necessary to go the content for frequently replacing memory that the efficiency of system is caused to reduce;By This needs to be further increased the data reading performance using redundancy arranged!
Summary of the invention
Present invention aims at disclosing a kind of network traceability system and its data processing method, computer storage medium, with It improves the resource utilization of HDFS and further increases the access efficiency of data.
In order to achieve the above object, the present invention discloses a kind of data processing method of network traceability system, the network, which is traced to the source, is System carries out data storage based on HDFS, which comprises
The network traceability system is divided into client layer, pretreatment layer and accumulation layer, the client layer is for generating institute State the data file of traceability system bottom crawl;The accumulation layer includes NameNode and at least two based on HDFS DataNode;
The pretreatment layer is set between the client layer and the accumulation layer, and in the storing process of data file In, following step is executed by the pretreatment layer:
Step S1, at least two small data files for grabbing the client layer are merged into one big file and generate correspondence SMI index, SMI index characterization merge after the titles of big file name and each small documents being merged, size and Relationship between offset;And
Step S2, after the big file uploading success after merging, believed according to the location metadata for merging blocks of files Breath generates MDI index, and the MDI index characterizes the corresponding pass of the big file name with the DataNode for storing the big file System;
Step S3, by the SMI index and the MDI indexed cache on the NameNode, and by the SMI rope Draw and is buffered on corresponding DataNode;
Correspondingly, in the storing process of data file, the also mating execution following step of the accumulation layer:
Step S10, it is each right to be transferred to the location metadata information for merging blocks of files from the NameNode memory On the distributed DataNode memory answered;
When reading data, the method also includes:
Step S100, the described client layer sends first to the NameNode and reads file request, and described first reads text Part request carries the title of target small documents;
Step S200, the SMI index and MDI index that the described NameNode reads file request, caching according to described first DataNode address information corresponding to target small documents is returned to client layer;
Step S300, the described client layer sends second according to the DataNode address information and reads file request, described Second reads the title that file request carries target small documents;
Step S400, the described DataNode reads file request, the SMI index of caching and merging file according to described second The content of the location metadata information searching target small documents of block, and return to the client layer.
Correspondingly, invention additionally discloses a kind of network traceability system, the network traceability system is based on HDFS and is counted According to storage, comprising:
Client layer, for generating the data file of the traceability system bottom crawl;
Accumulation layer, including NameNode (host node) and at least two DataNode based on HDFS (from node);And
Pretreatment layer between the client layer and the accumulation layer, in the storing process of data file, Execute following step:
Step S1, at least two small data files for grabbing the client layer are merged into one big file and generate correspondence SMI index, SMI index characterization merge after the titles of big file name and each small documents being merged, size and Relationship between offset;And
Step S2, after the big file uploading success after merging, believed according to the location metadata for merging blocks of files Breath generates MDI index, and the MDI index characterizes the corresponding pass of the big file name with the DataNode for storing the big file System;
Step S3, by the SMI index and the MDI indexed cache on the NameNode, and by the SMI rope Draw and is buffered on corresponding DataNode;
Correspondingly, in the storing process of data file, the accumulation layer is also used to mating execution following step:
Step S10, it is each right to be transferred to the location metadata information for merging blocks of files from the NameNode memory On the distributed DataNode memory answered;
When reading data, the network traceability system is also used to execute following step:
Step S100, the described client layer sends first to the NameNode and reads file request, and described first reads text Part request carries the title of target small documents;
Step S200, the SMI index and MDI index that the described NameNode reads file request, caching according to described first DataNode address information corresponding to target small documents is returned to client layer;
Step S300, the described client layer sends second according to the DataNode address information and reads file request, described Second reads the title that file request carries target small documents;
Step S400, the described DataNode reads file request, the SMI index of caching and merging file according to described second The content of the location metadata information searching target small documents of block, and return to the client layer.
In order to achieve the above object, invention additionally discloses a kind of network traceability system, the network traceability system be based on HDFS into The storage of row data including memory, processor and stores the computer program that can be run on a memory and on a processor, It is characterized in that, the step of processor realizes the above method when executing the computer program.
In order to achieve the above object, it is stored thereon with computer program invention additionally discloses a kind of computer storage medium, it is special Sign is, the step in the above method is realized when described program is executed by processor.
The invention has the following advantages:
On the one hand, being all loaded into memory for all file indexes is not had to go to prefetch again to waste time.On the other hand, The present invention also improves the storage organization of HDFS, and the location metadata information for merging blocks of files is transferred to from NameNode memory On each corresponding distribution DataNode memory, NameNode memory consumption is reduced;And pass through SMI index and MDI index Cooperation effectively prevent merging it is inconvenient brought by the location metadata information displacement of blocks of files.At the same time, it is deposited in data During storage, it is additionally arranged pretreatment layer;And in data read process, then it does not need pretreatment layer and participates in interaction.Thus from more A dimension enables the network traceability system based on small documents efficiently to operate on HDFS, improves the utilization of resources of HDFS Rate and the access efficiency for further increasing data.
Below with reference to accompanying drawings, the present invention is described in further detail.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present invention, schematic reality of the invention It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is system structure diagram disclosed by the embodiments of the present invention.
Fig. 2 is SMI index structure schematic diagram disclosed by the embodiments of the present invention.
Fig. 3 is DMI index structure schematic diagram disclosed by the embodiments of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail below in conjunction with attached drawing, but the present invention can be defined by the claims Implement with the multitude of different ways of covering.
Embodiment one
The present embodiment discloses a kind of network traceability system, as shown in Figure 1, comprising:
Client layer, for generating the data file of the traceability system bottom crawl.
Accumulation layer, including NameNode and at least two DataNode based on HDFS.And
Pretreatment layer between the client layer and the accumulation layer.
Wherein, pretreatment layer is used in the storing process of data file, executes following step:
Step S1, at least two small data files for grabbing the client layer are merged into one big file and generate correspondence SMI index.SMI index characterization merge after the titles of big file name and each small documents being merged, size and Relationship between offset.
As shown in Fig. 2, the specific form of SMI index may is that " hash:<key, value>".Wherein key essential record The title of file, the content of value are megerFileName_offset_length (i.e. sizes and offset).When we need When reading the content in some file, we first find corresponding key value from caching, corresponding so as to obtain Value value.We can determine the beginning and end position of some file according to value value is obtained.
Step S2, after the big file uploading success after merging, believed according to the location metadata for merging blocks of files Breath generates MDI index, and the MDI index characterizes the corresponding pass of the big file name with the DataNode for storing the big file System.
As shown in figure 3, the concrete form of MDI may is that " hash:<merged file name, DataNode IP>". MDI instruction merges the relationship between file and DataNode.It is as described later: when we obtain MDI information from NameNode, this Mean that we obtain the collection body positions for merging file.
Step S3, by the SMI index and the MDI indexed cache on the NameNode, and by the SMI rope Draw and is buffered on corresponding DataNode.
Correspondingly, in the storing process of data file, the accumulation layer is also used to mating execution following step:
Step S10, it is each right to be transferred to the location metadata information for merging blocks of files from the NameNode memory On the distributed DataNode memory answered.
When reading data, the network traceability system is also used to execute following step:
Step S100, the described client layer sends first to the NameNode and reads file request, and described first reads text Part request carries the title of target small documents.
Step S200, the SMI index and MDI index that the described NameNode reads file request, caching according to described first DataNode address information corresponding to target small documents is returned to client layer.When necessary, SMI of the NameNode also according to caching Small documents are returned to customer side and merge file match information.
Step S300, the described client layer sends second according to the DataNode address information and reads file request, described Second reads the title that file request carries target small documents.
Step S400, the described DataNode reads file request, the SMI index of caching and merging file according to described second The content of the location metadata information searching target small documents of block, and return to the client layer.In this process, according to small text Part and the match information for merging file, the offset and size of available target small documents, to quickly determine client layer Required target small documents content simultaneously returns to client layer.
Whereby, in the present embodiment, when the request of one reading small documents of client layer client initiation, a request is sent It goes to obtain NameNodeSMI and MDI, is not to go NameNode Querying Distributed document location metadata by original HDFS, adds Fast file reading rate.
Based on the present embodiment system, optionally, as shown in Figure 1, above-mentioned pretreatment layer includes:
File judging unit, for judging whether the size of big file after merging meets upload threshold value;If it is satisfied, by literary Part is sent to HDFS Client, otherwise, fat file is sent to document handling unit;
Document handling unit, for calculating the size of the file from the file judging unit, according to the conjunction of small documents And sequence obtains offset, while generating interim index file, and index file and data file are then passed to file Combining unit;
File mergences unit, for according to the sequence from the document handling unit by file mergences a to special form The file of formula, meanwhile, merge interim index file to generate SMI index;
HDFS Client, for combined file to be written in HDFS cluster, by distributed file system example with NameNode and DataNode establishes connection, and notifies NameNode that distribution is used for the DataNode of writing data blocks, obtains and closes And the location metadata information of blocks of files and MDI index is generated, and by the SMI index and the MDI indexed cache in institute It states on NameNode, and by the SMI indexed cache on corresponding DataNode.
Based on the present embodiment system, file writing process concretely:
Step 1: setting threshold value (the size 128MB of HDFS block).When pretreatment layer receives the text from client layer client When part write request, file judging unit first determines whether the size of current merging file.If it is not empty for merging file size And it is less than the size of HDFS block, then jump to step 2.If current combined sequence is sky, step 4 is jumped to;If The file size currently merged is greater than the size of HDFS block, jumps to step 5;
Step 2: the size of calculation document and foundation are literary when document handling unit is received from when client file The interim index of part, then jumps to step 3;
Step 3: file content is merged into current combining unit by combining unit, while file index is merged into rope In quotation part, step 1 is then branched to;
Step 4: making it a complete data using the file of a blank as current merging file Block and it is submitted into HDFS client.One new interim index file of creation and merging file, it can update this A interim index and small documents to index file and merge in file, then branch to step 2.
Step 5: current merging file is transmitted to HDFS client, HDFS client passes through distributed file system It is connected with HDFS cluster and stores current combined file to File Store layer, while deleting current merging file, then Jump to step 4.
Embodiment two
Corresponding with the above system embodiment, the present embodiment discloses a kind of data processing method of network traceability system, Include:
The network traceability system is divided into client layer, pretreatment layer and accumulation layer, the client layer is for generating institute State the data file of traceability system bottom crawl;The accumulation layer includes NameNode and at least two based on HDFS DataNode;
The pretreatment layer is set between the client layer and the accumulation layer, and in the storing process of data file In, following step is executed by the pretreatment layer:
Step S1, at least two small data files for grabbing the client layer are merged into one big file and generate correspondence SMI index, SMI index characterization merge after the titles of big file name and each small documents being merged, size and Relationship between offset;And
Step S2, after the big file uploading success after merging, believed according to the location metadata for merging blocks of files Breath generates MDI index, and the MDI index characterizes the corresponding pass of the big file name with the DataNode for storing the big file System;
Step S3, by the SMI index and the MDI indexed cache on the NameNode, and by the SMI rope Draw and is buffered on corresponding DataNode;
Correspondingly, in the storing process of data file, the also mating execution following step of the accumulation layer:
Step S10, it is each right to be transferred to the location metadata information for merging blocks of files from the NameNode memory On the distributed DataNode memory answered;
When reading data, the method also includes:
Step S100, the described client layer sends first to the NameNode and reads file request, and described first reads text Part request carries the title of target small documents;
Step S200, the SMI index and MDI index that the described NameNode reads file request, caching according to described first DataNode address information corresponding to target small documents is returned to client layer;
Step S300, the described client layer sends second according to the DataNode address information and reads file request, described Second reads the title that file request carries target small documents;
Step S400, the described DataNode reads file request, the SMI index of caching and merging file according to described second The content of the location metadata information searching target small documents of block, and return to the client layer.
Preferably, the present embodiment method further include:
The pretreatment layer is locally deleting the big file after the big file uploading success after merging.
Further, the present embodiment method further include:
The preprocessing module names the big file after merging with creation time, and protects in HBase database The timestamp information for depositing and updating the big file, when the corresponding timestamp information of the big file reaches preset storage timeliness When, the big file and relevant SMI index and MDI index information are deleted in the accumulation layer.
Such as: the data retention over time of setting the present embodiment network traceability system is 1 week, we are by combined file first It names to reach according to the time of creation orderly, at the end of the time, merge on the day of being recorded in HBase database One matched timestamp of addition is 1 while big file name, then one day timestamp+1 of every mistake, and introduces a timestamp Automatic detection module, when timestamp be greater than 7 when, then trigger corresponding deleting mechanism.
Further, the present embodiment method further include:
The SMI index and the MDI index are backed up in the HBase database, backup format are as follows: key, vaule;Wherein, key indicates the title for merging big file, and value includes small documents title, the size, conjunction come by merging sequence And the title of big file, the location metadata for merging blocks of files.For efficient reduction index and use space can be saved whereby It is greatly convenient to provide.
Further, the present embodiment method further include:
The pretreatment layer is equipped with cache pool, and is equipped with the balance policy merged from the cache pool extraction document.
Such as: in order to ensure file can be merged in time, the size of our cache pools is usually arranged as 5, then basis The current size for merging file makes after merging from suitable file mergences is chosen in cache pool closest to HDFS block block Size 128MB, at the same when a file in cache pool more than 3 minutes, then next merging is exactly it, in this way can be true The balance for protecting file storage, so that file reading efficiency is higher.
Embodiment three
The present embodiment discloses a kind of network traceability system, and the network traceability system is based on HDFS and carries out data storage, packet The computer program that includes memory, processor and storage on a memory and can run on a processor, the processor are held The step of realizing two the method for embodiment when the row computer program.
Example IV
The present embodiment discloses a kind of computer storage medium, is stored thereon with computer program, described program is by processor The step in two the method for above-described embodiment is realized when execution.
To sum up, network traceability system and its data processing method, calculating disclosed in the various embodiments described above institute of the present invention difference Machine storage medium, at least have it is following the utility model has the advantages that
On the one hand, being all loaded into memory for all file indexes is not had to go to prefetch again to waste time.On the other hand, The present invention also improves the storage organization of HDFS, and the location metadata information for merging blocks of files is transferred to from NameNode memory On each corresponding distribution DataNode memory, NameNode memory consumption is reduced;And pass through SMI index and MDI index Cooperation effectively prevent merging it is inconvenient brought by the location metadata information displacement of blocks of files.At the same time, it is deposited in data During storage, it is additionally arranged pretreatment layer;And in data read process, then it does not need pretreatment layer and participates in interaction.Thus from more A dimension enables the network traceability system based on small documents efficiently to operate on HDFS, improves the utilization of resources of HDFS Rate and the access efficiency for further increasing data.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (9)

1. a kind of data processing method of network traceability system, the network traceability system is based on HDFS and carries out data storage, It is characterized in that, which comprises
The network traceability system is divided into client layer, pretreatment layer and accumulation layer, the client layer is for generating described trace back The data file of source system bottom crawl;The accumulation layer includes NameNode and at least two DataNode based on HDFS;
The pretreatment layer is set between the client layer and the accumulation layer, and in the storing process of data file, by The pretreatment layer executes following step:
Step S1, at least two small data files for grabbing the client layer are merged into one big file and generate corresponding SMI index, SMI index characterization merge after the titles of big file name and each small documents being merged, size and partially Relationship between shifting amount;And
Step S2, raw according to the location metadata information for merging blocks of files after the big file uploading success after merging At MDI index, the MDI index characterizes the big file name and stores the corresponding relationship of the DataNode of the big file;
Step S3, by the SMI index and the MDI indexed cache on the NameNode, and the SMI index is slow There are on corresponding DataNode;
Correspondingly, in the storing process of data file, the also mating execution following step of the accumulation layer:
Step S10, the location metadata information for merging blocks of files is transferred to from the NameNode memory each corresponding On distributed DataNode memory;
When reading data, the method also includes:
Step S100, the described client layer sends first to the NameNode and reads file request, and the first reading file is asked Seek the title for carrying target small documents;
Step S200, the described NameNode according to it is described first read file request, caching SMI index and MDI index to Family layer returns to DataNode address information corresponding to target small documents;
Step S300, the described client layer according to the DataNode address information send second read file request, described second Read the title that file request carries target small documents;
Step S400, the described DataNode reads file request, the SMI index of caching and merging blocks of files according to described second The content of location metadata information searching target small documents, and return to the client layer.
2. the data processing method of network traceability system according to claim 1, which is characterized in that further include:
The pretreatment layer is locally deleting the big file after the big file uploading success after merging.
3. the data processing method of network traceability system according to claim 1 or 2, which is characterized in that further include:
The preprocessing module names the big file after merging with creation time, and in HBase database save and The timestamp information for updating the big file, when the corresponding timestamp information of the big file reaches preset storage timeliness, The big file and relevant SMI index and MDI index information are deleted in the accumulation layer.
4. the data processing method of network traceability system according to claim 3, which is characterized in that further include:
The SMI index and the MDI index are backed up in the HBase database, backup format are as follows: key, vaule;Its In, key indicates the title for merging big file, and value includes the small documents title come by merging sequence, size, merges big file Title, merge blocks of files location metadata.
5. the data processing method of network traceability system according to claim 4, which is characterized in that further include:
The pretreatment layer is equipped with cache pool, and is equipped with the balance policy merged from the cache pool extraction document.
6. a kind of network traceability system, the network traceability system is based on HDFS and carries out data storage characterized by comprising
Client layer, for generating the data file of the traceability system bottom crawl;
Accumulation layer, including NameNode and at least two DataNode based on HDFS;And
Pretreatment layer between the client layer and the accumulation layer, for executing in the storing process of data file Following step:
Step S1, at least two small data files for grabbing the client layer are merged into one big file and generate corresponding SMI index, SMI index characterization merge after the titles of big file name and each small documents being merged, size and partially Relationship between shifting amount;And
Step S2, raw according to the location metadata information for merging blocks of files after the big file uploading success after merging At MDI index, the MDI index characterizes the big file name and stores the corresponding relationship of the DataNode of the big file;
Step S3, by the SMI index and the MDI indexed cache on the NameNode, and the SMI index is slow There are on corresponding DataNode;
Correspondingly, in the storing process of data file, the accumulation layer is also used to mating execution following step:
Step S10, the location metadata information for merging blocks of files is transferred to from the NameNode memory each corresponding On distributed DataNode memory;
When reading data, the network traceability system is also used to execute following step:
Step S100, the described client layer sends first to the NameNode and reads file request, and the first reading file is asked Seek the title for carrying target small documents;
Step S200, the described NameNode according to it is described first read file request, caching SMI index and MDI index to Family layer returns to DataNode address information corresponding to target small documents;
Step S300, the described client layer according to the DataNode address information send second read file request, described second Read the title that file request carries target small documents;
Step S400, the described DataNode reads file request, the SMI index of caching and merging blocks of files according to described second The content of location metadata information searching target small documents, and return to the client layer.
7. network traceability system according to claim 6, which is characterized in that the pretreatment layer includes:
File judging unit, for judging whether the size of big file after merging meets upload threshold value;If it is satisfied, file is sent out HDFS Client is given, otherwise, fat file is sent to document handling unit;
Document handling unit, it is suitable according to the merging of small documents for calculating the size of the file from the file judging unit Sequence obtains offset, while generating interim index file, and index file and data file are then passed to file mergences Unit;
File mergences unit, for according to the sequence from the document handling unit by file mergences to special shape File, meanwhile, merge interim index file to generate SMI index;
HDFS Client, for combined file to be written in HDFS cluster, by distributed file system example with NameNode and DataNode establishes connection, and notifies NameNode that distribution is used for the DataNode of writing data blocks, obtains and closes And the location metadata information of blocks of files and MDI index is generated, and by the SMI index and the MDI indexed cache in institute It states on NameNode, and by the SMI indexed cache on corresponding DataNode.
8. a kind of network traceability system, the network traceability system is based on HDFS and carries out data storage, including memory, processor And store the computer program that can be run on a memory and on a processor, which is characterized in that the processor executes institute The step of any the method for the claims 1 to 5 is realized when stating computer program.
9. a kind of computer storage medium, is stored thereon with computer program, which is characterized in that described program is executed by processor Step in any the method for Shi Shixian the claims 1 to 5.
CN201910046934.1A 2019-01-18 2019-01-18 Network traceability system and its data processing method, computer storage medium Active CN109800208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910046934.1A CN109800208B (en) 2019-01-18 2019-01-18 Network traceability system and its data processing method, computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910046934.1A CN109800208B (en) 2019-01-18 2019-01-18 Network traceability system and its data processing method, computer storage medium

Publications (2)

Publication Number Publication Date
CN109800208A CN109800208A (en) 2019-05-24
CN109800208B true CN109800208B (en) 2019-09-27

Family

ID=66559697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910046934.1A Active CN109800208B (en) 2019-01-18 2019-01-18 Network traceability system and its data processing method, computer storage medium

Country Status (1)

Country Link
CN (1) CN109800208B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321349B (en) * 2019-06-13 2021-11-12 暨南大学 Self-adaptive data merging and storing method for data origin system
CN110515920A (en) * 2019-08-30 2019-11-29 北京浪潮数据技术有限公司 A kind of mass small documents access method and system based on Hadoop

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855239B (en) * 2011-06-28 2016-04-20 清华大学 A kind of distributed geographical file system
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 A method for associative storage of massive non-independent small files based on Hadoop
CN103577123B (en) * 2013-11-12 2016-06-22 河海大学 A kind of small documents optimization based on HDFS stores method
WO2018133762A1 (en) * 2017-01-17 2018-07-26 广州市动景计算机科技有限公司 File merging method and apparatus

Also Published As

Publication number Publication date
CN109800208A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
US10552287B2 (en) Performance metrics for diagnosing causes of poor performing virtual machines
US8301588B2 (en) Data storage for file updates
US9858303B2 (en) In-memory latch-free index structure
US10691687B2 (en) Pruning of columns in synopsis tables
CN102722449B (en) Key-Value local storage method and system based on solid state disk (SSD)
CN101576915B (en) A distributed B+ tree index system and construction method
US20070239747A1 (en) Methods, systems, and computer program products for providing read ahead and caching in an information lifecycle management system
US20040205044A1 (en) Method for storing inverted index, method for on-line updating the same and inverted index mechanism
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
CN104657459A (en) Massive data storage method based on file granularity
CN105243155A (en) Big data extracting and exchanging system
CN103037004A (en) Implement method and device of cloud storage system operation
US20240086362A1 (en) Key-value store and file system
CN103038742A (en) Method and system for dynamically replicating data within a distributed storage system
CN108984686A (en) A kind of distributed file system indexing means and device merged based on log
US11182260B1 (en) Avoiding recovery log archive access in database accelerator environments
CN114328601A (en) Data down-sampling and data query method, system and storage medium
US7844596B2 (en) System and method for aiding file searching and file serving by indexing historical filenames and locations
CN114416742A (en) Key-Value storage engine implementation method and system
CN109800208B (en) Network traceability system and its data processing method, computer storage medium
El Alami et al. Supply of a key value database redis in-memory by data from a relational database
Ma et al. A retrieval optimized surveillance video storage system for campus application scenarios
CN105912877A (en) Data processing method of medicine product
CN112540954A (en) Multi-level storage construction and online migration method in directory unit
Qian et al. An evaluation of Lucene for keywords search in large-scale short text storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant