CN113312355B - A method and device for data management - Google Patents
A method and device for data management Download PDFInfo
- Publication number
- CN113312355B CN113312355B CN202110660666.XA CN202110660666A CN113312355B CN 113312355 B CN113312355 B CN 113312355B CN 202110660666 A CN202110660666 A CN 202110660666A CN 113312355 B CN113312355 B CN 113312355B
- Authority
- CN
- China
- Prior art keywords
- metadata
- grouping
- index
- identifier
- query entry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2264—Multidimensional index structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for data management, and relates to the technical field of computers. The method comprises the steps of obtaining a query request aiming at a metadata table, determining a grouping identifier and a fragmentation identifier of target data corresponding to the query entry according to the query entry and a constructed secondary index model, and determining a data storage range of the target data according to the grouping identifier and the fragmentation identifier. According to the embodiment, the secondary index model is built through the metadata table for searching, so that the data searching efficiency is improved, and the user experience is improved.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for data management.
Background
ES (elastic search, a distributed, full-text search engine) supports distributed, use reverse index, and provide abundant search APIs, can be used as a storage tool for massive data, and is widely used in massive data retrieval, aggregation, log analysis and other businesses in the Internet field.
In the prior art, an ES is generally adopted as a storage engine, and data synchronization is performed based on a database so as to solve the problem of list searching caused by database and table division. However, when the data volume base is larger, the service search range is wider, and the query scene is more complex, the search efficiency of ES is lower, and the service requirement cannot be efficiently supported and adapted.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method and an apparatus for data management, which can implement quick search of data, improve search efficiency, and improve user experience.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of data management, including:
Acquiring a query request aiming at a metadata table, wherein the query request comprises a query entry;
determining a grouping identifier and a fragmentation identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
and determining the data storage range of the target data according to the grouping identification and the fragment identification.
Optionally, before determining the grouping identifier and the fragment identifier of the target data corresponding to the query entry according to the query entry and the constructed secondary index model, the method further includes:
Screening index fields from the fields of the metadata table, and constructing a secondary index table according to the index fields;
and grouping the metadata in the metadata table according to the index field in the secondary index table, and fragmenting the metadata in each group to construct the secondary index model.
Optionally, grouping metadata in the metadata table according to an index field in the secondary index table, and fragmenting metadata in each group, including:
Screening an index field from the index fields of the secondary index table to serve as a grouping field, and screening a field from the index fields to serve as a fragment key value;
grouping the metadata according to the grouping field to obtain one or more groups;
Fragmenting the metadata in each group according to the fragment key value to obtain one or more fragments;
And storing the grouping identification of the grouping and the slicing identification of the slicing corresponding to each piece of metadata.
Optionally, after storing the packet identifier of the packet and the fragment identifier of the fragment corresponding to each piece of metadata, the method further includes:
the storage path information of each piece of metadata is stored.
Optionally, before storing the storage path information of each piece of metadata, for any packet containing a plurality of fragments, performing:
determining that the difference or ratio of the data amounts of the metadata in any two slices of the packet does not exceed a preset threshold,
And when the preset threshold value is exceeded, hashing all metadata in the packet and then re-slicing, and storing the storage path information of all metadata in the packet after re-slicing.
Optionally, hashing and then re-fragmenting all metadata in the packet, including:
and performing secondary hash on the metadata according to the fragment identification of the metadata before re-fragmenting, determining the fragment identification of the metadata after re-fragmenting, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
Optionally, determining the packet identifier and the fragment identifier of the target data corresponding to the query entry includes:
determining an index field corresponding to the query entry from the secondary index model according to the query entry, and determining the grouping field and the fragment key value corresponding to the query entry according to the index field;
and determining a grouping identifier and a fragmentation identifier of the target data corresponding to the query entry according to the corresponding grouping field and the fragmentation key value.
According to still another aspect of an embodiment of the present invention, there is provided an apparatus for data management, including:
the acquisition module acquires a query request aiming at the metadata table, wherein the query request comprises a query entry;
The query module determines a grouping identifier and a fragmentation identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
and the determining module is used for determining the data storage range of the target data according to the grouping identifier and the fragment identifier.
According to another aspect of an embodiment of the present invention, there is provided an electronic apparatus including:
one or more processors;
storage means for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of data management provided by the present invention.
According to yet another aspect of an embodiment of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements the method of data management provided by the present invention.
The embodiment of the invention has the advantages that each piece of metadata in the metadata table has the corresponding grouping identification and the corresponding fragment identification through the secondary index model constructed according to the metadata table, the grouping identification and the fragment identification of the target data corresponding to the query entry are obtained through obtaining the query entry from the query request, and the storage range of the target data is further obtained through combining the secondary index model. The data management method provided by the embodiment of the invention can solve the problem of low ES retrieval efficiency under the conditions of large data base, wide service retrieval range and complex query scene, realize quick retrieval of data, meet service requirements, improve retrieval efficiency and promote user experience.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a method of data management according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main flow of constructing a secondary index model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another method of data management according to an embodiment of the present invention;
FIG. 4 is a flow chart of a method of data management according to an embodiment of the invention;
FIG. 5 is a schematic diagram of the main modules of an apparatus for data management according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram of the main flow of a method for data management according to an embodiment of the present invention, as shown in FIG. 1, the method for data management includes:
step S101, acquiring a query request aiming at a metadata table, wherein the query request comprises query entry;
step S102, determining grouping identification and fragmentation identification of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
Step S103, determining the data storage range of the target data according to the grouping identification and the slicing identification.
The method for data management provided by the embodiment of the invention solves the problem that the ES can not realize quick search under the conditions of large data base, wide search range and complex query scene, and realizes the efficient search of data by expanding the storage and search modes of the ES.
In the embodiment of the invention, the query request comprises a query entry, the query entry can be obtained by analyzing the query request, the query entry can be a query field, and in the field of electronic commerce, the query field can be fields such as order number, order time, commodity name, account number and the like.
In the embodiment of the invention, before step 102, the method comprises the steps of screening index fields from fields of a metadata table, constructing a secondary index table according to the index fields, grouping metadata in the metadata table according to the index fields in the secondary index table, and fragmenting metadata in each group to construct a secondary index (SecondaryIndex) model.
The retrieval scene or the historical retrieval situation aiming at the metadata in the metadata table is combed, one or more fields which are frequently retrieved are selected from the fields in the metadata table to serve as index fields, a secondary index table is constructed according to the index fields, the index fields can be screened out by setting the fields of which the retrieval times exceed a preset times threshold, and the storage range of the corresponding metadata can be queried through the one or more index fields. The secondary index table includes an index field and a Value (Value) of the index field. Optionally, the index field further includes a routing time, and the routing time is used as an index field of the secondary index table, so that metadata can be archived according to time later, and the routing time can be a time to order.
For example, fields in the metadata table include order number, order time, bill number, commodity code, commodity name, selling price, tax rate, order account number, organization ID, invoice number, etc., and the frequently retrieved fields order number, order account number, bill number, organization ID, invoice number are packaged as index fields into the secondary index table through the search scene of the carding metadata, and generally, the routing time, such as the order time, is also required to be packaged into the secondary index table to construct the secondary index table according to the index field.
In the embodiment of the invention, metadata in a metadata table is grouped according to index fields in a secondary index table, metadata in each group is segmented, and the method comprises the steps of screening one index field from the index fields of the secondary index table as a grouping field, screening one field from the index field as a segmentation key value, grouping the metadata according to the grouping field to obtain one or more groups, segmenting the metadata in each group according to the segmentation key value to obtain one or more segments, and storing a grouping identifier of the group in which each piece of metadata is located and a segmentation identifier of the segment in which each piece of metadata is located.
In an optional implementation manner of the embodiment of the invention, after storing the packet identifier of the packet where each piece of metadata corresponds and the fragment identifier of the fragment where each piece of metadata corresponds, the method further comprises storing the storage path information of each piece of metadata.
Screening a grouping field and a fragment key value from index fields in the secondary index table, wherein the grouping field is one of the index fields in the secondary index table, and the fragment key value is one of the index fields in the secondary index table. Optionally, the screening rule may be determined according to the service search and query scenario, and a reasonable mapping (mapping) file is generated by combing the service "read" scenario, so that the screened packet fields and the shard key values are adapted to all search fields as much as possible. Reasonable data splitting can be performed according to stock data and the growth ratio of future business development, and the searching efficiency and the searching range can be considered for splitting so as to determine a proper grouping field for archiving the data. The metadata may be based on the shard key value in which the shard is located. For example, the routing time can be selected as a grouping field to archive metadata, archive the metadata year by year, archive the metadata month by month, etc., when determining the slicing key value, if the service query scene only searches through the order number, the order number can be selected as the slicing key value, when the service query scene is complex, the field with larger data dimension can be selected as the slicing key value, for example, in the internet e-commerce field, the user can frequently search the data through the multidimensional degrees such as the order number, the order PIN, the invoice number, etc., and then the mechanism ID (all data sharing attribute) can be selected as the slicing key value, so as to facilitate the query of the slicing stored in the data.
According to the selected packet fields, metadata in the metadata table can be grouped to obtain one or more packets, each packet has a corresponding packet identifier (e.g., es.index#0, es.index#1, etc.), the packet identifier can be determined according to the packet fields, then the metadata in each packet is sliced according to the slice key value (partitionkey) to obtain one or more slices, each slice has a corresponding slice identifier (e.g., shard, shard, shard, etc.), shardN, N is equal to 0), one packet corresponds to one or more slices, the packet identifier and the slice identifier corresponding to each piece of metadata are stored, and the storage path information of each piece of metadata is stored. The storage path information is the path information from the packet identification to the fragment identification of the metadata, namely, the information of which packet and which fragment of which packet the metadata is in is stored, so that the metadata is quickly positioned to the fragment where the metadata is located for access when being searched. The group where the metadata is located is used as a storage structure of an outer layer index of the secondary index model, and the fragment where the metadata is located is a storage structure inside an ES of the secondary index model.
As shown in FIG. 2, a schematic diagram of a main flow of constructing a secondary index model is provided, the secondary index model comprises an ES data storage engine and an ES cluster, the ES data storage model comprises screening routing time (such as order time) from index fields of a secondary index table as grouping fields, such as metadata table as order data, and the order time can be used as index field to file the order data according to the order time, then screening an index field (such as mechanism ID) from the secondary index table as a fragment key value, writing the determined grouping fields and fragment key value into the metadata table and the secondary index table, for example, writing the order ID, the order time and the mechanism ID into the metadata table and the secondary index table. The ES cluster comprises the steps of grouping metadata according to grouping fields, such as annual archiving the metadata according to the next time, wherein grouping identifications can be index_2019, index_2020 and index_2021, namely, grouping identifications corresponding to the metadata in 2019 at the next time are index_2019 by acquiring values corresponding to the next time, the metadata in each grouping is segmented according to organization IDs, each grouping corresponds to a plurality of segments, the segmentation identifications can be Shard0, shard, shard and the number of the segments is equal to or greater than 0, shardN, wherein N is greater than or equal to 0, so that metadata of different organization IDs can correspond to different segmentation identifications, and the grouping identifications and the segmentation identifications corresponding to each piece of metadata and storage path information are stored, so that the segments where the metadata are located can be quickly accessed according to the storage path information when the metadata are searched later. For example, if one ES index includes 16 Shard pieces, after storing the path information, each time when retrieving the metadata, the corresponding fragments of the metadata can be directly located, without accessing 16 pieces Shard, the number of accesses is reduced by 15, and the retrieval range is theoretically 1/16 of the previous one, so as to achieve the purpose of quick retrieval, and improve the retrieval efficiency.
In the embodiment of the invention, before storing the storage path information of each piece of metadata, for any packet containing a plurality of fragments, determining that the difference or the ratio of the data amounts of the metadata in any two fragments of the packet does not exceed a preset threshold, hashing all metadata in the packet and then re-fragmenting when the preset threshold is exceeded, and storing the storage path information of all metadata in the packet after re-fragmenting.
When a packet contains a plurality of slices, there may be a case where the data amount of metadata in different slices is not uniform, and the data amount distribution is not uniform, which may reduce the retrieval efficiency. In this case, the data in the packet may be hashed and then re-fragmented so that the distribution of the data amount tends to be uniform. The method can judge whether the difference value or the ratio of the data quantity of the metadata in any two fragments in the packet does not exceed a preset threshold value, if the difference value or the ratio does not exceed the preset threshold value, the fragments do not need to be re-fragmented, and if the difference value or the ratio exceeds the preset threshold value, the fragments of all metadata in the packet are hashed again. The preset threshold may be set according to a service scenario or a service requirement, for example, the ratio of the data amounts of the metadata in any two slices cannot exceed the preset threshold (50), and if the ratio exceeds 50, the slices need to be re-sliced.
In the embodiment of the invention, the method comprises the steps of hashing all metadata in the packet and then re-fragmenting the metadata, wherein the method comprises the steps of performing secondary hashing on the metadata according to the fragment identification of the metadata before re-fragmenting, determining the fragment identification of the metadata after re-fragmenting, and re-fragmenting all the metadata in the packet according to the result of the secondary hashing.
After hashing all metadata in the packet, performing secondary hash processing on the metadata, re-slicing all metadata in the packet, determining the slicing of the re-sliced metadata, and storing the storage path information of the re-sliced metadata so as to facilitate quick positioning in subsequent retrieval.
After hashing all metadata within the packet, the packet is re-fragmented by a secondary hash process. By analyzing all metadata in the packet, determining a secondary hash key value (SecondHashKey) and a hash width (HASHRANGE), wherein the secondary hash key value is a field of the re-fragmentation, the hash width is the hash width of the metadata after the re-fragmentation, the secondary hash key value defaults to the_id of the ES, a random UUID can be selected, the hash width is determined according to the distribution condition of the data volume of the fragmentation key value fragmentation, the data density is as close to Shard as possible after the secondary hash, namely, the data volume in each Shard is uniformly distributed, and when the data distribution is more uniform, the larger the hash width is, the larger the retrieval performance is required to be weighed.
And determining the slicing identification of the slicing where the metadata are in after re-slicing according to the secondary hash key value, the hash width and the slicing identification of the slicing where the new metadata are in after re-slicing. The slice identifier ShardA of the slice where the metadata is located after re-slicing can be obtained by the following formula ShardA =hash (_ partitionkey)% numPrimaryShards +hash (_ SecondHashkey)% HASHRANGE, where hash (_ partitionkey)% numPrimaryShards is the slice where the metadata is located before re-slicing, numPrimaryShards is the number of slices of the packet before re-slicing, and hash (_ SecondHashkey)% HASHRANGE is the offset of the metadata.
For example, when metadata in one packet is divided into 8 pieces Shard (Shard 0, shard, shard2,) according to organization ID as a piece key value, namely numPrimaryShards =8, because of a large difference in the amount of the sub-pieces of each organization, the distribution of metadata in each Shard is uneven, UUID is selected as a secondary hash key value, hash width is 3, all metadata in the packet is hashed and then divided into 3 pieces Shard again, if the piece where the metadata is located before re-slicing is Shard2, hash (_ partitionkey)% 8= Shard2, offset=hash (_ SecondHashkey)% 3 is a random number between 0 and 2, and it is possible to determine that the piece where the metadata is located after re-slicing is Shard +offset, namely the piece location where the metadata is stored.
By hashing the data in the packet and then re-slicing, the data distribution in each sliced after re-slicing is more uniform than that before re-slicing, the number of the sliced is properly reduced, and the retrieval efficiency can be improved.
In an embodiment of the present invention, as shown in fig. 3, determining a packet identifier and a fragment identifier of target data corresponding to a query entry includes:
Step 301, determining an index field corresponding to the query entry from the secondary index model according to the query entry;
step S302, determining grouping fields and fragment key values corresponding to the query parameters according to the index fields;
Step S303, according to the corresponding grouping field and the fragment key value, determining the grouping identification and the fragment identification of the target data corresponding to the query entry.
According to the query entry, an index field corresponding to the query entry is determined from the secondary index model, then a grouping field (such as a single time) and a fragmentation key value (such as an organization ID) are determined, according to the query entry, and the grouping field and the fragmentation key value corresponding to the query entry, it is possible to obtain which group and which fragment of the target data is stored in, namely, determine a grouping identifier and a fragmentation identifier of the target data, package the grouping identifier and the fragmentation identifier of the target data to Routinginfo (an entity for packaging the grouping identifier and the fragmentation identifier), and then determine a data storage range of the target data according to the grouping identifier and the fragmentation identifier. If the two-level index model is grouped by year and then fragmented according to the organization ID, the grouping time and the organization ID are determined according to the query entry, and then the grouping time is aggregated, so that the grouping can be determined, and the organization ID is aggregated, so that the fragmentation can be determined.
After receiving the query request, identifying a query entry through a default initialization strategy (QueryParam), judging whether a grouping field and a fragmentation key value in a secondary index table exist in the query entry, if so, determining a grouping identification and a fragmentation identification corresponding to the query entry according to the grouping field and the fragmentation key value, packaging the grouping identification and the fragmentation identification into Routinginfo, and determining a data storage range of target data from a metadata table according to the grouping identification and the fragmentation identification.
In the embodiment of the invention, in order to improve the retrieval efficiency, the number of the fragment key values determined according to the query entry is set, namely, a set threshold is set, but partial data retrieval is lost when the number of the determined fragment key values exceeds the set threshold, so that the retrieval range of a user can be set at the front end in combination with the trade-off of service scenes in user experience and retrieval efficiency, if the retrieval range cannot be retrieved across institutions, or after the grouping fields and the fragment key values corresponding to the query entry are determined, before the grouping identification and the fragment identification of target data corresponding to the query entry are determined, the method comprises the steps of judging whether the number of the fragment key values exceeds the set threshold or not, if the number of the fragment key values does not exceed the set threshold, retrieving the target data from a metadata table according to the grouping identification and the fragment identification, if the number of the fragment key values exceeds the set threshold, resetting the fragments in Routinginfo, and setting up more than Shard or all Shard retrieval to avoid data loss so as to improve the user experience.
Fig. 4 is a flowchart of a method for data management according to an embodiment of the present invention, where the secondary index model includes M packets, and packet identifiers are es.index#0, es.index#1, and the number of the packets is es.index#i, and the number of the packets is es.index#m, and the es.index#i includes N slices, and the slice identifiers are Shard0, shard, and the number of the slices is 3834, shardi, and the numbers of the packets are ShardN. When a user sends a Query request to an application layer (such as a Client), wherein the Query request comprises a Query entry, the application layer calls an ES Client, a packet identifier corresponding to the Query entry is determined to be es.index#i according to a secondary index model, a corresponding fragment identifier is determined to be Shardj, a data storage range of target data corresponding to the Query entry is determined according to the packet identifier and the fragment identifier corresponding to the Query entry, namely, the target data is determined to be stored in Shardj of es.index#i, then the target data is acquired from a metadata table according to the data storage range, and the target data is returned to the user to complete the Query search process.
According to the data management method provided by the embodiment of the invention, the grouping identification and the fragmentation identification of the target data corresponding to the query entry are determined according to the query entry by constructing the secondary index model, so that the data storage range of the target data is determined. The method provided by the embodiment of the invention can improve the efficiency of data retrieval and further improve the user experience aiming at the situations of large data base, wide retrieval range and complex retrieval scene.
According to another aspect of an embodiment of the present invention, as shown in fig. 5, there is provided an apparatus 500 for data management, including:
the acquisition module 501 acquires a query request aiming at the metadata table, wherein the query request comprises a query entry;
The query module 502 determines a grouping identifier and a fragmentation identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
the determining module 503 determines the data storage range of the target data according to the group identifier and the fragment identifier.
In the embodiment of the invention, the data management device 500 further comprises a construction module, wherein the construction module is used for screening index fields from fields of the metadata table, constructing a secondary index table according to the index fields, grouping metadata in the metadata table according to the index fields in the secondary index table, and fragmenting metadata in each group to construct a secondary index model.
The embodiment of the invention further comprises a construction module, wherein the construction module is used for screening one index field from index fields of the secondary index table as a grouping field, screening one field from the index fields as a slicing key value, grouping metadata according to the grouping field to obtain one or more groups, slicing the metadata in each group according to the slicing key value to obtain one or more slices, and storing a grouping identifier of the group in which each piece of metadata is located and a slicing identifier of the slice in which each piece of metadata is located.
In the embodiment of the invention, the construction module is further used for storing the storage path information of each piece of metadata.
In the embodiment of the invention, the construction module is further used for executing the steps of determining that the difference or the ratio of the data quantity of the metadata in any two fragments of the packet does not exceed a preset threshold value for any packet containing a plurality of fragments before storing the storage path information of each piece of metadata, hashing all metadata in the packet and then re-fragmenting the metadata when the difference or the ratio exceeds the preset threshold value, and storing the storage path information of all metadata in the packet after re-fragmenting.
In the embodiment of the invention, the construction module is further used for carrying out secondary hash on the metadata according to the fragment identification of the metadata before re-fragmenting, determining the fragment identification of the metadata after re-fragmenting, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
In the embodiment of the present invention, the query module 502 is further configured to determine an index field corresponding to the query entry from the secondary index model according to the query entry, determine a packet field and a fragment key value corresponding to the query entry according to the index field, and determine a packet identifier and a fragment identifier of the target data corresponding to the query entry according to the corresponding packet field and fragment key value.
According to yet another aspect of the embodiment of the present invention, there is provided an electronic device including one or more processors, and a storage device for storing one or more programs, which when executed by the one or more processors, cause the one or more processors to implement a method of data management of the embodiment of the present invention.
Yet another aspect of the embodiments of the present invention provides a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements a method of data management of the embodiments of the present invention.
Fig. 6 illustrates an exemplary system architecture 600 of a data management method or apparatus to which embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603. The background management server may analyze and process the received data such as the query request, and feed back target data (e.g., product information—only an example) corresponding to the query request to the terminal device.
It should be noted that, the method for data management provided in the embodiment of the present invention is generally executed by the server 605, and accordingly, the device for data management is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Reference is now made to the schematic structural diagram of fig. 7. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Connected to the I/O interface 705 are an input section 706 including a keyboard, a mouse, and the like, an output section 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 708 including a hard disk, and the like, and a communication section 709 including a network interface card such as a LAN card, a modem, and the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, which may be described as, for example, a processor comprising an acquisition module, a query module, and a determination module. The names of these modules do not constitute a limitation on the module itself in some cases, and for example, the acquisition module may also be described as "a module that acquires a query request for a metadata table".
As a further aspect, the invention also provides a computer readable medium which may be comprised in the device described in the above embodiments or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to include obtaining a query request for a metadata table, the query request including a query entry, determining a group identification and a fragment identification of target data corresponding to the query entry according to the query entry and a constructed secondary index model, and determining a data storage range of the target data according to the group identification and the fragment identification.
According to the technical scheme of the embodiment of the invention, through a secondary index model constructed according to the metadata table, each piece of metadata in the metadata table has a corresponding grouping identifier and a corresponding fragment identifier, through acquiring the query entry from the query request and combining the secondary index model, the grouping identifier and the fragment identifier of the target data corresponding to the query entry are obtained, and then the storage range of the target data is acquired. The data management method provided by the embodiment of the invention can solve the problem of low ES retrieval efficiency under the conditions of large data base, wide service retrieval range and complex query scene, realize quick retrieval of data, meet service requirements, improve retrieval efficiency and promote user experience.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (9)
1. A method of data management, comprising:
Acquiring a query request aiming at a metadata table, wherein the query request comprises a query entry;
determining a grouping identifier and a fragmentation identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
Determining a data storage range of the target data according to the grouping identifier and the fragment identifier;
Before determining the grouping identification and the slicing identification of the target data corresponding to the query entry according to the query entry and the constructed secondary index model, the method further comprises the steps of screening index fields from fields of the metadata table, constructing a secondary index table according to the index fields, grouping metadata in the metadata table according to the index fields in the secondary index table, and slicing metadata in each grouping to construct the secondary index model.
2. The method of claim 1, wherein grouping metadata in the metadata table according to index fields in the secondary index table and fragmenting metadata within each group comprises:
Screening an index field from the index fields of the secondary index table to serve as a grouping field, and screening a field from the index fields to serve as a fragment key value;
grouping the metadata according to the grouping field to obtain one or more groups;
Fragmenting the metadata in each group according to the fragment key value to obtain one or more fragments;
And storing the grouping identification of the grouping and the slicing identification of the slicing corresponding to each piece of metadata.
3. The method of claim 2, further comprising, after storing the packet identifier of the packet and the fragment identifier of the fragment corresponding to each piece of metadata:
the storage path information of each piece of metadata is stored.
4. A method according to claim 3, wherein, before storing the storage path information of each piece of metadata, for any packet containing a plurality of fragments, performing:
determining that the difference or ratio of the data amounts of the metadata in any two slices of the packet does not exceed a preset threshold,
And when the preset threshold value is exceeded, hashing all metadata in the packet, then re-slicing, and storing storage path information of all metadata in the packet after re-slicing.
5. The method of claim 4, wherein the re-fragmenting after hashing all metadata in the packet comprises:
and performing secondary hash on the metadata according to the fragment identification of the metadata before re-fragmenting, determining the fragment identification of the metadata after re-fragmenting, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
6. The method of claim 2, wherein determining the packet identity and the fragment identity of the target data corresponding to the query entry comprises:
determining an index field corresponding to the query entry from the secondary index model according to the query entry, and determining the grouping field and the fragment key value corresponding to the query entry according to the index field;
and determining a grouping identifier and a fragmentation identifier of the target data corresponding to the query entry according to the corresponding grouping field and the fragmentation key value.
7. An apparatus for data management, comprising:
the acquisition module acquires a query request aiming at the metadata table, wherein the query request comprises a query entry;
The query module determines a grouping identifier and a fragmentation identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
the determining module is used for determining the data storage range of the target data according to the grouping identifier and the fragment identifier;
The query module is further configured to, before determining a grouping identifier and a slicing identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model, screen an index field from fields of the metadata table, construct a secondary index table according to the index field, group metadata in the metadata table according to the index field in the secondary index table, and slice metadata in each group to construct the secondary index model.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110660666.XA CN113312355B (en) | 2021-06-15 | 2021-06-15 | A method and device for data management |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110660666.XA CN113312355B (en) | 2021-06-15 | 2021-06-15 | A method and device for data management |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113312355A CN113312355A (en) | 2021-08-27 |
| CN113312355B true CN113312355B (en) | 2025-03-18 |
Family
ID=77378730
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110660666.XA Active CN113312355B (en) | 2021-06-15 | 2021-06-15 | A method and device for data management |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113312355B (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114168560A (en) * | 2021-10-28 | 2022-03-11 | 中国建设银行股份有限公司 | Data management method and device |
| CN114090622A (en) * | 2021-11-22 | 2022-02-25 | 中国建设银行股份有限公司 | Secondary index unloading method, device, equipment and storage medium |
| CN115080684B (en) * | 2022-07-28 | 2023-01-06 | 天津联想协同科技有限公司 | Network disk document indexing method and device, network disk and storage medium |
| CN115168409B (en) * | 2022-09-05 | 2023-02-28 | 金蝶软件(中国)有限公司 | Data query method and device for database sub-tables and computer equipment |
| CN116049307A (en) * | 2022-12-30 | 2023-05-02 | 天翼云科技有限公司 | A distributed database broadcasting method and system |
| CN116383255B (en) * | 2023-03-30 | 2025-08-15 | 阿里云计算有限公司 | Aggregation query method, system, equipment and storage medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103488687A (en) * | 2013-09-02 | 2014-01-01 | 用友软件股份有限公司 | Searching system and searching method of big data |
| US10318491B1 (en) * | 2015-03-31 | 2019-06-11 | EMC IP Holding Company LLC | Object metadata query with distributed processing systems |
| CN112783835A (en) * | 2021-03-11 | 2021-05-11 | 百果园技术(新加坡)有限公司 | Index management method and device and electronic equipment |
Family Cites Families (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8180789B1 (en) * | 2005-12-05 | 2012-05-15 | Teradata Us, Inc. | Techniques for query generation, population, and management |
| US8266135B2 (en) * | 2009-01-05 | 2012-09-11 | International Business Machines Corporation | Indexing for regular expressions in text-centric applications |
| CN103020204B (en) * | 2012-12-05 | 2018-09-25 | 北京普泽创智数据技术有限公司 | A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list |
| GB2509240A (en) * | 2012-12-20 | 2014-06-25 | Bae Systems Plc | Transaction record data storage and retrieval |
| CN103631951A (en) * | 2013-12-12 | 2014-03-12 | 用友软件股份有限公司 | Batch access function merging method and device based on metadata |
| CN104731516B (en) * | 2013-12-18 | 2019-03-01 | 腾讯科技(深圳)有限公司 | A kind of method, apparatus and distributed memory system of accessing file |
| CN103729471B (en) * | 2014-01-21 | 2017-03-08 | 华为软件技术有限公司 | Data base query method and device |
| US9846703B2 (en) * | 2014-09-30 | 2017-12-19 | Vivint, Inc. | Page-based metadata system for distributed filesystem |
| CN108170726A (en) * | 2015-10-21 | 2018-06-15 | 华为技术有限公司 | Data query method and apparatus |
| CN105740405B (en) * | 2016-01-29 | 2020-06-26 | 华为技术有限公司 | Method and apparatus for storing data |
| US10585867B2 (en) * | 2016-05-25 | 2020-03-10 | Mongodb, Inc. | Systems and methods for generating partial indexes in distributed databases |
| CN108271420B (en) * | 2016-11-02 | 2020-11-27 | 华为技术有限公司 | Method, file system and server system for managing files |
| CN107291889A (en) * | 2017-06-20 | 2017-10-24 | 郑州云海信息技术有限公司 | A kind of date storage method and system |
| WO2019055282A1 (en) * | 2017-09-14 | 2019-03-21 | Savizar, Inc. | Database engine |
| CN108228799B (en) * | 2017-12-29 | 2021-09-28 | 北京奇虎科技有限公司 | Object index information storage method and device |
| CN108664223B (en) * | 2018-05-18 | 2021-07-02 | 百度在线网络技术(北京)有限公司 | Distributed storage method and device, computer equipment and storage medium |
| CN108897859A (en) * | 2018-06-29 | 2018-11-27 | 郑州云海信息技术有限公司 | A kind of metadata retrieval method, apparatus, equipment and computer readable storage medium |
| CN110083605A (en) * | 2019-04-24 | 2019-08-02 | 天津中新智冠信息技术有限公司 | Traffic table querying method, device, server and computer readable storage medium |
| CN110389940B (en) * | 2019-07-19 | 2022-02-18 | 苏州浪潮智能科技有限公司 | Data equalization method and device and computer readable storage medium |
| CN110716965B (en) * | 2019-09-25 | 2022-02-25 | 蚂蚁区块链科技(上海)有限公司 | Query method, device and equipment in block chain type account book |
| CN112800023B (en) * | 2020-12-11 | 2023-01-10 | 北京计算机技术及应用研究所 | Multi-model data distributed storage and hierarchical query method based on semantic classification |
-
2021
- 2021-06-15 CN CN202110660666.XA patent/CN113312355B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103488687A (en) * | 2013-09-02 | 2014-01-01 | 用友软件股份有限公司 | Searching system and searching method of big data |
| US10318491B1 (en) * | 2015-03-31 | 2019-06-11 | EMC IP Holding Company LLC | Object metadata query with distributed processing systems |
| CN112783835A (en) * | 2021-03-11 | 2021-05-11 | 百果园技术(新加坡)有限公司 | Index management method and device and electronic equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113312355A (en) | 2021-08-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113312355B (en) | A method and device for data management | |
| CN109947668B (en) | Method and device for storing data | |
| US11586585B2 (en) | Method and system for historical call lookup in distributed file systems | |
| CN109614402B (en) | Multidimensional data query method and device | |
| CN110851419B (en) | Data migration method and device | |
| CN110909022B (en) | A data query method and device | |
| CN112395337B (en) | Data export method and device | |
| WO2023029592A1 (en) | Data processing method and apparatus | |
| CN110928853A (en) | A method and device for identifying logs | |
| WO2017174013A1 (en) | Data storage management method and apparatus, and data storage system | |
| CN115295164A (en) | Medical insurance data processing method and device, electronic equipment and storage medium | |
| CN112948334A (en) | Log processing method and device | |
| US11669402B2 (en) | Highly efficient native application data protection for office 365 | |
| US20210334147A1 (en) | System and method of updating temporary buckets | |
| CN112835863A (en) | Processing method and processing device of operation log | |
| CN113378093B (en) | Method and device for determining resource release strategy, electronic equipment and storage medium | |
| CN112100168A (en) | Method and device for determining data association relationship | |
| CN119336731A (en) | Data storage method and device | |
| CN113590322A (en) | Data processing method and device | |
| CN113704242A (en) | Data processing method and device | |
| CN107977381B (en) | Data configuration method, index management method, related device and computing equipment | |
| CN111723063A (en) | A method and device for offline log data processing | |
| CN113239303B (en) | Data storage method and device | |
| CN112862554B (en) | A method and device for processing order data | |
| CN112667627B (en) | Data processing method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |