[go: up one dir, main page]

CN114253992A - Data aggregation method, device, equipment and storage medium - Google Patents

Data aggregation method, device, equipment and storage medium Download PDF

Info

Publication number
CN114253992A
CN114253992A CN202111551970.7A CN202111551970A CN114253992A CN 114253992 A CN114253992 A CN 114253992A CN 202111551970 A CN202111551970 A CN 202111551970A CN 114253992 A CN114253992 A CN 114253992A
Authority
CN
China
Prior art keywords
data
array
target
hash value
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111551970.7A
Other languages
Chinese (zh)
Inventor
陈尊
张然
杜俊
刘一阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202111551970.7A priority Critical patent/CN114253992A/en
Publication of CN114253992A publication Critical patent/CN114253992A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data aggregation method, a data aggregation device, data aggregation equipment and a storage medium. Wherein, the method comprises the following steps: acquiring first data of a preset main key field in a first data table, and determining a first hash value corresponding to the first data based on a preset hash function; determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array; acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function; determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value; if so, determining target second data of the second hash value, and performing data aggregation on the line data where the target second data is located in the second data table and the first data table. The embodiment of the application improves the data aggregation efficiency, and reduces the complexity of data aggregation and the occupation of the memory space.

Description

Data aggregation method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a data aggregation method, a data aggregation device, data aggregation equipment and a storage medium.
Background
In large-scale data processing, the associative aggregation of data is a common requirement in various business fields. In the prior art, a data aggregation mode generally adopts an MPP (Massively Parallel Processing) database or uses a distributed computing framework to aggregate data tables or data files to be aggregated. However, without the MPP database or the distributed cluster can not be constructed, the data can only be aggregated by local computation.
If the data is aggregated by adopting a local calculation mode, the data table or the data file to be aggregated is stored locally by means of a high-capacity cache, which has high requirements on software and hardware environments, and the data aggregation process has low efficiency and high complexity.
Disclosure of Invention
The embodiment of the application provides a data aggregation method, a data aggregation device, data aggregation equipment and a storage medium, so as to improve data aggregation efficiency.
In a first aspect, an embodiment of the present application provides a data aggregation method, where the method includes:
acquiring first data of a preset main key field in a first data table, and determining a first hash value corresponding to the first data based on a preset hash function;
according to the first hash value and a preset bitmap array construction algorithm, determining an array element corresponding to the first hash value to obtain a target bitmap array;
acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function;
determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value;
and if so, determining the target second data of the second hash value, and performing data aggregation on the data of the row where the target second data is located in the second data table and the first data table.
In a second aspect, an embodiment of the present application further provides a data aggregation apparatus, where the apparatus includes:
the first hash value determining module is used for acquiring first data of a preset primary key field in a first data table and determining a first hash value corresponding to the first data based on a preset hash function;
the target bitmap array determining module is used for determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array;
the second hash value determining module is used for acquiring second data of a preset main key field in a second data table and determining a second hash value corresponding to the second data based on a preset hash function;
the target array element determining module is used for determining whether a target array element meeting a preset array searching condition exists in the target bitmap array according to the second hash value;
and the data aggregation module is used for determining the target second data of the second hash value and performing data aggregation on the data of the row where the target second data in the second data table is located and the first data table if the target array elements meeting the preset array searching condition exist in the target bitmap array.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the data aggregation method according to any one of the embodiments of the present application when executing the program.
In a fourth aspect, this application further provides a computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the data aggregation method according to any one of the embodiments of the application.
According to the embodiment of the application, first data of a preset main key field in a first data table are obtained, and a first hash value corresponding to the first data is determined based on a preset hash function; determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array; acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function; determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value; if so, determining target second data of the second hash value, and performing data aggregation on the line data where the target second data is located in the second data table and the first data table. The array elements of the bitmap array in the scheme only occupy 1 bit, the occupation of the memory space is small, and the data table to be aggregated does not need to be stored locally. The dependence on the MPP database and the distributed computing framework is reduced, and the requirements on software and hardware infrastructure are reduced. Approximate association between the data to be aggregated can be realized only by traversing the bitmap array, the data to be aggregated is obtained according to the association condition, the complexity of data aggregation is reduced, and the data aggregation efficiency is improved.
Drawings
Fig. 1 is a schematic flow chart of a data aggregation method in a first embodiment of the present application;
fig. 2 is a schematic flow chart of a data aggregation method in the second embodiment of the present application;
fig. 3 is a schematic flow chart of a data aggregation method in the third embodiment of the present application;
fig. 4 is a block diagram of a data aggregation apparatus in a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device in a fifth embodiment of the present application.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a schematic flow diagram of a data aggregation method provided in an embodiment of the present application, where the present embodiment is applicable to a case of aggregating a plurality of data tables or data files, and the method may be executed by a data aggregation device, and the device may be implemented in a software and/or hardware manner, as shown in fig. 1, the method specifically includes the following steps:
s101, first data of a preset main key field in a first data table are obtained, and a first hash value corresponding to the first data is determined based on a preset hash function.
The first data table may be any data table to be aggregated, and the primary key field may be a unique identifier in the first data table that enables a record to be determined. One line of data in the first data table is a record. The primary key field may be preset by a skilled person, for example, the preset primary key field may be a user unique identifier or a user name, etc. It should be noted that the precondition for the user name as the preset primary key field is that the user names in the first data table are not duplicated. The first data may be primary key values corresponding to preset primary key fields in the first data table, for example, if the preset primary key fields are user names, the first data may be "zhang san", "lie si", and "wang wu", etc.
The hash function may be preset by a person skilled in the art, and for example, the hash function may be MD5(Message-Digest Algorithm). The first hash value may be an encrypted value calculated by the hash function on the first data. It should be noted that hash values calculated based on the same hash function for the same data are necessarily the same; the hash values calculated by different data based on the same hash function may also have the same hash value, and are particularly related to the calculation accuracy of the hash function. In an ideal situation, when different data are calculated based on the same hash function, hash values corresponding to different data are different.
The first hash value calculated by the hash function is used to represent bitmap array index position information of the first data. For example, if the preset primary key field is a user name, there are 20 records of users in the first data table, and the number of data included in the corresponding first data is 20, the hash value range corresponding to the 20 first data may be 00 to 19, and is calculated based on a hash function, for example: the first hash value corresponding to the third digit "zhang is" 00 ", the first hash value corresponding to the fourth digit" li is "01", the first hash value corresponding to the fifth digit "wang is" 02 ", … …", the first hash value corresponding to the sixth Zhao "is" 19 ", and 00-19 respectively represent index positions in the bit map array. Wherein, the bitmap array can be constructed in advance by related technicians; the value of an array element in the bitmap array is 0 or 1, wherein 0 represents that the position of the array element in the bitmap array is unoccupied, and 1 represents that the position of the array element in the bitmap array is occupied. For example, the first hash value corresponding to "three" is "00", and the hash value is hashed into the bitmap array, and the first array element value in the bitmap array is set to "1". The first hash value corresponding to "Liquad" is "01", and the hash value is hashed into the bitmap array, and the value of the second array element in the bitmap array is set to "1". If the first hash value corresponding to "wang five" is also "01", the hash is performed to the bitmap array, the position of the array element of "01" in the bitmap array is also the second, and the value of the second array element is "1". In the case where the first hash value is "02", the third array element will be occupied, and the third array element will remain unoccupied, and the third array element will have a value of "0".
S102, according to the first hash value and a preset bitmap array construction algorithm, determining an array element corresponding to the first hash value to obtain a target bitmap array.
The bitmap array construction algorithm may be predetermined by a related technician, for example, the number of bitmap arrays is determined according to the number of preset hash functions, and the number of array elements of the bitmap array is determined according to the number of data of the first data in the first data table. For example, if the number of the preset hash functions is one, and the number of the first data is 100, the number of the constructed bitmap arrays is 1, and the number of array elements in the bitmap array is 100. It should be noted that after the bitmap array is constructed based on the preset bitmap array construction algorithm, the bitmap array is initialized, that is, the array element value in the bitmap array is set to "0". The initialized bitmap array may be used as the original bitmap array.
The target bitmap array may be a bitmap array that has been hashed to data elements in the original bitmap array according to the first hash value. As is exemplary. If the original bitmap array is {0,0,0,0,0}, and the first hash values corresponding to the first data are "00", "01", "02", "03", and "04", respectively, then the array elements corresponding to the first hash values "00", "01", "02", "03", and "04" are all "1", and the target bitmap array is {1,1,1,1,1 }. If the first hash values corresponding to the first data are "00", "01", "02", "01", and "04", respectively, array elements corresponding to the first hash values "00", "01", "02", "01", and "04" are "1", "0", and "1", respectively, and the target bitmap array is {1,1,1,0,1 }; the reason for this kind of situation is mainly that the preset hash function has low accuracy, and different data are calculated by the same hash function to obtain the same calculation result. Further, the greater the amount of data and/or the less accurate the hash function, the greater the probability of such a situation occurring; the smaller the amount of data and/or the higher the accuracy of the hash function, the less likely such a situation will occur.
S103, second data of a preset primary key field in a second data table are obtained, and a second hash value corresponding to the second data is determined based on a preset hash function.
The second data table may be any data table to be aggregated with the first data table, and the primary key field may be the same field as a preset primary key field of the first data table. For example, if the preset primary key field of the first data table is the user name, the preset primary key field of the second data table is also the user name. The predetermined hash function is the same as the hash function used for calculating the first hash value of the first data, for example, if the hash function selected by the first hash value corresponding to the first data is MD5, the hash function selected by the second hash value corresponding to the second data is MD 5. The second data may be a primary key value corresponding to a preset primary key field in the second data table. The second hash value may be an encrypted value calculated by the hash function on the second data. For example, if the first data is "zhangsan", the hash value determined based on the preset hash function is "00", and if the second data has "zhangsan", the hash value determined based on the preset hash function is also "00".
And S104, determining whether the target array elements meeting the preset array searching condition exist in the target bitmap array or not according to the second hash value.
The array searching condition may be preset by a related technician, and the preset array searching condition may be that an array element position corresponding to the second hash value in the target bitmap array is determined, whether the array element value at the array element position is the target array element is determined, and if so, it is determined that the second hash value satisfies the preset array searching condition. The target array element may be a preset array element value in the target bitmap array, for example, the target array element may be "1". For example, if the target bitmap array is {1,1}, the second hash value is "00" and "01", and the target array element is "1", then the array elements of the target bitmap array corresponding to the second hash values "00" and "01" are both "1", and therefore, it may be determined that there are target array elements in the target bitmap array that satisfy the preset array search condition.
If the target bitmap array is {1,0}, the second hash value is "00" and "01", and the target array element is "1", the array element of the target bitmap array corresponding to the second hash value "00" is "1"; the array element of the target bitmap array corresponding to the second hash value "01" is "0", and therefore, it can be determined that the target bitmap array has the target array element satisfying the preset array search condition. If the target bitmap array is {0,0}, the second hash value is "00" and "01", and the target array element is "1", the array element of the target bitmap array corresponding to the second hash value "00" is "0"; the array element of the target bitmap array corresponding to the second hash value "01" is "0", and therefore, it can be determined that no target array element satisfying the preset array search condition exists in the target bitmap array.
In an optional embodiment, determining whether a target array element meeting a preset array search condition exists in the target bitmap array according to the second hash value includes: determining a target position of the second hash value in the target bitmap array according to the second hash value; determining whether the array elements at the target position meet a preset array searching condition; if so, determining that the array element at the target position is the target array element.
The target position may be an arrangement position in the target bitmap array determined according to the second hash value, for example, if the second hash value is "00", the target position in the target bitmap array is "00". The preset array search condition may be that the array element at the target position is "1", and if the array element at the target position is "1", it is determined that the array element at the target position satisfies the preset array search condition, and it may be determined that the array element at the target position is the target array element; if the array element at the target position is not "1", it is determined that the array element at the target position does not satisfy the preset array search condition, and it may be determined that the array element at the target position is not the target array element.
In the optional embodiment, the target position of the second hash value in the target bitmap array is determined, and whether the array element at the target position is the target array element is determined according to the mode that whether the array element at the target position meets the preset array searching condition, so that the target array element is determined through the target position, and the accuracy of determining the target array element is improved.
In an optional embodiment, determining whether the array element at the target position satisfies a preset array search condition includes: judging whether a target position exists in the target bitmap array or not; if so, acquiring the data elements at the target position, and judging whether the data elements at the target position meet the preset array searching condition.
For example, if the second hash value is "05", "06", and "07", and the target bitmap array is {1,1,1,1,1}, the target positions of the second hash value in the target bitmap array may be determined to be 05, 06, and 07, and the array element positions corresponding to the array elements in the target bitmap array are 00, 01, 02, 03, and 04, respectively, so that it may be determined that the target position corresponding to the second hash value does not exist in the target bitmap array. If the second hash value is "00", "01", and "02", and the target bitmap array is {1,1,1,1,1}, the target positions of the second hash value in the target bitmap array may be determined to be 00, 01, and 02, and the positions of the array elements corresponding to the array elements in the target bitmap array are 00, 01, 02, 03, and 04, respectively, so that it may be determined that the target position corresponding to the second hash value exists in the target bitmap array.
For example, if the second hash value is "00", "01", and "05", and the target bitmap array is {1,1,1,1,1}, the target positions of the second hash value in the target bitmap array may be determined to be 00, 01, and 05, and the array element positions corresponding to the array elements in the target bitmap array are 00, 01, 02, 03, and 04, respectively, so that it may be determined that there is a target position corresponding to a portion of the second hash value in the target bitmap array, which is "00" and "01", respectively, and a target position corresponding to the second hash value "05" does not exist in the target bitmap array.
Judging whether a target position exists in the target bitmap array, if so, acquiring array elements at the target position, and further judging whether the array elements at the target position meet preset array searching conditions; if the target position does not exist in the target bitmap array, it can be considered that the second data corresponding to the second hash value associated with the target position does not exist in the first data table, and cannot be subjected to data aggregation with any first data in the first data table, and the row data where the second data is located can be discarded. If the data element at the target position meets the preset array searching condition, determining that the array element at the target position is the target array element; if the data element at the target position does not satisfy the preset array search condition, it may be determined that the array element at the target position is not the target array element.
The optional embodiment judges whether the target position exists in the target bitmap array, if so, the data element at the target position is obtained, and whether the data element at the target position meets the preset array searching condition is judged, so that whether the data element at the target position meets the preset array searching condition is further determined, and the accuracy of determining the target array element at the target position is improved.
S150, if so, determining target second data of the second hash value, and performing data aggregation on the row data where the target second data is located in the second data table and the first data table.
The target second data may be data that can be aggregated with the first data table. If the target bitmap array contains target array elements meeting preset array searching conditions, determining target second data of a second hash value, and performing data aggregation on data of the data where the target second data in the second data table is located and the first data table; if the target bitmap array does not have target array elements meeting the preset array searching condition, the common primary key value corresponding to the preset primary key field does not exist between the second data table and the first data table, and therefore the second data table cannot perform data aggregation with the first data table.
Optionally, before performing data aggregation on the line data of the target second data in the second data table and the first data table, an aggregation function, for example, an AVG function, a COUNT function, or a MAX function, may be preset according to actual requirements, and an aggregation result storage area is set, and is used to store an aggregation result obtained by data aggregation.
In a specific embodiment, the second hash values are "00", "01", "02", "03", and "04", the target bitmap array is {1,1,0,1}, and the target array element is "1". The target position corresponding to the second hash value "00" is 00, the target position corresponding to the second hash value "01" is 01, the target position corresponding to the second hash value "02" is 02, the target position corresponding to the second hash value "03" is 03, and the target position corresponding to the second hash value "04" is 04; according to the number of the group elements in the target bitmap array, the target position 00, the target position 01, the target position 02 and the target position 03 exist in the target bitmap array, and the target position 04 does not exist, so that the target position 04 is associated with the second data corresponding to the second hash value, and data aggregation cannot be performed with the first data. The array element corresponding to the target position 00 in the target bitmap array is "1", the array element corresponding to the target position 01 in the target bitmap array is "1", the array element corresponding to the target position 02 in the target bitmap array is "0", and the array element corresponding to the target position 03 in the target bitmap array is "1", so that the second data corresponding to the second hash value associated with the target positions 00, 01, and 03 is the target second data, and data aggregation can be performed on the data of the row where the target second data is located and the first data table.
According to the embodiment of the application, first data of a preset main key field in a first data table are obtained, and a first hash value corresponding to the first data is determined based on a preset hash function; determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array; acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function; determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value; if so, determining target second data of the second hash value, and performing data aggregation on the line data where the target second data is located in the second data table and the first data table. The array elements of the bitmap array in the scheme only occupy 1 bit, the occupation of the memory space is small, and the data table to be aggregated does not need to be stored locally. The dependence on the MPP database and the distributed computing framework is reduced, and the requirements on software and hardware infrastructure are reduced. Approximate association between the data to be aggregated can be realized only by traversing the bitmap array, the data to be aggregated is obtained according to the association condition, the complexity of data aggregation is reduced, and the data aggregation efficiency is improved.
Example two
Fig. 2 is a schematic flow chart of a data aggregation method provided in the second embodiment of the present application, and the second embodiment of the present application performs optimization and improvement on the basis of the foregoing technical solutions.
Further, the preset hash functions are at least two; correspondingly, before the step of obtaining the first data of the preset primary key field in the first data table, the step of creating at least two original bitmap arrays according to the function type number of the preset hash function is added; determining the array element number of at least two original bitmap arrays according to the row number of the first data table; determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array, and refining to determine array element positions of preset main key fields of any row in the first data table in at least two original bitmap arrays according to the first hash value; determining array element values at array element positions according to a preset array element filling rule to obtain at least two target bitmap arrays; the step of determining a second hash value corresponding to the second data based on the preset hash function is refined into a step of determining a second hash value corresponding to the second data based on the preset hash function; the step of determining whether a target array element meeting the preset array searching condition exists in the target bitmap array according to the second hash value is refined into the step of determining any candidate second hash value from the second data to serve as the target second hash value; determining a target bitmap array corresponding to the target second hash value from the at least two target bitmap arrays; determining a target position of the target second hash value in the corresponding target bitmap array; determining whether the array elements at the target position corresponding to the target second hash value meet a preset array searching condition; if so, determining that the array element of any candidate second hash value of the second data at the target position is the target array element. "to perfect the aggregation of data in the case of at least two hash functions.
As shown in fig. 2, the method comprises the following specific steps:
s201, creating at least two original bitmap arrays according to the function type number of the preset hash function.
The number of the types of the Hash function may be preset by a related technician, and two different types of Hash functions may be set, for example, the two different types of Hash functions are MD5 and SHA (Secure Hash Algorithm ), respectively. The original bitmap array may be a bitmap array obtained by initializing array elements, for example, the array elements obtained by initializing the original bitmap array are 0.
It should be noted that the number of the created arrays of the original bitmap arrays is related to the number of the preset hash functions, specifically, the number of the original bitmap arrays may be equal to the number of the function types of the preset hash functions; for example, the number of the preset hash functions is two, and correspondingly, the number of the original bitmap arrays is also two.
S202, determining the number of array elements of at least two original bitmap arrays according to the number of recording lines of the first data table.
The number of recording lines indicates the number of lines in which the first data of the first data table is present. The number of array elements of the original bitmap array may be the same as the number of record rows of the first data table. For example, if the number of rows of the first data table is 100, the number of array elements of the original bitmap array is 100.
For example, if the number of the preset function types of the hash function is 2 and the number of the record lines of the first data table is 100, the number of the original bitmap arrays is 2, and the number of array elements in the 2 original bitmap arrays is 100 respectively.
Optionally, only one original bitmap array may be constructed, and the number of array elements of the original bitmap array is twice the number of record lines of the first data table. For example, if the number of recording lines of the first data table is 100 lines, the number of data elements of the constructed original bitmap data is 200.
S203, first data of a preset main key field in the first data table are obtained, and a first hash value corresponding to the first data is determined based on a preset hash function.
The method includes the steps of obtaining first data of a preset main key field in a first data table, and determining a first hash value corresponding to the first data according to at least two preset hash functions, wherein any data in the first data respectively corresponds to at least two candidate first hash values, the candidate first hash values can be hash values obtained through calculation of the at least two hash functions, and the number of the candidate first hash values corresponding to any data in the first data is the same as the number of the preset hash functions.
Illustratively, if the number of the preset hash functions is two, the preset hash functions are respectively a first hash function and a second hash function; the first data are "Zhang three", "Li four" and "Wang Wu"; and performing hash calculation on the first data by respectively adopting a first hash function and a second hash function to obtain a first hash value, wherein any data in the first data respectively corresponds to two candidate first hash values. The first hash value corresponding to the first data "zhang san" may be "00, 00", where the former "00" may represent a candidate first hash value calculated by a first hash function, and the latter "00" may represent a candidate first hash value calculated by a second hash function; accordingly, the candidate first hash value corresponding to the first data "lie four" may be "01, 01", and the candidate first hash value corresponding to the first data "wang five" may be "01, 02". It should be noted that, the hash results obtained by computing the same data by using different hash functions may be the same or different.
S204, determining array element positions of the first data in the first data table in the at least two original bitmap arrays according to the first hash value.
Illustratively, if the number of the preset hash functions is two, the preset hash functions are respectively a first hash function and a second hash function, the first hash value includes two candidate first hash values, which are obtained by calculating the first data through the first hash function and calculating the first data through the second hash function. And if the recording line number of the first data table is 10 lines, the number of the corresponding array of the original bitmap array is 2, and the number of the array elements is 10. If the first hash value corresponding to the first data "zhang san" is "00, 00", the former "00" is the candidate first hash value calculated by the first hash function, and the latter "00" represents the candidate first hash value calculated by the second hash function. Accordingly, the array element positions of the first data "three" in the two original bitmap arrays are 00 and 00, respectively, that is, the array element arranged at the first position in the two original bitmap arrays is located.
S205, determining array element values at array element positions according to preset array element filling rules to obtain at least two target bitmap arrays.
Wherein, the filling rule of the array elements can be preset by the related technical personnel. For example, the array element filling rule may be to set the array element value of the first data at the array element position in the original bitmap array to "1". The target bitmap array may be a bitmap array having corresponding array element values filled in according to the array element filling rules.
Illustratively, if the first hash value corresponding to the first data is "00, 00", "01, 01" and "02, 02", respectively, the corresponding two target bitmap array ratios are {1,1,1} and {1,1,1 }. If the first hash values corresponding to the first data are "00, 00", "01, 01" and "02, 01", respectively, the corresponding two target bitmap array ratios are {1,1,1} and {1,1,0 }.
S206, second data of a preset primary key field in a second data table are obtained, and at least two candidate second hash values corresponding to the second data are determined according to at least two preset hash functions.
The second data corresponds to at least two candidate second hash values respectively, the candidate second hash values may be hash values calculated by at least two hash functions respectively, and the number of the candidate second hash values corresponding to any data in the second data is the same as the number of the preset hash functions.
For example, if two hash functions are used for calculating the first data, the two hash functions are respectively a first hash function and a second hash function, and the second data is "xx", "li xx", and "xx", the second data is hashed by using the first hash function and the second hash function respectively, so as to obtain a second hash value, and any one of the second data corresponds to two candidate first hash values respectively. The first hash value corresponding to the second data "zhxx" may be "04, 04", where the former "04" may represent a candidate second hash value calculated by the first hash function, and the latter "04" represents a candidate second hash value calculated by the second hash function; accordingly, the candidate first hash value corresponding to the second data "lie xx" may be "05, 05", and the candidate first hash value corresponding to the first data "wang five" may be "06, 06".
And S207, determining any candidate second hash value from the second data to serve as a target second hash value.
Wherein the target hash value may be any one of at least two candidate second hash values. For example, if the second hash value corresponding to the second data is "02, 03", the candidate second hash values corresponding to the second data may be 02 and 03, and 02 or 03 may be used as the target second hash value of the second data.
And S208, determining a target bitmap array corresponding to the target second hash value from the at least two target bitmap arrays.
For example, if the first data is calculated based on the first hash function to obtain a candidate first hash value, and the rule is filled in according to the candidate first hash value and the preset array element to obtain the first bitmap array, the target bitmap array is the first bitmap array if the second hash value calculated based on the first hash function by the second data is taken as the target second hash value.
S209, determining the target position of the target second hash value in the corresponding target bitmap array.
If there are three second data, "00, 00" is two candidate second hash values of the first second data, "01, 01" is two candidate second hash values of the second data, "02, 01" is two candidate second hash values of the third second data, the target second hash value may be "00, 01, 02" obtained by the first hash function, or "00, 01, 01" obtained by the second hash function. If the target second hash function is "00, 01, 02", the target positions in the corresponding target bitmap array are 00, 01, and 02, that is, the first array element position, the second array element position, and the third array element position in the corresponding target bitmap array.
S210, determining whether the array elements at the target position corresponding to the target second hash value meet a preset array searching condition.
The array search condition may be preset by a person skilled in the art. For example, the array lookup condition may be that the array elements of the target second hash value at the same target position are all "1". If the array elements of the target second hash value at the same target position are all "1", it is determined that the array elements of the target second hash value at the target position satisfy the preset array search condition, and it may be determined that the array elements at the target position are the target array elements.
For example, if the candidate second hash value of the second data is "00, 00", the target second hash value may be the first "00" or the second "00". Determining the target bitmap array corresponding to the two target second hash values "00", and if the array element of the first "00" at the 00 target position of the corresponding target bitmap array is "1" and the array element of the second "00" at the 00 target position of the corresponding target bitmap array is also "1", it may be considered that the target second hash value "00, 00" of the second data satisfies the array search condition. If the candidate second hash value of the second data is "02, 01", the array element of "02" in the target second hash value at the 02 target position of the corresponding target bitmap array is "1", and the array element of "01" in the target second hash value at the 01 target position of the corresponding target bitmap array is "1", it may be considered that the target second hash value "02, 01" of the second data satisfies the array search condition.
S211, if yes, determining that the array element of any candidate second hash value of the second data at the target position is the target array element.
For example, if the array element at the target position corresponding to the target second hash value satisfies the preset array search condition, it is determined that the array element at the target position of any candidate second hash value of the second data is the target array element. If the array element at the target position corresponding to the target second hash value does not satisfy the preset array search condition, determining that the array element at the target position of any candidate second hash value of the second data is not the target array element, and discarding the second data corresponding to the second hash value not associated with the target array element.
S212, if the target array elements meeting the preset array searching condition exist, determining target second data of the second hash value, and performing data aggregation on the data of the row where the target second data is located in the second data table and the first data table.
In a specific example, two hash functions are preset, which are a first hash function and a second hash function respectively, and two original bitmap arrays are created, which are a first bitmap array and a second bitmap array respectively. Wherein, the candidate first hash value obtained by calculating the first data by the first hash function corresponds to the first bitmap array; and calculating the candidate first hash value obtained by the first data through the second hash function to correspond to the second bitmap array. The number of the obtained recording lines of the first data table or the number of the obtained data of the first data is 5, and correspondingly, the number of the array elements of the two original bitmap arrays is 5. If the candidate first hash value obtained by calculating the first data based on the first hash function is "00, 01, 02, 03, 04", the candidate first hash value obtained by calculating the first data based on the second hash function is "00, 01, 02, 01, 04", and the array element value at the data element position corresponding to the first hash value is "1", the first bitmap array is {1,1,1,1,1}, and the second bitmap array is {1,1,1,0,1 }. And acquiring second data in the second data table, wherein if the candidate second hash value calculated based on the first hash function is '00, 01, 02, 03, 04, 05', the candidate second hash value calculated based on the second hash function is '00, 01, 02, 01, 04, 05'. According to the candidate second hash value "00, 01, 02, 03, 04, 05" and the first bitmap array {1,1,1,1,1} calculated based on the first hash function, and the candidate second hash value "00, 01, 02, 01, 04, 05" and the second bitmap array {1,1,1,0,1} calculated based on the second hash function, it can be determined that the values at the positions of the first array elements in the first bitmap array and the second bitmap array respectively corresponding to the second hash value "00, 00" are both "1". Similarly, the values at the positions of the second array elements in the first bitmap array and the second bitmap array respectively corresponding to the '01' and the '01' are both '1'; the values at the positions of the third array elements in the first bitmap array and the second bitmap array respectively corresponding to the '02' and the '02' are both '1'; the values at the positions of the fourth array elements in the corresponding first bitmap array and the second array elements in the corresponding second bitmap array are respectively '1'; the values at the positions of the fifth array elements in the first bitmap array and the second bitmap array corresponding to the positions of 04 and 04 are both 1. And the second hash value "05, 05" does not have a corresponding data element position in both the first bitmap array and the second bitmap array. Therefore, the second data corresponding to the second hash values "00, 00", "01, 01", "02, 02", "03, 01", and "04, 04" may be data aggregated with the line data of the first data table.
In the scheme of the embodiment, at least two original bitmap arrays are created according to the function type number of the preset hash function, and the array element number of the at least two original bitmap arrays is determined according to the record line number of the first data table, so that the original bitmap arrays are created when the at least two hash functions are preset. Determining array element positions of first data in the first data table in at least two original bitmap arrays according to the first hash value; determining array element values at array element positions according to preset array element filling rules to obtain at least two target bitmap arrays, and determining at least two candidate second hash values corresponding to second data according to at least two preset hash functions; determining a target bitmap array and a target position of a target second hash value in the corresponding target bitmap array; and determining whether the array elements at the target position corresponding to the target second hash value meet a preset array searching condition. According to the scheme, whether the array elements at the target position are the target array elements or not is accurately determined by presetting at least two hash functions, so that the accuracy of data aggregation between the first data table and the second data table is improved. The scheme of the embodiment reduces occupation of a large amount of memory space, and the data table to be aggregated does not need to be stored locally. The dependence on the MPP database and the distributed computing framework is reduced, and the requirements on software and hardware infrastructure are reduced. Only by traversing the bitmap array, the approximate association between the data to be aggregated is realized, and the data to be aggregated is obtained according to the association condition, so that the complexity of data aggregation is reduced, and the data aggregation efficiency is improved.
EXAMPLE III
Fig. 3 is a schematic flow chart of a data aggregation method provided in the third embodiment of the present application, and the third embodiment of the present application provides a preferred implementation manner based on the technical solutions of the foregoing embodiments.
S301, setting an aggregation function, an aggregation parameter storage area and an aggregation result storage area according to scene requirements;
s302, acquiring the number m of recording lines in a first data table, presetting a primary key field, and acquiring first data corresponding to the primary key field;
s303, constructing original bitmap arrays with the size of two times, wherein the number of the bitmap arrays is 2m, the first m array elements correspond to the hash value of one hash function, and the second m array elements correspond to the hash value of the other hash function;
s304, initializing the constructed original bitmap data, and setting the value of an array element corresponding to each array element position to 0;
s305, creating HASH functions HASH1 and HASH2, wherein the HASH functions are output to be numerical types;
s306, traversing the first data of the first data table and judging whether the first data is traversed completely; if yes, executing S309, otherwise executing S307;
s307, inputting the first data into HASH functions HASH1 and HASH2 respectively, and acquiring first HASH values H1 and H2 corresponding to the first data respectively;
s308, determining array positions corresponding to H1 and H2 in the bitmap array, setting the array element values at the array positions corresponding to H1 and H2 to 1, and continuing to execute S306;
s309, acquiring a second data table, and acquiring second data corresponding to the primary key field according to the preset primary key field;
s310, traversing second data of the second data table and judging whether the second data is completely traversed; if yes, executing S315, otherwise executing S311;
s311, inputting the second data into HASH functions HASH1 and HASH2 respectively, and acquiring second HASH values P1 and P2 corresponding to the first data respectively;
s312, determining array positions of the second hash values P1 and P2 in the bitmap array, and acquiring array element values Bit1 and Bit2 corresponding to the array positions;
s313, judging whether both Bit1 and Bit2 are 1, if so, executing S314; if not, discarding the second data, and executing S310;
s314, taking the row data of the second data and the first data in the first data table corresponding to the second data, storing the row data in the second data and the first data in the first data table in an aggregation parameter storage area, and continuing to execute S310;
and S315, aggregating the aggregation result output value into an aggregation result storage area.
Example four
Fig. 4 is a schematic structural diagram of a data aggregation apparatus according to a fourth embodiment of the present application. The data aggregation device provided by the embodiment of the application can be suitable for the condition of aggregating a plurality of data tables or data files, and can be realized in a software and/or hardware mode. As shown in fig. 4, the apparatus specifically includes: a first hash value determination module 401, a target bitmap array determination module 402, a second hash value determination module 403, a target array element determination module 404, and a data aggregation module 405. Wherein,
a first hash value determining module 401, configured to obtain first data of a preset primary key field in a first data table, and determine, based on a preset hash function, a first hash value corresponding to the first data;
a target bitmap array determining module 402, configured to determine an array element corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm, so as to obtain a target bitmap array;
a second hash value determining module 403, configured to obtain second data in a preset primary key field in a second data table, and determine, based on a preset hash function, a second hash value corresponding to the second data;
a target array element determining module 404, configured to determine, according to the second hash value, whether a target array element that meets a preset array search condition exists in the target bitmap array;
and a data aggregation module 405, configured to determine, if a target array element meeting a preset array search condition exists in the target bitmap array, target second data of the second hash value, and perform data aggregation on data of the row where the target second data exists in the second data table and the first data table.
According to the embodiment of the application, first data of a preset main key field in a first data table are obtained, and a first hash value corresponding to the first data is determined based on a preset hash function; determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array; acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function; determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value; if so, determining target second data of the second hash value, and performing data aggregation on the line data where the target second data is located in the second data table and the first data table. The array elements of the bitmap array in the scheme only occupy 1 bit, the occupation of the memory space is small, and the data table to be aggregated does not need to be stored locally. The dependence on the MPP database and the distributed computing framework is reduced, and the requirements on software and hardware infrastructure are reduced. Approximate association between the data to be aggregated can be realized only by traversing the bitmap array, the data to be aggregated is obtained according to the association condition, the complexity of data aggregation is reduced, and the data aggregation efficiency is improved.
Optionally, the preset hash functions are at least two;
correspondingly, the device further comprises:
the original bitmap array creating module is used for creating at least two original bitmap arrays according to the function type number of a preset hash function before first data of a preset primary key field in a first data table is acquired;
and the array element number determining module is used for determining the array element numbers of the at least two original bitmap arrays according to the record line number of the first data table.
Optionally, the target bitmap array determining module 402 includes:
an array element position determining unit, configured to determine, according to the first hash value, array element positions of first data in the first data table in at least two original bitmap arrays;
the first target bitmap array determining unit is used for determining array element values at array element positions according to preset array element filling rules to obtain at least two target bitmap arrays.
Optionally, the second hash value determining module 403 includes:
and the second hash value determining unit is used for determining at least two candidate second hash values corresponding to the second data according to at least two preset hash functions.
Optionally, the target array element determining module 404 includes:
a target second hash value determination unit configured to determine any candidate second hash value from the second data as a target second hash value;
a second target bitmap array determining unit, configured to determine, from at least two target bitmap arrays, a target bitmap array corresponding to the target second hash value;
a first target position determination unit, configured to determine a target position of the target second hash value in the corresponding target bitmap array;
a first array searching condition determining unit, configured to determine whether an array element at the target position corresponding to the target second hash value meets a preset array searching condition;
and the first target array element determining unit is used for determining that the array element of any candidate second hash value of the second data at the target position is the target array element if the array element at the target position corresponding to the target second hash value meets a preset array searching condition.
Optionally, the target array element determining module 404 includes:
a second target position determining unit, configured to determine, according to the second hash value, a target position of the second hash value in the target bitmap array;
a second array search condition determining unit, configured to determine whether an array element at the target position satisfies a preset array search condition;
and the second target array element determining unit is used for determining that the array element at the target position is the target array element if the array element at the target position meets a preset array searching condition.
Optionally, the second array searching condition determining unit includes:
a target position judging subunit, configured to judge whether the target position exists in the target bitmap array;
and the array searching condition judging subunit is used for acquiring the data element at the target position if the target position exists in the target bitmap array, and judging whether the data element at the target position meets a preset array searching condition.
The data aggregation device can execute the data aggregation method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of executing each data aggregation method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. Fig. 5 illustrates a block diagram of an exemplary electronic device 500 suitable for use in implementing embodiments of the present application. The electronic device 500 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: one or more processors or processing units 501, a system memory 502, and a bus 503 that couples the various system components (including the system memory 502 and the processing unit 501).
Bus 503 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 500 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 500 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 502 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)504 and/or cache memory 505. The electronic device 500 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 506 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 503 by one or more data media interfaces. Memory 502 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.
A program/utility 508 having a set (at least one) of program modules 507 may be stored, for instance, in memory 502, such program modules 507 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 507 generally perform the functions and/or methodologies of embodiments described herein.
The electronic device 500 may also communicate with one or more external devices 509 (e.g., keyboard, pointing device, display 510, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 511. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 512. As shown, the network adapter 512 communicates with the other modules of the electronic device 500 over the bus 503. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 501 executes various functional applications and data processing by executing programs stored in the system memory 502, for example, to implement a data aggregation method provided by the embodiment of the present application.
EXAMPLE six
The sixth embodiment of the present application further provides a storage medium containing computer-executable instructions, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for aggregating data provided in the sixth embodiment of the present application is implemented, where the method includes: acquiring first data of a preset main key field in a first data table, and determining a first hash value corresponding to the first data based on a preset hash function; according to the first hash value and a preset bitmap array construction algorithm, determining an array element corresponding to the first hash value to obtain a target bitmap array; acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function; determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value; and if so, determining the target second data of the second hash value, and performing data aggregation on the data of the row where the target second data is located in the second data table and the first data table.
The computer storage media of the embodiments of the present application may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for data aggregation, comprising:
acquiring first data of a preset main key field in a first data table, and determining a first hash value corresponding to the first data based on a preset hash function;
according to the first hash value and a preset bitmap array construction algorithm, determining an array element corresponding to the first hash value to obtain a target bitmap array;
acquiring second data of a preset main key field in a second data table, and determining a second hash value corresponding to the second data based on a preset hash function;
determining whether a target array element meeting a preset array searching condition exists in the target bitmap array or not according to the second hash value;
and if so, determining the target second data of the second hash value, and performing data aggregation on the data of the row where the target second data is located in the second data table and the first data table.
2. The method according to claim 1, wherein the preset hash functions are at least two;
correspondingly, before acquiring the first data of the preset primary key field in the first data table, the method further comprises:
creating at least two original bitmap arrays according to the function type number of a preset hash function;
and determining the number of array elements of the at least two original bitmap arrays according to the number of the recording lines of the first data table.
3. The method according to claim 2, wherein determining an array element corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array comprises:
determining array element positions of first data in the first data table in at least two original bitmap arrays according to the first hash value;
and determining array element values at the positions of the array elements according to a preset array element filling rule to obtain at least two target bitmap arrays.
4. The method of claim 3, wherein determining the second hash value corresponding to the second data based on a preset hash function comprises:
and determining at least two candidate second hash values corresponding to the second data according to at least two preset hash functions.
5. The method of claim 4, wherein determining whether a target array element meeting a preset array lookup condition exists in the target bitmap array according to the second hash value comprises:
determining any candidate second hash value from the second data to serve as a target second hash value;
determining a target bitmap array corresponding to the target second hash value from at least two target bitmap arrays;
determining a target position of the target second hash value in the corresponding target bitmap array;
determining whether the array elements at the target position corresponding to the target second hash value meet a preset array searching condition;
if so, determining that the array element of any candidate second hash value of the second data at the target position is the target array element.
6. The method of claim 1, wherein determining whether a target array element meeting a preset array lookup condition exists in the target bitmap array according to the second hash value comprises:
determining a target position of the second hash value in the target bitmap array according to the second hash value;
determining whether the array elements at the target position meet a preset array searching condition;
if yes, determining that the array element at the target position is a target array element.
7. The method of claim 6, wherein determining whether the array element at the target location satisfies a predetermined array lookup condition comprises:
judging whether the target position exists in the target bitmap array or not;
if so, acquiring the data element at the target position, and judging whether the data element at the target position meets a preset array searching condition.
8. A data aggregation apparatus, comprising:
the first hash value determining module is used for acquiring first data of a preset primary key field in a first data table and determining a first hash value corresponding to the first data based on a preset hash function;
the target bitmap array determining module is used for determining array elements corresponding to the first hash value according to the first hash value and a preset bitmap array construction algorithm to obtain a target bitmap array;
the second hash value determining module is used for acquiring second data of a preset main key field in a second data table and determining a second hash value corresponding to the second data based on a preset hash function;
the target array element determining module is used for determining whether a target array element meeting a preset array searching condition exists in the target bitmap array according to the second hash value;
and the data aggregation module is used for determining the target second data of the second hash value and performing data aggregation on the data of the row where the target second data in the second data table is located and the first data table if the target array elements meeting the preset array searching condition exist in the target bitmap array.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data aggregation method as claimed in any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data aggregation method according to any one of claims 1 to 7.
CN202111551970.7A 2021-12-17 2021-12-17 Data aggregation method, device, equipment and storage medium Pending CN114253992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111551970.7A CN114253992A (en) 2021-12-17 2021-12-17 Data aggregation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111551970.7A CN114253992A (en) 2021-12-17 2021-12-17 Data aggregation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114253992A true CN114253992A (en) 2022-03-29

Family

ID=80792784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111551970.7A Pending CN114253992A (en) 2021-12-17 2021-12-17 Data aggregation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114253992A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001661A (en) * 2022-06-02 2022-09-02 中国银行股份有限公司 Data encryption method and device, computer equipment and storage medium
CN116860798A (en) * 2023-06-20 2023-10-10 超聚变数字技术有限公司 Data query method, electronic device and computer-readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506670A (en) * 2019-01-31 2020-08-07 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN112330398A (en) * 2020-10-30 2021-02-05 京东数字科技控股股份有限公司 Object processing method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506670A (en) * 2019-01-31 2020-08-07 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN112330398A (en) * 2020-10-30 2021-02-05 京东数字科技控股股份有限公司 Object processing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001661A (en) * 2022-06-02 2022-09-02 中国银行股份有限公司 Data encryption method and device, computer equipment and storage medium
CN116860798A (en) * 2023-06-20 2023-10-10 超聚变数字技术有限公司 Data query method, electronic device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
US12056583B2 (en) Target variable distribution-based acceptance of machine learning test data sets
CN111090628B (en) Data processing method and device, storage medium and electronic equipment
US10592532B2 (en) Database sharding
CN111258966A (en) Data deduplication method, device, equipment and storage medium
CN110502519B (en) Data aggregation method, device, equipment and storage medium
US20150293958A1 (en) Scalable data structures
CN107704202B (en) Method and device for quickly reading and writing data
US10838963B2 (en) Optimized access for hierarchical low cardinality value synopsis in analytical databases
CN107360224A (en) Sequence number generation method, system, equipment and storage medium in distributed system
CN112948396A (en) Data storage method and device, electronic equipment and storage medium
US10296497B2 (en) Storing a key value to a deleted row based on key range density
CN111475105A (en) Monitoring data storage method, device, server and storage medium
CN112632052B (en) Heterogeneous data sharing method and intelligent sharing system
CN114253992A (en) Data aggregation method, device, equipment and storage medium
CN113761185A (en) Main key extraction method, equipment and storage medium
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN108140022B (en) Data query method and database system
CN104077082A (en) Network voting data storage method and device
CN112667721A (en) Data analysis method, device, equipment and storage medium
CN105447032A (en) Method and system for processing message and subscription information
CN111049988A (en) Intimacy prediction method, system, equipment and storage medium for mobile equipment
CN112965943A (en) Data processing method and device, electronic equipment and storage medium
CN118410036A (en) Security audit management method, device, medium and product based on cluster
CN113742332A (en) Data storage method, device, equipment and storage medium
CN114547086B (en) Data processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination