CN106294353A - Information processing method and device - Google Patents
Information processing method and device Download PDFInfo
- Publication number
- CN106294353A CN106294353A CN201510245690.1A CN201510245690A CN106294353A CN 106294353 A CN106294353 A CN 106294353A CN 201510245690 A CN201510245690 A CN 201510245690A CN 106294353 A CN106294353 A CN 106294353A
- Authority
- CN
- China
- Prior art keywords
- imei
- content
- query
- search index
- serial number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
本发明公开了一种信息处理方法及装置,所述方法包括:存储配置信息,形成配置矩阵;其中,所述配置矩阵中各元素的序列号与查询索引具有映射关系;所述元素的存储内容为基于所述查询索引进行查询的查询内容;从输入数据中获取所述查询索引;依据所述查询索引确定待查询元素的序列号;依据所述序列号查询所述查询内容。
The present invention discloses an information processing method and device. The method includes: storing configuration information to form a configuration matrix; wherein, the serial number of each element in the configuration matrix has a mapping relationship with a query index; the storage content of the element It is the query content of the query based on the query index; the query index is obtained from the input data; the sequence number of the element to be queried is determined according to the query index; and the query content is queried according to the sequence number.
Description
技术领域technical field
本发明涉及信息处理领域,尤其涉及一种信息处理方法及装置。The present invention relates to the field of information processing, in particular to an information processing method and device.
背景技术Background technique
在现有的信息处理过程中,通常会存在着实现存储了很多信息内容,如采用矩阵进行信息内容的存储,后续通过查询索引与存储的信息内容的匹配,获得最终要查询的内容。In the existing information processing process, there is usually a lot of information content stored, such as using a matrix to store information content, and then matching the query index with the stored information content to obtain the final query content.
这种信息查询匹配方式,在信息内容很多时,需要进行逐一匹配等遍历的方式来进行信息查询,显然会导致信息查询速度慢、查询装备消耗的资源多等问题。This information query and matching method, when there are a lot of information content, needs to be matched one by one and other traversal methods to query information, which will obviously lead to problems such as slow information query speed and more resources consumed by query equipment.
发明内容Contents of the invention
有鉴于此,本发明实施例期望提供一种信息处理方法及装置,能够加快数据查询效率减少信息查询消耗的资源。In view of this, the embodiments of the present invention expect to provide an information processing method and device, which can speed up the efficiency of data query and reduce the resources consumed by information query.
为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, technical solution of the present invention is achieved in that way:
本发明实施例第一方面提供一种信息处理方法,所述方法包括:The first aspect of the embodiments of the present invention provides an information processing method, the method including:
存储配置信息,形成配置矩阵;其中,所述配置矩阵中各元素的序列号与查询索引具有映射关系;所述元素的存储内容为基于所述查询索引进行查询的查询内容;Storing configuration information to form a configuration matrix; wherein, the sequence number of each element in the configuration matrix has a mapping relationship with the query index; the storage content of the element is the query content of the query based on the query index;
从输入数据中获取所述查询索引;obtaining said query index from input data;
依据所述查询索引确定待查询元素的序列号;Determine the serial number of the element to be queried according to the query index;
依据所述序列号查询所述查询内容。The query content is queried according to the serial number.
优选地,所述配置矩阵包括零元素和非零元素;Preferably, the configuration matrix includes zero elements and non-zero elements;
所述非零元素为:依据所述配置信息中已确定查询索引和查询内容对应关系的信息记录形成的;所述查询内容为所述非零元素的内容;The non-zero element is: formed according to the information record in the configuration information that has determined the corresponding relationship between the query index and the query content; the query content is the content of the non-zero element;
所述零元素为依据所述元素的序列号与所述配置信息中查询索引没有映射关系的元素;所述非零元素的存储内容为空。The zero element is an element that has no mapping relationship between the sequence number of the element and the query index in the configuration information; the storage content of the non-zero element is empty.
优选地,所述查询索引为终端的移动设备国际身份码IMEI;Preferably, the query index is the terminal's mobile equipment international identity code IMEI;
所述元素的序列号为所述IMEI的后N位;The serial number of the element is the last N digits of the IMEI;
所述N为不大于15的正整数;The N is a positive integer not greater than 15;
所述查询内容包括对应于所述IMEI的通信标识。The query content includes a communication identification corresponding to the IMEI.
优选地,所述接收输入数据,获取所述查询索引,包括:Preferably, the receiving input data and obtaining the query index include:
确定所述输入数据中是否确实所述通信标识;determining whether the communication identification is confirmed in the input data;
若缺失所述通信标识,则从所述输入数据中提取所述IMEI;If the communication identification is missing, extracting the IMEI from the input data;
所述依据所述查询索引确定待查询元素的序列号,包括:The determining the serial number of the element to be queried according to the query index includes:
提取所述IMEI的后N位;Extract the last N digits of the IMEI;
所述依据所述序列号查询所述查询内容,包括:The querying of the query content based on the serial number includes:
依据所述IMEI的后N到对应的存储位置获取所述通信标识。The communication identifier is obtained from the corresponding storage location according to the last N of the IMEI.
优选地,所述方法还包括:Preferably, the method also includes:
将与所述IMEI对应的通信标识补充到所述输入数据中,形成输出数据。Supplementing the communication identification corresponding to the IMEI to the input data to form output data.
本发明实施例第二方面提供一种信息处理装置,所述装置包括:The second aspect of the embodiment of the present invention provides an information processing device, the device comprising:
存储单元,用于存储配置信息,形成配置矩阵;其中,配置矩阵中各元素的序列号与查询索引具有映射关系;所述元素的存储内容为基于所述查询索引进行查询的查询内容;The storage unit is used to store configuration information and form a configuration matrix; wherein, the serial number of each element in the configuration matrix has a mapping relationship with the query index; the storage content of the element is the query content of the query based on the query index;
获取单元,用于从输入数据获取所述查询索引;an acquisition unit, configured to acquire the query index from input data;
确定单元,用于依据所述查询索引确定待查询元素的序列号;a determining unit, configured to determine the serial number of the element to be queried according to the query index;
查询单元,用于依据所述序列号获取所述查询内容。A query unit, configured to obtain the query content according to the serial number.
优选地,所述配置矩阵包括零元素和非零元素;Preferably, the configuration matrix includes zero elements and non-zero elements;
所述非零元素为:依据所述配置信息中已确定查询索引和查询内容对应关系的信息记录形成的;所述查询内容为所述非零元素的内容;The non-zero element is: formed according to the information record in the configuration information that has determined the corresponding relationship between the query index and the query content; the query content is the content of the non-zero element;
所述零元素为依据所述元素的序列号与所述配置信息中查询索引没有映射关系的元素;所述非零元素的存储内容为空。The zero element is an element that has no mapping relationship between the sequence number of the element and the query index in the configuration information; the storage content of the non-zero element is empty.
优选地,所述查询索引为终端的移动设备国际身份码IMEI;Preferably, the query index is the terminal's mobile equipment international identity code IMEI;
所述元素的序列号为所述IMEI的后N位;The serial number of the element is the last N digits of the IMEI;
所述N为不大于15的正整数;The N is a positive integer not greater than 15;
所述查询内容包括对应于所述IMEI的通信标识。The query content includes a communication identification corresponding to the IMEI.
优选地,所述获取单元,具体用于确定所述输入数据中是否确实所述通信标识;及若缺失所述通信标识,则从所述输入数据中提取所述IMEI;Preferably, the obtaining unit is specifically configured to determine whether the communication identification is indeed contained in the input data; and if the communication identification is missing, extract the IMEI from the input data;
所述确定单元,具体用于提取所述IMEI的后N位;The determining unit is specifically configured to extract the last N digits of the IMEI;
所述查询单元,具体用于依据所述IMEI的后N到对应的存储位置获取所述通信标识。The query unit is specifically configured to obtain the communication identification from the corresponding storage location according to the last N of the IMEI.
优选地,所述装置还包括:Preferably, the device also includes:
形成单元,用于将与所述IMEI对应的通信标识补充到所述输入数据中,形成输出数据。A forming unit, configured to add the communication identification corresponding to the IMEI to the input data to form output data.
本发明实施例信息处理方法及装,将配置信息形成的配置矩阵与查询索引具有映射关系,在后续依据查询索引进行信息查询时,依据所述映射关系将查询索引转换成元素的序列号,就可以确定出需要查询的元素,从而提取该元素中存储的内容即为要查询的查询内容,显然这样就避免了如现有技术中一样,将查询索引逐一的与存储内容中的查询索引进行匹配,从而提高了查询效率,减少因查询消耗的软硬件资源。In the information processing method and device of the embodiment of the present invention, the configuration matrix formed by the configuration information has a mapping relationship with the query index, and when performing information query based on the query index, the query index is converted into the serial number of the element according to the mapping relationship, and the The element to be queried can be determined, and the content stored in the element is extracted as the query content to be queried. Obviously, this avoids matching the query index with the query index in the stored content one by one as in the prior art , thereby improving the query efficiency and reducing the hardware and software resources consumed by the query.
附图说明Description of drawings
图1为本发明实施例所述的信息处理方法的流程示意图之一;FIG. 1 is one of the schematic flow diagrams of the information processing method described in the embodiment of the present invention;
图2为本发明实施例所述的信息处理方法的流程示意图之二;Fig. 2 is the second schematic flow diagram of the information processing method described in the embodiment of the present invention;
图3为本发明实施例所述的信息处理方法的流程示意图之三;Fig. 3 is the third schematic flow diagram of the information processing method described in the embodiment of the present invention;
图4为本发明实施例所述的信息处理装置的结构示意图。FIG. 4 is a schematic structural diagram of an information processing device according to an embodiment of the present invention.
具体实施方式detailed description
以下结合说明书附图及具体实施例对本发明的技术方案做进一步的详细阐述。The technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
方法实施例:Method example:
如图1所示,本实施例提供了一种信息处理方法,所述方法包括:As shown in Figure 1, this embodiment provides an information processing method, the method comprising:
步骤S110:存储配置信息,形成配置矩阵;其中,配置矩阵中各元素的序列号与查询索引具有映射关系;所述元素的存储内容为基于所述查询索引进行查询的查询内容;Step S110: storing configuration information to form a configuration matrix; wherein, the sequence number of each element in the configuration matrix has a mapping relationship with the query index; the stored content of the element is the query content for query based on the query index;
步骤S120:从输入数据中获取所述查询索引;Step S120: Obtain the query index from the input data;
步骤S130:依据所述查询索引确定待查询元素的序列号;Step S130: Determine the serial number of the element to be queried according to the query index;
步骤S140:依据所述序列号查询所述查询内容。Step S140: Query the query content according to the serial number.
所述序列号可指的是各元素在矩阵中的排序号,如1个10*10的矩阵,包括100个元素,在对配置矩阵中各元素进行排序时,可以安排计数,则第1排第1个元素的序列号为1,第1排第2个元素的序列号为2;也可以按列计数,则第1排第1个元素的序列号为1,第2排第1个元素的序列号为2。The sequence number can refer to the order number of each element in the matrix, such as a 10*10 matrix, including 100 elements, when sorting the elements in the configuration matrix, the count can be arranged, then the first row The serial number of the first element is 1, the serial number of the second element in the first row is 2; it can also be counted by column, then the serial number of the first element in the first row is 1, and the first element in the second row The serial number is 2.
由于查询索引与元素的序列号具有映射关系,这样根据所述查询索引就能确定出存储了所述查询内容的元素的序列号,从而通过矩阵元素查询的方式,快熟查询到所述查询内容,从而避免了如现有技术中的将所述查询索引与存储信息通过匹配的方式来获取到查询内容,显然大大的提高了信息查询效率,减少了信息查询的系统开销,且在本申请中,存储的查询内容本身并不包括所述查询索引,同时也减少了数据存储量。在具体实现时,矩阵的元素中也可以存储所述查询索引,以便后续进行核对,提供更加精确的查询结果。Since the query index has a mapping relationship with the serial number of the element, the serial number of the element storing the query content can be determined according to the query index, so that the query content can be quickly queried by means of matrix element query , so as to avoid obtaining the query content by matching the query index and stored information as in the prior art, which obviously greatly improves the efficiency of information query and reduces the system overhead of information query, and in this application , the stored query content itself does not include the query index, which also reduces the amount of data storage. In a specific implementation, the query index may also be stored in the elements of the matrix, so as to be checked later and provide more accurate query results.
本实施例所述的信息处理方法可以应用于流式计算系统中,所述流式计算主要应用于对数据的即时处理,统计学习等功能中。随着互联网大数据的爆发,流式计算也采用更加高级的分布式计算方式来提高处理速度,称之为分布式流式计算系统。分布式流式计算系统中最具代表性的是ibm stream、strom,基于分布式流式计算系统写出来的应用程序能够运行在成百上千台商用机器组成的大型集群上,并以一种可靠容错的方式并行处理上T级别的数据集。应用程序将需要计算的任务分割为很多的小块,通过类似流的方式,使其经过不同的处理节点,完成实时数据分析和事件处理。本实施例所述的信息查询的方法可以应用于流式计算系统中的信息查询。The information processing method described in this embodiment can be applied to a flow computing system, and the flow computing is mainly applied to functions such as real-time data processing and statistical learning. With the explosion of Internet big data, stream computing also adopts more advanced distributed computing methods to improve processing speed, which is called distributed stream computing system. The most representative distributed stream computing systems are ibm stream and strom. Applications written based on distributed stream computing systems can run on large clusters composed of hundreds or thousands of commercial machines, and use a Parallel processing of T-level data sets in a reliable and fault-tolerant manner. The application program divides the tasks that need to be calculated into many small pieces, and makes them pass through different processing nodes in a stream-like manner to complete real-time data analysis and event processing. The information query method described in this embodiment can be applied to information query in a stream computing system.
本实施例所述的查询索引优选为由数字、字母等本身能够用于排序作用的数字、字符等信息;且通常具有唯一标识信息;如所述IMEI或通信标识或各种卡分配的唯一的序列号等信息。具体如,应用于学生成绩查询信息处理中,查询索引可为学生学号或准考证号、查询内容可包括考试成绩;采用本实施例所述的方法,可将各个学生的考试成绩生成配置矩阵,而学号或准考证号将作为所述矩阵中各元素的序列号,这样学生输入学号或准考证号之后,查询装置立马就能通过指针跳转的方式找到对应的元素和其内存储的内容。The query index described in this embodiment is preferably information such as numbers and characters that can be used for sorting by numbers, letters, etc.; and usually has unique identification information; such as the unique IMEI or communication identification or various card distribution. Serial number and other information. Specifically, for example, when applied to the processing of student achievement query information, the query index can be the student number or the admission ticket number, and the query content can include test scores; using the method described in this embodiment, the test scores of each student can be generated into a configuration matrix , and the student number or admission ticket number will be used as the serial number of each element in the matrix, so that after the student enters the student number or admission ticket number, the query device can immediately find the corresponding element and its internal memory by jumping pointers. Content.
所述配置矩阵包括零元素和非零元素;The configuration matrix includes zero elements and non-zero elements;
所述非零元素为:依据所述配置信息中已确定查询索引和查询内容对应关系的信息记录形成的;所述查询内容为所述非零元素的内容;The non-zero element is: formed according to the information record in the configuration information that has determined the corresponding relationship between the query index and the query content; the query content is the content of the non-zero element;
所述零元素为依据所述元素的序列号与所述配置信息中查询索引没有映射关系的元素;所述非零元素的存储内容为空。The zero element is an element that has no mapping relationship between the sequence number of the element and the query index in the configuration information; the storage content of the non-zero element is empty.
当所述零元素很多时,就形成了稀疏矩阵;稀疏矩阵中的零元素是不占用存储空间,显然不存储查询索引的情况下,存储数据内容减少了。When there are many zero elements, a sparse matrix is formed; the zero elements in the sparse matrix do not occupy storage space, and obviously, when the query index is not stored, the stored data content is reduced.
所述查询索引为终端的移动设备国际身份码IMEI;The query index is the terminal's mobile equipment international identity code IMEI;
所述元素的序列号为所述IMEI的后N位;The serial number of the element is the last N digits of the IMEI;
所述N为不大于15的正整数;The N is a positive integer not greater than 15;
所述查询内容包括对应于所述IMEI的通信标识。The query content includes a communication identification corresponding to the IMEI.
所述通信标识可为手机号、固定电话号等用于通信过程作为标识的信息。在具体实现时,所述查询索引还可以为所述通信标识,所述查询内容可为所述IMEI或其他内容。The communication identification may be information such as a mobile phone number, a landline phone number, etc. used as an identification in the communication process. In a specific implementation, the query index may also be the communication identifier, and the query content may be the IMEI or other content.
下表为一个具体示例:The following table is a specific example:
表1Table 1
若所述N=10,采用本实施例所述的方法,形成配置矩阵则如下表:If said N=10, adopt the method described in this embodiment, form configuration matrix then as follows:
表2Table 2
从表2可以看出,在存储结构中序列号即IMEI号的后10位(目前IMEI号的前5位不变),这样的存储格式代表输入数据中IMEI号截取后10位就能直接定位到手机号在存储结构中的位置,如外部输入一条数据,该数据手机号码字段为空,需要进行手机号码补充。另外在稀疏矩阵的存储结构中,非零元素远远小于元素的总数,零元素在物理存储上是不占用空间的,这样既能提高查询效率,也能节约存储空间。It can be seen from Table 2 that the serial number in the storage structure is the last 10 digits of the IMEI number (currently the first 5 digits of the IMEI number remain unchanged), such a storage format means that the intercepted 10 digits of the IMEI number in the input data can be directly located To the location of the mobile phone number in the storage structure, if a piece of data is input externally, the mobile phone number field of this data is empty, and the mobile phone number needs to be supplemented. In addition, in the storage structure of the sparse matrix, the non-zero elements are far smaller than the total number of elements, and zero elements do not take up space in physical storage, which can not only improve query efficiency, but also save storage space.
进一步地,所述步骤S120可包括:确定所述输入数据中是否确实所述通信标识;若缺失所述通信标识,则从所述输入数据中提取所述IMEI;所述步骤S130可包括:提取所述IMEI的后N位。所述步骤S140可包括:依据所述IMEI的后N到对应的存储位置获取所述通信标识。当所述输入外部数据包括所述通信标识时,则可以不进行所述步骤S120至步骤S140,可以将输入数据按照预先要处理的流程进行后续处理,具体如转入下一个处理节点进行数据处理。Further, the step S120 may include: determining whether the communication identification is indeed contained in the input data; if the communication identification is missing, extracting the IMEI from the input data; the step S130 may include: extracting The last N digits of the IMEI. The step S140 may include: obtaining the communication identification according to the last N of the IMEI to the corresponding storage location. When the input external data includes the communication identification, the step S120 to step S140 may not be performed, and the input data may be processed according to the flow to be processed in advance, such as transferring to the next processing node for data processing .
作为上述实施例的进一步补充,所述方法还包括:将与所述IMEI对应的通信标识补充到所述输入数据中,形成输出数据。As a further supplement to the above embodiment, the method further includes: adding the communication identification corresponding to the IMEI to the input data to form output data.
以下结合本实施例所述的方法提供几个具体示例:Several specific examples are provided below in conjunction with the method described in this embodiment:
示例一:Example one:
本示例所应用的应用场景:Scenarios used in this example:
流式计算集群需要采集用户上网日志数据,但源系统中数据质量存在问题,问题在于源数据中部分记录缺少手机号码,需要在ETL(Extract-Load-Transform)过程中进行补充,否则缺失手机号码的记录将无法通过手机号码进行业务应用。The streaming computing cluster needs to collect user log data, but there is a problem with the data quality in the source system. The problem is that some records in the source data lack mobile phone numbers, which need to be supplemented during the ETL (Extract-Load-Transform) process, otherwise the mobile phone number is missing The records will not be able to use the mobile phone number for business applications.
若源系统提供的源数据记录中缺失手机号码,则采用本实施例所述的方法进行手机号码的补充。如通过源数据记录中的另一个字段IMEI(移动设备国际身份码)跟用户资料表进行关联,得到用户资料表中的手机号码,并补充到源数据记录的手机号码字段中。所述源数据记录即为所述输入数据。If the mobile phone number is missing in the source data record provided by the source system, the method described in this embodiment is used to supplement the mobile phone number. For example, another field IMEI (International Mobile Equipment Identity) in the source data record is associated with the user data table to obtain the mobile phone number in the user data table and added to the mobile phone number field of the source data record. The source data record is the input data.
如图2所示,本示例具体包括:As shown in Figure 2, this example specifically includes:
步骤1:外部数据输入到流式计算手机号码补充节点,输入数据可如表3:Step 1: Input external data to the stream computing mobile phone number supplement node, the input data can be as shown in Table 3:
表3table 3
步骤2:判断是否手机号缺失,确定输入数据的手机号码字段为空;Step 2: Determine whether the mobile phone number is missing, and confirm that the mobile phone number field of the input data is empty;
步骤3:发现缺失手机号码,从外部输入数据中取得IMEI号后10位,如IMEI号为:[460005007585583],截取后的数值为:[5007585583];Step 3: Find the missing mobile phone number, and obtain the last 10 digits of the IMEI number from the external input data. For example, the IMEI number is: [460005007585583], and the intercepted value is: [5007585583];
步骤4:通过IMEI后10位到配置数据存储中取得该序列号对应的手机号码数据:value[5007585583]=13837314528;Step 4: Obtain the mobile phone number data corresponding to the serial number in the configuration data storage through the last 10 digits of IMEI: value[5007585583]=13837314528;
步骤5:配置数据存储将手机号码数据返回给流式计算手机号码补充节点;Step 5: Configure data storage to return the mobile phone number data to the stream computing mobile phone number replenishment node;
步骤6:流式计算手机号码补充节点将返回数据补充到输入数据中,此刻输入数据内容为:Step 6: The stream computing mobile phone number supplement node will supplement the returned data to the input data. At this moment, the content of the input data is:
步骤7:补充后的输入数据传入下一个处理节点进行后续实时处理。Step 7: The supplemented input data is sent to the next processing node for subsequent real-time processing.
至此,基于流式计算产品的快速匹配功能结束。So far, the quick matching function based on streaming computing products has come to an end.
如图3所示,As shown in Figure 3,
Streams使用XSD(XML Schema Definition)来对JAVA运算符模型进行描述,涵盖:语法表达式、所需参数、输入端口定义、输出端口定义、所依赖的类库。JAVA操作符在创建的时候,会相应的生成配置文件信息,stream根据JAVA生成的XML配置调用JAVA,流式计算语言在调用时候只需,调用JAVA的文件名即可。如代码“HdfsSink”即为创建的自定义JAVA操作符名称,调用时候只需将作为操作符名称来调用。Streams uses XSD (XML Schema Definition) to describe the JAVA operator model, covering: syntax expressions, required parameters, input port definitions, output port definitions, and dependent class libraries. When the JAVA operator is created, the configuration file information will be generated accordingly. The stream calls JAVA according to the XML configuration generated by JAVA. When calling the stream computing language, it only needs to call the file name of JAVA. For example, the code "HdfsSink" is the name of the custom JAVA operator created, and it only needs to be called as the operator name when calling.
示例二:Example two:
如图3所示,本示例提供一种信息处理方法包括:As shown in Figure 3, this example provides an information processing method including:
将配置数据输入配置生成工具;input configuration data into the configuration generation tool;
配置生成工具将配置数据形成稀疏矩阵存储;The configuration generation tool forms the configuration data into a sparse matrix storage;
接收以一条用户数据;该用户数据即为上述输入数据;该用户数据中包括查询索引;Receive a piece of user data; the user data is the above-mentioned input data; the user data includes the query index;
利用索引匹配快速定位;定位形成的定位结果可用于查询内容获取;Use index matching to quickly locate; the positioning result formed by positioning can be used for query content acquisition;
查询内容获取;Query content acquisition;
将查询内容添加到用户数据后,形成更新后的用户数据。After the query content is added to the user data, an updated user data is formed.
本申请实施例所述的方法可适用于各种需要进行信息查询匹配处理的信息处理中,尤其适应于流式信息处理中。The method described in the embodiment of the present application is applicable to various information processing that requires information query matching processing, and is especially suitable for streaming information processing.
可应用于经营分析系统领域基于稀疏矩阵的存储结构把配置信息存储在内存中,并通过将JAVA实现的快速匹配算法嵌入流式节点中,在海量的数据实时输入过程中,能快速的定位配置数据位置,取出所要数据。既提高了处理效率,满足实时性要求,也避免在普通存储结构中查找数据所带来的资源消耗(CPU、处理时间)。It can be applied to the field of business analysis system. The sparse matrix-based storage structure stores the configuration information in the memory, and by embedding the fast matching algorithm implemented by JAVA into the streaming node, it can quickly locate and configure during the real-time input of massive data. Data location, fetch the desired data. It not only improves processing efficiency, meets real-time requirements, but also avoids resource consumption (CPU, processing time) caused by searching data in common storage structures.
设备实施例:Device example:
如图4所示,本实施例提供一种信息处理装置,其特征在于,所述装置包括:As shown in Figure 4, this embodiment provides an information processing device, wherein the device includes:
存储单元110,用于存储配置信息,形成配置矩阵;其中,配置矩阵中各元素的序列号与查询索引具有映射关系;所述元素的存储内容为基于所述查询索引进行查询的查询内容;The storage unit 110 is used to store configuration information and form a configuration matrix; wherein, the sequence number of each element in the configuration matrix has a mapping relationship with the query index; the storage content of the element is the query content of the query based on the query index;
获取单元120,用于从输入数据获取所述查询索引;An obtaining unit 120, configured to obtain the query index from input data;
确定单元130,用于依据所述查询索引确定待查询元素的序列号;A determining unit 130, configured to determine the serial number of the element to be queried according to the query index;
查询单元140,用于依据所述序列号获取所述查询内容。The query unit 140 is configured to obtain the query content according to the serial number.
本实施例所述的存储单元110可包括存储介质,以矩阵的方式来存储所述配置信息。具体如当形成所述配置矩阵的配置信息中的所述查询索引对应序列号很稀疏时,通常所述配置矩阵形成的是稀疏矩阵。而配置矩阵中与配置信息中查询索引有映射关系的元素通常为非零元素,而没有与配置信息中映射关系的元素通常为非零元素。所述稀疏矩阵中的非零元素为:依据所述配置信息中已确定查询索引和查询内容对应关系的信息记录形成的。The storage unit 110 in this embodiment may include a storage medium to store the configuration information in a matrix. Specifically, when the sequence numbers corresponding to the query indexes in the configuration information forming the configuration matrix are very sparse, usually the configuration matrix forms a sparse matrix. The elements in the configuration matrix that have a mapping relationship with the query index in the configuration information are usually non-zero elements, while the elements that have no mapping relationship with the configuration information are usually non-zero elements. The non-zero elements in the sparse matrix are formed according to the information records in the configuration information that have determined the corresponding relationship between the query index and the query content.
所述存储介质可为RAM、flash等各种存储介质,可选为非瞬间存储介质。The storage medium may be various storage media such as RAM and flash, and may be a non-transient storage medium.
所述获取单元120可包括通信接口,从外部设备中接收所述输入数据或直接接收所述查询索引,所述获取单元120还可包括解析器或具有信息解析功能的信息处理器,能够用于解析所述输入数据来获取查询索引。The acquisition unit 120 may include a communication interface, receive the input data from an external device or directly receive the query index, and the acquisition unit 120 may also include a parser or an information processor with an information analysis function, which can be used for The input data is parsed to obtain a query index.
所述确定单元130和查询单元140的具体结构都可包括各种类型的处理器,所述处理器通过对可执行代码的执行来确定所述确定单元130和查询单元140的功能。所述处理器可包括中央处理器CPU、微处理器MCU、数字信号处理器DSP、应用处理器AP或可编程处理器PLC等具有信息处理功能的电子器件。所述确定单元130和查询单元140可分别对应不同的处理器,也可以集成对应相同的处理器。The specific structures of the determining unit 130 and the querying unit 140 may include various types of processors, and the processors determine the functions of the determining unit 130 and the querying unit 140 by executing executable codes. The processor may include electronic devices with information processing functions such as a central processing unit CPU, a microprocessor MCU, a digital signal processor DSP, an application processor AP, or a programmable processor PLC. The determination unit 130 and the query unit 140 may respectively correspond to different processors, or may be integrated and correspond to the same processor.
综合上述本实施例提供了一种信息处理装置能够实现上述方法实施例中的信息处方法,显然在进行信息查询匹配时具有效率高、消耗系统资源少等优点。于此同时若各元素中并不存储查询索引时,还将减少信息存储量,从而减少存储空间的占用。此处的系统资源可包括CPU处理资源、线程占用时间以及内存占用量等。Based on the above, this embodiment provides an information processing device capable of implementing the information processing method in the above method embodiment, which obviously has the advantages of high efficiency and less consumption of system resources when performing information query matching. At the same time, if the query index is not stored in each element, the amount of information storage will be reduced, thereby reducing the occupation of storage space. The system resources here may include CPU processing resources, thread occupation time, memory occupation, and the like.
所述查询索引为终端的移动设备国际身份码IMEI;所述元素的序列号为所述IMEI的后N位;所述N为不大于15的正整数;所述N可具体为10等。所述查询内容包括对应于所述IMEI的通信标识。所述通信标识可为手机号等。所述查询内容还可包括用户名称、通信套餐、用户当前预缴费用等信息。The query index is the terminal's mobile equipment international identity code IMEI; the serial number of the element is the last N digits of the IMEI; the N is a positive integer not greater than 15; the N can be specifically 10, etc. The query content includes a communication identification corresponding to the IMEI. The communication identification may be a mobile phone number or the like. The query content may also include user name, communication package, user's current prepaid fee and other information.
显然本实施例所述查询索引、查询内容尽是本申请的一个具体示例,在具体实现时不局限于该具体示例。Apparently, the query index and query content described in this embodiment are only a specific example of the present application, and are not limited to this specific example during specific implementation.
进一步地,所述获取单元120,具体用于确定所述输入数据中是否确实所述通信标识;及若缺失所述通信标识,则从所述输入数据中提取所述IMEI;Further, the obtaining unit 120 is specifically configured to determine whether the communication identification is indeed in the input data; and if the communication identification is missing, extract the IMEI from the input data;
所述确定单元130,具体用于提取所述IMEI的后N位;The determining unit 130 is specifically configured to extract the last N digits of the IMEI;
所述查询单元140,具体用于依据所述IMEI的后N到对应的存储位置获取所述通信标识。The query unit 140 is specifically configured to obtain the communication identifier from the corresponding storage location according to the last N of the IMEI.
在本实施例中,所述获取单元120仅确定输入数据中确实所述通信标识,即输入数据缺失部分数据,需要从配置信息中提取信息补充时,才会依据查询对查询内容进行查询,以便将输入数据中缺失的信息补全,形成更新后的数据内容输入到后续处理节点中。结合所述通信标识的查询,在本实施例中所述获取单元判断所述输入数据数据中是否包括通信标识,在确实通信标识时,提取IMEI;并方便后续确定单元从IMEI提取后N位作为确定需要查找元素的序列号。In this embodiment, the acquisition unit 120 will query the query content according to the query only when it is determined that the communication identifier is indeed in the input data, that is, the input data lacks part of the data and needs to be supplemented by extracting information from the configuration information, so that Complete the missing information in the input data to form updated data content and input it to subsequent processing nodes. In conjunction with the query of the communication identification, in this embodiment, the acquisition unit judges whether the input data includes the communication identification, and when the communication identification is confirmed, extracts the IMEI; and facilitates the subsequent determination unit to extract the N bits from the IMEI as Determine the sequence number of the element to be looked up.
本实施例中所述获取单元120、确定单元130及查询单元140的物理结构可参见前述部分,在此就不重复了,总之此处的所述获取单元120、确定单元130及查询单元140可用于通信标识的查询。The physical structures of the acquisition unit 120, the determination unit 130 and the query unit 140 in this embodiment can be referred to the foregoing part, and will not be repeated here. In short, the acquisition unit 120, the determination unit 130 and the query unit 140 can be Inquiry on communication identification.
所述装置还包括:形成单元,用于将与所述IMEI对应的通信标识补充到所述输入数据中,形成输出数据。将查询内容补充到输入数据中形成补充后的数据,方便后续节点作用。The device further includes: a forming unit, configured to add the communication identification corresponding to the IMEI to the input data to form output data. Supplement the query content to the input data to form the supplemented data, which is convenient for subsequent node functions.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理模块中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention can be fully integrated into one processing module, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for realizing the above-mentioned method embodiments can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the Including the steps of the foregoing method embodiment; and the aforementioned storage medium includes: various storage devices, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk A medium on which program code can be stored.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510245690.1A CN106294353A (en) | 2015-05-14 | 2015-05-14 | Information processing method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510245690.1A CN106294353A (en) | 2015-05-14 | 2015-05-14 | Information processing method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN106294353A true CN106294353A (en) | 2017-01-04 |
Family
ID=57631073
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510245690.1A Pending CN106294353A (en) | 2015-05-14 | 2015-05-14 | Information processing method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106294353A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113298586A (en) * | 2020-05-25 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Order information processing method and device |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020087531A1 (en) * | 1997-02-26 | 2002-07-04 | Hitachi, Ltd. | Database processing method and system |
| CN102662992A (en) * | 2012-03-14 | 2012-09-12 | 北京搜狐新媒体信息技术有限公司 | Method and device for storing and accessing massive small files |
| CN104424258A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Multidimensional data query method and system, query server and column storage server |
| CN104486777A (en) * | 2014-12-01 | 2015-04-01 | 中国联合网络通信集团有限公司 | Method and device for processing data |
-
2015
- 2015-05-14 CN CN201510245690.1A patent/CN106294353A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020087531A1 (en) * | 1997-02-26 | 2002-07-04 | Hitachi, Ltd. | Database processing method and system |
| CN102662992A (en) * | 2012-03-14 | 2012-09-12 | 北京搜狐新媒体信息技术有限公司 | Method and device for storing and accessing massive small files |
| CN104424258A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Multidimensional data query method and system, query server and column storage server |
| CN104486777A (en) * | 2014-12-01 | 2015-04-01 | 中国联合网络通信集团有限公司 | Method and device for processing data |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113298586A (en) * | 2020-05-25 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Order information processing method and device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110502546B (en) | Data processing method and device | |
| CN111209319B (en) | Data service method and device | |
| CN113553341B (en) | Multidimensional data analysis method, device, equipment and computer-readable storage medium | |
| CN106649630A (en) | Data query method and device | |
| CN111198898B (en) | Big data query method and big data query device | |
| CN110119292A (en) | System operational parameters querying method, matching process, device and node device | |
| CN104484353A (en) | Data imaging method, data imaging device and database server | |
| US10171606B2 (en) | System and method for providing data as a service (DaaS) in real-time | |
| CN106997394B (en) | A kind of data random ordering arrival processing method and system | |
| CN110704486A (en) | Data processing method, device, system, storage medium and server | |
| CN113760242B (en) | Data processing method, device, server and medium | |
| CN102426612A (en) | Conditional object query method and system | |
| CN111858617A (en) | User searching method and device, computer readable storage medium and electronic equipment | |
| CN114461363A (en) | Task execution method and device, and computer-readable storage medium | |
| CN106294353A (en) | Information processing method and device | |
| CN104239537A (en) | Method for realizing generating and processing flow for large-data pre-processing text data | |
| CN112131016A (en) | Application program internal data processing method, device and equipment | |
| CN117708164A (en) | Data storage method, device and equipment based on parallel processing database | |
| CN107368477B (en) | HBase coprocessor-based SQL-like query method and system | |
| CN116257673A (en) | Data query method, device, equipment and storage medium based on ElasticSearch | |
| CN110019472A (en) | A kind of address date matching process and intelligent terminal | |
| CN115268982A (en) | A system database switching method, system, computer equipment and medium | |
| CN114510501A (en) | A method and device for real-time processing of interface data | |
| CN112445811A (en) | Data service method, device, storage medium and component based on SQL configuration | |
| CN118689880B (en) | Industrial Internet of things data storage optimization method, system and equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170104 |
|
| RJ01 | Rejection of invention patent application after publication |