[go: up one dir, main page]

WO2018126367A1 - Data cleaning method and device - Google Patents

Data cleaning method and device Download PDF

Info

Publication number
WO2018126367A1
WO2018126367A1 PCT/CN2017/070190 CN2017070190W WO2018126367A1 WO 2018126367 A1 WO2018126367 A1 WO 2018126367A1 CN 2017070190 W CN2017070190 W CN 2017070190W WO 2018126367 A1 WO2018126367 A1 WO 2018126367A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
segment
time
abnormal
data segment
Prior art date
Application number
PCT/CN2017/070190
Other languages
French (fr)
Chinese (zh)
Inventor
黄建华
康宏
Original Assignee
上海温尔信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海温尔信息科技有限公司 filed Critical 上海温尔信息科技有限公司
Priority to PCT/CN2017/070190 priority Critical patent/WO2018126367A1/en
Publication of WO2018126367A1 publication Critical patent/WO2018126367A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a data cleaning method and apparatus.
  • a wearable device is a portable device that can be worn directly on the user or integrated into the user's clothing or accessories.
  • the wearable device can continuously collect relevant data of the user and upload the collected data to the server to implement data interaction.
  • the user's health index, behavioral habits, life preferences, and the like can be analyzed.
  • the data can be denoised to remove noise in the data, improve the reliability of the data, and improve the accuracy of the analysis results.
  • a lot of data collected by the wearable device from the human body is limited by the complex scene of the human body and the problem of the device itself or the connection with the mobile device. Some wrong data may occur, such as data loss caused by device falling off, or data loss caused by device communication. Wrong data needs to be eliminated in time to avoid further analysis affecting analytical calculations. The inventors thought that before using the data collected by the wearable device, the data needs to be thoroughly cleaned to obtain clean and reliable data, and the accuracy of the correlation analysis based on the data is improved.
  • the embodiment of the present application provides a data cleaning method, including:
  • the step of removing the abnormal data includes: setting a time window based on a time domain characteristic of the physical quantity; and using the time window, dividing the data sequence into at least one data segment; respectively The abnormal data in the at least one data segment is removed.
  • the step of removing the abnormal data in the first data segment of the at least one data segment includes: removing data exceeding the data range in the first data segment; and/or removing The data in the first data segment whose volatility is greater than the fluctuation threshold.
  • the step of removing the data beyond the data range includes: calculating a mean and a variance of the data in the first data segment; setting the data range according to the mean and the variance a boundary and a lower boundary; removing data in the first data segment that is larger than the upper boundary and data smaller than the lower boundary.
  • the step of removing the data whose volatility is greater than the fluctuation threshold includes: calculating a differential of the data in the first data segment; and removing an absolute value of the differential in the first data segment to be greater than a differential Threshold data.
  • the step of uniformly processing the continuous data segment includes: identifying, according to the set time interval threshold, the continuous data segment from the data sequence after the abnormal data is removed; using data interpolation In a way, the continuous data segments are processed uniformly in time.
  • the step of identifying the continuous data segment includes: determining, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; and identifying the data after removing the abnormal data A data segment in the data sequence that is disconnected by the data interruption point as the continuous data segment.
  • the step of uniformly processing the consecutive data segments includes: determining at least one uniformization time point corresponding to the continuous data segment; and interpolating the at least the data in the continuous data segment A data that equalizes the time point.
  • the embodiment of the present application further provides a data cleaning device, including:
  • An acquiring unit configured to acquire a data sequence of a physical quantity time domain sample
  • a removing unit configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity
  • a uniform processing unit for uniformly processing contiguous data segments in the data sequence after the abnormal data is removed in time.
  • the removing unit includes: a setting subunit, configured to set a time window based on a time domain characteristic of the physical quantity; and a dividing subunit, configured to use the time window to use the data sequence Dividing into at least one data segment; removing subunits for respectively removing abnormal data in the at least one data segment.
  • the removing subunit is specifically configured to: remove data exceeding the data range in the first data segment; and/or remove data in the first data segment whose volatility is greater than a fluctuation threshold.
  • the removing subunit is specifically configured to: calculate a mean and a variance of data in the first data segment; and set upper and lower boundaries of the data range according to the mean and variance And removing data larger than the upper boundary and data smaller than the lower boundary in the first data segment.
  • the removing subunit is specifically configured to: calculate a differential of data in the first data segment; and remove data in which the absolute value of the differential entropy in the first data segment is greater than a differential threshold.
  • the uniform processing unit includes: an identifying subunit, configured to identify the continuous data segment from the data sequence after the abnormal data is removed according to a set time interval threshold; The unit is configured to uniformly process the continuous data segment in time by using data interpolation.
  • the identifying subunit is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; and identify the data sequence after the abnormal data is removed A piece of data that is disconnected by the data interruption point as the continuous data segment.
  • the interpolation subunit is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and interpolate the at least one homogenization by using data in the continuous data segment The data corresponding to the time point.
  • the embodiment of the present application further provides a computer storage medium, which stores the following program instructions:
  • a second program instruction configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity
  • the third program instruction is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
  • the embodiment of the present application further provides an electronic device, including:
  • a memory configured to store a computer program
  • a communication interface configured to implement communication between the electronic device and other devices
  • a processor coupled to the memory and the communication interface is configured to execute the computer program for:
  • the method when the processor removes the abnormal data, is specifically configured to: set a time window based on a time domain characteristic of the physical quantity; and use the time window to divide the data sequence into At least one data segment; respectively removing abnormal data in the at least one data segment.
  • the method when the processor removes the abnormal data, the method is specifically configured to: remove data exceeding the data range in the first data segment; and/or remove fluctuations in the first data segment Data with a rate greater than the fluctuation threshold.
  • the processor when the processor removes the data that is out of the data range, the processor is specifically configured to: calculate a mean and a variance of the data in the first data segment;
  • the processor removes the number of the volatility greater than a fluctuation threshold According to the time, it is specifically used to: calculate a differential of the data in the first data segment; and remove data in which the absolute value of the differential entropy in the first data segment is greater than a differential threshold.
  • the method when the processor uniformly processes the consecutive data segments, the method is specifically configured to: identify the continuous data from the data sequence after the abnormal data is removed according to a set time interval threshold Data segment; using data interpolation, uniformly processing the continuous data segment in time.
  • the method when the processor identifies the continuous data segment, the method is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; In the data sequence after the abnormal data is removed, the data segment disconnected by the data interruption point is used as the continuous data segment.
  • the method when the processor uniformly processes the consecutive data segments, the method is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and utilize data in the continuous data segment And interpolating data corresponding to the at least one homogenization time point.
  • the abnormal data is removed based on the time domain characteristics of the physical quantity, and uniformly processed in time.
  • FIG. 1 is a schematic structural diagram of a data cleaning system according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a data cleaning method according to another embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a data cleaning method according to another embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a data cleaning method according to another embodiment of the present application.
  • FIG. 5 is a schematic flowchart diagram of a data cleaning method according to another embodiment of the present application.
  • FIG. 6 is a schematic flowchart diagram of a data cleaning method according to another embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a data cleaning apparatus according to another embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device according to another embodiment of the present disclosure.
  • the data cleaning method provided by the embodiment of the present application can be implemented based on the data cleaning system shown in FIG. 1, but is not limited thereto.
  • the data cleaning system includes: a data collection device 10 and a data cleaning device 20; and the data collection device 10 is communicatively coupled to the data cleaning device 20.
  • the data collection device 10 is configured to perform time domain sampling on a physical quantity to obtain sampling data of the physical quantity.
  • the data collection device 10 can directly report the physical quantity of the sampling data to the data cleaning device 20, so that the data cleaning device 20 obtains the data sequence of the physical quantity time domain sampling. or,
  • the data collection device 10 may store the sampled data of the physical quantity into a database for the data cleaning device 20 to acquire the data sequence of the time-domain samples of the physical quantity from the database.
  • the data cleaning device 20 it is mainly used to acquire a data sequence of physical quantity time domain samples, and the data sequence is cleaned to obtain reliable and accurate data, and provide basic data for subsequent application or analysis.
  • the data collection device 10 and the data cleaning device 20 may be connected by wireless or wired network.
  • the network standard of the mobile network may be any of 2G (GSM), 2.5G (GPRS), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G+ (LTE+), WiMax, etc.
  • GSM 2G
  • GPRS 2.5G
  • WCDMA 3G
  • WCDMA Time Division Multiple Access
  • TD-SCDMA Time Division Multiple Access
  • CDMA2000 Code Division Multiple Access 2000
  • UTMS Universal Mobile communications
  • 4G Long Term Evolution
  • LTE+ Long Term Evolution+
  • the physical quantity in this embodiment may be any physical quantity that supports time domain acquisition, and may be, for example, temperature or humidity.
  • the data collection device 10 in this embodiment may be any device capable of performing time domain acquisition on physical quantities, for example, various sensors. Taking the physical quantity as the temperature, especially the body temperature of the human body as an example, the data acquisition device may be a wearable device with a temperature sensor.
  • the data cleaning device 20 in this embodiment may be any device having data storage and data processing functions such as a server, a computer, a tablet computer, a smart terminal, and the like.
  • FIG. 2 is a schematic flowchart diagram of a data cleaning method according to another embodiment of the present application. As shown in FIG. 2, the method includes:
  • the data cleaning device 20 acquires a data sequence of a physical quantity of time domain samples.
  • the data cleaning device 20 can acquire a data sequence formed by the data acquisition device 10 performing time domain sampling on the physical quantity.
  • the data sequence includes sampled data of the physical quantities at different points in time and corresponding timestamps.
  • the data collection device 10 performs time domain sampling on the physical quantity. For each sampled data, the data collection device 10 automatically adds a time stamp to it. Optionally, the data collection device 10 reports the time-stamped sample data to the data cleaning device 20 for the data cleaning device 20 to obtain the time-domain sampled data sequence of the physical quantity. Alternatively, the data collection device 10 stores the time-stamped sample data to a database for the data cleaning device 20 to obtain the physical quantity from the database. Time-domain sampled data sequence.
  • the data collection device 10 begins at a specified point in time, sampling the physical quantities at even time intervals, with no time stamps for the sampled data.
  • the data collection device 10 reports the sample data without time stamp to the data cleaning device 20, and the data cleaning device 20 adds a time stamp to the sample data to obtain a data sequence of the time domain sample of the physical quantity.
  • the data collection device 10 stores the sample data without time stamps in a database. In the storage process, the sample data is time stamped for the data cleaning device 20 to acquire the time domain samples of the physical quantity from the database. Data sequence.
  • the physical quantity generally has a certain time domain characteristic from the physical and mathematical characteristics of the physical quantity itself. For example, some physical quantities are continuous over a certain time frame, and the changes are gentle and do not suddenly jump or change rapidly.
  • some physical quantities are continuous over a certain time frame, and the changes are gentle and do not suddenly jump or change rapidly.
  • the body temperature of the human body is continuous and does not suddenly jump; if the body temperature data of sudden jump in the actual collection is abnormal, it should belong to the abnormal situation in the collection process, not the body temperature of the sampling object. The jump has changed.
  • the body temperature change is relatively flat. Generally, the body temperature change will not exceed 0.05 degrees per second. If the actual body temperature data changes exceed this range, it should be an abnormal situation in the collection process, not the measurement object. Body temperature really changed at this rate.
  • the data cleaning device 20 removes the abnormal data in the data sequence acquired in step 201 based on the time domain characteristics of the physical quantity.
  • the data sequence after the abnormal data is removed includes data conforming to the time domain characteristics of the physical quantity, and the data is reliable and accurate.
  • step 203 considering step 202 to remove the abnormal data in the data sequence, the data sequence may not be continuous in time, no longer uniform, and overall is inconvenient to use, but the continuous data segments in the data sequence still have certain use. Value.
  • the data cleaning device 20 uniformly processes the continuous data segments in the data sequence after the removal of the abnormal data in time to provide a reliable, temporally continuous and uniform continuous data segment for subsequent use.
  • the continuous data segment refers to a data sequence in which the abnormal data is removed, and the time interval corresponding to all adjacent data is less than a preset.
  • the data segment of the time interval threshold is a data sequence in which the abnormal data is removed, and the time interval corresponding to all adjacent data is less than a preset.
  • the value of the time interval threshold may be different according to the application scenario and the physical quantity. This embodiment does not limit the value of the time interval threshold, and can be adaptively set.
  • the abnormal data is removed based on the time domain characteristics of the physical quantity, and is uniformly processed in time.
  • the continuous data segment in the data sequence after the abnormal data realizes the cleaning of the data sequence of the time domain sampling, and finally obtains reliable and accurate sampling data, thereby improving the accuracy of correlation analysis based on the sampled data.
  • removing the abnormal data in the data sequence based on the time domain characteristics of the physical quantity may include the following steps:
  • a time window is set, which reflects the time domain characteristics of the physical quantity, which is simply a characteristic of the physical quantity changing with time.
  • the body temperature in order to continuously collect human body temperature, in general, the body temperature does not change more than 0.5 degrees within 3 minutes, and according to this characteristic, the time window can be set to 3 minutes. This means that in the body temperature data within 3 minutes, the body temperature data that changes by more than 0.5 degrees is abnormal data.
  • the human heart rate in order to continuously collect the human heart rate, in general, the human heart rate does not change more than 15 times within 10 seconds, and according to this characteristic, the time window can be set to 10 seconds. This means that in the heart rate data within 10 seconds, the heart rate data that has changed more than 15 times is abnormal data.
  • step 2022 based on the time window set in step 2021, the data sequence can be divided into at least one data segment, the length of time of each data segment being the length of the time window.
  • the data sequence is divided into at least one data segment by using a time window, and at least one data segment does not overlap. Further, if the length of the last data segment in the data sequence is less than the length of the time window, but the ratio of the length of time to the time window is greater than or equal to the specified The ratio, for example greater than 1/3, preserves the last data segment as a separate data segment. Conversely, if the length of the last data segment in the data sequence is less than the length of the time window, and the ratio of the length of time to the time window is less than a specified ratio, such as less than 1/3, meaning that the last data segment is less than the required time window 1/3, The last data segment is then merged into the most recent time period data.
  • the data in 12:00:00-12:30:00 can be divided into a data segment.
  • the data in 12:30:00-13:00:00 is divided into a data segment, and the data in 13:00:00-13:30:00 is divided into a data segment, and the data of the last 5 minutes is merged into 13 :00:00-13:30:00 time period.
  • step 2023 the data segment divided by step 2022 is removed from the abnormal data.
  • step 2022 each data segment is divided, that is, the process proceeds to step 2023, the abnormal data in the data segment is removed, and then the process returns to step 2022. or,
  • step 2022 the process proceeds to step 2023 to remove the abnormal data in each data segment one by one.
  • the abnormal data may be removed in the following manner:
  • the step of removing the number of data segments beyond the data range in the first data segment may be: calculating a mean and a variance of the data in the first data segment, respectively recorded as ⁇ and ⁇ ; setting data according to the mean and variance
  • the upper and lower boundaries of the range are denoted as ⁇ + ⁇ and ⁇ - ⁇ , respectively; the data larger than the upper boundary ⁇ + ⁇ in the first data segment and the data smaller than the lower boundary ⁇ - ⁇ are removed, that is, only the first data segment is retained
  • is a coefficient, which can be determined according to the application scenario and physical quantity.
  • the step of removing the data in the first data segment whose volatility is greater than the fluctuation threshold may be: calculating a differential of the data in the first data segment; and removing the data in the first data segment whose absolute value is greater than the differential threshold.
  • the volatility of the data is represented by differentiation, and correspondingly, the volatility threshold is embodied by a differential threshold. Can diversify all data in the data sequence The absolute value of the score is compared to the differential threshold, and the differential that generally exceeds the differential threshold will appear in pieces.
  • the data whose absolute value of the differential is greater than the differential threshold is the data of the abnormality of the change, for example, it may be the initial stage of the acquisition, or the end of the collection, or the acquisition object is lost for some reason (for example, the temperature measuring device falls off). These data are generally abnormal data.
  • n is a positive integer
  • dT(n) represents the differentiation of the nth time point
  • T(n) and T(n-1) represent the data of the nth time point and the n-1th, respectively.
  • Data at time points; t(n) and t(n-1) represent the nth time point and the n-1th time point, respectively.
  • n is a non-negative integer
  • dT(n) represents the differentiation of the nth time point
  • T(n) and T(n+1) represent the data of the nth time point and the n+th, respectively.
  • Data at one time point; t(n) and t(n+1) represent the nth time point and the n+1th time point, respectively;
  • dT(end) and dT(end-1) respectively represent the last time The differentiation of the point and the differentiation of the penultimate time point.
  • the differential calculation formula belongs to the central differential.
  • n is a positive integer
  • dT(n) represents the differentiation of the nth time point
  • T(n-1) and T(n+1) respectively represent the nth - data of one time point and data of the n+1th time point
  • t(n-1) and t(n+1) respectively represent the n-1th time point and the n+1th time point
  • dT (end) and dT(end-1) represent the differentiation of the last time point and the differentiation of the penultimate time point, respectively.
  • abnormal data in the second and third data segments in the at least one data segment may be removed in the same manner as the first data segment, but is not limited thereto.
  • the step of uniformly processing the continuous data segments in time may be: identifying the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold;
  • the data interpolation method uniformly processes the continuous data segments in time.
  • the step of identifying the continuous data segment may be: according to the time interval threshold, The data interruption point in the data sequence after the abnormal data is removed; the data segment broken by the data interruption point in the data sequence after the abnormal data is removed is identified as a continuous data segment. Specifically, the difference between the timestamp corresponding to the adjacent data in the data sequence after the abnormal data is removed is compared with the time interval threshold, and the adjacent data with the difference of the timestamp greater than the time interval threshold is used as the interrupt data point.
  • the interrupt data points are used as a segmentation point to divide the data sequence into at least one continuous data segment. In each successive data segment, the difference between the timestamps corresponding to the adjacent data is less than or equal to the time interval threshold.
  • the step of uniformly processing the continuous data segment may be: determining at least one uniformization time point corresponding to the continuous data segment; and using data in the continuous data segment, interpolating data corresponding to the at least one uniformization time point.
  • the data may be directly used as the data corresponding to the homogenization time point; if the homogenization time point is not related to any data in the continuous data segment If the time stamps are the same, the data corresponding to the homogenization time points can be interpolated and the data corresponding to the homogenization time points can be taken.
  • the interpolation method may be linear interpolation, spline interpolation, or the like.
  • FIG. 4 a data cleaning method is shown in FIG. 4, including:
  • the wearable device continuously collects body temperature and stores the collected body temperature data into a database.
  • the wearable device can add a time stamp to the collected body temperature data. Or, in the process of storing to the database, time stamp data is added.
  • the data cleaning device acquires a data sequence corresponding to the body temperature of the human body from the database, and the data sequence includes a series of body temperature data.
  • the data cleaning device sets a time window based on a time domain characteristic of the physical quantity.
  • the data cleaning device divides the data sequence into at least one data segment by using a time window.
  • the data cleaning device calculates a mean and a variance of body temperature data in each of the at least one data segment.
  • the data cleaning device respectively determines the mean and variance of the body temperature data in each data segment. Set the upper and lower boundaries of the data range corresponding to each data segment.
  • the data cleaning device separately removes body temperature data larger than the upper boundary and body temperature data smaller than the lower boundary in each data segment.
  • the data cleaning device identifies the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold.
  • the data cleaning device adopts a data interpolation manner to uniformly process the continuous data segment in time.
  • FIG. 5 In the application scenario of collecting human body temperature, another data cleaning method is shown in FIG. 5, including:
  • the wearable device continuously collects human body temperature, and stores the collected body temperature data into a database.
  • the wearable device can add a time stamp to the collected body temperature data. Or, in the process of storing to the database, time stamp data is added.
  • the data cleaning device acquires a data sequence corresponding to the body temperature of the human body from the database, and the data sequence includes a series of body temperature data.
  • the data cleaning device sets a time window based on a time domain characteristic of the physical quantity.
  • the data cleaning device divides the data sequence into at least one data segment by using a time window.
  • the data cleaning device calculates a differential of body temperature data in each of the at least one data segment.
  • the data cleaning device respectively removes body temperature data whose absolute value in each data segment is greater than a differential threshold.
  • the data cleaning device identifies the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold.
  • the data cleaning device adopts a data interpolation manner to uniformly process the continuous data segment in time.
  • FIG. 6 In the application scenario of collecting human body temperature, another data cleaning method is shown in FIG. 6, which includes:
  • the wearable device continuously collects body temperature and stores the collected body temperature data in a database.
  • the wearable device can add a time stamp to the collected body temperature data. Or, in the process of storing to the database, time stamp data is added.
  • the data cleaning device acquires a data sequence corresponding to the body temperature of the human body from the database, and the data sequence includes a series of body temperature data.
  • the data cleaning device sets a time window based on a time domain characteristic of the physical quantity.
  • the data cleaning device divides the data sequence into at least one data segment by using a time window.
  • the data cleaning device calculates a mean and a variance of body temperature data in each of the at least one data segment.
  • the data cleaning device sets upper and lower boundaries of the data range corresponding to each data segment according to the mean and variance of the body temperature data in each data segment.
  • the data cleaning device separately removes body temperature data larger than the upper boundary and body temperature data smaller than the lower boundary in each data segment.
  • the data cleaning device calculates a differential of body temperature data in each of the at least one data segment.
  • the data cleaning device respectively removes body temperature data whose absolute value in each data segment is greater than a differential threshold.
  • the data cleaning device identifies the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold.
  • the data cleaning device adopts a data interpolation manner to uniformly process the continuous data segment in time.
  • steps 605-607 and steps 608-609 are not limited to the order described in the embodiment, and the operations described in steps 608-609 may be performed first, and the operations described in steps 605-607 may be performed. Wherein, the operations described in steps 605-607 are performed first, and the operations described in steps 608-609 are performed as a preferred embodiment.
  • the data sequence corresponding to the human body temperature first removes the abnormal data based on the time domain characteristics of the human body temperature, and identifies and removes the abnormal data.
  • a contiguous segment of data in the subsequent data sequence The continuous data segment is homogenized in time to obtain reliable and accurate body temperature data, which provides a good basic condition for subsequent analysis based on body temperature data, which is beneficial to improve the accuracy of subsequent analysis results.
  • the execution bodies of the steps of the method provided by the foregoing embodiments may all be the same device, or the method may also be performed by different devices.
  • the execution body of steps 201 to 203 may be device A; for example, the execution body of steps 201 and 202 may be device A, the execution body of step 203 may be device B, and the like.
  • FIG. 7 is a schematic structural diagram of a data cleaning apparatus according to another embodiment of the present application. As shown in FIG. 7, the apparatus includes an acquisition unit 71, a removal unit 72, and a uniform processing unit 73.
  • the obtaining unit 71 is configured to acquire a data sequence of a physical quantity of time domain samples.
  • the removing unit 72 is configured to remove the abnormal data in the data sequence based on the time domain characteristic of the physical quantity.
  • the uniform processing unit 73 is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
  • an implementation structure of the removing unit 72 includes:
  • the subunit is removed for respectively removing abnormal data in the at least one data segment.
  • the removing subunit is specifically configured to: remove data exceeding the data range in the first data segment; and/or remove data in the first data segment whose volatility is greater than a fluctuation threshold.
  • the removing subunit is specifically configured to: calculate a mean value and a variance of the data in the first data segment when removing data exceeding the data range in the first data segment; according to the mean value and the variance, Setting an upper boundary and a lower boundary of the data range; removing data larger than the upper boundary and data smaller than the lower boundary in the first data segment.
  • the removing subunit is specifically configured to: calculate a differential of the data in the first data segment when removing the data whose volatility is greater than the fluctuation threshold in the first data segment; The data in which the absolute value of the differential in the first data segment is greater than the differential threshold.
  • an implementation structure of the uniform processing unit includes:
  • a identifying subunit configured to identify the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold
  • the interpolation subunit is configured to uniformly process the continuous data segment in time by using a data interpolation method.
  • the identifying subunit is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; and identify the data sequence after the abnormal data is removed, A data segment in which the data break point is broken as the continuous data segment.
  • the interpolation subunit is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and interpolate data corresponding to the at least one homogenization time point by using data in the continuous data segment .
  • the data cleaning device provided in this embodiment may be used to perform the process provided by the foregoing method embodiments, and details are not described herein again.
  • the data cleaning device provided in this embodiment combines the data sampling scenario, and considers the physical and mathematical characteristics of the data itself, and the time-domain sampling data sequence of the physical quantity, the abnormal data is removed based on the time domain characteristics of the physical quantity, and in time, Uniformly processing the continuous data segments in the data sequence after the abnormal data is removed, thereby purifying the data sequence of the time domain sampling, and finally obtaining reliable and accurate sampling data, thereby improving the accuracy of correlation analysis based on the sampled data.
  • the data cleaning device can be implemented as an electronic device, including: a memory 81, a processor 82, and a communication interface 83.
  • the memory 81 is configured to store a computer program.
  • the memory 81 can also be configured to store other various data to support operation on the electronic device. Examples of such data include instructions for any application or method operating on an electronic device, contact data, phone book data, messages, pictures, videos, and the like.
  • Memory 81 can be any type of volatile or non-volatile storage device or combination thereof Implementations such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory magnetic memory
  • flash memory disk or optical disk.
  • the communication interface 83 is configured to implement communication between the electronic device and other devices, such as wired or wireless communication.
  • the electronic device can access a wireless network based on a communication standard such as WiFi, 2G or 3G, or a combination thereof.
  • the communication interface 83 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • communication interface 83 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • a processor 82 coupled to the memory 81 and the communication interface 83, is configured to execute a computer program in the memory 81 for:
  • the processor 82 is configured to: when the abnormal data is removed, to set a time window based on the time domain characteristic of the physical quantity; and use the time window to divide the data sequence into at least one data a segment; respectively removing abnormal data in the at least one data segment.
  • the processor 82 when the processor 82 removes the abnormal data in the first data segment of the at least one data segment, the processor 82 is specifically configured to: remove data in the first data segment that is out of the data range; and/or And removing data in the first data segment whose volatility is greater than a fluctuation threshold.
  • the processor 82 when the processor 82 removes the data beyond the data range, the processor 82 is specifically configured to: calculate a mean and a variance of the data in the first data segment; and set the data according to the average and the variance. An upper boundary and a lower boundary of the range; data larger than the upper boundary and smaller than the lower boundary in the first data segment are removed.
  • the method is: calculating a differential of the data in the first data segment; and removing data in which the absolute value of the differential in the first data segment is greater than a differential threshold.
  • the method when the processor 82 uniformly processes the consecutive data segments, the method is specifically configured to: identify, according to the set time interval threshold, the continuous data from the data sequence after the abnormal data is removed. Segment; using data interpolation, uniformly processing the continuous data segments in time.
  • the method when the processor 82 identifies the continuous data segment, the method is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; In the data sequence after the abnormal data is removed, the data segment broken by the data interruption point is used as the continuous data segment.
  • the method when the processor 82 uniformly processes the consecutive data segments, the method is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and utilize data in the continuous data segment, Interpolating data corresponding to the at least one homogenization time point.
  • the electronic device further includes: a display 84, a power supply component 85, an audio component 86, and the like. Only some of the components are schematically illustrated in FIG. 8, and it is not meant that the client device includes only the components shown in FIG.
  • Display 84 includes a screen whose screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • a power supply assembly 85 provides power to various components of the electronic device.
  • Power component 85 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for client devices.
  • the audio component 86 is configured to output and/or input an audio signal.
  • the audio component 86 includes a microphone (MIC) that is configured to receive an external audio signal when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in the memory 81 or transmitted via the communication interface 83.
  • the audio Component 86 also includes a speaker for outputting an audio signal.
  • the embodiment of the present application further provides a computer storage medium suitable for a computer program, where the computer storage medium stores the following program instructions:
  • a second program instruction configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity
  • the third program instruction is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
  • the process provided by the foregoing method embodiment can be implemented, and the data sequence of the time domain sampling is cleaned, and reliable and accurate data is obtained, which provides a good basic condition for subsequent analysis based on the sampled data. Improve the accuracy of subsequent analysis results.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the computer readable memory is stored in the computer readable memory.
  • the instructions in the production result include an article of manufacture of the instruction device that implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory, such as read only memory (ROM) or flash memory (flashRAM), in a computer readable medium.
  • RAM random access memory
  • ROM read only memory
  • flashRAM flash memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

A data cleaning method and device. The data cleaning method comprises: obtaining a time-domain sampled data sequence of a physical quantity (201); removing abnormal data in the data sequence according to the time domain characteristic of the physical quantity (202); and in time, uniformly processing continuous data segments in the data sequence, the abnormal data of which is removed(203). By means of the method, reliable and accurate time-domain sampled data can be obtained, and the accuracy of related analysis based on the obtained sampled data is improved.

Description

数据清洗方法及装置Data cleaning method and device 技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种数据清洗方法及装置。The present application relates to the field of data processing technologies, and in particular, to a data cleaning method and apparatus.
背景技术Background technique
可穿戴设备是可直接穿戴在用户身上,或是可整合到用户的衣服或配件中的一种便携式设备。可穿戴设备可以不间断的采集用户的相关数据,并将采集的数据上传至服务端,实现数据交互。A wearable device is a portable device that can be worn directly on the user or integrated into the user's clothing or accessories. The wearable device can continuously collect relevant data of the user and upload the collected data to the server to implement data interaction.
基于可穿戴设备采集的大量数据,可以分析用户的健康指数、行为习惯、生活偏好等。在使用可穿戴设备采集到的数据之前,可对数据进行降噪处理,以去除数据中的噪声,提高数据的可靠性,进而提高分析结果的准确度。Based on the large amount of data collected by the wearable device, the user's health index, behavioral habits, life preferences, and the like can be analyzed. Before using the data collected by the wearable device, the data can be denoised to remove noise in the data, improve the reliability of the data, and improve the accuracy of the analysis results.
发明内容Summary of the invention
可穿戴设备采集人体的很多数据受限于人体复杂的场景以及设备本身的问题或者与移动设备的连接错误,会出现一些错误数据,比如设备脱落造成数据错误,或者设备通讯造成数据丢失。错误的数据需要及时剔除,以免进一步分析时影响分析计算。发明人想到在使用可穿戴设备采集的数据之前,需要对数据进行全面清洗,以获得干净、可靠的数据,提高基于所述数据进行相关分析的准确性。A lot of data collected by the wearable device from the human body is limited by the complex scene of the human body and the problem of the device itself or the connection with the mobile device. Some wrong data may occur, such as data loss caused by device falling off, or data loss caused by device communication. Wrong data needs to be eliminated in time to avoid further analysis affecting analytical calculations. The inventors thought that before using the data collected by the wearable device, the data needs to be thoroughly cleaned to obtain clean and reliable data, and the accuracy of the correlation analysis based on the data is improved.
本申请实施例提供一种数据清洗方法,包括:The embodiment of the present application provides a data cleaning method, including:
获取一物理量的时域采样的数据序列;Obtaining a data sequence of a physical quantity time domain sample;
基于所述物理量的时域特性,去除所述数据序列中的异常数据;Removing abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。 In time, the continuous data segments in the data sequence after the abnormal data are removed are uniformly processed.
在一可选实施方式中,所述异常数据的去除步骤,包括:基于所述物理量的时域特性,设置时间窗口;利用所述时间窗口,将所述数据序列划分为至少一个数据片段;分别去除所述至少一个数据片段中的异常数据。In an optional implementation manner, the step of removing the abnormal data includes: setting a time window based on a time domain characteristic of the physical quantity; and using the time window, dividing the data sequence into at least one data segment; respectively The abnormal data in the at least one data segment is removed.
在一可选实施方式中,对所述至少一个数据片段中的第一数据片段,所述异常数据的去除步骤,包括:去除所述第一数据片段中超出数据范围的数据;和/或去除所述第一数据片段中波动率大于波动阈值的数据。In an optional implementation manner, the step of removing the abnormal data in the first data segment of the at least one data segment includes: removing data exceeding the data range in the first data segment; and/or removing The data in the first data segment whose volatility is greater than the fluctuation threshold.
在一可选实施方式中,所述超出数据范围的数据的去除步骤,包括:计算所述第一数据片段中的数据的均值和方差;根据所述均值和方差,设置所述数据范围的上边界和下边界;去除所述第一数据片段中大于所述上边界的数据以及小于所述下边界的数据。In an optional implementation, the step of removing the data beyond the data range includes: calculating a mean and a variance of the data in the first data segment; setting the data range according to the mean and the variance a boundary and a lower boundary; removing data in the first data segment that is larger than the upper boundary and data smaller than the lower boundary.
在一可选实施方式中,所述波动率大于波动阈值的数据的去除步骤,包括:计算所述第一数据片段中的数据的微分;去除所述第一数据片段中微分的绝对值大于微分阈值的数据。In an optional implementation manner, the step of removing the data whose volatility is greater than the fluctuation threshold includes: calculating a differential of the data in the first data segment; and removing an absolute value of the differential in the first data segment to be greater than a differential Threshold data.
在一可选实施方式中,所述连续数据段的均匀处理步骤,包括:根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;采用数据插值方式,在时间上,均匀处理所述连续数据段。In an optional implementation manner, the step of uniformly processing the continuous data segment includes: identifying, according to the set time interval threshold, the continuous data segment from the data sequence after the abnormal data is removed; using data interpolation In a way, the continuous data segments are processed uniformly in time.
在一可选实施方式中,所述连续数据段的识别步骤,包括:根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。In an optional implementation, the step of identifying the continuous data segment includes: determining, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; and identifying the data after removing the abnormal data A data segment in the data sequence that is disconnected by the data interruption point as the continuous data segment.
在一可选实施方式中,所述连续数据段的均匀处理步骤,包括:确定所述连续数据段对应的至少一个均匀化时间点;利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。In an optional implementation, the step of uniformly processing the consecutive data segments includes: determining at least one uniformization time point corresponding to the continuous data segment; and interpolating the at least the data in the continuous data segment A data that equalizes the time point.
本申请实施例还提供一种数据清洗装置,包括:The embodiment of the present application further provides a data cleaning device, including:
获取单元,用于获取一物理量的时域采样的数据序列;An acquiring unit, configured to acquire a data sequence of a physical quantity time domain sample;
去除单元,用于基于所述物理量的时域特性,去除所述数据序列中的异常数据; a removing unit, configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
均匀处理单元,用于在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。A uniform processing unit for uniformly processing contiguous data segments in the data sequence after the abnormal data is removed in time.
在一可选实施方式中,所述去除单元包括:设置子单元,用于基于所述物理量的时域特性,设置时间窗口;划分子单元,用于利用所述时间窗口,将所述数据序列划分为至少一个数据片段;去除子单元,用于分别去除所述至少一个数据片段中的异常数据。In an optional implementation, the removing unit includes: a setting subunit, configured to set a time window based on a time domain characteristic of the physical quantity; and a dividing subunit, configured to use the time window to use the data sequence Dividing into at least one data segment; removing subunits for respectively removing abnormal data in the at least one data segment.
在一可选实施方式中,所述去除子单元具体用于:去除所述第一数据片段中超出数据范围的数据;和/或去除所述第一数据片段中波动率大于波动阈值的数据。In an optional implementation, the removing subunit is specifically configured to: remove data exceeding the data range in the first data segment; and/or remove data in the first data segment whose volatility is greater than a fluctuation threshold.
在一可选实施方式中,所述去除子单元具体用于:计算所述第一数据片段中的数据的均值和方差;根据所述均值和方差,设置所述数据范围的上边界和下边界;去除所述第一数据片段中大于所述上边界的数据以及小于所述下边界的数据。In an optional implementation, the removing subunit is specifically configured to: calculate a mean and a variance of data in the first data segment; and set upper and lower boundaries of the data range according to the mean and variance And removing data larger than the upper boundary and data smaller than the lower boundary in the first data segment.
在一可选实施方式中,所述去除子单元具体用于:计算所述第一数据片段中的数据的微分;去除所述第一数据片段中微分熵的绝对值大于微分阈值的数据。In an optional implementation, the removing subunit is specifically configured to: calculate a differential of data in the first data segment; and remove data in which the absolute value of the differential entropy in the first data segment is greater than a differential threshold.
在一可选实施方式中,所述均匀处理单元包括:识别子单元,用于根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;插值子单元,用于采用数据插值方式,在时间上,均匀处理所述连续数据段。In an optional implementation, the uniform processing unit includes: an identifying subunit, configured to identify the continuous data segment from the data sequence after the abnormal data is removed according to a set time interval threshold; The unit is configured to uniformly process the continuous data segment in time by using data interpolation.
在一可选实施方式中,所述识别子单元具体用于:根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。In an optional implementation, the identifying subunit is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; and identify the data sequence after the abnormal data is removed A piece of data that is disconnected by the data interruption point as the continuous data segment.
在一可选实施方式中,所述插值子单元具体用于:确定所述连续数据段对应的至少一个均匀化时间点;利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。 In an optional implementation, the interpolation subunit is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and interpolate the at least one homogenization by using data in the continuous data segment The data corresponding to the time point.
本申请实施例还提供一种计算机存储介质,存储有以下程序指令:The embodiment of the present application further provides a computer storage medium, which stores the following program instructions:
第一程序指令,用于获取一物理量的时域采样的数据序列;a first program instruction for acquiring a data sequence of a physical quantity time domain sample;
第二程序指令,用于基于所述物理量的时域特性,去除所述数据序列中的异常数据;a second program instruction, configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
第三程序指令,用于在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。The third program instruction is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
本申请实施例还提供一种电子设备,包括:The embodiment of the present application further provides an electronic device, including:
存储器,被配置为存储计算机程序;a memory configured to store a computer program;
通信接口,被配置为实现所述电子设备与其它设备之间的通信;a communication interface configured to implement communication between the electronic device and other devices;
处理器,耦合至所述存储器和所述通信接口,被配置为执行所述计算机程序,以用于:A processor coupled to the memory and the communication interface is configured to execute the computer program for:
通过所述通信接口,获取一物理量的时域采样的数据序列;Obtaining, by the communication interface, a data sequence of a physical quantity time domain sampling;
基于所述物理量的时域特性,去除所述数据序列中的异常数据;Removing abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。In time, the continuous data segments in the data sequence after the abnormal data are removed are uniformly processed.
在一可选实施方式中,所述处理器在去除所述异常数据时,具体用于:基于所述物理量的时域特性,设置时间窗口;利用所述时间窗口,将所述数据序列划分为至少一个数据片段;分别去除所述至少一个数据片段中的异常数据。In an optional implementation, when the processor removes the abnormal data, the method is specifically configured to: set a time window based on a time domain characteristic of the physical quantity; and use the time window to divide the data sequence into At least one data segment; respectively removing abnormal data in the at least one data segment.
在一可选实施方式中,所述处理器在去除所述异常数据时,具体用于:去除所述第一数据片段中超出数据范围的数据;和/或去除所述第一数据片段中波动率大于波动阈值的数据。In an optional implementation, when the processor removes the abnormal data, the method is specifically configured to: remove data exceeding the data range in the first data segment; and/or remove fluctuations in the first data segment Data with a rate greater than the fluctuation threshold.
在一可选实施方式中,所述处理器在去除所述超出数据范围的数据时,具体用于:计算所述第一数据片段中的数据的均值和方差;In an optional implementation, when the processor removes the data that is out of the data range, the processor is specifically configured to: calculate a mean and a variance of the data in the first data segment;
根据所述均值和方差,设置所述数据范围的上边界和下边界;Setting upper and lower boundaries of the data range according to the mean and variance;
去除所述第一数据片段中大于所述上边界的数据以及小于所述下边界的数据。Data in the first data segment that is larger than the upper boundary and data that is smaller than the lower boundary are removed.
在一可选实施方式中,所述处理器在去除所述波动率大于波动阈值的数 据时,具体用于:计算所述第一数据片段中的数据的微分;去除所述第一数据片段中微分熵的绝对值大于微分阈值的数据。In an optional implementation manner, the processor removes the number of the volatility greater than a fluctuation threshold According to the time, it is specifically used to: calculate a differential of the data in the first data segment; and remove data in which the absolute value of the differential entropy in the first data segment is greater than a differential threshold.
在一可选实施方式中,所述处理器在均匀处理所述连续数据段时,具体用于:根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;采用数据插值方式,在时间上,均匀处理所述连续数据段。In an optional implementation manner, when the processor uniformly processes the consecutive data segments, the method is specifically configured to: identify the continuous data from the data sequence after the abnormal data is removed according to a set time interval threshold Data segment; using data interpolation, uniformly processing the continuous data segment in time.
在一可选实施方式中,所述处理器在识别所述连续数据段时,具体用于:根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。In an optional implementation, when the processor identifies the continuous data segment, the method is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; In the data sequence after the abnormal data is removed, the data segment disconnected by the data interruption point is used as the continuous data segment.
在一可选实施方式中,所述处理器在均匀处理所述连续数据段时,具体用于:确定所述连续数据段对应的至少一个均匀化时间点;利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。In an optional implementation, when the processor uniformly processes the consecutive data segments, the method is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and utilize data in the continuous data segment And interpolating data corresponding to the at least one homogenization time point.
在本申请实施例中,结合数据采样场景,从数据本身具备的物理和数学特性考虑,对物理量的时域采样的数据序列,基于物理量的时域特性去除异常数据,并在时间上,均匀处理去除异常数据后的数据序列中的连续数据段,实现对时域采样的数据序列的清洗,最终获得可靠、准确的采样数据,进而提高基于采样数据进行相关分析的准确性。In the embodiment of the present application, in combination with the data sampling scenario, considering the physical and mathematical characteristics of the data itself, the data sequence of the physical quantity time domain sampling, the abnormal data is removed based on the time domain characteristics of the physical quantity, and uniformly processed in time. The continuous data segment in the data sequence after the abnormal data is removed, the data sequence of the time domain sampling is cleaned, and finally the reliable and accurate sampling data is obtained, thereby improving the accuracy of the correlation analysis based on the sampled data.
附图说明DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:
图1为本申请一实施例提供的数据清洗系统的结构示意图;1 is a schematic structural diagram of a data cleaning system according to an embodiment of the present application;
图2为本申请另一实施例提供的数据清洗方法的流程示意图;2 is a schematic flowchart of a data cleaning method according to another embodiment of the present application;
图3为本申请又一实施例提供的数据清洗方法的流程示意图; 3 is a schematic flowchart of a data cleaning method according to another embodiment of the present application;
图4为本申请又一实施例提供的数据清洗方法的流程示意图;4 is a schematic flowchart of a data cleaning method according to another embodiment of the present application;
图5为本申请又一实施例提供的数据清洗方法的流程示意图;FIG. 5 is a schematic flowchart diagram of a data cleaning method according to another embodiment of the present application;
图6为本申请又一实施例提供的数据清洗方法的流程示意图;FIG. 6 is a schematic flowchart diagram of a data cleaning method according to another embodiment of the present disclosure;
图7为本申请又一实施例提供的数据清洗装置的结构示意图;FIG. 7 is a schematic structural diagram of a data cleaning apparatus according to another embodiment of the present disclosure;
图8为本申请又一实施例提供的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device according to another embodiment of the present disclosure.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
本申请实施例提供的数据清洗方法可基于图1所示的数据清洗系统实现,但不限于此。如图1所示,所述数据清洗系统包括:数据采集设备10和数据清洗设备20;且数据采集设备10与数据清洗设备20通信连接。The data cleaning method provided by the embodiment of the present application can be implemented based on the data cleaning system shown in FIG. 1, but is not limited thereto. As shown in FIG. 1, the data cleaning system includes: a data collection device 10 and a data cleaning device 20; and the data collection device 10 is communicatively coupled to the data cleaning device 20.
数据采集设备10,用于对一物理量进行时域采样,以获得所述物理量的采样数据。The data collection device 10 is configured to perform time domain sampling on a physical quantity to obtain sampling data of the physical quantity.
可选的,数据采集设备10可以将物理量的采样数据直接上报给数据清洗设备20,以供数据清洗设备20获得所述物理量的时域采样的数据序列。或者,Optionally, the data collection device 10 can directly report the physical quantity of the sampling data to the data cleaning device 20, so that the data cleaning device 20 obtains the data sequence of the physical quantity time domain sampling. or,
可选的,数据采集设备10可以将物理量的采样数据存储至一数据库中,以供数据清洗设备20从该数据库中获取所述物理量的时域采样的数据序列。Optionally, the data collection device 10 may store the sampled data of the physical quantity into a database for the data cleaning device 20 to acquire the data sequence of the time-domain samples of the physical quantity from the database.
对数据清洗设备20来说,主要用于获取物理量的时域采样的数据序列,对所述数据序列进行清洗,以获得可靠、准确的数据,为后续应用或分析提供基础数据。For the data cleaning device 20, it is mainly used to acquire a data sequence of physical quantity time domain samples, and the data sequence is cleaned to obtain reliable and accurate data, and provide basic data for subsequent application or analysis.
其中,数据采集设备10与数据清洗设备20之间可以是无线或有线网络连接。在本实施例中,若数据采集设备10通过移动网络与数据清洗设备20 通信连接,该移动网络的网络制式可以为2G(GSM)、2.5G(GPRS)、3G(WCDMA、TD-SCDMA、CDMA2000、UTMS)、4G(LTE)、4G+(LTE+)、WiMax等中的任意一种。除此之外,数据采集设备10还可以通过蓝牙、Wi-Fi、红外等无线通信方式与数据清洗设备20连接。The data collection device 10 and the data cleaning device 20 may be connected by wireless or wired network. In this embodiment, if the data collection device 10 passes the mobile network and the data cleaning device 20 Communication connection, the network standard of the mobile network may be any of 2G (GSM), 2.5G (GPRS), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G+ (LTE+), WiMax, etc. One. In addition, the data collection device 10 can also be connected to the data cleaning device 20 by wireless communication methods such as Bluetooth, Wi-Fi, and infrared.
本实施例中的物理量可以是任意支持时域采集的物理量,例如可以是温度或湿度。The physical quantity in this embodiment may be any physical quantity that supports time domain acquisition, and may be, for example, temperature or humidity.
与上述物理量相适应,本实施例中的数据采集设备10可以是任何能够对物理量进行时域采集的设备,例如可以是各种传感器。以物理量为温度,尤其是人体体温为例,所述数据采集设备可以是带有温度传感器的可穿戴设备。The data collection device 10 in this embodiment may be any device capable of performing time domain acquisition on physical quantities, for example, various sensors. Taking the physical quantity as the temperature, especially the body temperature of the human body as an example, the data acquisition device may be a wearable device with a temperature sensor.
本实施例中的数据清洗设备20可以是服务器、计算机、平板电脑、智能终端等任何具有数据存储和数据处理功能的设备。The data cleaning device 20 in this embodiment may be any device having data storage and data processing functions such as a server, a computer, a tablet computer, a smart terminal, and the like.
结合图1所示的数据清洗系统,以下实施例从数据清洗设备20的角度,详细说明本申请实施例提供的数据清洗方法的流程。The following is a detailed description of the flow of the data cleaning method provided by the embodiment of the present application from the perspective of the data cleaning device 20 in conjunction with the data cleaning system shown in FIG.
图2为本申请另一实施例提供的数据清洗方法的流程示意图。如图2所示,所述方法包括:FIG. 2 is a schematic flowchart diagram of a data cleaning method according to another embodiment of the present application. As shown in FIG. 2, the method includes:
201、获取一物理量的时域采样的数据序列。201. Obtain a data sequence of a physical quantity time domain sample.
202、基于物理量的时域特性,去除所述数据序列中的异常数据。202. Remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity.
203、在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。203. Process the continuous data segments in the data sequence after the abnormal data is removed in time.
在步骤201中,数据清洗设备20获取一物理量的时域采样的数据序列。例如,数据清洗设备20可以获取数据采集设备10对所述物理量进行时域采样形成的数据序列。所述数据序列包括所述物理量在不同时间点上的采样数据以及对应的时间戳。In step 201, the data cleaning device 20 acquires a data sequence of a physical quantity of time domain samples. For example, the data cleaning device 20 can acquire a data sequence formed by the data acquisition device 10 performing time domain sampling on the physical quantity. The data sequence includes sampled data of the physical quantities at different points in time and corresponding timestamps.
在一种应用场景,数据采集设备10对所述物理量进行时域采样,对每个采样数据,数据采集设备10自动为其添加时间戳。可选的,数据采集设备10向数据清洗设备20上报带有时间戳的采样数据,以供数据清洗设备20获得所述物理量的时域采样的数据序列。或者,数据采集设备10将带有时间戳的采样数据存储至一数据库,以供数据清洗设备20从数据库中获取所述物理量 的时域采样的数据序列。In an application scenario, the data collection device 10 performs time domain sampling on the physical quantity. For each sampled data, the data collection device 10 automatically adds a time stamp to it. Optionally, the data collection device 10 reports the time-stamped sample data to the data cleaning device 20 for the data cleaning device 20 to obtain the time-domain sampled data sequence of the physical quantity. Alternatively, the data collection device 10 stores the time-stamped sample data to a database for the data cleaning device 20 to obtain the physical quantity from the database. Time-domain sampled data sequence.
在另一应用场景中,数据采集设备10在指定时间点开始,以均匀时间间隔对所述物理量进行采样,采样数据不带有时间戳。可选的,数据采集设备10向数据清洗设备20上报不带时间戳的采样数据,数据清洗设备20为采样数据补上时间戳,以获得所述物理量的时域采样的数据序列。或者,数据采集设备10将不带时间戳的采样数据存储至一数据库,在存储过程中,为采样数据补上时间戳,以供数据清洗设备20从数据库中获取所述物理量的时域采样的数据序列。In another application scenario, the data collection device 10 begins at a specified point in time, sampling the physical quantities at even time intervals, with no time stamps for the sampled data. Optionally, the data collection device 10 reports the sample data without time stamp to the data cleaning device 20, and the data cleaning device 20 adds a time stamp to the sample data to obtain a data sequence of the time domain sample of the physical quantity. Alternatively, the data collection device 10 stores the sample data without time stamps in a database. In the storage process, the sample data is time stamped for the data cleaning device 20 to acquire the time domain samples of the physical quantity from the database. Data sequence.
在步骤202中,本申请发明人从物理量本身具有的物理和数学特性考虑,发现物理量一般具有一定的时域特性。例如,有些物理量在一定时间范围内是连续的,且其变化是平缓的,不会突然跳变或者迅速变化。以人体体温为例,正常情况下,人体体温是连续的,不会突然跳变;如果实际采集中出现突然跳变的体温数据,应该属于采集过程中的异常情况,并不是采样对象的体温真的跳变了。当然,人体体温的变化也是比较平缓的,一般人体体温的变化不会超过0.05度每秒;如果实际采集到体温数据的变化超过这个范围,应该属于采集过程中的异常情况,并不是测量对象的体温真的按这个速度变化了。In step 202, the inventors of the present application have found that the physical quantity generally has a certain time domain characteristic from the physical and mathematical characteristics of the physical quantity itself. For example, some physical quantities are continuous over a certain time frame, and the changes are gentle and do not suddenly jump or change rapidly. Taking human body temperature as an example, under normal circumstances, the body temperature of the human body is continuous and does not suddenly jump; if the body temperature data of sudden jump in the actual collection is abnormal, it should belong to the abnormal situation in the collection process, not the body temperature of the sampling object. The jump has changed. Of course, the body temperature change is relatively flat. Generally, the body temperature change will not exceed 0.05 degrees per second. If the actual body temperature data changes exceed this range, it should be an abnormal situation in the collection process, not the measurement object. Body temperature really changed at this rate.
以上列举的情况属于采样数据本身不合理,使用常规的噪声滤波方法无法识别出来。基于上述发现,数据清洗设备20基于物理量的时域特性,去除步骤201中获取的数据序列中的异常数据。其中,去除异常数据后的数据序列中包括符合物理量的时域特性的数据,这些数据是可靠、准确的。The above enumerated cases are unreasonable for the sampled data itself and cannot be identified using conventional noise filtering methods. Based on the above findings, the data cleaning device 20 removes the abnormal data in the data sequence acquired in step 201 based on the time domain characteristics of the physical quantity. The data sequence after the abnormal data is removed includes data conforming to the time domain characteristics of the physical quantity, and the data is reliable and accurate.
在步骤203中,考虑步骤202去除了数据序列中的异常数据,可能导致数据序列在时间上不再连续,不再均匀,整体上不便于使用,但数据序列中连续的数据片段还是具有一定使用价值的。对此,数据清洗设备20对去除异常数据后的数据序列中的连续数据段,在时间上进行均匀处理,以便于提供可靠、时间上连续且均匀的连续数据段,以供后续使用。所述连续数据段是指去除异常数据后的数据序列中,所有相邻数据对应的时间间隔均小于预设 的时间间隔阈值的数据片段。In step 203, considering step 202 to remove the abnormal data in the data sequence, the data sequence may not be continuous in time, no longer uniform, and overall is inconvenient to use, but the continuous data segments in the data sequence still have certain use. Value. In this regard, the data cleaning device 20 uniformly processes the continuous data segments in the data sequence after the removal of the abnormal data in time to provide a reliable, temporally continuous and uniform continuous data segment for subsequent use. The continuous data segment refers to a data sequence in which the abnormal data is removed, and the time interval corresponding to all adjacent data is less than a preset. The data segment of the time interval threshold.
其中,根据应用场景以及物理量的不同,上述时间间隔阈值的取值会有所不同。本实施例并不限定时间间隔阈值的取值,可适应性设置。The value of the time interval threshold may be different according to the application scenario and the physical quantity. This embodiment does not limit the value of the time interval threshold, and can be adaptively set.
在本实施例中,结合数据采样场景,从数据本身具备的物理和数学特性考虑,对物理量的时域采样的数据序列,基于物理量的时域特性去除异常数据,并在时间上,均匀处理去除异常数据后的数据序列中的连续数据段,实现对时域采样的数据序列的清洗,最终获得可靠、准确的采样数据,进而提高基于采样数据进行相关分析的准确性。In this embodiment, combined with the data sampling scenario, considering the physical and mathematical characteristics of the data itself, the data sequence of the physical quantity time domain sampling, the abnormal data is removed based on the time domain characteristics of the physical quantity, and is uniformly processed in time. The continuous data segment in the data sequence after the abnormal data realizes the cleaning of the data sequence of the time domain sampling, and finally obtains reliable and accurate sampling data, thereby improving the accuracy of correlation analysis based on the sampled data.
在上述实施例或下述实施例中,如图3所示,基于物理量的时域特性,去除数据序列中的异常数据,可以包括以下步骤:In the above embodiment or the following embodiments, as shown in FIG. 3, removing the abnormal data in the data sequence based on the time domain characteristics of the physical quantity may include the following steps:
2021、基于物理量的时域特性,设置一时间窗口。2011. Set a time window based on the time domain characteristics of the physical quantity.
2022、利用时间窗口,将数据序列划分为至少一个数据片段。2022. Using a time window, divide the data sequence into at least one data segment.
2023、分别去除至少一个数据片段中的异常数据。2023. Remove abnormal data in at least one data segment.
在步骤2021中,设定一时间窗口,该时间窗口反映物理量的时域特性,简单来说,就是物理量随时间变化的特性。In step 2021, a time window is set, which reflects the time domain characteristics of the physical quantity, which is simply a characteristic of the physical quantity changing with time.
例如,以连续采集人体体温为例,一般来说,人体体温在3分钟内的变化不会超过0.5度,根据该特性可以设置时间窗口为3分钟。这意味着,3分钟内的体温数据中,变化超过0.5度的体温数据为异常数据。For example, in order to continuously collect human body temperature, in general, the body temperature does not change more than 0.5 degrees within 3 minutes, and according to this characteristic, the time window can be set to 3 minutes. This means that in the body temperature data within 3 minutes, the body temperature data that changes by more than 0.5 degrees is abnormal data.
又例如,以连续采集人体心率为例,一般来说,人体心率在10秒钟内的变化不会超过15次,根据该特性可以设置时间窗口为10秒钟。这意味着,10秒钟内的心率数据中,变化超过15次的心率数据为异常数据。For another example, in order to continuously collect the human heart rate, in general, the human heart rate does not change more than 15 times within 10 seconds, and according to this characteristic, the time window can be set to 10 seconds. This means that in the heart rate data within 10 seconds, the heart rate data that has changed more than 15 times is abnormal data.
在步骤2022中,基于步骤2021设置的时间窗口,可以将数据序列划分为至少一个数据片段,每个数据片段的时间长度为所述时间窗口的长度。In step 2022, based on the time window set in step 2021, the data sequence can be divided into at least one data segment, the length of time of each data segment being the length of the time window.
可选的,利用时间窗口将数据序列划分为至少一个数据片段,至少一个数据片段之间不具有交叠。进一步,若数据序列中最后一个数据片段的时间长度不足时间窗口的长度,但其时间长度与时间窗口的比值大于或等于指定 比例,例如大于1/3,则保留最后的数据片段为单独数据段。反之,若数据序列中最后一个数据片段的时间长度不足时间窗口的长度,且时间长度与时间窗口的比值小于指定比例,例如小于1/3,意味着最后数据片段不足要求时间窗口1/3,则将最后的数据片段合并至最近的时间段数据。例如,数据序列对应的时间为12:00:00-13:35:00,时间窗口为30分钟,则可以将12:00:00-12:30:00内的数据划分为一数据片段,将12:30:00-13:00:00内的数据划分为一数据片段,将13:00:00-13:30:00内的数据划分为一数据片段,将最后5分钟的数据合并到13:00:00-13:30:00时间段内。Optionally, the data sequence is divided into at least one data segment by using a time window, and at least one data segment does not overlap. Further, if the length of the last data segment in the data sequence is less than the length of the time window, but the ratio of the length of time to the time window is greater than or equal to the specified The ratio, for example greater than 1/3, preserves the last data segment as a separate data segment. Conversely, if the length of the last data segment in the data sequence is less than the length of the time window, and the ratio of the length of time to the time window is less than a specified ratio, such as less than 1/3, meaning that the last data segment is less than the required time window 1/3, The last data segment is then merged into the most recent time period data. For example, if the data sequence corresponds to 12:00:00-13:35:00 and the time window is 30 minutes, the data in 12:00:00-12:30:00 can be divided into a data segment. The data in 12:30:00-13:00:00 is divided into a data segment, and the data in 13:00:00-13:30:00 is divided into a data segment, and the data of the last 5 minutes is merged into 13 :00:00-13:30:00 time period.
在步骤2023中,对步骤2022划分出的数据片段,去除其中的异常数据。In step 2023, the data segment divided by step 2022 is removed from the abnormal data.
可选的,步骤2022每划分出一个数据片段,即进入步骤2023,去除该数据片段中的异常数据,再返回步骤2022。或者,Optionally, in step 2022, each data segment is divided, that is, the process proceeds to step 2023, the abnormal data in the data segment is removed, and then the process returns to step 2022. or,
可选的,在步骤2022划分出所有数据片段后,进入步骤2023,逐一去除每个数据片段中的异常数据。Optionally, after all the data segments are divided in step 2022, the process proceeds to step 2023 to remove the abnormal data in each data segment one by one.
对上述至少一个数据片段中的第一数据片段来说,可以采用以下方式去除其中的异常数据:For the first data segment in the at least one data segment, the abnormal data may be removed in the following manner:
去除第一数据片段中超出数据范围的数据;和/或Removing data out of the data range in the first data segment; and/or
去除第一数据片段中波动率大于波动阈值的数据。Data in the first data segment whose volatility is greater than the fluctuation threshold is removed.
可选的,去除第一数据片段中超出数据范围的数的步骤,可以为:计算第一数据片段中的数据的均值和方差,分别记为μ和σ;根据所述均值和方差,设置数据范围的上边界和下边界,分别记为μ+ρσ和μ-ρσ;去除第一数据片段中大于上边界μ+ρσ的数据以及小于下边界μ-ρσ的数据,即仅保留第一数据片段中位于上边界μ+ρσ和下边界μ-ρσ之间的数据。其中,ρ是一个系数,可根据应用场景和物理量而定。Optionally, the step of removing the number of data segments beyond the data range in the first data segment may be: calculating a mean and a variance of the data in the first data segment, respectively recorded as μ and σ; setting data according to the mean and variance The upper and lower boundaries of the range are denoted as μ+ρσ and μ-ρσ, respectively; the data larger than the upper boundary μ+ρσ in the first data segment and the data smaller than the lower boundary μ-ρσ are removed, that is, only the first data segment is retained The data between the upper boundary μ+ρσ and the lower boundary μ-ρσ. Where ρ is a coefficient, which can be determined according to the application scenario and physical quantity.
可选的,去除第一数据片段中波动率大于波动阈值的数据的步骤,可以为:计算第一数据片段中的数据的微分;去除第一数据片段中微分的绝对值大于微分阈值的数据。在该可选实施方式中,数据的波动率通过微分来体现,相应的,波动率阈值通过微分阈值来体现。可以将数据序列中所有数据的微 分的绝对值与微分阈值比较,一般超过微分阈值的微分会成片出现。这些成片出现的微分的绝对值大于微分阈值的数据属于变化异常的数据,例如可能是采集刚开始阶段,或者是采集结束阶段,或者是某种原因导致采集对象丢失(例如体温测量设备脱落),这些数据一般属于异常数据。Optionally, the step of removing the data in the first data segment whose volatility is greater than the fluctuation threshold may be: calculating a differential of the data in the first data segment; and removing the data in the first data segment whose absolute value is greater than the differential threshold. In this alternative embodiment, the volatility of the data is represented by differentiation, and correspondingly, the volatility threshold is embodied by a differential threshold. Can diversify all data in the data sequence The absolute value of the score is compared to the differential threshold, and the differential that generally exceeds the differential threshold will appear in pieces. The data whose absolute value of the differential is greater than the differential threshold is the data of the abnormality of the change, for example, it may be the initial stage of the acquisition, or the end of the collection, or the acquisition object is lost for some reason (for example, the temperature measuring device falls off). These data are generally abnormal data.
上述微分的计算方法可以有多种,下面举例说明:There are many ways to calculate the above differentials. The following examples illustrate:
例如,一种微分计算公式为:dT(n)=(T(n)-T(n-1))/(t(n)-t(n-1)),dT(1)=dT(2)。在该微分计算公式中,n为正整数;dT(n)表示第n个时间点的微分;T(n)和T(n-1)分别表示第n个时间点的数据和第n-1个时间点的数据;t(n)和t(n-1)分别表示第n个时间点和第n-1个时间点。For example, a differential calculation formula is: dT(n)=(T(n)-T(n-1))/(t(n)-t(n-1)), dT(1)=dT(2 ). In the differential calculation formula, n is a positive integer; dT(n) represents the differentiation of the nth time point; T(n) and T(n-1) represent the data of the nth time point and the n-1th, respectively. Data at time points; t(n) and t(n-1) represent the nth time point and the n-1th time point, respectively.
又例如,另一种微分计算公式为:dT(n)=(T(n+1)-T(n))/(t(n+1)-t(n)),dT(end)=dT(end-1)。在该微分计算公式中,n为非负整数;dT(n)表示第n个时间点的微分;T(n)和T(n+1)分别表示第n个时间点的数据和第n+1个时间点的数据;t(n)和t(n+1)分别表示第n个时间点和第n+1个时间点;dT(end)和dT(end-1)分别表示最后一个时间点的微分和倒数第二个时间点的微分。For another example, another differential calculation formula is: dT(n)=(T(n+1)-T(n))/(t(n+1)-t(n)), dT(end)=dT (end-1). In the differential calculation formula, n is a non-negative integer; dT(n) represents the differentiation of the nth time point; T(n) and T(n+1) represent the data of the nth time point and the n+th, respectively. Data at one time point; t(n) and t(n+1) represent the nth time point and the n+1th time point, respectively; dT(end) and dT(end-1) respectively represent the last time The differentiation of the point and the differentiation of the penultimate time point.
又例如,又一种微分计算公式为:dT(n)=(T(n+1)-T(n-1))/(t(n+1)-t(n-1)),dT(1)=dT(2),T(end)=dT(end-1)。该微分计算公式属于中心微分,在该微分计算公式中,n为正整数;dT(n)表示第n个时间点的微分;T(n-1)和T(n+1)分别表示第n-1个时间点的数据和第n+1个时间点的数据;t(n-1)和t(n+1)分别表示第n-1个时间点和第n+1个时间点;dT(end)和dT(end-1)分别表示最后一个时间点的微分和倒数第二个时间点的微分。For another example, another differential calculation formula is: dT(n)=(T(n+1)-T(n-1))/(t(n+1)-t(n-1)), dT( 1) = dT(2), T(end) = dT(end-1). The differential calculation formula belongs to the central differential. In the differential calculation formula, n is a positive integer; dT(n) represents the differentiation of the nth time point; T(n-1) and T(n+1) respectively represent the nth - data of one time point and data of the n+1th time point; t(n-1) and t(n+1) respectively represent the n-1th time point and the n+1th time point; dT (end) and dT(end-1) represent the differentiation of the last time point and the differentiation of the penultimate time point, respectively.
值得说明的是,可采用与第一数据片段相同的方法,去除上述至少一个数据片段中的第二、第三等数据片段中的异常数据,但并不限于此。It should be noted that the abnormal data in the second and third data segments in the at least one data segment may be removed in the same manner as the first data segment, but is not limited thereto.
在上述实施例或下述实施例中,在时间上,均匀处理连续数据段的步骤,可以为:根据设定的时间间隔阈值,从去除异常数据后的数据序列中,识别连续数据段;采用数据插值方式,在时间上,均匀处理所述连续数据段。In the above embodiment or the following embodiments, the step of uniformly processing the continuous data segments in time may be: identifying the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold; The data interpolation method uniformly processes the continuous data segments in time.
可选的,上述识别连续数据段的步骤,可以为:根据时间间隔阈值,确 定去除异常数据后的数据序列中的数据中断点;识别去除异常数据后的数据序列中,被数据中断点断开的数据片段,作为连续数据段。具体的,可以将去除异常数据后的数据序列中相邻数据对应的时间戳的差值与时间间隔阈值进行比较,将时间戳的差值大于时间间隔阈值的相邻数据之间作为中断数据点,将这些中断数据点作为切分点,从而将数据序列切分为至少一个连续数据段。在每个连续数据段中,相邻数据对应的时间戳的差值均小于或等于时间间隔阈值。Optionally, the step of identifying the continuous data segment may be: according to the time interval threshold, The data interruption point in the data sequence after the abnormal data is removed; the data segment broken by the data interruption point in the data sequence after the abnormal data is removed is identified as a continuous data segment. Specifically, the difference between the timestamp corresponding to the adjacent data in the data sequence after the abnormal data is removed is compared with the time interval threshold, and the adjacent data with the difference of the timestamp greater than the time interval threshold is used as the interrupt data point. The interrupt data points are used as a segmentation point to divide the data sequence into at least one continuous data segment. In each successive data segment, the difference between the timestamps corresponding to the adjacent data is less than or equal to the time interval threshold.
可选的,上述连续数据段的均匀处理步骤,可以为:确定连续数据段对应的至少一个均匀化时间点;利用连续数据段中的数据,插值出至少一个均匀化时间点对应的数据。Optionally, the step of uniformly processing the continuous data segment may be: determining at least one uniformization time point corresponding to the continuous data segment; and using data in the continuous data segment, interpolating data corresponding to the at least one uniformization time point.
具体实现上,如果均匀化时间点与连续数据段中某个数据的时间戳相同,则可以直接将该数据作为均匀化时间点对应的数据;如果均匀化时间点不与连续数据段中任何数据的时间戳相同,则可以取位于该均匀化时间点前后的数据插值出均匀化时间点对应的数据。所述插值方式可以是线性插值,样条插值等。In a specific implementation, if the homogenization time point is the same as the time stamp of a certain data in the continuous data segment, the data may be directly used as the data corresponding to the homogenization time point; if the homogenization time point is not related to any data in the continuous data segment If the time stamps are the same, the data corresponding to the homogenization time points can be interpolated and the data corresponding to the homogenization time points can be taken. The interpolation method may be linear interpolation, spline interpolation, or the like.
在采集人体体温的应用场景中,一种数据清洗方法如图4所示,包括:In the application scenario of collecting human body temperature, a data cleaning method is shown in FIG. 4, including:
401、可穿戴设备连续采集人体体温,将采集到的体温数据存储至数据库。401. The wearable device continuously collects body temperature and stores the collected body temperature data into a database.
可选的,可穿戴设备可以为采集到的体温数据添加时间戳。或者,在存储至数据库的过程中,为体温数据补上时间戳。Optionally, the wearable device can add a time stamp to the collected body temperature data. Or, in the process of storing to the database, time stamp data is added.
402、数据清洗设备从数据库中获取人体体温对应的数据序列,该数据序列包括一系列体温数据。402. The data cleaning device acquires a data sequence corresponding to the body temperature of the human body from the database, and the data sequence includes a series of body temperature data.
403、数据清洗设备基于物理量的时域特性,设置一时间窗口。403. The data cleaning device sets a time window based on a time domain characteristic of the physical quantity.
404、数据清洗设备利用时间窗口,将数据序列划分为至少一个数据片段。404. The data cleaning device divides the data sequence into at least one data segment by using a time window.
405、数据清洗设备计算至少一个数据片段中每个数据片段中的体温数据的均值和方差。405. The data cleaning device calculates a mean and a variance of body temperature data in each of the at least one data segment.
406、数据清洗设备根据每个数据片段中的体温数据的均值和方差,分别 设置每个数据片段对应的数据范围的上边界和下边界。406. The data cleaning device respectively determines the mean and variance of the body temperature data in each data segment. Set the upper and lower boundaries of the data range corresponding to each data segment.
407、数据清洗设备分别去除每个数据片段中大于上边界的体温数据以及小于下边界的体温数据。407. The data cleaning device separately removes body temperature data larger than the upper boundary and body temperature data smaller than the lower boundary in each data segment.
408、数据清洗设备根据设定的时间间隔阈值,从去除异常数据后的数据序列中,识别连续数据段。408. The data cleaning device identifies the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold.
409、数据清洗设备采用数据插值方式,在时间上,均匀处理所述连续数据段。409. The data cleaning device adopts a data interpolation manner to uniformly process the continuous data segment in time.
在采集人体体温的应用场景中,另一种数据清洗方法如图5所示,包括:In the application scenario of collecting human body temperature, another data cleaning method is shown in FIG. 5, including:
501、可穿戴设备连续采集人体体温,将采集到的体温数据存储至数据库。501. The wearable device continuously collects human body temperature, and stores the collected body temperature data into a database.
可选的,可穿戴设备可以为采集到的体温数据添加时间戳。或者,在存储至数据库的过程中,为体温数据补上时间戳。Optionally, the wearable device can add a time stamp to the collected body temperature data. Or, in the process of storing to the database, time stamp data is added.
502、数据清洗设备从数据库中获取人体体温对应的数据序列,该数据序列包括一系列体温数据。502. The data cleaning device acquires a data sequence corresponding to the body temperature of the human body from the database, and the data sequence includes a series of body temperature data.
503、数据清洗设备基于物理量的时域特性,设置一时间窗口。503. The data cleaning device sets a time window based on a time domain characteristic of the physical quantity.
504、数据清洗设备利用时间窗口,将数据序列划分为至少一个数据片段。504. The data cleaning device divides the data sequence into at least one data segment by using a time window.
505、数据清洗设备计算至少一个数据片段中每个数据片段中的体温数据的微分。505. The data cleaning device calculates a differential of body temperature data in each of the at least one data segment.
506、数据清洗设备分别去除每个数据片段中微分的绝对值大于微分阈值的体温数据。506. The data cleaning device respectively removes body temperature data whose absolute value in each data segment is greater than a differential threshold.
507、数据清洗设备根据设定的时间间隔阈值,从去除异常数据后的数据序列中,识别连续数据段。507. The data cleaning device identifies the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold.
508、数据清洗设备采用数据插值方式,在时间上,均匀处理所述连续数据段。508. The data cleaning device adopts a data interpolation manner to uniformly process the continuous data segment in time.
在采集人体体温的应用场景中,又一种数据清洗方法如图6所示,包括:In the application scenario of collecting human body temperature, another data cleaning method is shown in FIG. 6, which includes:
601、可穿戴设备连续采集人体体温,将采集到的体温数据存储至数据库。 601. The wearable device continuously collects body temperature and stores the collected body temperature data in a database.
可选的,可穿戴设备可以为采集到的体温数据添加时间戳。或者,在存储至数据库的过程中,为体温数据补上时间戳。Optionally, the wearable device can add a time stamp to the collected body temperature data. Or, in the process of storing to the database, time stamp data is added.
602、数据清洗设备从数据库中获取人体体温对应的数据序列,该数据序列包括一系列体温数据。602. The data cleaning device acquires a data sequence corresponding to the body temperature of the human body from the database, and the data sequence includes a series of body temperature data.
603、数据清洗设备基于物理量的时域特性,设置一时间窗口。603. The data cleaning device sets a time window based on a time domain characteristic of the physical quantity.
604、数据清洗设备利用时间窗口,将数据序列划分为至少一个数据片段。604. The data cleaning device divides the data sequence into at least one data segment by using a time window.
605、数据清洗设备计算至少一个数据片段中每个数据片段中的体温数据的均值和方差。605. The data cleaning device calculates a mean and a variance of body temperature data in each of the at least one data segment.
606、数据清洗设备根据每个数据片段中的体温数据的均值和方差,分别设置每个数据片段对应的数据范围的上边界和下边界。606. The data cleaning device sets upper and lower boundaries of the data range corresponding to each data segment according to the mean and variance of the body temperature data in each data segment.
607、数据清洗设备分别去除每个数据片段中大于上边界的体温数据以及小于下边界的体温数据。607. The data cleaning device separately removes body temperature data larger than the upper boundary and body temperature data smaller than the lower boundary in each data segment.
608、数据清洗设备计算至少一个数据片段中每个数据片段中的体温数据的微分。608. The data cleaning device calculates a differential of body temperature data in each of the at least one data segment.
609、数据清洗设备分别去除每个数据片段中微分的绝对值大于微分阈值的体温数据。609. The data cleaning device respectively removes body temperature data whose absolute value in each data segment is greater than a differential threshold.
610、数据清洗设备根据设定的时间间隔阈值,从去除异常数据后的数据序列中,识别连续数据段。610. The data cleaning device identifies the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold.
611、数据清洗设备采用数据插值方式,在时间上,均匀处理所述连续数据段。611. The data cleaning device adopts a data interpolation manner to uniformly process the continuous data segment in time.
在此说明,上述步骤605-607与步骤608-609的执行顺序并不限于该实施例中描述的顺序,也可以先执行步骤608-609描述的操作,再执行步骤605-607描述的操作。其中,先执行步骤605-607描述的操作,再执行步骤608-609描述的操作是一种优选实施方式。It is to be noted that the order of execution of the above steps 605-607 and steps 608-609 is not limited to the order described in the embodiment, and the operations described in steps 608-609 may be performed first, and the operations described in steps 605-607 may be performed. Wherein, the operations described in steps 605-607 are performed first, and the operations described in steps 608-609 are performed as a preferred embodiment.
在上述实施例中,结合人体体温这一个特定物理量,从体温数据本身具备的物理和数学特性考虑,对人体体温对应的数据序列,首先基于人体体温的时域特性去除异常数据,识别去除异常数据后的数据序列中的连续数据段, 在时间上对连续数据段进行均匀化处理,以获得可靠、准确的体温数据,为后续基于体温数据进行各种分析提供良好的基础条件,利于提高后续分析结果的准确性。In the above embodiment, in combination with the specific physical quantity of the human body temperature, considering the physical and mathematical characteristics of the body temperature data itself, the data sequence corresponding to the human body temperature first removes the abnormal data based on the time domain characteristics of the human body temperature, and identifies and removes the abnormal data. a contiguous segment of data in the subsequent data sequence, The continuous data segment is homogenized in time to obtain reliable and accurate body temperature data, which provides a good basic condition for subsequent analysis based on body temperature data, which is beneficial to improve the accuracy of subsequent analysis results.
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤201至步骤203的执行主体可以为设备A;又比如,步骤201和202的执行主体可以为设备A,步骤203的执行主体可以为设备B;等等。It should be noted that the execution bodies of the steps of the method provided by the foregoing embodiments may all be the same device, or the method may also be performed by different devices. For example, the execution body of steps 201 to 203 may be device A; for example, the execution body of steps 201 and 202 may be device A, the execution body of step 203 may be device B, and the like.
图7为本申请又一实施例提供的数据清洗装置的结构示意图。如图7所示,该装置包括:获取单元71、去除单元72和均匀处理单元73。FIG. 7 is a schematic structural diagram of a data cleaning apparatus according to another embodiment of the present application. As shown in FIG. 7, the apparatus includes an acquisition unit 71, a removal unit 72, and a uniform processing unit 73.
获取单元71,用于获取一物理量的时域采样的数据序列。The obtaining unit 71 is configured to acquire a data sequence of a physical quantity of time domain samples.
去除单元72,用于基于所述物理量的时域特性,去除所述数据序列中的异常数据。The removing unit 72 is configured to remove the abnormal data in the data sequence based on the time domain characteristic of the physical quantity.
均匀处理单元73,用于在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。The uniform processing unit 73 is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
在一可选实施方式中,去除单元72的一种实现结构包括:In an optional implementation, an implementation structure of the removing unit 72 includes:
设置子单元,用于基于所述物理量的时域特性,设置时间窗口;Setting a subunit for setting a time window based on a time domain characteristic of the physical quantity;
划分子单元,用于利用所述时间窗口,将所述数据序列划分为至少一个数据片段;Dividing a subunit for dividing the data sequence into at least one data segment by using the time window;
去除子单元,用于分别去除所述至少一个数据片段中的异常数据。The subunit is removed for respectively removing abnormal data in the at least one data segment.
进一步可选的,去除子单元具体用于:去除所述第一数据片段中超出数据范围的数据;和/或,去除所述第一数据片段中波动率大于波动阈值的数据。Further optionally, the removing subunit is specifically configured to: remove data exceeding the data range in the first data segment; and/or remove data in the first data segment whose volatility is greater than a fluctuation threshold.
进一步可选的,去除子单元在去除所述第一数据片段中超出数据范围的数据时,具体用于:计算所述第一数据片段中的数据的均值和方差;根据所述均值和方差,设置所述数据范围的上边界和下边界;去除所述第一数据片段中大于所述上边界的数据以及小于所述下边界的数据。Further, the removing subunit is specifically configured to: calculate a mean value and a variance of the data in the first data segment when removing data exceeding the data range in the first data segment; according to the mean value and the variance, Setting an upper boundary and a lower boundary of the data range; removing data larger than the upper boundary and data smaller than the lower boundary in the first data segment.
进一步可选的,去除子单元在去除所述第一数据片段中波动率大于波动阈值的数据时,具体用于:计算所述第一数据片段中的数据的微分;去除所 述第一数据片段中微分的绝对值大于微分阈值的数据。Further optionally, the removing subunit is specifically configured to: calculate a differential of the data in the first data segment when removing the data whose volatility is greater than the fluctuation threshold in the first data segment; The data in which the absolute value of the differential in the first data segment is greater than the differential threshold.
在一可选实施方式中,均匀处理单元的一种实现结构包括:In an alternative embodiment, an implementation structure of the uniform processing unit includes:
识别子单元,用于根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;a identifying subunit, configured to identify the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold;
插值子单元,用于采用数据插值方式,在时间上,均匀处理所述连续数据段。The interpolation subunit is configured to uniformly process the continuous data segment in time by using a data interpolation method.
进一步可选的,识别子单元具体用于:根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。Further, the identifying subunit is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; and identify the data sequence after the abnormal data is removed, A data segment in which the data break point is broken as the continuous data segment.
进一步可选的,插值子单元具体用于:确定所述连续数据段对应的至少一个均匀化时间点;利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。Further optionally, the interpolation subunit is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and interpolate data corresponding to the at least one homogenization time point by using data in the continuous data segment .
本实施例提供的数据清洗装置,可用于执行上述方法实施例提供的流程,详细描述在此不再赘述。The data cleaning device provided in this embodiment may be used to perform the process provided by the foregoing method embodiments, and details are not described herein again.
本实施例提供的数据清洗装置,结合数据采样场景,从数据本身具备的物理和数学特性考虑,对物理量的时域采样的数据序列,基于物理量的时域特性去除异常数据,并在时间上,均匀处理去除异常数据后的数据序列中的连续数据段,实现对时域采样的数据序列的清洗,最终获得可靠、准确的采样数据,进而提高基于采样数据进行相关分析的准确性。The data cleaning device provided in this embodiment combines the data sampling scenario, and considers the physical and mathematical characteristics of the data itself, and the time-domain sampling data sequence of the physical quantity, the abnormal data is removed based on the time domain characteristics of the physical quantity, and in time, Uniformly processing the continuous data segments in the data sequence after the abnormal data is removed, thereby purifying the data sequence of the time domain sampling, and finally obtaining reliable and accurate sampling data, thereby improving the accuracy of correlation analysis based on the sampled data.
以上描述了数据清洗装置的内部功能和结构,如图8所示,实际中,该数据清洗装置可实现为一电子设备,包括:存储器81、处理器82和通信接口83。The internal function and structure of the data cleaning device are described above, as shown in FIG. 8. In practice, the data cleaning device can be implemented as an electronic device, including: a memory 81, a processor 82, and a communication interface 83.
存储器81,被配置为存储计算机程序。The memory 81 is configured to store a computer program.
另外,存储器81还可被配置为存储其它各种数据以支持在电子设备上的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。Additionally, the memory 81 can also be configured to store other various data to support operation on the electronic device. Examples of such data include instructions for any application or method operating on an electronic device, contact data, phone book data, messages, pictures, videos, and the like.
存储器81可以由任何类型的易失性或非易失性存储设备或者它们的组合 实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。 Memory 81 can be any type of volatile or non-volatile storage device or combination thereof Implementations such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable read only memory (EPROM), programmable read only memory (PROM), read only memory ( ROM), magnetic memory, flash memory, disk or optical disk.
通信接口83,被配置为实现电子设备与其它设备之间的通信,例如可以是有线或无线通信方式。The communication interface 83 is configured to implement communication between the electronic device and other devices, such as wired or wireless communication.
电子设备可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信接口83经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信接口83还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The electronic device can access a wireless network based on a communication standard such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication interface 83 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communication interface 83 also includes a near field communication (NFC) module to facilitate short range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
处理器82,耦合至存储器81和通信接口83,被配置为执行存储器81中的计算机程序,以用于:A processor 82, coupled to the memory 81 and the communication interface 83, is configured to execute a computer program in the memory 81 for:
通过通信接口83获取一物理量的时域采样的数据序列;Obtaining a data sequence of a physical quantity of time domain samples through the communication interface 83;
基于所述物理量的时域特性,去除所述数据序列中的异常数据;Removing abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。In time, the continuous data segments in the data sequence after the abnormal data are removed are uniformly processed.
在一可选实施方式中,处理器82在去除异常数据时,具体用于:基于所述物理量的时域特性,设置时间窗口;利用所述时间窗口,将所述数据序列划分为至少一个数据片段;分别去除所述至少一个数据片段中的异常数据。In an optional implementation manner, the processor 82 is configured to: when the abnormal data is removed, to set a time window based on the time domain characteristic of the physical quantity; and use the time window to divide the data sequence into at least one data a segment; respectively removing abnormal data in the at least one data segment.
在一可选实施方式中,处理器82在去除至少一个数据片段中的第一数据片段中的异常数据时,具体用于:去除所述第一数据片段中超出数据范围的数据;和/或,去除所述第一数据片段中波动率大于波动阈值的数据。In an optional implementation, when the processor 82 removes the abnormal data in the first data segment of the at least one data segment, the processor 82 is specifically configured to: remove data in the first data segment that is out of the data range; and/or And removing data in the first data segment whose volatility is greater than a fluctuation threshold.
在一可选实施方式中,处理器82在去除超出数据范围的数据时,具体用于:计算所述第一数据片段中的数据的均值和方差;根据所述均值和方差,设置所述数据范围的上边界和下边界;去除所述第一数据片段中大于所述上边界的数据以及小于所述下边界的数据。In an optional implementation, when the processor 82 removes the data beyond the data range, the processor 82 is specifically configured to: calculate a mean and a variance of the data in the first data segment; and set the data according to the average and the variance. An upper boundary and a lower boundary of the range; data larger than the upper boundary and smaller than the lower boundary in the first data segment are removed.
在一可选实施方式中,处理器82在去除波动率大于波动阈值的数据时, 具体用于:计算所述第一数据片段中的数据的微分;去除所述第一数据片段中微分的绝对值大于微分阈值的数据。In an optional implementation, when the processor 82 removes data whose volatility is greater than the fluctuation threshold, Specifically, the method is: calculating a differential of the data in the first data segment; and removing data in which the absolute value of the differential in the first data segment is greater than a differential threshold.
在一可选实施方式中,处理器82在均匀处理所述连续数据段时,具体用于:根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;采用数据插值方式,在时间上,均匀处理所述连续数据段。In an optional implementation manner, when the processor 82 uniformly processes the consecutive data segments, the method is specifically configured to: identify, according to the set time interval threshold, the continuous data from the data sequence after the abnormal data is removed. Segment; using data interpolation, uniformly processing the continuous data segments in time.
在一可选实施方式中,处理器82在识别所述连续数据段时,具体用于:根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。In an optional implementation, when the processor 82 identifies the continuous data segment, the method is specifically configured to: determine, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed; In the data sequence after the abnormal data is removed, the data segment broken by the data interruption point is used as the continuous data segment.
在一可选实施方式中,处理器82在均匀处理所述连续数据段时,具体用于:确定所述连续数据段对应的至少一个均匀化时间点;利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。In an optional implementation, when the processor 82 uniformly processes the consecutive data segments, the method is specifically configured to: determine at least one homogenization time point corresponding to the continuous data segment; and utilize data in the continuous data segment, Interpolating data corresponding to the at least one homogenization time point.
进一步,如图8所示,电子设备还包括:显示器84、电源组件85、音频组件86等其它组件。图8中仅示意性给出部分组件,并不意味着客户端设备只包括图8所示组件。Further, as shown in FIG. 8, the electronic device further includes: a display 84, a power supply component 85, an audio component 86, and the like. Only some of the components are schematically illustrated in FIG. 8, and it is not meant that the client device includes only the components shown in FIG.
显示器84包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。 Display 84 includes a screen whose screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
电源组件85,为电子设备的各种组件提供电力。电源组件85可以包括电源管理系统,一个或多个电源,及其他与为客户端设备生成、管理和分配电力相关联的组件。A power supply assembly 85 provides power to various components of the electronic device. Power component 85 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for client devices.
音频组件86被配置为输出和/或输入音频信号。例如,音频组件86包括一个麦克风(MIC),当电子设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器81或经由通信接口83发送。在一些实施例中,音频 组件86还包括一个扬声器,用于输出音频信号。The audio component 86 is configured to output and/or input an audio signal. For example, the audio component 86 includes a microphone (MIC) that is configured to receive an external audio signal when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in the memory 81 or transmitted via the communication interface 83. In some embodiments, the audio Component 86 also includes a speaker for outputting an audio signal.
本申请实施例还提供一种适用于计算机程序的计算机存储介质,计算机存储介质存储有以下程序指令:The embodiment of the present application further provides a computer storage medium suitable for a computer program, where the computer storage medium stores the following program instructions:
第一程序指令,用于获取一物理量的时域采样的数据序列;a first program instruction for acquiring a data sequence of a physical quantity time domain sample;
第二程序指令,用于基于所述物理量的时域特性,去除所述数据序列中的异常数据;a second program instruction, configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
第三程序指令,用于在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。The third program instruction is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
当上述程序指令被执行时,可实现上述方法实施例提供的流程,实现对时域采样的数据序列的清洗,获得可靠、准确的数据,为后续基于采样数据进行各种分析提供良好的基础条件,提高后续分析结果的准确性。When the above program instructions are executed, the process provided by the foregoing method embodiment can be implemented, and the data sequence of the time domain sampling is cleaned, and reliable and accurate data is obtained, which provides a good basic condition for subsequent analysis based on the sampled data. Improve the accuracy of subsequent analysis results.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器 中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the computer readable memory is stored in the computer readable memory. The instructions in the production result include an article of manufacture of the instruction device that implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory, such as read only memory (ROM) or flash memory (flashRAM), in a computer readable medium. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、 商品或者设备中还存在另外的相同要素。It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. In the absence of more restrictions, elements defined by the phrase "including one..." are not excluded from the process, method, There are additional identical elements in the item or device.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。 The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims (20)

  1. 一种数据清洗方法,其特征在于,包括:A data cleaning method, comprising:
    获取一物理量的时域采样的数据序列;Obtaining a data sequence of a physical quantity time domain sample;
    基于所述物理量的时域特性,去除所述数据序列中的异常数据;Removing abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
    在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。In time, the continuous data segments in the data sequence after the abnormal data are removed are uniformly processed.
  2. 根据权利要求1所述的方法,其特征在于,所述异常数据的去除步骤,包括:The method according to claim 1, wherein the step of removing the abnormal data comprises:
    基于所述物理量的时域特性,设置时间窗口;Setting a time window based on a time domain characteristic of the physical quantity;
    利用所述时间窗口,将所述数据序列划分为至少一个数据片段;Using the time window, dividing the data sequence into at least one data segment;
    分别去除所述至少一个数据片段中的异常数据。The abnormal data in the at least one data segment is separately removed.
  3. 根据权利要求2所述的方法,其特征在于,对所述至少一个数据片段中的第一数据片段,所述异常数据的去除步骤,包括:The method according to claim 2, wherein the step of removing the abnormal data for the first data segment of the at least one data segment comprises:
    去除所述第一数据片段中超出数据范围的数据;和/或Removing data out of the data range of the first data segment; and/or
    去除所述第一数据片段中波动率大于波动阈值的数据。Data in the first data segment having a volatility greater than a fluctuation threshold is removed.
  4. 根据权利要求3所述的方法,其特征在于,所述超出数据范围的数据的去除步骤,包括:The method according to claim 3, wherein the step of removing the data beyond the data range comprises:
    计算所述第一数据片段中的数据的均值和方差;Calculating a mean and a variance of data in the first data segment;
    根据所述均值和方差,设置所述数据范围的上边界和下边界;Setting upper and lower boundaries of the data range according to the mean and variance;
    去除所述第一数据片段中大于所述上边界的数据以及小于所述下边界的数据。Data in the first data segment that is larger than the upper boundary and data that is smaller than the lower boundary are removed.
  5. 根据权利要求3所述的方法,其特征在于,所述波动率大于波动阈值的数据的去除步骤,包括:The method according to claim 3, wherein the step of removing the data whose volatility is greater than the fluctuation threshold comprises:
    计算所述第一数据片段中的数据的微分;Calculating a differential of data in the first data segment;
    去除所述第一数据片段中微分的绝对值大于微分阈值的数据。Data in which the absolute value of the differential in the first data segment is greater than a differential threshold is removed.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述连续数据段的均匀处理步骤,包括: The method according to any one of claims 1 to 5, wherein the step of uniformly processing the continuous data segments comprises:
    根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;Identifying the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold;
    采用数据插值方式,在时间上,均匀处理所述连续数据段。The data segmentation method is used to uniformly process the continuous data segments in time.
  7. 根据权利要求6所述的方法,其特征在于,所述连续数据段的识别步骤,包括:The method according to claim 6, wherein the step of identifying the continuous data segment comprises:
    根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;Determining, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed;
    识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。A data segment that is disconnected by the data interruption point in the data sequence after the abnormal data is removed is identified as the continuous data segment.
  8. 根据权利要求6所述的方法,其特征在于,所述连续数据段的均匀处理步骤,包括:The method according to claim 6, wherein the step of uniformly processing the continuous data segments comprises:
    确定所述连续数据段对应的至少一个均匀化时间点;Determining at least one homogenization time point corresponding to the continuous data segment;
    利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。Using data in the continuous data segment, data corresponding to the at least one homogenization time point is interpolated.
  9. 一种数据清洗装置,其特征在于,包括:A data cleaning device, comprising:
    获取单元,用于获取一物理量的时域采样的数据序列;An acquiring unit, configured to acquire a data sequence of a physical quantity time domain sample;
    去除单元,用于基于所述物理量的时域特性,去除所述数据序列中的异常数据;a removing unit, configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
    均匀处理单元,用于在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。A uniform processing unit for uniformly processing contiguous data segments in the data sequence after the abnormal data is removed in time.
  10. 根据权利要求9所述的装置,其特征在于,所述去除单元包括:The apparatus according to claim 9, wherein said removing unit comprises:
    设置子单元,用于基于所述物理量的时域特性,设置时间窗口;Setting a subunit for setting a time window based on a time domain characteristic of the physical quantity;
    划分子单元,用于利用所述时间窗口,将所述数据序列划分为至少一个数据片段;Dividing a subunit for dividing the data sequence into at least one data segment by using the time window;
    去除子单元,用于分别去除所述至少一个数据片段中的异常数据。The subunit is removed for respectively removing abnormal data in the at least one data segment.
  11. 根据权利要求10所述的装置,其特征在于,所述去除子单元具体用于: The apparatus according to claim 10, wherein the removing subunit is specifically configured to:
    去除所述第一数据片段中超出数据范围的数据;和/或Removing data out of the data range of the first data segment; and/or
    去除所述第一数据片段中波动率大于波动阈值的数据。Data in the first data segment having a volatility greater than a fluctuation threshold is removed.
  12. 根据权利要求9-11任一项所述的装置,其特征在于,所述均匀处理单元包括:The apparatus according to any one of claims 9-11, wherein the uniform processing unit comprises:
    识别子单元,用于根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;a identifying subunit, configured to identify the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold;
    插值子单元,用于采用数据插值方式,在时间上,均匀处理所述连续数据段。The interpolation subunit is configured to uniformly process the continuous data segment in time by using a data interpolation method.
  13. 根据权利要求12所述的装置,其特征在于,所述识别子单元具体用于:The device according to claim 12, wherein the identification subunit is specifically configured to:
    根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;Determining, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed;
    识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。A data segment that is disconnected by the data interruption point in the data sequence after the abnormal data is removed is identified as the continuous data segment.
  14. 根据权利要求12所述的装置,其特征在于,所述插值子单元具体用于:The apparatus according to claim 12, wherein the interpolation subunit is specifically configured to:
    确定所述连续数据段对应的至少一个均匀化时间点;Determining at least one homogenization time point corresponding to the continuous data segment;
    利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。Using data in the continuous data segment, data corresponding to the at least one homogenization time point is interpolated.
  15. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有以下程序指令:A computer storage medium, characterized in that the computer storage medium stores the following program instructions:
    第一程序指令,用于获取一物理量的时域采样的数据序列;a first program instruction for acquiring a data sequence of a physical quantity time domain sample;
    第二程序指令,用于基于所述物理量的时域特性,去除所述数据序列中的异常数据;a second program instruction, configured to remove abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
    第三程序指令,用于在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。The third program instruction is configured to uniformly process the continuous data segments in the data sequence after the abnormal data is removed in time.
  16. 一种电子设备,其特征在于,包括: An electronic device, comprising:
    存储器,被配置为存储计算机程序;a memory configured to store a computer program;
    通信接口,被配置为实现所述电子设备与其它设备之间的通信;a communication interface configured to implement communication between the electronic device and other devices;
    处理器,耦合至所述存储器和所述通信接口,被配置为执行所述计算机程序,以用于:A processor coupled to the memory and the communication interface is configured to execute the computer program for:
    通过所述通信接口,获取一物理量的时域采样的数据序列;Obtaining, by the communication interface, a data sequence of a physical quantity time domain sampling;
    基于所述物理量的时域特性,去除所述数据序列中的异常数据;Removing abnormal data in the data sequence based on a time domain characteristic of the physical quantity;
    在时间上,均匀处理去除异常数据后的数据序列中的连续数据段。In time, the continuous data segments in the data sequence after the abnormal data are removed are uniformly processed.
  17. 根据权利要求16所述的电子设备,其特征在于,所述处理器在去除所述异常数据时,具体用于:The electronic device according to claim 16, wherein the processor is specifically configured to: when the abnormal data is removed:
    基于所述物理量的时域特性,设置时间窗口;Setting a time window based on a time domain characteristic of the physical quantity;
    利用所述时间窗口,将所述数据序列划分为至少一个数据片段;Using the time window, dividing the data sequence into at least one data segment;
    分别去除所述至少一个数据片段中的异常数据。The abnormal data in the at least one data segment is separately removed.
  18. 根据权利要求16或17所述的电子设备,其特征在于,所述处理器在均匀处理所述连续数据段时,具体用于:The electronic device according to claim 16 or 17, wherein the processor is specifically configured to: when uniformly processing the continuous data segment:
    根据设定的时间间隔阈值,从所述去除异常数据后的数据序列中,识别所述连续数据段;Identifying the continuous data segment from the data sequence after the abnormal data is removed according to the set time interval threshold;
    采用数据插值方式,在时间上,均匀处理所述连续数据段。The data segmentation method is used to uniformly process the continuous data segments in time.
  19. 根据权利要求18所述的电子设备,其特征在于,所述处理器在识别所述连续数据段时,具体用于:The electronic device according to claim 18, wherein the processor is configured to: when identifying the continuous data segment:
    根据所述时间间隔阈值,确定所述去除异常数据后的数据序列中的数据中断点;Determining, according to the time interval threshold, a data interruption point in the data sequence after the abnormal data is removed;
    识别所述去除异常数据后的数据序列中,被所述数据中断点断开的数据片段,作为所述连续数据段。A data segment that is disconnected by the data interruption point in the data sequence after the abnormal data is removed is identified as the continuous data segment.
  20. 根据权利要求18所述的电子设备,其特征在于,所述处理器在均匀处理所述连续数据段时,具体用于:The electronic device according to claim 18, wherein the processor is specifically configured to: when uniformly processing the continuous data segment:
    确定所述连续数据段对应的至少一个均匀化时间点;Determining at least one homogenization time point corresponding to the continuous data segment;
    利用所述连续数据段中的数据,插值出所述至少一个均匀化时间点对应的数据。 Using data in the continuous data segment, data corresponding to the at least one homogenization time point is interpolated.
PCT/CN2017/070190 2017-01-04 2017-01-04 Data cleaning method and device WO2018126367A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/070190 WO2018126367A1 (en) 2017-01-04 2017-01-04 Data cleaning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/070190 WO2018126367A1 (en) 2017-01-04 2017-01-04 Data cleaning method and device

Publications (1)

Publication Number Publication Date
WO2018126367A1 true WO2018126367A1 (en) 2018-07-12

Family

ID=62788934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/070190 WO2018126367A1 (en) 2017-01-04 2017-01-04 Data cleaning method and device

Country Status (1)

Country Link
WO (1) WO2018126367A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522806A (en) * 2020-04-26 2020-08-11 陈文海 Big data cleaning and processing method, device, server and readable storage medium
CN111625413A (en) * 2020-04-23 2020-09-04 平安科技(深圳)有限公司 Index abnormality analysis method, index abnormality analysis device and storage medium
US20210091866A1 (en) * 2015-07-17 2021-03-25 Feng Zhang Method, apparatus, and system for accurate wireless monitoring

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332042A (en) * 2011-09-13 2012-01-25 东南大学 A modeling method for the start-up model of a quartz flexible accelerometer
US20120084323A1 (en) * 2010-10-02 2012-04-05 Microsoft Corporation Geographic text search using image-mined data
CN102609501A (en) * 2012-02-02 2012-07-25 北京华电天仁电力控制技术有限公司 Data cleaning method based on real-time historical database
CN105719019A (en) * 2016-01-21 2016-06-29 华南理工大学 Public bicycle peak time demand prediction method considering user reservation data
CN105740627A (en) * 2016-01-29 2016-07-06 深圳市奋达科技股份有限公司 Heart rate calculating method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084323A1 (en) * 2010-10-02 2012-04-05 Microsoft Corporation Geographic text search using image-mined data
CN102332042A (en) * 2011-09-13 2012-01-25 东南大学 A modeling method for the start-up model of a quartz flexible accelerometer
CN102609501A (en) * 2012-02-02 2012-07-25 北京华电天仁电力控制技术有限公司 Data cleaning method based on real-time historical database
CN105719019A (en) * 2016-01-21 2016-06-29 华南理工大学 Public bicycle peak time demand prediction method considering user reservation data
CN105740627A (en) * 2016-01-29 2016-07-06 深圳市奋达科技股份有限公司 Heart rate calculating method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210091866A1 (en) * 2015-07-17 2021-03-25 Feng Zhang Method, apparatus, and system for accurate wireless monitoring
US11770197B2 (en) * 2015-07-17 2023-09-26 Origin Wireless, Inc. Method, apparatus, and system for accurate wireless monitoring
CN111625413A (en) * 2020-04-23 2020-09-04 平安科技(深圳)有限公司 Index abnormality analysis method, index abnormality analysis device and storage medium
CN111522806A (en) * 2020-04-26 2020-08-11 陈文海 Big data cleaning and processing method, device, server and readable storage medium
CN111522806B (en) * 2020-04-26 2023-07-07 上海聚均科技有限公司 Big data cleaning processing method, device, server and readable storage medium

Similar Documents

Publication Publication Date Title
TWI620547B (en) Information processing device, information processing method and information processing system
CN106653059B (en) Automatic identification method and system for infant crying reason
CN109800483A (en) A prediction method, apparatus, electronic device and computer-readable storage medium
KR101700656B1 (en) Method and device for acquiring user information
CN106161705B (en) Audio equipment testing method and device
WO2015196601A1 (en) Method, apparatus and device for testing response time of user interface, and storage medium
JPWO2019159252A1 (en) Stress estimation device and stress estimation method using biological signals
WO2018126367A1 (en) Data cleaning method and device
US20140194756A1 (en) Biological rhythm disturbance degree calculating device, biological rhythm disturbance degree calculating system, biological rhythm disturbance degree calculating method, program, and recording medium
US20140067838A1 (en) Analysis module, cloud analysis system and method thereof
CN104636164B (en) Start page generation method and device
WO2018126366A1 (en) Temperature measurement method and apparatus
CN111584035A (en) Menu recommendation method and device and refrigerator
CN112735563A (en) Recommendation information generation method and device and processor
CN106775403A (en) Obtain the method and device of interim card information
CN104297542A (en) Reminding method and device based on electricity consumption
CN106341712A (en) Processing method and apparatus of multimedia data
WO2016131244A1 (en) User health monitoring method, monitoring device, and monitoring terminal
CN110069468B (en) Method and device for obtaining user demands and electronic equipment
CN109870172B (en) Pedometer detection method, device, equipment and storage medium
CN105657575B (en) Video labeling method and device
CN111093481B (en) Temperature display method and device
CN105706409B (en) Method, device and system for enhancing user engagement with service
CN111414074A (en) Screen browsing data processing method, device, medium and electronic equipment
CN105551206A (en) Emotion-based prompting method and related device and prompting system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17890037

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC , EPO FORM 1205A DATED 28.10.19.

122 Ep: pct application non-entry in european phase

Ref document number: 17890037

Country of ref document: EP

Kind code of ref document: A1