Detailed Description
      The large data volume, complexity of interactions between health indicators and other factors, and limited clinical guidelines may limit the effectiveness of any monitoring system attempting to detect anomalies in continuous and/or flow sensor data through specific rules based on traditional medical practices. Embodiments described herein include apparatuses, systems, methods, and platforms that can detect anomalies in a time series based on health index data alone or in combination with other factor (as defined herein) data in an unsupervised manner using a predictive machine learning model.
      Atrial fibrillation (AF or AFib) occurs at 1-2% of the general population, and the presence of AF increases the risk of morbidity such as stroke and heart failure, as well as adverse consequences. Boriani g and Pettorelli D.,Atrial Fibrillation Burden and Atrial Fibrillation type:Clinical Significance and Impact on the Risk of Stroke and Decision Making for Long-term Anticoagulation,Vascul Pharmacol.,83:26-35(2016, 8 months), page 26. In many people (estimated to be up to 40% of AF patients), AFib may be asymptomatic, and these asymptomatic patients have similar risk conditions for stroke and heart failure as symptomatic patients. See supra. Symptomatic patients, however, may take aggressive measures (such as taking blood diluents or other medications) to reduce the risk of negative consequences. Asymptomatic AF (so-called silent AF or SAF) and the duration of the patient in AF can be detected using implantable electrical devices (CIED). As above. From this information, the time that has elapsed while these patients are under AF or AF burden can be determined. As above. AF loads greater than 5-6 minutes, particularly greater than 1 hour, are associated with a significantly increased risk of stroke and other negative health consequences. As above. Thus, the ability to measure AF load in asymptomatic patients may enable early intervention and may reduce the risk of negative health consequences associated with AF. As above. Detection of SAF is challenging and typically requires some form of continuous monitoring. Currently, continuous monitoring of AF requires bulky, sometimes invasive and expensive devices, where such monitoring requires a high level of medical professional supervision and review.
      Many devices continue to obtain data to provide a measurement or calculation of health index data, such as but not limited toSmartphones, tablet computers, etc. belong to the class of wearable devices and/or mobile devices. Other devices include permanent or semi-permanent devices (e.g., dynamic electrocardiographs) on or in the user/patient, while other devices may include larger devices that are movable on a cart within a hospital. But the measurement data is rarely processed except for regular observation of the measurement data on the display or establishment of a simple data threshold. Observations of data (and even of trained medical professionals) may often appear normal, with the major exception of situations where the user has easily identifiable acute symptoms. It is difficult and almost impossible for medical professionals to continuously monitor health indicators and observe anomalies and/or trends in data that may be indicative of more severe conditions.
      As used herein, a platform includes one or more custom software applications (or "applications") configured to interact with each other locally or over a distributed network including the cloud and the internet. An application of a platform as described herein is configured to collect and analyze user data and may include one or more software models. In some embodiments of the platform, the platform includes one or more hardware components (e.g., one or more sensing devices or microprocessors). In some embodiments, the platform is configured to operate with one or more devices and/or one or more systems. That is, in some embodiments, the devices as described herein are configured to run applications of the platform using the built-in processor, and in some embodiments, the platform is utilized by a system that includes one or more computing devices that interact with or run one or more applications of the platform.
      Systems, methods, devices, software and platforms are described for continuously monitoring user data (e.g., without limitation, PPG signals, heart rate or blood pressure, etc.) from a user device in conjunction with (temporal) corresponding data related to factors (referred to herein as "other factors") that may affect a health metric to determine whether a user has normal health by, for example, without limitation, determining or comparing with, for example, without limitation, i) a set of individuals affected by similar other factors, or ii) the user himself affected by similar other factors. In some embodiments, the measured health index data is input into a trained machine learning model alone or in combination with other factor data, wherein the machine learning model determines a probability that the user's measured health index is considered to be within a health range, and notifies the user of this if the user's measured health index is considered to be not within the health range. A user that is not within health range may increase the likelihood that the user may be experiencing a health event (such as a cardiac arrhythmia that may be symptomatic or asymptomatic) that requires high fidelity information to confirm the diagnosis. The notification may take the form of, for example, requesting the user to obtain an ECG. Other high fidelity measurements (blood pressure, pulse oximeter, etc.) may be requested, ECG being just one example. High fidelity measurements (ECG in this embodiment) may be evaluated by algorithms and/or medical professionals to make notifications or diagnoses (collectively referred to herein as "diagnosis," recognizing that only a physician can make a diagnosis). In the ECG example, the diagnosis may be AFib or any other number of well known conditions for diagnosis using ECG.
      In further embodiments, the diagnostics are used to tag a low fidelity data sequence (e.g., heart rate or PPG), which may include other factor data sequences. The low fidelity data sequence after the high fidelity diagnostic mark is used for training a high fidelity machine learning model. In these further embodiments, the training of the high fidelity machine learning model may be trained by unsupervised learning, or may be updated from time to time with new training examples. In some embodiments, the measured low-fidelity health indicator data sequence of the user, and optionally the (temporally) corresponding data sequence of other factors, is input into a trained high-fidelity machine learning model to determine the probability and/or prediction that the user is experiencing or experiencing a diagnostic condition for which the high-fidelity machine learning model was trained. Such probabilities may include probabilities of when an event starts and when an event ends. For example, some embodiments may calculate the Atrial Fibrillation (AF) load of the user, or the amount of time the user experiences AF over time. Previously, AF load could only be determined using cumbersome and expensive dynamic electrocardiography or implantable continuous ECG monitoring devices. Thus, some embodiments described herein may continuously monitor the health of a user and notify the user of a change in health by continuously monitoring health index data (e.g., without limitation, PPG data, blood pressure data, heart rate data, etc.) obtained from a device worn by the user, alone or in combination with corresponding data for other factors. As used herein, "other factors" include any factor that may affect a health indicator and/or that may affect data representing a health indicator (e.g., PPG data). These other factors may include various factors such as, but not limited to, air temperature, altitude, exercise level, weight, gender, diet, standing, sitting, falling, lying down, weather, and BMI. In some embodiments, mathematical or empirical models other than machine learning models may be used to determine when to inform a user of high fidelity measurements that may then be analyzed and used to train a high fidelity machine training model as described herein.
      Some embodiments described herein may detect anomalies in a user by receiving a primary time series of health indicator data, optionally receiving a secondary time series of one or more other factor data corresponding in time to the primary time series of health indicator data, which may be from a sensor or from an external data source (e.g., via a network connection, computer API, etc.), providing the primary time series and the secondary time series to a pre-processor, which may perform operations on the data such as filtering, caching, averaging, time alignment, buffering, up-sampling, down-sampling, etc., providing the time series of data to a machine learning model trained and/or configured to utilize values of the primary time series and the secondary time series to predict a next value of the primary time series at a future time, comparing a predicted primary time series value generated by the machine learning model at a particular time t to the primary time series at the time t, and in the event that a difference between the predicted future time series and the measured time series exceeds a threshold or criteria, alerting or prompting the user.
      Thus, some embodiments described herein detect when the behavior of the main sequence of physiological data observed with respect to the passage of time and/or in response to the ordered sequence of observed data is different from what would be expected given the training examples used to train the model. The system may be used as an anomaly detector in the case of collecting training examples from normal individuals or from data previously categorized as normal for a particular user. If the data is only acquired from a particular user without any other classification, the system may be used as a change detector to detect changes in the health indicator data being measured by the main sequence relative to the time at which the training data was captured.
      Software platforms, systems, apparatus, and methods are described herein for generating a trained machine learning model and using the model to predict or determine the probability that measured health index data (primary sequence) of a user affected by other factors (sequence) is outside of normal limits (i.e., global model) of a healthy crowd affected by similar other factors, or outside of normal limits (i.e., personalized model) of the particular user affected by similar other factors, where such notification is provided to the user. In some embodiments, the user may be prompted to obtain additional measured high fidelity data that may be used to tag previously acquired low fidelity user health metric data to generate a different trained high fidelity machine learning model that has the ability to predict or diagnose anomalies or events using only the low fidelity health metric data, where such anomalies are typically identified or diagnosed using only the high fidelity data.
      Some embodiments described herein may include inputting the user's health indicator data, and optionally inputting (in time) corresponding data of other factors into a trained machine learning model, where the trained machine learning model predicts the user's health indicator data or probability distribution of the health indicator data at future time steps. In some embodiments, the prediction is compared to the measured health indicator data of the user at the predicted time step, wherein if the absolute value of the difference exceeds a threshold, the user is notified that his or her health indicator data is outside of normal range. In some embodiments, the notification may include an indication of diagnosis or something to do, such as, but not limited to, obtaining additional measurements or contacting a health professional. In some embodiments, the machine learning model is trained using (temporal) corresponding data from health index data and other factors of the healthy population. It should be appreciated that other factors in the training examples used to train the machine learning model may not be an average of the population, rather, the data of each of the other factors corresponds in time to the set of health indicator data for the individual in the training examples.
      Some embodiments are described as receiving discrete data points over time, predicting discrete data points at a future time from the input, and then determining whether a loss between the discrete measurement input at the future time and the predicted value at the future time exceeds a threshold. Those skilled in the art will readily appreciate that the input data and output predictions may take forms other than discrete data points or scalar quantities. For example, but not limited to, a health indicator data sequence (also referred to herein as a primary sequence) and other data sequences (also referred to herein as secondary sequences) may be divided into time segments. Those skilled in the art will recognize that the manner in which data is segmented is a matter of design choice and can take many different forms.
      Some embodiments segment the health indicator data sequence (also referred to herein as a primary sequence) and other data sequences (also referred to herein as secondary sequences) into two segments, past representing all data before a particular time t, and future representing all data at or after time t. These embodiments input the health indicator data sequence of the past time segment and all other data sequences of the past time segment into a machine learning model configured to predict the most likely future segment (or a distribution of likely future segments) of the health indicator data. Optionally, the embodiments input the health indicator data sequence of the past time segment, all other data sequences of the past time segment, and other data sequences of the future segments to a machine learning model configured to predict a most likely future segment (or a distribution of likely future segments) of the health indicator data. Future segments of predicted health index data are compared to measured health index data of the user at the future segments to determine a loss and whether the loss exceeds a threshold, in which case some action is taken. The action may include, for example, but not limited to, notifying the user of additional data (e.g., ECG or blood pressure), notifying the user of contact with a health professional, or automatically triggering the acquisition of additional data. Automatic acquisition of additional data may include, for example, but is not limited to, ECG acquisition via a sensor operatively coupled (wired or wireless) to a computing device worn by the user, or blood pressure via a mobile cuff surrounding the wrist or other suitable body part of the user and coupled to the computing device worn by the user. A data segment may include a single data point, a number of data points over a period of time, an average of the data points over the period of time, where the average may include a true average, a median, or a mode. In some embodiments, the segments may overlap in time.
      These embodiments detect when the behavior or measurement of the health indicator data sequence observed with respect to the passage of time, as affected by the corresponding other factor data sequence, is different from the behavior or measurement expected from the training examples collected under similar other factors. If training examples are collected from healthy individuals under similar other factors, or from data previously categorized as healthy for a particular user under similar other factors, these embodiments are used as anomaly detectors from a healthy crowd or a particular user, respectively. If the training examples are only obtained from a particular user without any other classification, these embodiments act as a change detector for detecting changes in the health index of the particular user at the time of measurement relative to the time of collection of the training examples.
      Some embodiments described herein utilize machine learning to continuously monitor a person's health indicator under the influence of one or more other factors, and evaluate whether the person is healthy based on the crowd categorized as healthy under the influence of similar other factors. As will be readily appreciated by those skilled in the art, a variety of different machine learning algorithms or models (including, but not limited to Bayes, markov, gausian processes, clustering algorithms, generative models, kernel and neural network algorithms) may be used without departing from the scope described herein. As will be appreciated by those skilled in the art, a typical neural network employs one or more layers, such as, but not limited to, a nonlinear activation function, to predict the output of a received input, and may include one or more hidden layers in addition to the input and output layers. The output of each hidden layer of some of these networks serves as an input to the next layer in the network. Examples of neural networks include, for example, but are not limited to, a generating neural network (GENERATIVE NEUTRAL NETWORK), a convolutional neural network, and a recurrent neural network.
      Some embodiments of the health monitoring system monitor heart rate and activity data of an individual as low fidelity data (e.g., heart rate or PPG data) and detect conditions (e.g., AFib) that are typically detected using high fidelity data (e.g., ECG data). For example, the heart rate of the individual may be provided by the sensor continuously or at discrete intervals (such as every five seconds). The heart rate may be determined based on PPG, pulse oximeter, or other sensor. In some embodiments, activity data may be generated as a number of steps taken, a sensed movement amount, or other data points indicative of activity level. The low fidelity (e.g., heart rate) data and activity data may then be input into a machine learning system to determine a prediction of the high fidelity result. For example, the machine learning system may use low fidelity data to predict arrhythmias or other indications of the user's heart health. In some embodiments, the machine learning system may use the input of the segment of the data input to determine the prediction. For example, one hour of activity level data and heart rate data may be input into the machine learning system. The system may then use this data to generate predictions of conditions such as atrial fibrillation. Various embodiments of the invention are discussed in more detail below.
      Referring to fig. 1A, a trained Convolutional Neural Network (CNN) 100 (one example of a feed forward network) brings input data 102 (e.g., a picture of a ship) into convolutional layers (also known as hidden layers) 103, applying a series of trained weights or filters 104 to the input data 106 in each convolutional layer 103. The output of the first convolution layer is an activation map (not shown) that is the input of the second convolution layer to which trained weights or filters (not shown) are applied, wherein the output of the subsequent convolution layer results in an activation map that represents increasingly complex features of the input data of the first layer. After each convolution layer, a non-linear layer (not shown) is applied to introduce non-linearity problems, wherein the non-linear layer may include tan h, sigmoid, or ReLU. In some cases, a pooling layer (not shown) (also referred to as a downsampling layer) may be applied after the non-linear layer, where the pooling layer takes substantially the same length of filters and step sizes and applies them to the input and outputs the maximum number of each sub-region for which the filter performs a convolution operation. Other options for pooling are average pooling and L2 norm pooling. The pooling layer reduces the spatial dimension of the input volume, thereby reducing computational costs and controlling overfitting. The last layer of the network is the fully connected layer which takes the output of the last convolution layer and outputs an n-dimensional output vector representing the quantity to be predicted (e.g. probability of image classification: 20% car, 75% ship, 5% bus and 0% bicycle), i.e. gets the predicted output 106 (O x), which may be a picture of a ship, for example. The output may be a scalar value data point, such as a stock price, that the network is predicting. As described more fully below, the trained weights 104 may be different for each convolutional layer 103. To achieve such real world prediction/detection (e.g., it is a ship), the neural network needs to be trained on known data input or training examples, resulting in a trained CNN 100. To train the CNN 100, many different training examples (e.g., pictures of many vessels) are input into the model. Those skilled in the art of neural networks will fully understand that the above description provides some simplistic view of CNNs to provide some context for the present discussion, and will fully understand that the use of any CNN alone or in combination with other neural networks will be equally applicable and within the scope of some embodiments described herein.
      Fig. 1B presents training CNN 108. In fig. 1B, the convolution layer 103 is shown as a single hidden convolution layer 105, 105' up to convolution layer 105 n-1, and the last nth layer is the fully connected layer. It should be understood that the last layer may be more than one fully connected layer. Training example 111 is input into convolutional layer 103, and a nonlinear activation function (not shown) and weights 110, 110' through 110 n are applied continuously in training example 111, where the output of any hidden layer is the input of the next layer, and so on, until the last nth fully connected layer 105 n produces output 114.   the output or prediction 114 is compared to the training example 111 (e.g., a picture of a ship) resulting in a difference 116 between the output or prediction 114 and the training example 111. If the difference or loss 116 is less than some preset loss (e.g., the output or prediction 114 predicts that the object is a ship), then the CNN converges and is considered trained. If the CNN has not converged, then the weights 110 and 110' to 110 n are updated according to how close the prediction is to the known input using a back propagation technique.   Those skilled in the art will appreciate that methods other than back propagation may be used to adjust the weights. A second training example (e.g., a picture of a different ship) is entered and the process is repeated again with the updated weights, then the weights are updated again, and so on until an nth training example (e.g., an nth picture of an nth ship) has been entered. This process is repeated iteratively through the same n training examples until the Convolutional Neural Network (CNN) is trained or converged to the correct output of the known input. Once CNN 108 is trained, weights 110, 110' through 110 n (i.e., weights 104 as depicted in FIG. 1A) are fixed and used for trained CNN 100.   As explained, there are different weights for each convolution layer 103 and each fully connected layer. The trained CNN 100 or model is then fed image data to determine or predict what it is trained to predict/identify (e.g., a ship), as described above. Any trained model, CNN, RNN, etc. may be further trained with additional training examples or prediction data output by the model, which is then used as a training example, i.e. may allow modifying weights. The machine learning model may be trained "off-line," e.g., on a computing platform separate from the platform that uses/executes the trained model, and then transferred to the platform that uses/executes the trained model.   Alternatively, embodiments described herein may update the machine learning model periodically or continuously based on newly acquired training data. Such update training may be performed on a separate computing platform that delivers the updated trained model to the platform that uses/performs the retraining model over a network connection, or the training/retraining/updating process may be performed on the platform itself that uses/performs the retraining model as new data is acquired. Those skilled in the art will appreciate that CNNs are applicable to data (e.g., pictures, characters, words, etc.) or time series of data in a fixed array. For example, CNN may be used to model serialized health index data and other factor data.   some embodiments utilize a feed-forward CNN with a jump connection and gaussian mixture model output to determine a probability distribution of a predicted health indicator (e.g., heart rate, PPG, or arrhythmia).
      Some embodiments may utilize other types and configurations of neural networks. The number of convolutional layers and the number of fully-connected layers may be increased or decreased. In general, the optimal number and ratio of convolutional layers to fully-connected layers can be set experimentally by determining which configuration provides the best performance for a given data set. The number of convolutional layers can be reduced to 0, leaving a fully-connected network. The number of convolution filters and the width of each filter may also be increased or decreased.
      The output of the neural network may be a single scalar value corresponding to an accurate prediction of the primary time series. Alternatively, the output of the neural network may be a logistic regression in which each class corresponds to a particular range or category of master time series values, where these master time series values are any number of selectable outputs as would be readily understood by a worker skilled in the art.
      In some embodiments, the use of gaussian mixture model output aims to constrain the probability distribution of a network learning form well and improve the generalization of limited training data. In some embodiments, the use of multiple elements in a gaussian mixture model is intended to allow the model to learn a multi-modal probability distribution. A machine learning model that combines or aggregates the results of different neural networks may also be used, where the results may be combined.
      Machine learning models with updatable memory or state from previous predictions to apply to subsequent predictions are another method for modeling serialized data. In particular, some embodiments described herein utilize recurrent neural networks. Referring to the example of fig. 2A, a diagram of a trained Recurrent Neural Network (RNN) 200 is shown. The trained RNN 200 has an updatable state (S) 202 and trained weights (W) 204. Input data 206 is input into the state 202 of the applied weights (W) 204 and predictions 206 are output (P). In contrast to a linear neural network (e.g., CNN 100), state 202 is updated based on input data, serving as a memory from previous states for use in turn for the next prediction with the next data. The update status provides a cyclic or cyclical feature to the RNN. For better presentation, FIG. 2B shows the expanded trained RNN 200 and its suitability for serialization data. When expanded, the RNN behaves like a CNN, but in expanded RNNs, each apparently similar layer behaves as a single layer with updated state, with the same weights applied in each iteration of the loop. Those skilled in the art will appreciate that a single layer may itself have sublayers, but for clarity of explanation, a single layer is described herein. Input data (I t) 208 at time t is input to state (S t) 210 at time t, and trained weights 204 are applied within neurons (C t) 212 at time t. The output of C t 212 is the prediction at time step t+1And updated state S t+1 216. Similarly, in C t+1 220, I t+1 218 is input into S t+1 216, the same trained weights 204 are applied, and the output of C t+1 220 isAs described above, S t+1 is updated by S t, so S t+1 has memory from S t in the previous time step. For example, and without limitation, the memory may include previous health indicator data or previous other factor data from one or more previous time steps. The process continues with n steps, where I t+n 224 is input into S t+n 226, and the same weights 204 are applied. The output of neuron C t+n is predictiveIn particular, the state is updated according to a previous time step, thereby providing the RNN with the benefit of memory from the previous state. For some embodiments, this feature makes RNNs an option to predict serialized data. Nevertheless, and as described above, other suitable machine learning techniques exist for making such predictions for serialized data, including CNN.
      Like CNNs, RNNs can process data strings as inputs and output predicted data strings. A simple way to explain this aspect of using RNNs is to use examples of natural language predictions. Take The following phrase as an example, the sky is blue (sky is blue). Word strings (i.e., data) have context. Thus, as the state updates, the data string updates from one iteration to the next, providing a context to predict blue. As just described, the RNN has a memory component to assist in predicting the serialized data. However, memory in the updated state of the RNN may be limited in how far it can be traced back, similar to short-term memory. When it is desired to predict the serialized data with longer backtracking (similar to long term memory), fine-tuning of the RNN just described may be used to achieve this. The sentence of the word to be predicted is not clear from the immediately preceding or surrounding word, again a simple example to explain that MARY SPEAKS fluent French. It is not clear from the immediately preceding word that French is the correct prediction, it is only clear that a certain language is the correct prediction, but what is the correct prediction. A Long Term Memory (LSTM) network is a special RNN that is able to learn these (more) Long Term dependencies.
      As described above, RNNs have a relatively simple repeating structure, e.g., they include a single layer with a nonlinear activation function (e.g., tanh or sigmoid). Similarly, LSTM has a chain-like structure, but for example, has four neural network layers instead of one. These additional neural network layers provide the LSTM with the ability to delete or add information relative to state (S) by using a structure called a neuron gate. As above. Fig. 3 shows a neuron 300 for an LSTM RNN. Line 302 represents the neuronal state (S) and can be considered an information highway, which relatively easily flows information along unchanged neuronal states. As above. Neuron gates 304, 306, and 308 determine how much information is allowed to pass through the state or along the information highway. The neuron gate 304 first decides how much information to delete from the neuron state S t, the so-called forgetting gate layer. As above. Next, the neuron gates 306 and 306 'determine which information is to be added to the neuron state, and the neuron gates 308 and 308' determine what information is to be output from the neuron state as a predictionThe information highway or neuron state is now updated neuron state S t+1 for use in the next neuron. LSTM allows RNNs to have more permanent or (more) long-term memory. LSTM provides an additional advantage to RNN-based machine learning models in that output predictions take into account the context of longer spatial or temporal separation from input data, depending on how the data is serialized, as compared to simpler RNN structures.
      In some embodiments utilizing RNNs, the primary and secondary time series may not be provided as vectors to the RNN at each time step. Instead, only the current values of the primary and secondary time series, and the future values or aggregate functions of the secondary time series within the prediction interval, are provided to the RNN. In this way, the RNN uses the persistent state vector to retain information about previous values for use in making predictions.
      Machine learning is well suited to continuously monitor one or more criteria to identify anomalies or trends in input data that are of different sizes than the training examples used to train the model. Thus, some embodiments described herein input the user's health index data and optionally other factor data into a trained machine learning model that predicts how the health index data of a healthy person looks at the next time step and compares the predictions to the user's measured health index data at the future time step. If the absolute value of the difference (e.g., the loss described below) exceeds a threshold, the user is notified that his or her health indicator data is not within normal or healthy range. The threshold is a number set by the designer and may be altered by the user in some embodiments to allow the user to adjust the notification sensitivity. The machine learning model of these embodiments may be trained by health index data from a healthy population alone or in combination with (temporally) corresponding other factor data, or may be trained by other training examples to meet the design requirements of the model.
      The data from the health index (e.g. heart rate data) is serialized data, more particularly time serialized data. The heart rate may be measured in a number of different ways, for example, but not limited to, measuring an electrical signal from the chest strap or derived from the PPG signal. Some embodiments obtain a heart rate derived from the device, wherein the data points (e.g., heart rate) are generated at approximately equal intervals (e.g., 5 seconds). But in some cases and in other embodiments the derived heart rate is not provided in approximately equal time steps, for example because the data required for the derivation is unreliable (e.g. because the device is moving or because the PPG signal is unreliable due to light pollution). The same is true for data sequences obtained from motion sensors or other sensors used to collect other factor data.
      The raw signal/data (electrical signal from the ECG, chest strap or PPG signal) itself is a time series of data that may be used according to some embodiments. For clarity, and not limitation, PPG is used herein to refer to data representing health indicators. Those skilled in the art will readily appreciate that forms of health indicator data, raw data, waveforms, or numbers derived from raw data or waveforms may be used in accordance with some embodiments described herein.
      Machine learning models that may be used with the embodiments described herein include, for example, but are not limited to Bayes, markov, gausian processes, clustering algorithms, generative models, kernel and neural network algorithms. Some embodiments utilize a machine learning model based on a trained neural network, other embodiments utilize a recurrent neural network, and additional embodiments use an LTSM RNN. For clarity, and not limitation, some embodiments of the present specification will be described using recurrent neural networks.
      Fig. 4A to 4C show hypothetical plots of PPG (fig. 4A), taken steps (fig. 4B), and air temperature (fig. 4C) versus time. PPG is an example of health index data, where step, activity level, and air temperature are exemplary other factor data that may affect other factors of the health index data. Those skilled in the art will appreciate that other data may be obtained from any of a number of known sources including, but not limited to, accelerometer data, GPS data, weight scales, user input, etc., and may include, but not limited to, air temperature, activity (running, walking, sitting, cycling, falling, climbing stairs, stepping, etc.), BMI, weight, height, age, etc. The first dashed line extending vertically across all three plots represents time t at which user data is obtained for input into a trained machine learning model (discussed below). The hashed plot line in fig. 4A represents predicted or likely output data 402, and the solid line 404 in fig. 4A represents measured data. Fig. 4B is a hypothetical plot of the number of user steps at each time, and fig. 4C is a hypothetical plot of the air temperature at each time.
      Fig. 5 a-5 b depict schematic diagrams of a trained recurrent neural network 500 receiving the input data depicted in fig. 4 a-4 c, i.e., PPG (P), step (R), and air temperature (T). Again, these input data (P, R and T) are merely examples of health index data and other factor data. It should also be appreciated that more than one health indicator data may be entered and predicted, and that more or less than two other factor data may be used, with the selection being dependent on what the model is designed for. Those skilled in the art will also appreciate that other factor data is collected to correspond in time to the collection or measurement of health indicator data. In some cases (e.g., body weight), other factor data will remain relatively constant for a certain period of time.
      Fig. 5A depicts the trained neural network 500 as a loop. P, T and R are input into the RNN 500 state 502 where the weight W is applied, and the RNN 500 outputs the predicted PPG 504 (P). In step 506, the difference P-P *(ΔP*) is calculated, and at step 508, a determination is made as to whether |Δp * | is greater than a threshold. If so, step 510 notifies/alerts the user that his/her health indicator is outside the predicted limit/threshold for a healthy person or predicted to be normal. The alert/notification/detection may be, for example, but not limited to, a proposal to see a doctor/consult a doctor, a simple notification such as tactile feedback, a request to take additional measurements such as ECG, or a simple annotation without any proposal, or any combination thereof. If |ΔP * | is less than or equal to the threshold, then step 512 does nothing. In both steps 510 and 512, the process is repeated at the next time step with new user data. In the present embodiment, the state is updated after the prediction data is output, and the prediction data may be used at the time of updating the state.
      In another embodiment (not shown), the main sequence of heart rate data (e.g., derived from the PPG signal) and the ordered sequence of other factor data are provided to a trained machine learning model, which may be an RNN, CNN, other machine learning model, or a combination of these models. In the present embodiment, the machine learning model is configured to receive as input the following at the reference time t:
       A. A vector of length 300 (V H) up to and including the last 300 health-index samples (e.g., heart rate in beats per minute) of any health-index data at time t; 
       B. At least one vector (V O) of length 300 containing the most recent other factor data (e.g., step number) at the approximate time of each sample in V H; 
       C. A vector of length 300 (V TD), wherein the input V DT (i) indexed i contains the time difference between the time stamp of the health index sample V H (i) and the time stamp of V H (i-1), and 
      D. Scalar prediction interval other factor rate O rate (e.g., without limitation, a step rate) representing an average other factor rate (e.g., a step rate) measured over a period of time from t to t + tau, where tau may be, for example, without limitation, 2.5 minutes, and is a future prediction interval.
      The output of this embodiment may be, for example, a probability distribution characterizing a predicted heart rate measured over a period of time from t to t+τ. In some embodiments, the machine learning model is trained with training examples that include a continuous time series of health indicator data and other factor data series. In an alternative embodiment, the notification system assigns a timestamp t+τ/2 to each predicted health indicator (e.g., heart rate) distribution, thereby concentrating the predicted distribution within a prediction interval (τ). In this embodiment, the notification logic then considers all samples within a sliding window (W) of length W L = 2 x (τ) or 5 minutes in this example, and calculates three parameters:
       1. Average of all health index serialized data over a time window  
      2. Average of all model predictions of health indicators for which the prediction timestamp falls within a time windowWherein, and
      3. Intermediate value of root mean square of each predicted health index distribution within time windowWherein the method comprises the steps of
      4. In one embodiment, ifOr (b) (Where ψ is a threshold value), a notification is generated.
      In the present embodiment, in the case where the measured health index within the specific window W is more than a certain multiple of the standard deviation from the average value of the predicted health index value, a warning is generated. Window W may be applied in a sliding fashion in a sequence of measuring and predicting health index values, where each window overlaps in time with the previous window by a designer-specified fraction (e.g., 0.5 minutes).
      The notification may take any number of different forms. For example, and without limitation, the user may be notified of an ECG and/or blood pressure, may be directed to a computing system (e.g., a wearable computing system, etc.) to automatically obtain an ECG or blood pressure (e.g., a user may be notified to see a doctor, or simply notified that the health indicator data is abnormal.
      In this embodiment, the selection of V DT as an input to the model is intended to allow the model to take advantage of the information contained in the variable spacing between the health index data in V H, which may be from an algorithm that derives health index data from raw data that is less consistent. For example, heart rate samples are generated by APPLE WATCH algorithm only if there is enough reliable raw PPG data to output a reliable heart rate value, which results in irregular time gaps between heart rate samples. In a similar manner, the present embodiment uses vectors of other factor data (V O) having the same length as other vectors to handle different and irregular sampling rates between the main sequence (health index) and the sequence columns (other factors). In this embodiment, the secondary sequence is remapped or interpolated to the same point in time as the primary time sequence.
      Furthermore, in some embodiments, the configuration of data in the secondary time series that exists as input to the machine learning model for a future predicted time interval (e.g., after t) may be modified. In some embodiments, multiple scalar values (e.g., one scalar value per sub-time sequence) may be utilized to modify a single scalar value that contains the average other factor data rate over the prediction interval. Or a vector of values within the prediction interval may be used. In addition, the prediction interval itself may be adjusted. For example, a shorter prediction interval may provide a faster response to changes and improved detection of events for which the basic time metric (shorter), but may also be more sensitive to interference from noise sources (e.g., motion artifacts).
      Similarly, the output prediction of the machine learning model itself need not be a scalar. For example, some embodiments may generate a time series of predictions for multiple times t within a time interval between t and t+τ, and the alert logic may compare each of these predictions to a measured value within the same time interval.
      In this previous embodiment, the machine learning model itself may comprise, for example, a 7-layer feed forward neural network. The first 3 layers may be convolutional layers containing 32 kernels, each kernel having a kernel width of 24 and a step size of 2. The first layer may have arrays V H、VO and V TD as inputs in three channels. The last 4 layers may be fully connected layers, with all fully connected layers except the last layer utilizing a hyperbolic tangent activation function. The output of the third layer may be planarized to an array for input into the first fully connected layer. The last layer outputs 30 values, parameterizing the gaussian mixture model into 10 mixtures (with three parameters for each mixture, mean, variance and weight). The network uses a jump connection between the first fully connected layer and the third fully connected layer such that the output of layer 6 is summed with the output of layer 4 to produce the input of layer 7. Standard batch normalization can be used on all but the last layer, with a decay of 0.97. The use of hopping connections and batch normalization may improve the ability to propagate gradients through the network.
      The selection of the machine learning model may affect the performance of the system. Machine learning model configurations can be divided into two types of considerations. First is the internal architecture of the model, i.e. the choice of the model type (generalized nonlinear regression of convolutional neural networks, recurrent neural networks, random forests, etc.), and the parameters characterizing the implementation of the model (typically the number of parameters, and/or the number of layers, the number of decision trees, etc.). Next is the external architecture of the model—the arrangement of the data being fed into the model, the specific parameters of the problem that the model is required to solve. The external architecture may be characterized in part by the dimensions and types of data provided as input to the model, the time frame spanned by the data, and the pre-or post-processing of the data.
      In general, the choice of external architecture is a balance between increasing the number of parameters and increasing the amount of information provided as input, which can increase the predictive ability of the machine learning model (with available storage and computing power to train and evaluate larger models) and the availability of a sufficient amount of data to prevent overfitting.
      Many variations of the external architecture of the model discussed in some embodiments are possible. The number of input vectors can be modified as well as the absolute length (number of elements) and the covered time span. The input vectors need not be the same length or cover the same time span. The data need not be sampled at equal time, for example and without limitation, a6 hour heart rate data history may be provided in which less than one hour of data prior to t is sampled at a rate of 1Hz, more than 1 hour prior to t is sampled at a rate of 0.5Hz, and less than 2 hours prior to t is sampled at a rate of 0.1Hz, where t is the reference time.
      Figure 5B shows the expanded trained RNN 500. Input data 513 (P t、Rt and T t) is input to state at time T (S t) 514 and trained weights 516 are applied. The output of neuron (C t) 518 is the prediction at time t+1And updated state S t+1 522. Similarly, in C t+1 524, the input data (P t+1、Rt+1 and T t+1) 513' is input into S t+1 522, and the trained weights 516 are applied, and the output of C t+1 524 isAs described above, S t+1 is obtained by updating S t, so S t+1 has memory from S t at the previous time step resulting from the operation in neuron (C t) 518. The process continues with n steps, where input data (P n、Rn and T n) 513 "is input into S n 530 and trained weights 516 are applied. The output of neuron C t is predictiveIn particular, the trained RNN always applies the same weight, but, and more importantly, updates the state according to the previous time step, providing the RNN with the benefit of memory from the previous time step. Those skilled in the art will appreciate that the chronological order of the input of the dependent health indicator data may vary and still produce the desired result. For example, measured health index data (e.g., P t-1) from a previous time step and other factor data (e.g., R t and T t) from a current time step may be input into the state at the current time step (S t), where the model predicts the health index at the current time stepThe health index is as described aboveAnd comparing with the measured health index data at the current time step to determine whether the health index of the user is normal or within health range.
      Fig. 5C shows an alternative embodiment of a trained RNN to determine whether the user's health indicator serialization data (PPG in our example) is within a band or threshold of a healthy person. The input data in this embodiment is a linear combinationWherein the method comprises the steps ofIs a predicted health index value at time t, and P t is a measured health index at time t. In this embodiment, the nonlinear range of α is 0-1 as a function of loss (L), where loss and α are discussed in more detail below. It is now notable that when α is close to 0, measurement data P t is input into the network, and when α is close to 1, prediction data (P t *) is input into the network to make predictions at the next time step. Other factor data (O t) at time t may also optionally be entered.
      I t and O t are inputs to state S t, where in some embodiments state S t outputs predicted health index data at time step t+1Probability distribution of (2)Where β (P*) is the probability distribution function of the predicted health index (P *). In some embodiments, the probability distribution function is sampled to select a predictive health index value at t+1As will be appreciated by those skilled in the art, β (P*) may be sampled using different methods depending on the goals of the network designer, where the methods may include taking the average, maximum, or random samples of the probability distribution. Using the measurement data at time t+1 to evaluate β t+1 provides the probability that state S t+1 predicts for the measurement data.
      To illustrate this concept, fig. 5D shows a hypothetical probability distribution for a hypothetical health index data range at time t+1. The function is sampled, for example, with a maximum probability of 0.95, to determine the predicted health indicator at time t+1Also using measured or actual health index dataTo evaluate the probability distribution (beta t+1) and to determine the probability that the model will predict if actual data has been entered into the model. In this example of the present invention, in this case,Is 0.85.
      Losses may be defined to help determine whether to inform the user that his or her health condition is not within the normal range predicted by the trained machine learning model. The loss is selected to model how close the predicted data is to the actual or measured data. Those skilled in the art will appreciate many ways to define losses. In other embodiments described herein, for example, the absolute value of the difference between the predicted data and the actual data (|Δp * |) is a penalty. In some embodiments, the loss (L) may be l= -ln [ β (P) ], where L is a measure of how close the predicted data is to the measured or actual data. Beta (P) is in the range of 0 to 1, where 1 means that the predicted and measured values are the same. Thus, low loss means that the predicted value is likely to be the same as or close to the measured value, in this context, low loss means that the measured data appears to be from a healthy/normal person. In some embodiments, a threshold for L is set, e.g., L >5, where the user is notified that the health indicator data is outside the range considered healthy. Other embodiments may take an average of the losses over a period of time and compare the average to a threshold. In some embodiments, the threshold itself may be a function of a statistical calculation of the predicted value or an average of the predicted values. In some embodiments, the following formula may be used to inform the user that the health indicator is not within health range:
      
        
      
       < P range > is determined by a method of averaging measured health index data over a range of times; 
        determining by a method of averaging predicted health index data within the same time range;  Is the median of sequences of standard deviations obtained from the network over the same time frame, and 
       Is atA function of the standard deviation of the evaluation and can be used as a threshold.
      Methods of averaging that may be used include, for example, but are not limited to, average, arithmetic average, median, and mode. In some embodiments, outliers are deleted so as not to deviate the calculated numbers.
      Referring back to the input data of the embodiment depicted in FIG. 5CAlpha t is defined as a function of L and is in the range of 0 to 1. For example, α (L) may be a linear function or a nonlinear function, or may be linear within a certain range of L, but nonlinear within a separate range of L. In one example, as shown in fig. 5E, the function α (L) is linear for L between 0 and 3, quadratic for L between 3 and 13, and 1 for L greater than 13. For the present embodiment, when L is between 0 and 3 (i.e., when the predicted health index data and the measured health index data approximately match), as α -1 approaches zero, the input data I t+1 approximates the measured data P t+1. When L is large (e.g., greater than 13), α (L) is 1, which causes the input data to beIs time (predicted health indicator at time t+1). When L is between 1 and 13, α (L) changes secondarily, and the relative contributions of the predicted health index data and the measured health index data to the input data also change. In this embodiment, the linear combination of the predicted health index data and the measured health index data weighted with α (L) allows weighting of the input data at any particular time step between the predicted data and the measured data. In all of these examples, the input data may also include other factor data (O t). This is just one example of self-sampling, where some combination of predicted and measured data is used as input to a trained network. Those skilled in the art will appreciate that other examples may be used.
      The machine learning model in an embodiment uses a trained machine learning model. In some embodiments, the machine learning model uses a recurrent neural network that requires a trained RNN. By way of example and not limitation, fig. 6 depicts a deployed RNN to reveal a training RNN in accordance with some embodiments. The neuron 602 has an initial state S 0 604 and a weight matrix W606. Step rate data R 0, air temperature data T 0, and initial PPG data P 0 at time step 0 are input into state S 0, weight W is applied, and the prediction at the first time step is output from neuron 602And calculates using the PPG (P 1) obtained at time step 1The neuron 602 also outputs the updated state 608 at time step 1 (S 1), which state 608 (S 1) enters the neuron 610. Step rate data R 1, air temperature data T 1, and PPG data P 1 at time step 1 are input into S 1, weights 606W are applied, and predictions at time step 2 are output from neurons 610And calculates using the PPG (P 2) obtained at time step 2The neuron 610 also outputs the updated state 612 at time step 2 (S 2), which state 612 (S 2) enters the neuron 614. Step rate data R 3, air temperature data T 3, and PPG data (P 3) at time step 3 are input into S 2, weights 606W are applied, and predictions at time step 3 are output from neurons 614 And calculates using the PPG (P 3) obtained at time step 3The process continues until state 616 at output time step n and is calculatedUntil that point. Similar to the training of convolutional neural networks, Δp *' is used in back propagation to adjust the weight matrix. However, unlike convolutional networks, the same weight matrix in recurrent neural networks is applied in each iteration, and during training, the weight matrix is only modified in the back propagation. Many training examples with health indicator data and corresponding other factor data are repeatedly entered into the RNN 600 until it converges. As previously discussed, LTSM RNNs may be used in some embodiments in which the status of such networks provides longer term context analysis of incoming data, which may provide better predictions where the network is aware of (more) long term dependencies. As mentioned, and as will be readily appreciated by those skilled in the art, other machine learning models will fall within the scope of the embodiments described herein, and may include, for example, but not limited to, CNN or other feed forward networks.
      Fig. 7A depicts a system 700 for predicting whether a user's measured health indicator is within or outside of a threshold that is normal for a healthy person under similar other factors. The system 700 has a machine learning model 702 and a health detector 704. For example (but not limited to), embodiments of the machine learning model 702 include a trained machine learning model, a trained RNN, CNN, or other feed forward network. The trained RNN, other network or combination of networks may be trained by training examples from healthy people from which the health index data and (in time) corresponding other factor data are collected. Alternatively, the trained RNN, other networks, or combinations of networks may be trained through training examples from a particular user, making it a personalized trained machine learning model. Those skilled in the art will appreciate that training examples from different populations may generally be selected based on the use or design of the trained network and system. Those skilled in the art will also readily appreciate that the health indicator data in this and other embodiments may be one or more health indicators. The model may be trained and the health of the user predicted using, for example, but not limited to, one or more of PPG data, heart rate data, blood pressure data, body temperature data, blood oxygen concentration data, and the like. The health detector 704 uses predictions 708 from the machine learning model 702 and input data 710 to determine whether the loss or other metric determined by analyzing the predicted output with the measured data exceeds a threshold that is considered normal and is therefore unhealthy. The system 700 then outputs a notification or a health condition of the user. The notification may take many forms as discussed herein. The input generator 706 utilizes a sensor (not shown) to continuously obtain data from a user wearing or in contact with the sensor, where the data is representative of one or more health indicators of the user. The corresponding other factor data (in time) may be collected by another sensor or obtained by other means described herein or apparent to those skilled in the art.
      The input generator 706 may also collect data to determine/calculate other factor data. For example, but not limited to, the input generator may include a smart watch, a wearable or mobile device (e.g., appleOr (b)A smart phone, tablet computer, or laptop computer), a combination of a smart watch and a mobile device, a surgical implant device with the ability to send data to a mobile device or other portable computing device, or a device on a cart in a medical care facility. Preferably, the user input generator 706 has a sensor (e.g., PPG sensor, electrode sensor, etc.) to measure data related to one or more health metrics. The smart watch, tablet, mobile phone, or laptop of some embodiments may carry the sensor, or the sensor may be placed remotely (embedded by surgery, in contact with the body away from the mobile device, or some separate device), where in all of these cases the mobile device communicates with the sensor to collect health indicator data. In some embodiments, system 700 may be provided on a mobile device alone, in combination with other mobile devices, or in combination with other computing systems via communications over a network through which the devices may communicate. For example, but not limited to, the system 700 may be a smart watch or a wearable device having a machine learning model 702 and a health detector 704, where the machine learning model 702 and the health detector 704 are located on the device (e.g., memory of the watch or firmware on the watch). The watch may have a user input generator 706 and communicate with other computing devices (e.g., mobile phone, tablet, laptop, or desktop computer, etc.) via direct communication, wireless communication (e.g., wiFi, voice, bluetooth, etc.), or through a network (e.g., internet, intranet, extranet, etc.), or a combination thereof, where the trained machine learning model 702 and health detector 704 may be located on the other computing devices. Those skilled in the art will appreciate that any number of configurations of the system 700 may be utilized without exceeding the scope of the embodiments described herein.
      Referring to fig. 7B, a smart watch 712 is depicted in accordance with an embodiment. Smart watch 712 includes a watch 714 that contains all the circuitry and microprocessors (not shown) known to those skilled in the art. The watch 714 also includes a display 716, wherein on the display 716, user health indicator data 718 (heart rate data in this example) may be displayed. A predictive health indicator band 720 for a normal or healthy population may also be displayed on the display 716. In fig. 7B, the user's measured heart rate data does not exceed the predicted health band, so in this particular example, no notification will be made. The watch 714 may also include a wristband 722 and a high fidelity sensor 724 (e.g., an ECG sensor). Alternatively, the wristband 722 may be an expandable cuff to measure blood pressure. A low fidelity sensor 726 (shown in phantom) is provided on the back of the watch 714 to collect user health indicator data such as PPG data, which may be used to derive heart rate data or other data such as blood pressure, for example. Alternatively, as will be appreciated by those skilled in the art, in some embodiments, fitness bracelets (such as FitBit or Polar, etc.) may be used, where the fitness bracelets have similar processing capabilities and other factor measuring devices (e.g., ppg and accelerometers).
      Fig. 8 depicts an embodiment of a method 800 for continuously monitoring a health condition of a user. Step 802 receives user input data, which may include data for one or more health indicators (also referred to as a primary data sequence) and corresponding data for other factors (in time) (also referred to as a secondary data sequence). Step 804 inputs the user data into a trained machine learning model, which may include a trained RNN, CNN, other feed forward networks as described herein, or other neural networks known to those of skill in the art. In some embodiments, the health indicator input data may be one or a combination, such as a linear combination, of predicted health indicator data and measured health indicator data, as described in some embodiments herein. Step 806 outputs data of one or more predicted health indicators at a time step, where the output may include, for example, but not limited to, a single predicted value, a probability distribution as a function of the predicted value. Step 808 determines a loss based on the predicted health indicator, where the loss may be, for example, but not limited to, a simple difference between the predicted health indicator and the measured health indicator, or some other appropriately selected loss function (e.g., the negative logarithm of the probability distribution of the evaluation of the value of the measured health indicator). Step 810 determines whether the loss exceeds a threshold that is considered normal or unhealthy, where the threshold may be, for example, but not limited to, a simple number selected by the designer, or a more complex function of some parameter associated with the prediction. If the loss is greater than the threshold, step 812 notifies the user that his or her health indicator exceeds the threshold, which is considered normal or healthy. As described herein, the notification may take many forms. In some embodiments, this information may be visible to the user. For example, but not limited to, information may be displayed on a user interface such as a graph showing (i) a distribution of measured health index data (e.g., heart rate) and other factor data (e.g., number of steps) as a function of time, (ii) predicted health index data (e.g., predicted heart rate values) generated by a machine learning model. In this way, a user can visually compare the measured data points with the predicted data points and determine, by visual observation, whether their heart rate falls within the range expected by the machine learning model, for example.
      Some embodiments described herein have referred to using a threshold to determine whether to notify a user. In one or more of these embodiments, the user may change the threshold to adjust or fine tune the system or method to more closely match the user's personal health knowledge. For example, if the physiological index used is blood pressure and the user has a high blood pressure, embodiments may frequently alert/notify the user that their health index is outside of normal or healthy range according to a model trained by healthy people. Thus, certain embodiments allow the user to increase the threshold such that the user is not notified of his/her health indicator data exceeding what is considered normal or healthy so frequently.
      Some embodiments prefer to use raw data of health indicators. If the raw data is processed to derive a particular measurement (e.g., heart rate), the derived data may be used according to an embodiment. In some cases, the provider of the health monitoring device has no control over the raw data, rather the received data is data that is processed in the form of a calculated health indicator (e.g., heart rate or blood pressure). As will be appreciated by those skilled in the art, the form of data used to train the machine learning model should match the form of data collected from the user and input into the trained model, otherwise the predictions may prove to be erroneous. For example APPLE WATCH provides heart rate measurement data at unequal time steps, but not raw PPG data. In this example, the user wears APPLE WATCH, which APPLE WATCH utilizes heart rate data at unequal time steps to output heart rate data according to Apple's PPG processing algorithm. The model is trained from this data. Apple decides to change the algorithm it provides heart rate data, which may make the model trained with data from the previous algorithm outdated for data input using the new algorithm. To address this potential problem, some embodiments resample irregularly spaced data (heart rate, blood pressure data, or ECG data, etc.) onto and according to a regularly spaced grid with the data collected to train the model. If Apple or other data provider changes its algorithm, the model need only be retrained with the newly collected training examples, without having to reconstruct the model to account for the algorithm changes.
      In further embodiments, the trained machine learning model may be trained via user data, resulting in a personalized trained machine learning model. Such a trained personalized machine learning model may be used in place of or in combination with the machine learning model described herein that is trained by a healthy population. If a personalized trained machine learning model is used itself, the user's data is entered into the machine learning model, which will output predictions of individual health indicators in the next time step that are normal to the user, which predictions are then compared to actual/measured data from the next time step in a manner consistent with the embodiments described herein to determine if the user's health indicators differ from health indicators predicted to be normal for the user by some threshold. In addition, such personalized machine learning models may be used in combination with machine learning models trained by training examples from healthy people to generate predictions and related notifications regarding both health metrics predicted to be normal for the individual user and health metrics predicted to be normal for the healthy people.
      Fig. 9A depicts a method 900 according to another embodiment, and fig. 9B shows a hypothetical plot 902 of heart rate as a function of time for purposes of explanation, such as but not limited to. Step 904 (fig. 9A) receives user heart rate data (or other health indicator data) and optionally corresponding other factor data (in time) and inputs the data into a personalized trained machine learning model. In some embodiments, the personalized trained model is trained by the user's individual health index data and optionally (in time) corresponding other data, as described herein. Thus, in step 906, the personalized trained machine learning model predicts normal heart rate data for the individual user under the conditions of other factors, and step 908 compares the health indicator data for the user to the health indicator data predicted to be normal for the particular user to identify anomalies or anomalies in the health indicator data for the user. As discussed in the present specification, some embodiments provide for the user to be provided with a device that is wearable from the user (e.g., APPLE WATCH, a smart watch,Etc.), or from sensors on the user (e.g.,Band, PPG sensor, etc.) communicates with other mobile devices (e.g., tablet computer, etc.) receives health index data for a user.
      A penalty may be defined to help determine whether the user is to be notified in step 908 that the user's measured data is abnormal to data that is predicted to be normal for that particular user. The loss is selected to model how close the prediction is to the actual or measured data. Those skilled in the art will appreciate many ways to define losses. In other embodiments described herein and equally applicable, for example, the absolute value of the difference between the predicted value and the measured value |Δp * | is a form of loss. In some embodiments, the loss (L) may be l= -ln [ β (P) ], whereL is typically a measure of how close the predicted data is to the measured data. Beta (P) (probability distribution in this example) is in the range of 0 to 1, where 1 means that the predicted data and measured data are the same. Thus, in some embodiments, low loss means that the predicted data is likely to be the same as or close to the measured data. In some embodiments, a threshold value of L is set, e.g., L >5, where a particular user is notified of the existence of an abnormal condition based on predictions for that user. Such notification may take a variety of forms, as described elsewhere herein. Other embodiments may take an average of the losses over a period of time and compare the average to a threshold, as also described elsewhere herein. In some embodiments, the threshold itself may be a function of statistical calculations of the predicted data or an average of the predicted data, as described in more detail elsewhere herein. Losses have been described in detail elsewhere herein and will not be further discussed herein for the sake of brevity. Those skilled in the art will appreciate that the input and prediction data may be scalar values, or segments of data over a period of time. For example, but not limited to, a system designer may be interested in a 5 minute data segment and will enter all data prior to time t and all other data for t+5 minutes, predict the health index data for t+5 minutes, and determine the loss between the measured health index data for the t+5 minute segment relative to the predicted health index data for the t+5 minute segment.
      Step 908 determines if an anomaly exists. As discussed, it may be determined whether the loss exceeds a threshold. As previously mentioned, the threshold is set by the designer's choice and based on the purpose of the system being designed. In some embodiments, the threshold may be modified by the user, but preferably no modification is made in this embodiment. If there is no exception, the process is repeated at step 904. If an anomaly is present, step 910 notifies or alerts the user to obtain a high fidelity measurement, such as, but not limited to, an ECG or blood pressure measurement. In step 912, the high fidelity data is analyzed by an algorithm, a health professional, or both, and described as normal or abnormal, and if abnormal, some diagnosis, such as AFib, tachycardia, bradycardia, atrial fibrillation, or high/low blood pressure, may be assigned according to the obtained high fidelity measurements. For clarity, it should be noted that notifications to record high fidelity data are equally applicable and possible in other embodiments, as well as in the particular embodiments described above using the general model. In some embodiments, the high fidelity measurement may be obtained directly by the user using a mobile monitoring system (such as an ECG or blood pressure system, etc.), which in some embodiments may be associated with a wearable device. Optionally, the notifying step 910 causes automatic acquisition of the high fidelity measurement. For example, the wearable device may communicate with the sensor (by hard-wire or via wireless communication) and obtain ECG data, or it may communicate with a blood pressure cuff system (e.g., a wristband or armband cuff of the wearable device) to automatically obtain blood pressure measurements, or it may communicate with an implanted device such as a pacemaker or ECG electrode. For example, aliveCor, inc. Provides a system for remotely obtaining ECG, such a system including (but not limited to) one or more sensors in contact with a user in two or more locations, wherein the sensors collect electrocardiographic data that is sent to a mobile computing device, either wired or wireless, wherein the app generates an ECG strip from the data, which can be analyzed by an algorithm, a medical professional, or both. Alternatively, the sensor may be a blood pressure monitor, wherein the blood pressure data is sent to the mobile computing device, either wired or wireless. The wearable device itself may be a blood pressure system with a cuff capable of measuring health index data and optionally with an ECG sensor similar to the one described above. The mobile computing device may be, for example, but not limited to, a tablet computer (e.g., iPad), a smartphone (e.g.,) A wearable device (e.g., APPLE WATCH) or a device in a medical care facility (which may be mounted on a cart). In some embodiments, the mobile computing device may be a laptop computer or a computer in communication with some other mobile device. Those skilled in the art will appreciate that a wearable device or smart watch will also be considered a mobile computing device in terms of the capabilities provided in the context of the embodiments described herein. In the case of a wearable device, the sensor may be placed on the cuff of the wearable device, where the sensor may send data to the computing device/wearable device wirelessly or through wires, or the cuff may also be a blood pressure monitoring cuff, or both as previously described. In the case of a mobile phone, the sensor may be a pad attached to or remote from the phone, where the pad senses the electrocardiographic signal and communicates data to the wearable device or other mobile computing device wirelessly or through a hard wire. A more detailed description of some of these systems is provided in one or more of U.S. patent nos. 9,420,956, 9,572,499, 9,351,654, 9,247,911, 9,254,095, and 8,509,882, and one or more of U.S. patent application publication nos. 2015/0018660, 2015/0297134, and 2015/030322, all of which are incorporated herein for all purposes. Step 912 analyzes the high fidelity data and provides a description or diagnosis as previously described.
      In step 914, a diagnosis or classification of the high fidelity measurement is received by a computing system, which in some embodiments may be a mobile or wearable computing system for collecting heart rate data (or other health index data) of the user, and in step 916, the low fidelity health index data sequence (heart rate data in this example) is marked by the diagnosis. In step 918, the labeled user low fidelity data sequence is used to train a high fidelity machine learning model, and optionally other factor data sequences are also provided to train the model. In some embodiments, the trained high-fidelity machine learning model is capable of receiving a sequence of measured low-fidelity health indicator data (e.g., heart rate data or PPG data) and optionally other factor data, and giving a probability that the user is experiencing an event that is typically diagnosed or detected using the high-fidelity data, or predicting or diagnosing or detecting when the user is experiencing an event that is typically diagnosed or detected using the high-fidelity data. The trained high-fidelity machine learning model is able to do this because the trained high-fidelity machine learning model has been trained by utilizing user health index data (and optionally other factor data) that marks a diagnosis of the high-fidelity data. Thus, the trained model is able to predict when a user has an event (e.g., afib, hypertension, etc.) associated with one or more markers based solely on measuring a low fidelity health indicator input data sequence (e.g., heart rate or ppg data) (and optionally other factor data). Those skilled in the art will appreciate that the training of the high fidelity model may be performed on the user's mobile device, remotely from the user's mobile device, both, or in a distributed network. For example, but not limited to, the user's health indicator data may be stored in a cloud system and the data may be tagged in the cloud using the diagnosis from step 914. Those of skill in the art will readily understand any number of methods and manners to store, tag, and access this information. Alternatively, a globally trained high-fidelity model may be used that will be trained with labeled training examples from the population experiencing those conditions that are typically diagnosed or detected with high-fidelity measurements. These global training examples will provide a low fidelity data sequence (e.g., heart rate) labeled with conditions (e.g., afib from ECG by a medical professional or algorithm) diagnosed using high fidelity measurements.
      Referring now to fig. 9B, a plot 902 shows a schematic of heart rate plotted as a function of time. An abnormality 920 occurs at time t 1、t2、t3、t4、t5、t6、t7、8 relative to the user's normal heart rate data. As mentioned above, normal means that the predicted data for that particular user is within the threshold of the measured data, where it is usually outside the threshold. Upon abnormality relative to normal, some embodiments prompt the user to obtain a more definitive or high fidelity reading, such as, but not limited to, an ECG reading identified as ECG1、ECG2、ECG3、ECG4、ECG5、ECG6、ECG7、ECG8. As described above, a high fidelity reading may be automatically obtained, the user may obtain a high fidelity reading, or the high fidelity reading may be something other than an ECG, such as blood pressure. The high fidelity readings are analyzed by an algorithm, health professional, or both to identify the high fidelity data as normal/abnormal and further identify/diagnose the abnormality (e.g., without limitation AFib). This information is used to tag health index data (e.g., heart rate or PPG data) at outliers 920 in the user's serialized data.
      The distinction between high-fidelity and low-fidelity data is that high-fidelity data or measurements are typically used to make a judgment, test, or diagnosis, while low-fidelity data may not be readily available for such judgment, test, or diagnosis. For example, ECG scanning may be used to identify, detect or diagnose arrhythmias, while heart rate or PPG data generally do not provide this capability. As will be appreciated by those skilled in the art, the description herein of machine learning algorithms (e.g., bayes, markov, gausian processes, clustering algorithms, generative models, kernel and neural network algorithms) applies equally to all embodiments described herein.
      In some cases, the user is still asymptomatic despite the possible presence of these problems, and even if symptoms are present, it may be impractical to obtain the high fidelity measurements necessary to make a diagnosis or test. For example, but not limited to, arrhythmia, especially AF, may not be present and even if symptoms do exist, it is very difficult to record an ECG at that time and it is very difficult to continuously monitor the user without expensive, bulky and sometimes invasive monitoring devices. As discussed elsewhere herein, it is important to know when a user experiences AF, as AF may be at least the cause of a stroke, among other serious conditions. Similarly and as discussed elsewhere, the AF load may have similar inputs. Some embodiments allow for continuous monitoring of cardiac arrhythmias (e.g., AF) or other serious conditions using only continuous monitoring of low fidelity health indicator data (such as heart rate or ppg) and optionally other factor data.
      Fig. 10 depicts a method 1000 in accordance with some embodiments of health monitoring systems and methods. Step 1002 receives measured or actual user low fidelity health indicator data (e.g., heart rate or PPG data from sensors on the wearable device), and optionally receives (in time) corresponding other factor data that may affect the health indicator data described herein. As discussed elsewhere herein, the low fidelity health indicator data may be measured by a mobile computing device such as a smart watch, other wearable device, or tablet computer. In step 1004, the user's low-fidelity health metric data (and optionally other factor data) is input into a trained high-fidelity machine learning model that outputs predictive identification or diagnosis for the user based on measuring the low-fidelity health metric data (and optionally (temporally) corresponding other factor data) in step 1006. Step 1008 asks whether the identification or diagnosis is normal, and if so, the process resumes. If the identification or diagnosis is abnormal, step 1010 notifies the user of the problem or detection. Alternatively, a system, method, or platform may be provided to notify a user, family, friend, medical care professional, emergency 911, or any combination thereof. Informing which of these persons may depend on identification, detection or diagnosis. In the case of identifying, detecting or diagnosing life threatening situations, then certain persons may be contacted or notified, which may not be notified in the case of diagnosing life threatening situations. Additionally, in some embodiments, the sequence of measured health index data is input into a trained high fidelity machine learning model and the amount of time the user is experiencing an abnormal event (e.g., predicting the difference between the onset and cessation of the abnormal event) is calculated, allowing for a better understanding of the user's abnormal load. In particular, in preventing stroke and other serious conditions, it can be very important to understand the AF load. Thus, some embodiments allow for continuous monitoring of abnormal events with a mobile computing device, a wearable computing device, or other portable device capable of acquiring only low fidelity health factor data, and optionally other factor data.
      FIG. 11 depicts example data 1100 analyzed based on low fidelity data to generate high fidelity output predictions or detections, in accordance with some embodiments described herein. Although described with reference to the detection of atrial fibrillation, similar data may be generated for additional predictions for high fidelity diagnostics based on low fidelity measurements. The first graph 1110 shows the heart rate calculation of the user over time. The heart rate may be determined based on PPG data or other heart rate sensors. The second graph 1120 shows activity data of the user during the same time period. For example, activity data may be determined based on a number of steps or other measure of user movement. The third graph 1130 shows the classifier output from the machine learning model and the level threshold of when the notification was generated. The machine learning model may generate predictions based on inputs of the low fidelity measurements. For example, the data in the first graph 1110 and the second graph 1120 may be analyzed by a machine learning system as further described above. The results of the machine learning system analysis may be provided as atrial fibrillation probabilities shown in graph 1130. When the probability exceeds a threshold (shown in this case as being above 0.6 confidence), the health monitoring system may trigger a notification or other alert for the user, physician, or other user associated with the user.
      In some embodiments, the data in graphs 1110 and 1120 may be provided to the machine learning system as continuous measurements. For example, heart rate and activity level may be generated every 5 seconds as measurements to make accurate measurements. The time slice with the plurality of measurements may then be input into a machine learning model. For example, the previous hour of data may be used as input to a machine learning model. In some embodiments, a shorter or longer period of time may be provided instead of an hour. As shown in fig. 11, the output graph 1130 provides an indication of the period of time that the user is experiencing an abnormal health event. For example, the health monitoring system may use a time period predicted to be above a certain confidence level to determine atrial fibrillation. This value may then be used to determine the atrial fibrillation load of the user during the measurement period.
      In some embodiments, a machine learning model to generate the predicted output in graph 1130 may be trained based on the labeled user data. For example, the tagged user data may be provided based on high fidelity data (such as ECG readings, etc.) acquired over a period of time in which low fidelity data (e.g., PPG, heart rate, etc.) and other data (e.g., activity level or step, etc.) are also available. In some embodiments, a machine learning model is designed to determine whether atrial fibrillation is likely to exist during a previous time period. For example, the machine learning model may take one hour of low fidelity data as input and provide a likelihood of an event occurring. Thus, the training data may include a plurality of hours of recorded data for the population of individuals. In the case where a condition is diagnosed based on high fidelity data, the data may be a health event marking time. Thus, if there is a health event marking time based on high fidelity data, the machine learning model may determine that low fidelity data with any one hour window of the event entered into the untrained machine learning model should provide a prediction of a health event. The untrained machine learning model may then be updated based on comparing the predictions to the markers. After repeating the multiple iterations and determining that the machine learning model has converged, the health monitoring system may use the machine learning model to monitor the user for atrial fibrillation based on the low fidelity data. In various embodiments, low fidelity data may be used to detect conditions other than atrial fibrillation.
      Fig. 12 shows a schematic representation of a machine in the example form of a computer system 1200 in which a set of instructions for causing the machine to perform any one or more of the methods discussed herein may be executed within the computer system 1200. In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a Local Area Network (LAN), an intranet, an extranet, or the internet. The machine may operate in the capacity of a server or a client machine in a client-server network environment or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, hub, access point, network access control device, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Furthermore, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. In one embodiment, computer system 1200 may represent a server, mobile computing device, or wearable device, etc. configured to perform the health monitoring described herein.
      The exemplary computer system 1200 includes a processing device 1202, a main memory 1204 (e.g., read Only Memory (ROM), flash memory, dynamic Random Access Memory (DRAM)), a static memory 1206 (e.g., flash memory, static Random Access Memory (SRAM), etc.), and a data storage device 1218, which communicate with each other via a bus 1230. Any of the signals provided over the various buses described herein may be time multiplexed with other signals and provided over one or more common buses. In addition, the interconnections between circuit components or blocks may be shown as buses or as single signal lines. Each bus may optionally be one or more single signal lines, and each single signal line may optionally be a bus.
      The processing device 1202 represents one or more general-purpose processing devices, such as a microprocessor or central processing unit, or the like. More specifically, the processing device may be a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set Computer (RISC) microprocessor, a Very Long Instruction Word (VLIW) microprocessor, or a processor implementing other instruction sets, or a processor implementing a combination of instruction sets. The processing device 1202 may also be one or more special purpose processing devices, such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), a network processor, or the like. The processing device 1202 is configured to execute the processing logic 1226, which processing logic 1226 may be one example of a health monitor 1250 and related systems for performing the operations and steps discussed herein.
      The data storage 1218 may include a machine-readable storage medium 1228 on which is stored one or more sets of instructions 1222 (e.g., software) embodying any one or more of the methodologies of the functions described herein, including instructions to cause the processing device 1202 to execute the health monitor 1250 and related processes described herein. The instructions 1222 may also reside, completely or at least partially, within the main memory 1204 or within the processing device 1202 during execution thereof by the computer system 1200, the main memory 1204 and the processing device 1202 also constituting machine-readable storage media. The instructions 1222 may also be transmitted or received over the network 1220 via the network interface device 1208.
      The machine-readable storage medium 1228 may also be used to store instructions to perform a method for monitoring the health of a user, as described herein. While the machine-readable storage medium 1228 is shown in an exemplary embodiment to be a single medium, the term "machine-readable storage medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) that store the one or more sets of instructions. A machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable medium may include, but is not limited to, magnetic storage media (e.g., floppy disks), optical storage media (e.g., CD-ROMs), magneto-optical storage media, read-only memory (ROMs), random Access Memory (RAM), erasable programmable memory (e.g., EPROMs and EEPROMs), flash memory, or other type of media suitable for storing electronic instructions.
      The foregoing description sets forth numerous specific details, such as examples of specific systems, components, methods, etc., in order to provide a thorough understanding of various embodiments of the present invention. It will be apparent, however, to one skilled in the art that at least some embodiments of the invention may be practiced without these specific details. In other instances, well-known components or methods have not been described in detail or presented in simple block diagram format in order to avoid unnecessarily obscuring the present invention. Therefore, the specific details set forth are merely exemplary. Specific embodiments may vary from these exemplary details and still be contemplated to be within the scope of the present invention.
      Additionally, some embodiments may be implemented in a distributed computing environment where a machine-readable medium is stored on or executed by more than one computer system. In addition, information transferred between computer systems may be pulled or pushed across a communications medium connecting the computer systems.
      Embodiments of the claimed subject matter include, but are not limited to, the various operations described herein. These operations may be performed by hardware components, software, firmware, or a combination thereof.
      Although the operations of the methods herein are shown and described in a particular order, the order of the operations of the methods may be altered so that certain operations may be performed in an inverse order or so that certain operations may be performed at least partially concurrently with other operations. In another embodiment, instructions or sub-operations of different operations may be in an intermittent or alternating manner.
      The above description of illustrated implementations of the application, including what is described in the abstract, is not intended to be exhaustive or to limit the application to the precise forms disclosed. As will be recognized by those skilled in the art, while specific implementations and examples of the application are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the application. The terms "exemplary" or "illustrative" are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "exemplary" or "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects or designs. Conversely, the use of the terms "example" or "exemplary" is intended to present concepts in a concrete fashion. As used in this disclosure, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise or clear from context, "X includes a or B" is intended to mean any natural inclusive permutation. That is, if X includes A, X includes B, or X includes both a and B, then in any of the above cases, "X includes a or B" is satisfied. In addition, the articles "a" and "an" as used in this disclosure and the appended claims should generally be construed to mean "one or more" unless specified otherwise or clear from context to be directed to a single form. Furthermore, the use of the terms "embodiment" or "one embodiment" or "an implementation" or "one implementation" throughout this specification is not intended to mean the same embodiment or implementation, unless described as such. Furthermore, the terms "first," "second," "third," "fourth," and the like as used herein refer to a label that is used to distinguish between different elements and may not necessarily have an ordinal meaning as specified by their numbers.
      It will be appreciated that variations of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims. The claims may cover embodiments in hardware, software, or a combination thereof.
      In addition to the embodiments described above, the present invention includes, but is not limited to, the following example implementations.
      Some example implementations provide a method of monitoring cardiac health of a user. The method may include receiving measured health indicator data and other factor data of a user at a first time, inputting, by a processing device, the health indicator data and other factor data into a machine learning model, wherein the machine learning model generates predicted health indicator data at a next time step, receiving the data of the user at the next time step, determining, by the processing device, a loss at the next time step, wherein the loss is a measure between the predicted health indicator data at the next time step and the measured health indicator data of the user at the next time step, determining that the loss exceeds a threshold, and outputting a notification to the user in response to determining that the loss exceeds the threshold.
      In some example implementations of the method of any example implementation, the trained machine learning model is a trained generated neural network. In some example implementations of the method of any example implementation, the trained machine learning model is a feed forward network. In some example implementations of the method of any example implementation, the trained machine learning model is an RNN. In some example implementations of the method of any example implementation, the trained machine learning model is CNN.
      In some example implementations of the method of any example implementation, the trained machine learning model is trained by training examples from one or more of a healthy crowd, a crowd with heart disease, and a user.
      In some example implementations of the method of any example implementation, the loss at the next time step is an absolute value of a difference between the predicted health indicator data at the next time step and the measured health indicator of the user at the next time step.
      In some example implementations of the method of any example implementation, the predicted health indicator data is a probability distribution, and wherein the predicted health indicator data at the next time step is sampled according to the probability distribution.
      In some example implementations of the method of any example implementation, the predicted health index data at the next time step is sampled according to a sampling technique selected from the group consisting of maximum probability of the predicted health index data, and the predicted health index data is randomly sampled according to a probability distribution.
      In some example implementations of the method of any example implementation, the predicted health indicator data is a probability distribution (β), and wherein the loss is determined based on a negative logarithm of the probability distribution at a next time step evaluated with the measured health indicator of the user at the next time step. In some example implementations of the method of any example implementation, the method further includes self-sampling of the probability distribution.
      In some example implementations of the method of any example implementation, the method further includes averaging the predicted health indicator data over a time period of the time step, averaging the measured health indicator data of the user over a time period of the time step, and determining the loss based on an absolute value of a difference between the predicted health indicator data and the measured health indicator data.
      In some example implementations of the method of any example implementation, the measured health indicator data includes PPG data. In some example implementations of the method of any example implementation, the measured health indicator data includes heart rate data.
      In some example implementations of the method of any example implementation, the method further includes resampling the irregularly spaced heart rate data onto a regularly spaced grid, wherein the heart rate data is sampled according to the regularly spaced grid.
      In some example implementations of the method of any example implementations, the measured health indicator data is one or more health indicator data selected from the group consisting of PPG data, heart rate data, pulse oximeter data, ECG data, and blood pressure data.
      Some example limitations provide an apparatus comprising a mobile computing device including a processing device, a display, a health indicator data sensor, and a memory having stored thereon instructions that, when executed by the processing device, cause the processing device to receive measured health indicator data from the health indicator data sensor at a first time and other factor data at the first time, input the health indicator data and other factor data into a trained machine learning model, and wherein the trained machine learning model generates predicted health indicator data at a next time step, receive the measured health indicator data and other factor data at the next time step, determine a loss at the next time step, wherein the loss is a measure between the predicted health indicator data at the next time step and the measured health indicator data at the next time step, and output a notification if the loss at the next time step exceeds a threshold.
      In some example implementations of any example apparatus, the trained machine learning model includes a trained generation neural network. In some example implementations of any example apparatus, the trained machine learning model includes a feed forward network. In some example implementations of any example apparatus, the trained machine learning model is an RNN. In some example implementations of the method of any example implementation, the trained machine learning model is CNN.
      In some example implementations of any example device, the trained machine learning model is trained with a training example from one of the group consisting of a healthy crowd, a crowd with heart disease, and a user.
      In some example implementations of any example apparatus, the predicted health indicator data is a point prediction of the user health indicator at a next time step, and wherein the loss is an absolute value of a difference between the predicted health indicator data at the next time step and the measured health indicator data at the next time step.
      In some example implementations of any example apparatus, the predicted health indicator data is sampled according to a probability distribution generated by a machine learning model.
      In some example implementations of any example apparatus, the predictive health indicator data is sampled according to a sampling technique selected from the group consisting of maximum probability and randomly sampling according to a probability distribution.
      In some example implementations of any example apparatus, the predicted health indicator data is a probability distribution (β), and wherein the loss is determined based on a negative logarithm of β evaluated with the measured health indicator of the user at the next time step.
      In some example implementations of any example apparatus, the processing means is further to define a function α ranging from 0 to 1, wherein I t includes a linear combination of measured health index data and a predicted health index number for the user as a function of α.
      In some example implementations of any example apparatus, the processing means is further to perform self-sampling of the probability distribution.
      In some example implementations of any example apparatus, the processing means is further for averaging, using an averaging method, the predicted health indicator data sampled according to the probability distribution over a time period of the time step, averaging, using an averaging method, the measured health indicator data of the user over the time period of the time step, defining the loss as an absolute value of a difference between the averaged predicted health indicator data and the measured health indicator data.
      In some example implementations of any example apparatus, the averaging method includes one or more methods selected from the group consisting of calculating an average, calculating an arithmetic average, calculating a median, and calculating a mode.
      In some example implementations of any example device, the measured health indicator data includes PPG data from the PPG signal. In some example implementations of any example device, the measured health indicator data is heart rate data. In some example implementations of any example device, the heart rate data is collected by resampling irregularly spaced heart rate data onto a regularly spaced grid and sampling the heart rate data according to the regularly spaced grid. In some example implementations of any example device, the measured health indicator data is one or more health indicator data selected from the group consisting of PPG data, heart rate data, pulse oximeter data, ECG data, and blood pressure data.
      In some example implementations of any example apparatus, the mobile device is selected from the group consisting of a smart watch, a fitness bracelet, a tablet computer, and a laptop computer.
      In some example implementations of any example apparatus, the mobile device further comprises a user high-fidelity sensor, wherein the notification requests the user to obtain high-fidelity measurement data, and wherein the processing device is further configured to receive an analysis of the high-fidelity measurement data, tag the user's measured health indicator data with the analysis to generate tagged user health indicator data, and use the tagged user health indicator data as a training example to train the trained personalized high-fidelity machine learning model.
      In some example implementations of any example apparatus, the trained machine learning model is stored in memory. In some example implementations of any example apparatus, the trained machine learning model is stored in a remote memory, wherein the remote memory is separate from the computing device, and wherein the mobile computing device is a wearable computing device. In some example implementations of any example apparatus, the trained personalized high fidelity machine learning model is stored in a memory. In some example implementations of any example apparatus, the trained personalized high-fidelity machine learning model is stored in a remote memory, wherein the remote memory is separate from the computing device, and wherein the mobile computing device is a wearable computing device.
      In some example implementations of any example apparatus, the processing means is further to predict that the user is experiencing atrial fibrillation and determine an atrial fibrillation load of the user.
      Some example implementations provide a method of monitoring cardiac health of a user. The method may include receiving measured low fidelity user health indicator data and other factor data at a first time, inputting data including the user health indicator data and other factor data at the first time into a personalized trained high fidelity machine learning model, wherein the personalized trained high fidelity machine learning model predicts whether the user's health indicator data is abnormal, and sending a notification of the user's health abnormality if the prediction is abnormal.
      In some example implementations of the method of any example implementation, the trained personalized high fidelity machine learning model is trained by measuring low fidelity user health metric data that is labeled with an analysis of the high fidelity measurement data.
      In some example implementations of the method of any example implementation, the analysis of the high-fidelity measurement data is based on user-specific high-fidelity measurement data.
      In some example implementations of the method of any example implementation, the personalized high fidelity machine learning model outputs a probability distribution, wherein the predictions are sampled according to the probability distribution.
      In some example implementations of the method of any example implementation, the prediction is sampled according to a sampling technique selected from the group consisting of maximum probability prediction, and sampling the prediction according to a probability distribution.
      In some example implementations of the method of any example implementation, the average prediction is determined by averaging predictions over a period of time of the time step using an averaging method, and wherein the average prediction is used to determine whether the user's health indicator data is normal or abnormal.
      In some example implementations of the method of any example implementation, the averaging method includes one or more methods selected from the group consisting of calculating an average, calculating an arithmetic average, calculating a median, and calculating a mode.
      In some example implementations of the method of any example implementation, the personalized high fidelity training machine learning model is stored in a memory of the user wearable device. In some example implementations of the method of any example implementation, the measured health indicator data and other factor data are time segments of data over a period of time.
      In some example implementations of the method of any example implementation, the personalized high fidelity training machine learning model is stored in a remote memory, wherein the remote memory is located at a location remote from the user wearable computing device.
      In some example implementations, a health monitoring device may include a mobile computing apparatus including a microprocessor, a display, a user health indicator data sensor, and a memory having stored thereon instructions that, when executed by the microprocessor, cause the processing apparatus to receive measured low-fidelity health indicator data and other factor data at a first time, wherein the measured health indicator data is obtained by the user health indicator data sensor, input data including the health indicator data and other factor data at the first time into a trained high-fidelity machine learning model, wherein the trained high-fidelity machine learning model predicts whether the health indicator data of a user is normal or abnormal, and send a notification of the health abnormality of the user to at least the user in response to the prediction being abnormal.
      In some example implementations of the health monitoring device of any example implementation, the trained high-fidelity machine learning model is a trained high-fidelity generation neural network. In some example implementations of the health monitoring device of any example implementation, wherein the trained high-fidelity machine learning model is a trained Recurrent Neural Network (RNN). In some example implementations of the health monitoring device of any example implementation, the trained high-fidelity machine learning model is a trained feedforward neural network. In some example implementations of the health monitoring device of any example implementation, the trained high-fidelity machine learning model is CNN.
      In some example implementations of the health monitoring device of any example implementation, the trained high-fidelity machine learning model is trained by measuring user health metric data labeled based on user-specific high-fidelity measurement data.
      In some example implementations of the health monitoring device of any example implementation, the trained high-fidelity machine learning model is trained by low-fidelity health metric data labeled based on high-fidelity measurement data, wherein the low-fidelity health metric data and the high-fidelity measurement data are from a population of subjects.
      In some example implementations of the health monitoring device of any example implementation, the high fidelity machine learning model outputs a probability distribution, wherein the predictions are sampled according to the probability distribution.
      In some example implementations of the health monitoring device of any of the example implementations, the prediction is sampled according to a sampling technique selected from the group consisting of maximum probability prediction, and the prediction is randomly sampled according to a probability distribution.
      In some example implementations of the health monitoring device of any example implementation, the average prediction is determined by averaging predictions over a period of time of the time step using an averaging method, and wherein the average prediction is used to determine whether the health indicator data of the user is normal or abnormal.
      In some example implementations of the health monitoring device of any example implementations, the measured health indicator data and other factor data are time segments of data over a period of time.
      In some example implementations of the health monitoring device of any of the example implementations, the averaging method includes one or more methods selected from the group consisting of calculating an average, calculating an arithmetic average, calculating a median, and calculating a mode.
      In some example implementations of the health monitoring device of any example implementation, the personalized high fidelity training machine learning model is stored in a memory. In some example implementations of the health monitoring apparatus of any example implementation, the personalized high fidelity training machine learning model is stored in a remote memory, wherein the remote memory is located at a location remote from the wearable computing device. In some example implementations of the health monitoring apparatus of any example implementations, the mobile device is selected from the group consisting of a smart watch, a fitness bracelet, a tablet computer, and a laptop computer.