Disclosure of Invention
In view of the defects of the prior art, the invention provides a daily behavior migration recognition method based on a DSN (Domain Separation Networks) deep antagonistic migration network. Compared with the existing heterogeneous intelligent home behavior identification method, the method can utilize the DSN deep generation countermeasure network to automatically extract the expressive force characteristics, and simultaneously, the integrated learning and field self-adaption method is added to achieve the purposes of identifying the unique daily behavior of the target domain by the similar multi-source domain transfer learning and improving the accuracy of daily behavior identification.
The technical means adopted by the invention are as follows:
a daily behavior migration recognition method based on a DSN deep antagonistic migration network comprises the following steps:
acquiring a plurality of candidate source domains and target domains, wherein the daily behavior labels in the candidate source domains are known, and the daily behavior labels in the target domains are partially known or completely unknown;
respectively mapping daily behavior labels and sensors of the candidate source domain and the target domain to the same space;
acquiring daily behavior feature vectors corresponding to the daily behavior labels in the same space, and processing the daily behavior feature vectors based on a distance measurement method, so as to screen out similar source domains of the target domain from the candidate source domains;
adopting a domain self-adaptive method to approximate the feature vector distribution of each similar source domain and target domain;
combining the feature vectors of each similar source domain and the target domain respectively to be used as the input of the DSN, thereby training and obtaining the base classifiers with the quantity equivalent to that of the similar source domains;
and performing ensemble learning on the classification result of the feature vector in the target domain by each base classifier so as to obtain the daily behavior recognition result of the target domain.
Further, mapping the daily behavior labels of the candidate source domain and the target domain to the same space includes:
extracting all known daily behavior labels of the candidate source domain and the target domain;
putting an original daily behavior label into a Word2vec model for training to obtain a daily behavior label digital feature vector which corresponds to the daily behavior label and has semantic content;
and dividing the daily behavior labels corresponding to the two daily behavior label digital feature vectors exceeding a certain threshold into the same daily behavior by utilizing the cosine similarity and the distance between the daily behavior label digital feature vectors, thereby obtaining a daily behavior template integrated by similar daily behaviors and completing the mapping of the daily behaviors.
Further, mapping the sensors of the candidate source domain and the target domain to the same space includes:
acquiring configuration vectors of all sensors of a candidate source domain and a candidate target domain, wherein the configuration vectors comprise positions, occurrence frequency and types of each daily behavior;
inputting the sensor configuration vector into a Word2vec model for training to obtain a sensor data vector with digital characteristics corresponding to the type of the sensor;
and clustering the sensor data vectors, and taking the sensors corresponding to the sensor data vectors in the same cluster as the same sensor based on the clustering result to complete sensor mapping.
Further, the method based on distance measurement processes each daily behavior feature vector, so as to screen out a similar source domain of the target domain from the candidate source domain, including:
acquiring daily behavior feature vectors of each candidate source domain and each candidate target domain in a mapping space;
and respectively calculating the distance characteristics of the daily behavior characteristic vector of each candidate source domain and the daily behavior characteristic vector of the target source domain, and screening out the candidate source domains with the distance characteristics meeting the preset requirements as similar source domains.
Compared with the prior art, the invention has the following advantages:
1. the invention adopts a DSN method for deeply resisting network migration, can automatically extract more expressive characteristics, and has better effect than manually extracting the characteristics. The unique network structure of the DSN keeps the uniqueness of the characteristics, the similarity of the characteristic sharing part is improved by utilizing the generation countermeasure thought, meanwhile, the DSN network can also keep the unique part of the characteristics of each domain, and the negative migration is effectively avoided. And the extracted features and the integrated learning method are utilized, so that the purpose of identifying the unique daily behaviors of the target domain or mutually identifying the daily behaviors in multiple domains by utilizing multiple source domains can be achieved.
2. The method creates innovations for the sensor and the daily behavior mapping method. And mapping is carried out through a word2vec semantic model and a clustering method. The daily behaviors can be better identified by utilizing semantic model mapping.
3. The method utilizes the distance measurement to select the similar data sets, utilizes a field self-adaptive method to improve the similarity between the sample characteristics, and can improve the effect of transfer learning compared with the existing method.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1 to 3, the present invention provides heterogeneous smart home behavior recognition based on a DSN deep-countermeasure migration network, including the following steps:
s1, obtaining a plurality of candidate source domains and target domains, wherein the daily behavior labels in the candidate source domains are known, and the daily behavior labels in the target domains are partially known or totally unknown.
And S2, mapping the daily behavior labels and the sensors of the candidate source domain and the target domain to the same space respectively.
Specifically, mapping the daily behavior labels of the candidate source domain and the target domain to the same space, as shown in fig. 2, includes:
s211, extracting all known daily behavior labels of the candidate source domain and the target domain;
s212, putting the original daily behavior label into a Word2vec model for training to obtain a daily behavior label digital feature vector which corresponds to the daily behavior label and has semantics;
and S213, dividing the daily behavior labels corresponding to the two daily behavior label digital feature vectors exceeding a certain threshold into the same daily behavior by using the cosine similarity and the distance between the daily behavior label digital feature vectors, thereby obtaining a daily behavior template integrated by similar daily behaviors and completing daily behavior mapping. Wherein, the threshold value is a value with better result which is selected in advance according to the experimental result and is used as the standard measurement. The cosine similarity formula is as follows:
wherein x and y are two different daily behavior label digital feature vectors respectively.
Further, mapping the sensors of the candidate source domain and the target domain to the same space, as shown in fig. 3, includes:
s221, obtaining configuration vectors of all sensors of the candidate source domain and the candidate target domain, wherein the configuration vectors comprise positions, occurrence frequency and types of each daily behavior;
s222, inputting the sensor configuration vector into a Word2vec model for training to obtain a sensor data vector with digital characteristics corresponding to the sensor type;
and S223, clustering the sensor data vectors, and taking the sensors corresponding to the sensor data vectors in the same cluster as the same sensor based on the clustering result to complete sensor mapping. Preferably, the K-Means method and the DBSCAN method are used for clustering, and the method with the better clustering effect is selected. In a preferred embodiment of the invention, a K-Means method is adopted, which is also named as a K-Means algorithm, wherein K represents that the clusters are K clusters, and Means represents that the mean value of the data values in each cluster is taken as the center of the cluster. The algorithm idea is roughly as follows: firstly, randomly selecting K vectors from a sample set as cluster centers, calculating the distance between all the vectors and the K cluster centers, dividing each vector into the cluster where the cluster center closest to the vector is located, and calculating the new cluster center of each cluster for the new cluster until the cluster center does not move.
And S3, acquiring daily behavior feature vectors corresponding to the daily behavior labels in the same space, and processing the daily behavior feature vectors based on a distance measurement method, so as to screen out similar source domains of the target domain from the candidate source domains.
Specifically, the method shown in fig. 2 and 3 is firstly adopted to map the sensor, daily activities of the randomly selected multi-source domain and the target domain to the same space. The extracted sensor configuration vectors include: location, sensor type, frequency of triggers in each day-to-day activity. Sensor type: magnetic door sensor, light sensor, infrared motion sensor, wide area infrared motion sensor, temperature sensor. The daily behavior of each data set: rest, clean, go out, go home, sleep, cook, eat, wash dishes, go to the toilet, work. Then, feature vectors of daily behaviors are obtained, and a source domain similar to a target domain is found by using a Distance measurement method (a Rank of domain method in a Wassertein Distance method or a GFK (Geodesic Flow Kernel) method). Wherein, the feature vector of the daily behavior comprises: the starting time of the daily behavior, the ending time of the daily behavior, the duration time of the daily activity, the proportion of the mapped sensor streams of the daily behavior, and the daily behavior label. And respectively calculating the distance characteristics of the daily behavior characteristic vector of each candidate source domain and the daily behavior characteristic vector of the target source domain, and screening out the candidate source domains with the distance characteristics meeting the preset requirements as similar source domains.
And S4, adopting a domain self-adaptive method to approximate the feature vector distribution of each similar source domain and target domain.
Specifically, by mapping similar source domain and target domain sensor, daily behaviors to the same space, we obtain daily behavior feature vectors of the same dimension. And then observing the domain distribution, and utilizing a proper domain self-adaptive method to draw the feature vector distribution of the similar source domain and the target domain, so that the feature vector distribution of the source domain and the target domain is closer, and the training effect in the DSN network is better. In a preferred embodiment of the invention, the domain adaptive method is, for example, a TCA method: it is assumed that there is a feature mapping such that the edge distributions of the mapped source domain and target domain are close, wherein the feature mapping is implemented using MMD distance metrics and kernel functions. As can be seen from the probability distribution map, the probability distributions of the source domain and the target domain become more overlapping.
And S5, combining the feature vector of each similar source domain after being drawn and the feature vector of the target domain respectively to be used as the input of the DSN, thereby training and obtaining the base classifiers with the quantity equivalent to that of the similar source domains. Based on each base classifier, a primary classification result output by a plurality of base classifiers is obtained, namely, data of a target domain is marked by a label approximate to a source domain.
And S6, performing ensemble learning on the classification result of the feature vector in the target domain by each base classifier, thereby obtaining the daily behavior recognition result of the target domain.
Specifically, the results of the plurality of basis classifiers are put into ensemble Learning, and the ensemble Learning method of Stacking Learning is preferably used in this embodiment. The output of each base classifier is added with weight, the predicted output of each base classifier to the same characteristic is used as the input of the last classifier, and the target domain data is labeled again. The aim of identifying the daily behaviors (including the unique daily behaviors) of the target domain by the multi-source domain is fulfilled.
The following further describes the aspects and effects of the present invention based on specific application examples.
Example 1: daily behaviors of the target domain are identified by using daily behavior sensor event streams collected at two different smart home apartments (the two apartment data are processed by similarity, namely similar source domains are selected for migration). After daily behavior mapping and sensor mapping, feature spaces of all daily behaviors are unified. The common daily behaviors of three apartments are: eating, sleeping, cooking, resting, cleaning, going out, going home, washing dishes, working, bathing and going to the toilet. The first nine daily behaviors are common daily behaviors of the three data sets, bathing only occurs in a first source domain and a target domain, and toileting only occurs in a second source domain and the target domain. The daily behaviors of the target domain can be identified by a plurality of source domains through integrated learning, and the problem that a certain source domain does not have the identification of a certain daily behavior of the target domain is solved. That is, all daily activities of the target domain can be identified by using the two source domains (as long as one source domain in the multi-source domain has the daily activities). The data set of the two apartments can identify daily behaviors of all categories of the third apartment, and the accuracy of identification is improved by using the DSN network extraction features as a base classifier.
Selecting a data set: data sets published by Washington State University (Washington State University) CASAS (center for Advanced students in Adaptive systems). Wherein CASAS is the largest and most widely used daily behavior recognition dataset in scale so far. The daily behavior datasets-hh 102, hh103, hh104, hh105 dataset (as preselected source domain), and hh106 dataset (as target domain) were randomly selected among them for a single user. The five data sets respectively consist of sensor information collected by different volunteers in different intelligent household environment layouts. Sensor information: time of trigger, location, sensor type, sensor name, daily behavior tag. Sensor type: magnetic door sensor, light sensor, infrared motion sensor, wide area infrared motion sensor, temperature sensor. The daily behavior of each data set: rest, clean, go out, go home, sleep, cook, eat, wash dishes, go to the toilet, work. For each dataset, redundant sensor sequences without daily activity markers are deleted, leaving only labeled sensor sequences. The environment layout of each data set and the installation position of the sensor are shown in fig. 4, 5, 6, 7, and 8 in this order.
Mapping the sensors and the daily behavior labels of the five domains to the same feature space by the methods of the figure 1 and the figure 2 respectively, and obtaining the feature vector of each domain: the starting time of the daily behavior, the ending time of the daily behavior, the duration time of the daily behavior, the proportion of the mapped sensors in the sensor stream of the daily behavior, and the daily behavior label. The feature vector of this daily activity, such as sleeping, can be represented as the following numerical features (22:00,7:00,9,1/3,1/3,0,0,0,0,0,0,1/3,0,0.2,0.2, 0.6).
Through distance measurement, a plurality of source domains similar to the target domain are found: hh102, hh 103. And repeating the second step, and mapping the sensors and daily behaviors of the similar source domain and the target domain to obtain new feature vectors of each domain. And then, respectively approximating the distribution of the feature vectors of hh102, hh103 and the target domain hh106 by using a domain adaptive method. Mapped daily behavior templates (removing tense, near word interferences): eating, sleeping, cooking, resting, cleaning, going out, going home, washing dishes, working, bathing and going to the toilet. The mapped sensor code numbers are numbers 1-10. To demonstrate the functionality of the method, when processing the data sets, we consider the first nine daily activities to be common daily activities of the three data sets, bathing only occurring in hh102, hh106, and toileting only occurring in hh103, hh 106.
Now there are two source domains similar to the target domain, so using the combination one: hh102, hh106, and combination two: training the DSN by using the characteristic vectors of hh103 and hh106 to obtain two base classifiers, and respectively marking the characteristic vectors of the target domain by using the labels of the source domains. The marking result can be observed, the recognition effect on the public daily behaviors and bathing in the target domain is good in the first combination, and the recognition on the toilet daily behaviors in hh106 is not ideal. The same principle is used for combining the two recognition effects.
The result of the base classifier is put into the integrated learning model for calibration again, and the result shows that after integrated learning, the identification of the daily behaviors of the target domain is good, including the daily behaviors of bathing and toileting types (the daily behaviors of the target domain do not exist in a single source domain), and meanwhile, the identification precision of the public daily behaviors is improved.
In the above-described embodiments of the present invention, the description of the daily behaviors in each apartment has each emphasis, and the recognition effect of the daily behaviors that do not exist in a single source domain but exist in a target domain is highlighted.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other ways. The above-described clustering, distance measurement, and method for improving similarity between the source domain and the target domain are merely exemplary, and may be converted into other methods with better effect.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.