CN102201048A - Method and system for performing topic-level privacy protection on document set - Google Patents
Method and system for performing topic-level privacy protection on document set Download PDFInfo
- Publication number
- CN102201048A CN102201048A CN2010101325939A CN201010132593A CN102201048A CN 102201048 A CN102201048 A CN 102201048A CN 2010101325939 A CN2010101325939 A CN 2010101325939A CN 201010132593 A CN201010132593 A CN 201010132593A CN 102201048 A CN102201048 A CN 102201048A
- Authority
- CN
- China
- Prior art keywords
- document
- keyword
- responsive
- collection
- privacy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000008676 import Effects 0.000 claims description 6
- 230000035945 sensitivity Effects 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 2
- 238000012217 deletion Methods 0.000 claims 1
- 230000037430 deletion Effects 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 7
- 230000008859 change Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 15
- 238000003860 storage Methods 0.000 description 12
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000007480 spreading Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000002715 modification method Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 206010070834 Sensitisation Diseases 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000008313 sensitization Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention provides a method and system for performing topic-level privacy protection on a document set. The method comprises the steps of: inputting the document set and a topic-level privacy strategy, wherein the privacy strategy comprises one or more topic key words which need privacy protection; expanding the topic key words to generate one or more sensitive key words; and determining privacy documents from the document set based on the generated sensitive key words. According to different embodiments of the invention, the sensitive key words are generated based on one or both of the internal features of the document set and the external knowledge (ontology). A training document is not needed in the method provided by the invention; and therefore, the system is more efficient, flexible and practical, a large quantity of privacy strategies can be synchronously processed, furthermore, the dynamic change of the privacy strategy can be supported conveniently.
Description
Technical field
The secret protection that relates to collection of document that the present invention is general more specifically, relates to the method and system that collection of document is carried out the secret protection of theme rank.
Background technology
Along with the fast development of computer and network technologies, the obtainable information of people has presented the feature of digitizing and magnanimityization.Yet digitizing and networking make that also information is carried out secret protection and management to be become and have more challenge.Particularly in the occasion that relates to health and fitness information, the contour private information of accounts information, this problem especially exigence is resolved.For example, share for the ease of medical information at present, the use of electronic health record and electronic health care archives has become trend, medical related personnel such as medical personnel, medical investigator, hygiene department, insurance company can carry out related service work based on the electron medical treatment data easily, yet angle from patient, they should have the privacy control to relevant he or she's case history or health account, most typical situation then is that patient does not wish that some does not have the people through its mandate, can learn that by the electron medical treatment data that get access to it suffers from certain sensitive diseases.
Now; search has become the basic tool that people handle mass digital information; and how to make the searchers obtain the information that it needs fast, conveniently, accurately; simultaneously can protect the privacy of information owner or information content relevant people not revealed again; promptly realize the balance of search quality and secret protection, become a right difficult problem of search system demand side.
Generally, the document owner or privacy relevant people are by formulating privacy policy (privacypolicy) and come the scope of personalized definition privacy and for the protection strategy of privacy relevant information.The search (privacy-preserving search) that takes into account secret protection needs the problem of solution is exactly how to satisfy privacy policy in search system, keeps high as far as possible search quality simultaneously.
In the actual search system, access control be the most frequently used also be the most effective privacy control method, promptly privacy everyone to comprising the document setup access control policy of privacy information, be set with the searchers that authority or lack of competence are obtained the privacy document.Secret protection based on access control comprises three steps usually: the definition of (1) privacy policy: everyone determines the connotation and the scope of privacy privacy; (2) judgement of privacy document: judge whether a document comprises the privacy content; (3) setting of access control: for each privacy document is set access strategy.When the large volume document, privacy everyone can not be manual the privacy that each document is carried out one by one judge and the access control setting.In addition, when everyone changes its privacy policy when privacy, document is judged again and is set one by one again that also can not be manual.Based on privacy policy, how to realize judging automatically the privacy document and set access control, guarantee the accuracy of access control simultaneously, be the problem that need solve when facing large volume document.
Exist following technology to be used to handle the problems referred to above in the prior art:
In the system that the European patent EP 1638032A3 (being submitted on September 6th, 2005) that is entitled as " Method; System and Apparatus for Maintaining User Privacy in aKnowledge Interchange System " is proposed, allow the privacy policy of user definition keyword-level (keyword-level), promptly determine some responsive key word.The customer documentation information that comprises these responsive key words will not be sent to and supply on the server to share, thereby reach the purpose of access control and secret protection.
In addition, the U.S. Pat 7409406B2 (being submitted on September 8th, 2003) that is entitled as " Uniform Search System and Method for Selectively SharingDistributed Access-Controlled Documents " passes to the document owner (privacy relevant people) oneself with the right of execution of access control, rather than is taken on by search server.Each document owner stores the document (comprising privacy document and non-privacy document) of oneself.Server is only preserved the document index of having done secret protection.When server received search inquiry, according to index, server was transmitted to the relevant documentation owner with inquiry, responds inquiry according to its privacy policy and access control policy voluntarily by the document owner then.
Also have, the U.S. Pat 2009/0144255A1 (being submitted on November 29th, 2007) that is entitled as " Augmenting Privacy Policies with Inference Detection " supports the privacy policy of user definition theme rank (topic-level), be that the user can define specific responsive theme, all documents that relate to responsive theme all should be judged as the privacy document.Responsive theme is represented with one or one group of topic keyword.This patent is at each privacy policy (responsive theme), prepare one group and judged manually that good corresponding sensitive documents is as the training document, pass through the method for natural language statistical study (statistical natural language analysis) then, from the new key word that can represent this sensitivity theme of training document the inside study, the key word that is used to define theme united in amiable these newly-generated key words, together as the set of keywords that is used to judge the privacy document.
Yet there are some defectives in prior art.In the existing work of Jie Shaoing, patent EP1638032A3 uses the privacy policy of keyword-level in the above, and the shortcoming of this method is that the user is difficult to the relevant key word of limit privacy, thereby causes being difficult to realize available secret protection.Patent US7409406B2 transfers to the document owner in order to evade the privacy disclosure risk of server with access control power from search server, this method can't be used in a lot of actual conditions, because the document owner or privacy relevant people are in off-line state (such as the patient in the electron medical treatment infosystem) often, allow their the real-time processing request of access be irrational.Patent US2009/0144255A1 has realized the support to other privacy policy of subject matter level, overcome the shortcoming of patent EP1638032A3, but this patent need be prepared the training document for each privacy policy in the expansion topic keyword, yet the preparation of training document needs artificial mark, and is very consuming time.Especially when the large volume document owner has set up a large amount of privacy policies, and the user may in use change privacy policy, lacks flexiblely in this case based on the method for training document, is inapplicable in actual applications.
Summary of the invention
The present invention be directed to the problems referred to above makes.
The present invention proposes a kind of new, full automatic, method and system of collection of document being carried out the secret protection of theme rank; this method utilization is carried out statistical study to collection of document self or is used the combination of body (external knowledge source) or collection of document and text; automatically obtain the relevant keyword of responsive theme, these keywords are used for judging the privacy document from collection of document.In addition, utilize the collection of document that determines the privacy document, can realize taking into account the document searching of theme rank secret protection.
According to first aspect present invention, a kind of method that collection of document is carried out the secret protection of theme rank is provided, comprising: input collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection; Based on the internal feature expansion subject key words of collection of document itself, to generate one or more responsive keywords; And from collection of document, judge the privacy document based on the responsive keyword that generates.
According to second aspect present invention, a kind of method that collection of document is carried out the secret protection of theme rank is provided, comprising: input collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection; According to external knowledge expansion subject key words, to generate one or more responsive keywords; And from collection of document, judge the privacy document based on the responsive keyword that generates.
According to third aspect present invention, a kind of method that collection of document is carried out the secret protection of theme rank is provided, comprising: input collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection; Based on the internal feature expansion subject key words of collection of document itself, to generate the set of the first responsive keyword; According to external knowledge expansion subject key words, to generate the set of the second responsive keyword; Set according to the second responsive keyword is revised the set of the first responsive keyword; Replenish according to the set of the first responsive keyword set the second responsive keyword; To merge through the set of the first responsive keyword revised with through the set of the second responsive keyword that replenishes, to obtain the set of final responsive keyword; And from collection of document, judge the privacy document based on the responsive keyword in the responsive keyword set.
According to fourth aspect present invention, a kind of system that collection of document is carried out the secret protection of theme rank is provided, comprise: input media, be used to import collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection; Responsive keyword generating apparatus is used for the internal feature expansion subject key words based on collection of document itself, to generate one or more responsive keywords; And privacy document decision maker, be used for judging the privacy document from collection of document based on the responsive keyword that generates.
According to fifth aspect present invention, a kind of system that collection of document is carried out the secret protection of theme rank is provided, comprise: input media, be used to import collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection; The external knowledge storer is used to store external knowledge; Responsive keyword generating apparatus is used for according to external knowledge expansion subject key words, to generate one or more responsive keywords; And privacy document decision maker, be used for judging the privacy document from collection of document based on the responsive keyword that generates.
According to sixth aspect present invention, a kind of system that collection of document is carried out the secret protection of theme rank is provided, comprise: input media, be used to import collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection; The external knowledge storer is used to store external knowledge; The first responsive keyword generating apparatus is used for the internal feature expansion subject key words based on collection of document itself, to generate the set of the first responsive keyword; The second responsive keyword generating apparatus is used for according to external knowledge expansion subject key words, to generate the set of the second responsive keyword; Correcting device is used for according to the set of the second responsive keyword set of the first responsive keyword being revised; Supplementary device is used for replenishing according to the set of the first responsive keyword set to the second responsive keyword; Merge device, be used for merging, to obtain the set of final responsive keyword through the set of the first responsive keyword revised with through the set of the second responsive keyword that replenishes; And privacy document decision maker, be used for from collection of document, judging the privacy document based on the responsive keyword of responsive keyword set.
With respect to towards keyword-level privacy policy method, the present invention supports theme rank privacy policy, thereby realizes intelligence, comprehensively secret protection more.In addition, with respect to existing other method for secret protection of subject-oriented level, the present invention is not owing to need to train document, can make system become high-efficiency soft and practicality, can handle a large amount of privacy policies simultaneously, and support the dynamic change of privacy policy easily.
Description of drawings
In conjunction with the accompanying drawings, from following detailed description to the embodiment of the invention, will understand the present invention better, similar label is indicated similar part in the accompanying drawing, wherein:
Fig. 1 illustrates the block diagram according to the inner structure of the document search system of having realized the secret protection of theme rank of the present invention;
Fig. 2 further illustrates the block diagram according to the inner structure of responsive keyword generating apparatus of the present invention;
Fig. 3 A illustrates the process flow diagram that is used for collection of document is carried out the method 300 of theme rank secret protection according to first embodiment of the invention;
Fig. 3 B is the synoptic diagram of an example that is used for the course of work of method shown in the key diagram 3A;
Fig. 4 A illustrates the process flow diagram that is used for collection of document is carried out the method 400 of theme rank secret protection according to second embodiment of the invention;
Fig. 4 B is the synoptic diagram of an example that is used for the course of work of method shown in the key diagram 4A;
Fig. 5 A illustrates the process flow diagram that is used for collection of document is carried out the method 500 of theme rank secret protection according to third embodiment of the invention; And
Fig. 5 B is the synoptic diagram of an example that is used for the course of work of method shown in the key diagram 5A.
Embodiment
Fig. 1 illustrates the block diagram according to the inner structure of the document search system of having realized the secret protection of theme rank of the present invention.System shown in Figure 1 comprises theme rank secret protection equipment 101, memory device 102 and takes into account the document searching equipment 103 of privacy.Theme rank secret protection equipment 101 comprises input media 1011, responsive keyword generating apparatus 1012 and privacy document decision maker 1013.Memory device 102 comprises privacy policy storage unit 1021, document storing unit 1022, body storage unit 1023, responsive keyword storage unit 1024 and privacy document storing unit 1025.
At each unit of being stored in the memory device 102, their function is as follows: other privacy policy of subject matter level of the privacy policy storage unit 1021 storage document owners or the definition of privacy relevant people, i.e. user's certain sensitive theme that can define.Responsive theme can be represented with one or one group of topic keyword.Document storing unit 1022 storage collection of document, document package contains user ID, the sign document owner.Body storage unit 1023 storage ontology libraries, definition has the relation between notion and the notion in the ontology library.The body storage unit is only using the sensitive word based on body to generate (second embodiment that will describe subsequently), and perhaps sensitive word generation (the 3rd embodiment that will the describe subsequently) time based on mixed method just needs.The responsive keyword that responsive keyword storage unit 1024 storages are generated by sensitive word generating apparatus 1012, each responsive keyword is corresponding to relevant privacy theme, i.e. privacy policy.The document that includes privacy information that privacy document storing unit 1025 storage is judged out according to privacy policy and responsive keyword, i.e. privacy document.
At each treating apparatus in the theme rank secret protection equipment 101; their function is as follows: input media 1011 input collection of document and user-defined theme rank privacy policies, this theme rank privacy policy can comprise one or more subject key words that need secret protection.Responsive keyword generating apparatus 1012 utilizes according to invention of the present invention, and subject key words expansion included from privacy policy generates responsive keyword.Responsive keyword generating apparatus 1012 will be described in detail subsequently as core of the present invention.The present invention proposes respectively by collection of document self being carried out statistical study or using body (external knowledge source) or generate a plurality of embodiment of responsive keyword based on the combination of collection of document and body.Privacy document decision maker 1013 is judged the document that includes privacy information according to the responsive keyword that generates in document library.For example, can realize by the following method judging: for one piece of document, if its owner's definition has privacy policy, the responsive keyword that this privacy policy is correlated with is if in this piece document, this piece document promptly is judged as the privacy document, otherwise is non-privacy document.
Can carry out the various operations that take into account secret protection for the collection of document that identifies the privacy document.For example, take into account the document searching equipment 103 of privacy and can carry out document searching, satisfy user-defined privacy policy simultaneously.The most basic implementation method is: can not visit certain subject document if certain user has specified privacy policy to limit certain (perhaps certain class) searchers, in this (perhaps this class) searchers search, the privacy document that is associated with privacy policy will can not occur in Search Results so.
The core processing unit of this invention is responsive keyword generating apparatus 1012, and its other privacy policy of subject-oriented level is realized automatically the generation of (do not need train document sets) responsive keyword.Fig. 2 illustrates the block diagram according to the inner structure of responsive keyword generating apparatus of the present invention.Note that to have omitted among Fig. 2 in the system shown in Figure 1 and generate the parts that do not have direct relation with responsive keyword.
The invention provides the specific implementation of three kinds of different responsive keyword automatically generating devices, that is, based on the generation unit 201 (first embodiment) of collection of document, based on the generation unit 202 (second embodiment) of body with based on the generation unit 203 (the 3rd embodiment) of mixed method.
Based on the internal feature of the generation unit 201 analytical documentation set of document sets itself, and, expand generating new responsive keyword with the subject key words that defines in the privacy policy seed speech as responsive keyword.Generation unit 202 based on body utilizes external knowledge---the notion of ontology definition and the relation between the notion, equally also be theme with privacy policy the inside definition as seed, the responsive keyword that is expanded.Based on the generation unit 203 of mixed method is by in conjunction with based on collection of document with based on the method for body, obtains responsive more accurately and effectively keyword.In Fig. 2, also show amending unit 204, it is used for according to external knowledge (for example body) revising based on the set of the responsive keyword that internal feature generated of the collection of document spreading result of the generation unit 201 of document sets (that is, based on).In Fig. 2, also show supplementary units 205, it is used to utilize the responsive keyword that internal feature generated according to collection of document (based on the spreading result of the generation unit 201 of document sets) to replenishing according to the set of the responsive keyword that external knowledge generated (based on the spreading result of the generation unit 202 of body).
Next the different implementation methods that will generate automatically responsive keyword is with reference to the accompanying drawings done the specific description explanation respectively.
<based on the method for the internal feature of collection of document 〉
Fig. 3 A illustrates the process flow diagram that is used for collection of document is carried out the method 300 (based on the method for the internal feature of collection of document) of theme rank secret protection according to first embodiment of the invention; Fig. 3 B is the synoptic diagram of an example that is used for the course of work of method shown in the key diagram 3A.
Responsive keyword generation based on collection of document uses the method for text-processing that collection of document is analyzed, thereby excavates the keyword relevant with responsive theme.
In step 301, input media 1011 is at first imported collection of document and other privacy policy of subject matter level, and this privacy policy can be one or more theme rank keywords that need secret protection.
In step 302, expand subject key words based on the generation unit 201 of document sets based on the internal feature of collection of document, to generate responsive keyword, this can realize by collection of document is carried out text analyzing.For example, (Latent Semantic Analysis LSA) is a kind of implementation method of text analyzing to latent semantic analysis.LSA carries out Singular Value Decomposition Using by document-keyword matrix that document sets is formed, obtain keyword between topic similarity tolerance.Similar more between the keyword, represent that theirs is thematic relevant more.In other privacy policy of user-defined subject matter level, we are referred to as keyword seed to the name of theme, according to the keyword similarity result who obtains at LSA, find out the most similar keyword (can according to default similar value threshold value), then keyword seed is united these the most similar keywords as the responsive keyword corresponding to this sensitivity theme, be used for judgement the privacy document.Fig. 3 B shows an example that utilizes LSA to generate responsive keyword.
In step 303, alternatively, amending unit 204 can be revised the set of the responsive keyword that generated according to external knowledge (for example body).Concrete modification method will be described subsequently.
In step 304, privacy document decision maker 1013 is judged the privacy document based on the responsive keyword (or through revised responsive keyword) that generates from collection of document.Because privacy document decision maker 1013 can use known method to operate, and does not give unnecessary details here.
Then, process 300 finishes.
<based on the method for outer body 〉
Fig. 4 A illustrates the process flow diagram that is used for collection of document is carried out the method 400 (based on the method for outer body) of theme rank secret protection according to second embodiment of the invention; Fig. 4 B is the synoptic diagram of an example that is used for the course of work of method shown in the key diagram 4A.
Responsive keyword based on body generates, and is to utilize external knowledge (body) to obtain understanding to responsive theme.Body is a kind of formal knowledge representation, has defined the relation between field concept and the notion in the body, and the relation between the wherein the most basic notion promptly is a hierarchical relationship, represents that one of them notion is the father's notion or the sub-notion of another one notion.
In step 401, be similar to first embodiment, input media 1011 is at first imported collection of document and other privacy policy of subject matter level, and this privacy policy can be one or more theme rank keywords that need secret protection.
In step 402, expand subject key words based on the generation unit 202 of body according to external knowledge (for example body), to generate responsive keyword set A1.For example, when the responsive keyword of carrying out basic body generates, can in body, find the notion that is complementary with other privacy policy of user-defined subject matter level as a seed notion, obtain all sub-notions (all subordinate concepts that comprise sub-notion) of this seed notion then.The notion set that forms can constitute the complete description to this sensitivity theme.The representative keyword of all these notions (is the keyword of the title of forming these notions in body, sometimes also define a plurality of titles that the expression same concept is arranged in the body, keyword in these titles all is chosen as the representative keyword of notion) then formed responsive keyword corresponding to this sensitivity theme, be used for judgement to the privacy document.Fig. 4 B shows an example that utilizes body to generate responsive keyword.
In step 405, be similar to first embodiment, privacy document decision maker 1013 is judged the privacy document based on the responsive keyword that generates (or the responsive keyword after replenishing) from collection of document.
Then, process 400 finishes.
<based on the internal feature of collection of document with based on the mixed method of body 〉
Fig. 5 A illustrates the process flow diagram that is used for collection of document is carried out the method 500 (mixed method) of theme rank secret protection according to third embodiment of the invention; And Fig. 5 B is the synoptic diagram of an example that is used for the course of work of method shown in the key diagram 5A.
Based on collection of document with based on the generation method of body its intrinsic shortcoming is arranged all separately: the method (such as the LSA method) based on collection of document can be introduced too much noise usually, and make responsive keyword generate too much, can form the overprotection of privacy, thereby influence search quality; And depend on body from the external knowledge source based on the method for body; body often can be very not comprehensive to the covering in field; therefore may cause some privacy theme in body, to can not find correspondence, and not realize that the expansion of subject key words generates, thereby influence the quality of secret protection.Given this, the present invention also proposes a kind of mixed method, and above-mentioned two kinds of methods are used in combination, and can overcome the other side's shortcoming mutually, thereby obtains better secret protection degree and search quality.
As mentioned above, modification method that is proposed and compensation process can be applied to above-mentioned first and second embodiment respectively in the present embodiment, to be used to improve search quality.
With reference to figure 5A, in step 501, be similar to first and second embodiment, input media 1011 is at first imported collection of document and other privacy policy of subject matter level, and this privacy policy can be one or more theme rank keywords that need secret protection.
In step 502, based on the internal feature expansion subject key words of collection of document, to generate the first responsive keyword set A1 (for example, using text analyzing LSA).
In step 503, according to external knowledge (for example body) expansion subject key words, to generate the second responsive keyword set A2.
In step 504, utilize the responsive keyword in the set A 2 that A1 is revised.Modification rule for example can be: if keyword A ' is the subject key words A spreading result that analysis obtains through document sets in the privacy policy, and simultaneously in body, it is related that the notion of A ' expression and the notion that A represents do not have, and then deletes A ' from the relevant sensitization keyword set of this privacy theme.
In step 505, utilize the responsive keyword in the set A 1 that A2 is replenished.Additional rule for example can be: if keyword A ' is the subject key words A spreading result that analysis obtains through document sets in the privacy policy, simultaneously can not find in body can corresponding notion for A, can substitute A as the descriptor in the privacy policy with keyword A ' so, in body, seek corresponding notion, thereby trigger sensitive word generative process based on body.
Fig. 5 B has provided an example of above-mentioned makeover process and additional process.Obviously, The above results modification method proposed by the invention and compensation process are as just example, rather than limitation of the present invention.Those skilled in the art it is contemplated that other modes realize the mutual correction of two kinds of responsive keywords (the responsive keyword that generates based on document sets and based on the responsive keyword of body) and replenish.
Then, in step 506, with revised A1 ' with replenish after A2 ' merge (associating), thereby with two union of sets collection as the responsive keyword set that finally is used for the judgement of privacy document.With reference to figure 5B example.
Subsequently, in step 507, be similar to first and second embodiment, privacy document decision maker 1013 is judged the privacy document based on the responsive keyword that generates from collection of document.Then, process 500 finishes.
With reference to the accompanying drawings the method and system that is used for collection of document is carried out the secret protection of theme rank according to the present invention is described in detail above.As previously mentioned, method of the present invention can realize intelligence, comprehensively secret protection more.With respect to existing other method for secret protection of subject-oriented level, the present invention is owing to need to use the training document, can make system become high-efficiency soft and practicality, can handle a large amount of privacy policies simultaneously, and support the dynamic change of privacy policy easily.
But, need clearly customized configuration and processing that the present invention is not limited to above describe and illustrates in the drawings.And, for brevity, omit detailed description here to the known method technology.In the above-described embodiments, describe and show some concrete steps as example.But procedure of the present invention is not limited to the concrete steps that institute describes and illustrates, and those skilled in the art can make various changes, modification and interpolation after understanding spirit of the present invention, perhaps change the order between the step.
Element of the present invention can be implemented as hardware, software, firmware or their combination, and can be used in their system, subsystem, parts or the subassembly.When realizing with software mode, element of the present invention is program or the code segment that is used to carry out required task.Program or code segment can be stored in the machine readable media, perhaps send at transmission medium or communication links by the data-signal that carries in the carrier wave." machine readable media " can comprise any medium that can store or transmit information.The example of machine readable media comprises electronic circuit, semiconductor memory devices, ROM, flash memory, can wipe ROM (EROM), floppy disk, CD-ROM, CD, hard disk, fiber medium, radio frequency (RF) link, or the like.Code segment can be downloaded via the computer network such as the Internet, Intranet etc.
The present invention can realize with other concrete form, and do not break away from its spirit and essential characteristic.For example, the algorithm described in the specific embodiment can be modified, and system architecture does not break away from essence spirit of the present invention.Therefore, current embodiment is counted as exemplary but not determinate in all respects, scope of the present invention is by claims but not foregoing description definition, and, thereby the whole changes that fall in the scope of the implication of claim and equivalent all are included among the scope of the present invention.
Claims (20)
1. method that collection of document is carried out the secret protection of theme rank comprises:
Input collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection;
Based on the internal feature expansion subject key words of described collection of document itself, to generate one or more responsive keywords; And
From described collection of document, judge the privacy document based on the described responsive keyword that generates.
2. the method for claim 1, wherein said spread step comprises:
With the subject key words that comprises in the described privacy policy as the seed speech, by described collection of document is carried out text analyzing find to the seed speech between the theme similarity greater than the similar keyword of the theme of a predetermined threshold; And
The similar keyword to their theme of described subject key words is merged, as described responsive keyword.
3. method as claimed in claim 2, wherein said text analyzing are used latent semantic analysis method LSA.
4. the method for claim 1 also comprises:
According to external knowledge the set based on the responsive keyword that internal feature generated of described collection of document is revised.
5. method as claimed in claim 4, wherein said external knowledge is a body.
6. method as claimed in claim 5, wherein said correction step comprises:
If it is related that a subject key words A and its responsive keyword A ' that internal feature generated based on described collection of document are confirmed as not having on described body, this sensitivity keyword of deletion A ' from the set of responsive keyword then.
7. method that collection of document is carried out the secret protection of theme rank comprises:
Input collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection;
According to external knowledge expansion subject key words, to generate one or more responsive keywords; And
From described collection of document, judge the privacy document based on the described responsive keyword that generates.
8. method as claimed in claim 7, wherein said external knowledge is a body.
9. method as claimed in claim 8, wherein said spread step comprises:
Described privacy policy as the seed notion, is found all sub-notions of described seed notion in described body; And
The representative keyword and the described subject key words of described sub-notion are merged, as described responsive keyword.
10. method as claimed in claim 9, the described sub-notion that wherein finds also comprises grandson and all subordinate concepts.
11. method as claimed in claim 9, wherein said representative keyword are to form the keyword of one or more titles of the described sub-notion of expression.
12. method as claimed in claim 8 also comprises:
Expand described subject key words to generate responsive keyword based on the internal feature of described collection of document itself;
Utilization replenishes the set according to the responsive keyword that described outer body generated according to the responsive keyword that internal feature generated of described collection of document.
13. method as claimed in claim 12, wherein said replenish step comprises:
If a subject key words A can not find corresponding concepts in described outer body, then its responsive keyword A ' that internal feature generated based on described collection of document is sought responsive keyword as the seed notion in described outer body.
14. the method that collection of document is carried out the secret protection of theme rank comprises:
Input collection of document and theme rank privacy policy, described privacy policy comprises one or more subject key words that need secret protection;
Based on the internal feature expansion subject key words of described collection of document itself, to generate the set of the first responsive keyword;
According to external knowledge expansion subject key words, to generate the set of the second responsive keyword;
Set according to the described second responsive keyword is revised the set of the described first responsive keyword;
Replenish according to the set of the described first responsive keyword set the described second responsive keyword;
To merge through the set of the described first responsive keyword revised with through the set of the described second responsive keyword that replenishes, to obtain the set of final responsive keyword; And
From collection of document, judge the privacy document based on the responsive keyword in the described responsive keyword set.
15. the system that collection of document is carried out the secret protection of theme rank comprises:
Input media is used to import collection of document and theme rank privacy policy, and described privacy policy comprises one or more subject key words that need secret protection;
Responsive keyword generating apparatus is used for the internal feature expansion subject key words based on described collection of document itself, to generate one or more responsive keywords; And
Privacy document decision maker is used for judging the privacy document based on the described responsive keyword that generates from described collection of document.
16. system as claimed in claim 15 also comprises:
Take into account the document searching device of privacy, be used for the described collection of document that marks out the privacy document is carried out the document searching of taking into account secret protection.
17. system as claimed in claim 15 also comprises:
The external knowledge storer is used to store external knowledge;
Correcting device is used for according to described external knowledge described responsive keyword generating apparatus being revised based on the set of the responsive keyword that internal feature generated of described collection of document.
18. the system that collection of document is carried out the secret protection of theme rank comprises:
Input media is used to import collection of document and theme rank privacy policy, and described privacy policy comprises one or more subject key words that need secret protection;
The external knowledge storer is used to store external knowledge;
The first responsive keyword generating apparatus is used for according to described external knowledge expansion subject key words, to generate one or more responsive keywords; And
Privacy document decision maker is used for judging the privacy document based on the described responsive keyword that generates from described collection of document.
19. system as claimed in claim 18 also comprises:
The second responsive keyword generating apparatus is used for expanding described subject key words to generate responsive keyword based on the internal feature of described collection of document itself;
Supplementary device, the set according to the responsive keyword that described external knowledge generated replenishes to the described first responsive keyword generating apparatus according to the responsive keyword that internal feature generated of described collection of document to be used to utilize the described second responsive keyword generating apparatus.
20. the system that collection of document is carried out the secret protection of theme rank comprises:
Input media is used to import collection of document and theme rank privacy policy, and described privacy policy comprises one or more subject key words that need secret protection;
The external knowledge storer is used to store external knowledge;
The first responsive keyword generating apparatus is used for the internal feature expansion subject key words based on described collection of document itself, to generate the set of the first responsive keyword;
The second responsive keyword generating apparatus is used for according to external knowledge expansion subject key words, to generate the set of the second responsive keyword;
Correcting device is used for according to the set of the described second responsive keyword set of the described first responsive keyword being revised;
Supplementary device is used for replenishing according to the set of the described first responsive keyword set to the described second responsive keyword;
Merge device, be used for merging, to obtain the set of final responsive keyword through the set of the described first responsive keyword revised with through the set of the described second responsive keyword that replenishes; And
Privacy document decision maker is used for judging the privacy document based on the responsive keyword of described responsive keyword set from collection of document.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2010101325939A CN102201048A (en) | 2010-03-24 | 2010-03-24 | Method and system for performing topic-level privacy protection on document set |
| JP2011012560A JP2011204224A (en) | 2010-03-24 | 2011-01-25 | Method and system for implement privacy protection of topic level on document collection |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2010101325939A CN102201048A (en) | 2010-03-24 | 2010-03-24 | Method and system for performing topic-level privacy protection on document set |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN102201048A true CN102201048A (en) | 2011-09-28 |
Family
ID=44661715
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2010101325939A Pending CN102201048A (en) | 2010-03-24 | 2010-03-24 | Method and system for performing topic-level privacy protection on document set |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP2011204224A (en) |
| CN (1) | CN102201048A (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104462056A (en) * | 2013-09-17 | 2015-03-25 | 国际商业机器公司 | Active knowledge guidance based on deep document analysis |
| CN106548083A (en) * | 2016-11-25 | 2017-03-29 | 维沃移动通信有限公司 | A kind of note encryption method and terminal |
| CN106815200A (en) * | 2015-11-30 | 2017-06-09 | 任子行网络技术股份有限公司 | Objectionable text detection method and device based on keyword |
| CN106845265A (en) * | 2016-12-01 | 2017-06-13 | 北京计算机技术及应用研究所 | A kind of document security level automatic identifying method |
| CN109766715A (en) * | 2018-12-24 | 2019-05-17 | 贵州航天计量测试技术研究所 | One kind is towards the leakage-preventing automatic identifying method of big data environment privacy information and system |
| CN110414241A (en) * | 2019-08-05 | 2019-11-05 | 深圳市网安计算机安全检测技术有限公司 | Privacy policy detection method, device, computer equipment and storage medium |
| CN111563276A (en) * | 2019-01-25 | 2020-08-21 | 深信服科技股份有限公司 | Webpage tampering detection method, detection system and related equipment |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013114383A (en) * | 2011-11-28 | 2013-06-10 | Denso Corp | Privacy protection method, device for vehicle, communication system for vehicle and portable terminal |
| CN109829043B (en) * | 2018-12-28 | 2021-07-20 | 广州华多网络科技有限公司 | Part-of-speech confirmation method, part-of-speech confirmation device, electronic device, and storage medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1752973A (en) * | 2004-09-20 | 2006-03-29 | 微软公司 | Method, system and apparatus for maintaining user privacy in knowledge exchange system |
| US20090144255A1 (en) * | 2007-11-29 | 2009-06-04 | Palo Alto Research Center Incorporated | Augmenting privacy policies with inference detection |
| CN101566988A (en) * | 2008-04-24 | 2009-10-28 | 华为技术有限公司 | Method, system and device for searching fuzzy semantics |
-
2010
- 2010-03-24 CN CN2010101325939A patent/CN102201048A/en active Pending
-
2011
- 2011-01-25 JP JP2011012560A patent/JP2011204224A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1752973A (en) * | 2004-09-20 | 2006-03-29 | 微软公司 | Method, system and apparatus for maintaining user privacy in knowledge exchange system |
| US20090144255A1 (en) * | 2007-11-29 | 2009-06-04 | Palo Alto Research Center Incorporated | Augmenting privacy policies with inference detection |
| CN101566988A (en) * | 2008-04-24 | 2009-10-28 | 华为技术有限公司 | Method, system and device for searching fuzzy semantics |
Non-Patent Citations (1)
| Title |
|---|
| 闭剑婷等: "基于潜在语义分析的跨语言查询扩展方法", 《计算机工程》 * |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104462056B (en) * | 2013-09-17 | 2018-02-09 | 国际商业机器公司 | For the method and information handling systems of knouledge-based information to be presented |
| US10698956B2 (en) | 2013-09-17 | 2020-06-30 | International Business Machines Corporation | Active knowledge guidance based on deep document analysis |
| CN104462056A (en) * | 2013-09-17 | 2015-03-25 | 国际商业机器公司 | Active knowledge guidance based on deep document analysis |
| CN106815200A (en) * | 2015-11-30 | 2017-06-09 | 任子行网络技术股份有限公司 | Objectionable text detection method and device based on keyword |
| CN106548083B (en) * | 2016-11-25 | 2019-10-15 | 维沃移动通信有限公司 | A note encryption method and terminal |
| CN106548083A (en) * | 2016-11-25 | 2017-03-29 | 维沃移动通信有限公司 | A kind of note encryption method and terminal |
| CN106845265A (en) * | 2016-12-01 | 2017-06-13 | 北京计算机技术及应用研究所 | A kind of document security level automatic identifying method |
| CN106845265B (en) * | 2016-12-01 | 2020-06-12 | 北京计算机技术及应用研究所 | Document security level automatic identification method |
| CN109766715A (en) * | 2018-12-24 | 2019-05-17 | 贵州航天计量测试技术研究所 | One kind is towards the leakage-preventing automatic identifying method of big data environment privacy information and system |
| CN109766715B (en) * | 2018-12-24 | 2023-07-25 | 贵州航天计量测试技术研究所 | Big data environment-oriented privacy information anti-leakage automatic identification method and system |
| CN111563276A (en) * | 2019-01-25 | 2020-08-21 | 深信服科技股份有限公司 | Webpage tampering detection method, detection system and related equipment |
| CN111563276B (en) * | 2019-01-25 | 2024-04-09 | 深信服科技股份有限公司 | Webpage tampering detection method, detection system and related equipment |
| CN110414241A (en) * | 2019-08-05 | 2019-11-05 | 深圳市网安计算机安全检测技术有限公司 | Privacy policy detection method, device, computer equipment and storage medium |
| CN110414241B (en) * | 2019-08-05 | 2021-08-27 | 深圳市网安计算机安全检测技术有限公司 | Privacy policy detection method and device, computer equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2011204224A (en) | 2011-10-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102201048A (en) | Method and system for performing topic-level privacy protection on document set | |
| CN102460076B (en) | Generating test data | |
| Berberich et al. | A language modeling approach for temporal information needs | |
| Citron et al. | Patterns of text reuse in a scientific corpus | |
| US9619513B2 (en) | Changed answer notification in a question and answer system | |
| Sundaram et al. | Assessing traceability of software engineering artifacts | |
| CN102016787B (en) | Determining relevant information for domains of interest | |
| CA2805391C (en) | Determining relevant information for domains of interest | |
| CN102326144B (en) | Providing recommendations using information determined for domains of interest | |
| US20200143012A1 (en) | Digital rights management using a digital agent | |
| US20150026212A1 (en) | Third party search applications for a search system | |
| Jiang et al. | Mining preferences from superior and inferior examples | |
| US20160110448A1 (en) | Dynamic Load Balancing Based on Question Difficulty | |
| WO2014126657A1 (en) | Latent semantic analysis for application in a question answer system | |
| CN111886608B (en) | User-centered AI knowledge base | |
| US20130311517A1 (en) | Representing Incomplete and Uncertain Information in Graph Data | |
| Seifert et al. | Ubiquitous access to digital cultural heritage | |
| Li et al. | Voting with their feet: Inferring user preferences from app management activities | |
| KR20160026907A (en) | Person search utilizing entity expansion | |
| Achille et al. | AI model disgorgement: Methods and choices | |
| JP2020077256A (en) | Anonymization system and anonymization method | |
| US7797311B2 (en) | Organizing scenario-related information and controlling access thereto | |
| Meller et al. | Identifying core journals in otolaryngology: a bibliometric analysis | |
| Alotaibi et al. | Trust-based recommendations for scientific papers based on the researcher’s current interest | |
| Kaffee et al. | Ranking knowledge graphs by capturing knowledge about languages and labels |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20110928 |