[go: up one dir, main page]

CN109376270A - A kind of data retrieval method and device - Google Patents

A kind of data retrieval method and device Download PDF

Info

Publication number
CN109376270A
CN109376270A CN201811126932.5A CN201811126932A CN109376270A CN 109376270 A CN109376270 A CN 109376270A CN 201811126932 A CN201811126932 A CN 201811126932A CN 109376270 A CN109376270 A CN 109376270A
Authority
CN
China
Prior art keywords
audio
label
video file
participle
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811126932.5A
Other languages
Chinese (zh)
Inventor
赵明
徐钊
于松
袁丽
王永选
杨梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Poly Cloud Technology Co Ltd
Original Assignee
Qingdao Poly Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Poly Cloud Technology Co Ltd filed Critical Qingdao Poly Cloud Technology Co Ltd
Priority to CN201811126932.5A priority Critical patent/CN109376270A/en
Publication of CN109376270A publication Critical patent/CN109376270A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment provides a kind of data retrieval method and devices, it is related to field of home appliance technology, when solving that user is using label search audio/video file in the prior art, it will appear the audio/video file unrelated with the label to select for user, lead to user can not determine the problem of whether selected audio/video file is the audio/video file oneself needed.This method includes obtaining search instruction;According to label, the audio/video file comprising label is determined;According to the audio and video information of audio/video file, the collating sequence of audio/video file is generated;Audio/video file is shown according to collating sequence.The embodiment of the present invention is for providing a kind of search method of audio/video file.

Description

A kind of data retrieval method and device
Technical field
The present invention relates to field of home appliance technology more particularly to a kind of data retrieval methods and device.
Background technique
In the prior art, when user is based on label search audio/video file, since the label of audio/video file is mostly artificial mark, The problems such as there are missing, inaccuracy, Type-Inconsistencies, so that can have audio-visual text much unrelated with the label in search result Part can also show user to select, user can not determine selected audio/video file whether be oneself need audio-visual text Part, to reduce the experience of user.
It can be seen from the above, when user uses label search audio/video file in the prior art, it may appear that unrelated with the label Audio/video file for user select, cause user can not determine selected audio/video file whether be oneself need audio/video file.
Summary of the invention
The embodiment of the present invention provides a kind of data retrieval method and device, solves user in the prior art and uses label When retrieving audio/video file, it may appear that the audio/video file unrelated with the label is selected for user, causes user that can not determine selected Audio/video file whether be oneself need audio/video file the problem of.
In order to achieve the above objectives, the embodiment of the present invention adopts the following technical scheme that
First aspect, the embodiment of the present invention provide a kind of search method, comprising: obtain search instruction;Wherein, retrieval refers to It enables and includes at least label;According to label, the audio/video file comprising label is determined;According to the audio and video information of audio/video file, shadow is generated The collating sequence of sound file;Wherein, audio and video information includes at least any one of label weight and comprehensive score, and label weight is used In indicating label to the significance level of audio/video file, comprehensive score is used to indicate the appraisal result of audio/video file;According to sequence sequence Column display audio/video file.
By above scheme it is found that the search method that embodiment through the invention provides, when user needs Checking label, All audio/video files comprising the label can be found first, then according in the label weight and comprehensive score of the audio/video file Any one generates the collating sequence of audio/video file, and shows audio/video file according to collating sequence;Therefore, user can be according to shadow Any one of the label weight of sound file and comprehensive score determine the application file for needing to find, so that search result is more It is accurate;When solving that user is using label search audio/video file in the prior art, it may appear that the audio-visual text unrelated with the label Part is selected for user, the poor problem of user experience.
Optionally, before obtaining search instruction, this method further include: obtain the audio and video information of audio/video file;Wherein, audio-visual letter Breath further includes any one of title, brief introduction and label;According to LDA topic model and audio and video information, the label master of label is determined Write inscription sort subset and theme ProbabilityDistribution Vector;According to label descriptor sort subset and theme ProbabilityDistribution Vector, determine The label weight of label.
Optionally, according to LDA topic model and audio and video information, determine that the label descriptor sort subset of label and theme are general Before rate distribution vector, this method further include: obtain the training corpus of LDA topic model;Wherein, training corpus includes at least One participle, participle include any one of label, the title participle of title and brief introduction participle of brief introduction;According to training corpus Library determines the word frequency and inverse document word frequency of at least one participle;According to the word frequency of at least one participle and inverse document word frequency, determine The characterization vector of at least one participle;According to label descriptor sort subset and theme ProbabilityDistribution Vector, label is determined After label weight, this method further include: according to the characterization vector and label weight of at least one participle, determine audio/video file Comprehensive score.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate Any one;According to the characterization vector and label weight of at least one participle, the comprehensive score of audio/video file is determined, comprising: root According to temperature, number of clicks, issuing time, play time and payment rate is played, other scorings are determined;According at least one participle Characterization vector, label weight and other scorings, determine the comprehensive score of audio/video file.
Second aspect, the embodiment of the present invention provide a kind of retrieval device, comprising: obtain module, refer to for obtaining retrieval It enables;Wherein, search instruction includes at least label;Processing module, the label for obtaining according to module is obtained determine to include label Audio/video file;Processing module is also used to the audio and video information according to audio/video file, generates the collating sequence of audio/video file;Wherein, Audio and video information includes at least any one of label weight and comprehensive score, and label weight is used to indicate label to audio/video file Significance level, comprehensive score are used to indicate the appraisal result of audio/video file;Display module, the row for being generated according to processing module Sequence sequence shows audio/video file.
Optionally, module is obtained, is also used to obtain the audio and video information of audio/video file;Wherein, audio and video information includes title, letter Any one of Jie and label;Processing module is also used to according to LDA topic model and obtains the audio and video information of module acquisition, really Calibrate the label descriptor sort subset and theme ProbabilityDistribution Vector of label;Processing module is also used to be arranged according to label descriptor Sequence subset and theme ProbabilityDistribution Vector determine the label weight of label.
Optionally, module is obtained, is also used to obtain the training corpus of LDA topic model;Wherein, training corpus includes At least one participle, participle include any one of label, the title participle of title and brief introduction participle of brief introduction;Processing module, It is also used to the training corpus obtained according to module is obtained, determines the word frequency and inverse document word frequency of at least one participle;Handle mould Block is also used to word frequency and inverse document word frequency according at least one participle, determines the characterization vector of at least one participle;Processing Module determines the comprehensive score of audio/video file specifically for the characterization vector and label weight segmented according at least one.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate Any one;Processing module, specifically for according to broadcasting temperature, the number of clicks, issuing time, play time for obtaining module acquisition With payment rate, other scorings are determined;Processing module, specifically for segmented according at least one characterization vector, label weight With other scorings, the comprehensive score of audio/video file is determined.
The third aspect, the embodiment of the present invention provide a kind of computer storage medium, including instruction, when its on computers When operation, so that computer executes the search method of any one provided such as above-mentioned first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of retrieval device, comprising: communication interface, processor, memory, Bus;For storing computer executed instructions, processor is connect with memory by bus memory, when retrieval device operation When, processor executes the computer executed instructions of memory storage, so that retrieval device is executed as above-mentioned first aspect provides The search method of any one.
5th aspect, the embodiment of the present invention provide a kind of searching system, including household electrical appliance and second aspect provide Any one retrieve device.
It is to be appreciated that any retrieval device of above-mentioned offer is corresponding for executing first aspect presented above Method, therefore, attainable beneficial effect can refer in method and the following detailed description of first aspect above The beneficial effect of corresponding scheme, details are not described herein again.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 provides a kind of one of the flow diagram of search method for the embodiment of the present invention;
Fig. 2 provides the two of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 3 provides the three of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 4 provides the four of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 5 provides the five of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 6 provides the tag queries DSL sentence of the ElasticSearch of search method a kind of for the embodiment of the present invention;
Fig. 7 provides a kind of one of structural schematic diagram for retrieving device for the embodiment of the present invention;
Fig. 8 provides a kind of second structural representation for retrieving device for the embodiment of the present invention;
Fig. 9 provides a kind of structural schematic diagram of searching system for the embodiment of the present invention.
Appended drawing reference:
Retrieve device -10;
Obtain module -101;Processing module -102;Display module -103.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
For the ease of clearly describing the technical solution of the embodiment of the present invention, in an embodiment of the present invention, use " the One ", the printed words such as " second " distinguish function and the essentially identical identical entry of effect or similar item, and those skilled in the art can To understand that the printed words such as " first ", " second " are not to be defined to quantity and execution order.
In embodiments of the present invention, " illustrative " or " such as " etc. words for indicate make example, illustration or explanation.This Be described as in inventive embodiments " illustrative " or " such as " any embodiment or design scheme be not necessarily to be construed as comparing Other embodiments or design scheme more preferably or more advantage.Specifically, use " illustrative " or " such as " etc. words purport Related notion is being presented in specific ways.
In the description of the embodiment of the present invention, unless otherwise indicated, the meaning of " plurality " is refer to two or more.Example Such as, multiple networks refer to two or more networks.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Symbol herein Number "/" indicates that affiliated partner is that relationship such as A/B perhaps indicates A or B.
In the prior art, the label dimension based on audio/video file contents attribute is increasingly wider, quantity is more and more, if not Label is effectively managed, is caused when based on label search audio/video file, it may appear that the audio-visual text unrelated with the label Part causes search result chaotic;To solve the above-mentioned problems, the embodiment of the present invention provides a kind of search method, specific real Existing mode is as follows:
Embodiment one
The embodiment of the present invention provides a kind of search method, includes: as shown in Figure 1
S101, search instruction is obtained;Wherein, search instruction includes at least label.
Optionally, before obtaining search instruction as shown in Figure 2 and Figure 5, this method further include:
S105, the audio and video information for obtaining audio/video file;Wherein, audio and video information further includes appointing in title, brief introduction and label One.
It should be noted that in actual application, because the label that existing matchmaker provides audio/video file in library is mostly artificial mark The problems such as note, there are missing, inaccuracy, Type-Inconsistencies;The number of labels of different audio/video files has and has less more simultaneously, is difficult to fill The actual attribute and intension of the existing audio/video file of fission.
Therefore, it when getting the audio and video information of audio/video file, needs to some attribute information classes for including in the label Label etc. of label, description inaccuracy, such as relevant marks of non-content such as the awards, list, affiliated series, the age brackets that are obtained Label guarantee the accuracy of the label weight finally calculated to reject the label of mistake.
S106, according to document subject matter generate model (full name in English: Latent Dirichlet Allocation, referred to as: LDA) topic model and audio and video information determine the label descriptor sort subset and theme ProbabilityDistribution Vector of label.
It should be noted that in actual application, it is necessary first to carry out nothing on distributed computing engine Spark cluster The training and cluster of the LDA topic model of supervision obtain the key words probabilities model of multiple themes, the specific steps are as follows:
One, the training set composition of LDA topic model:
The characterization vector that label, title participle and the brief introduction of some audio/video file segment can indicate the audio/video file, Each film represents a training data, and the training data of whole audio/video files constitutes the instruction of LDA topic model in media library Practice collection.For the effect of computational efficiency and generation theme, can also first classify by type to the audio/video file in media library, such as Film, TV play, variety, music etc. guarantee that the audio/video file under each classification has similar comparativity.
Two, the number of iterations of LDA topic model LDA topic model parameter setting: is set as 200- according to actual experience 300 times, the number of topics of topic model is 30-100, using EM algorithm (full name in English: Expectation Maximization, referred to as: EM) model is established, based on puzzlement degree as model evaluation standard.
Three, training LDA topic model: training set iterative calculation obtains the optimal LDA topic model of puzzlement degree, uses this LDA topic model clusters current whole audio/video files, and label descriptor sort subset and the theme for obtaining each theme are general Rate distribution vector.
Four, trained LDA topic model is saved to distributed file system (full name in English: Hadoop Distributed File System, referred to as: HDFS), it is ensured that the audio/video file being newly added can be by the LDA that has generated Topic model carries out subject classification.
Five, the LDA topic model is regularly updated, training set is rearranged and carries out the training of LDA topic model, it is ensured that LDA master Inscribe the timeliness of model.
S107, according to label descriptor sort subset and theme ProbabilityDistribution Vector, determine the label weight of label.
It should be noted that LDA topic model is made of two levels of theme and bag of words, bag of words in actual application The sequence of middle participle reflects difference (the corresponding bag of words of each theme, each bag of words packet of the content of theme and other themes Containing at least one participle, the content of the corresponding bag of words of each theme is not necessarily identical), the classification of some audio/video file is by institute owner The participle sequence of topic and the probability distribution of affiliated theme codetermine, and the label Tag of some audio/video file is theme bag of words Subset, therefore the weight relationship of label can be measured from two levels of bag of words and theme, specific calculating is as follows:
One, each theme Topic is by different participle weight sequencing WrankIts content is described, the mark of audio/video file is extracted The position in Topic participle sequence is signed, current label can be mapped on this theme to the important journey of audio/video file description Degree, the problem of Rank positional relationship can also differ greatly to avoid the participle weight of different themes, therefore set audio/video file k and have m A label has s theme, and (m, n are the integer greater than 0), then label descriptor sort subset is as follows at some Topic:
Tagrank(Ti)=Tagk∩Wrank(Ti)={ Tag1(W1),Tag2(W2),…,Tagm(Wn)},Its In, Tagm(Wn)) indicate label m in theme TiSorting position in corresponding bag of words is Wn, i ∈ s;
For whole themes, then there is the tag sorting matrix of audio/video file k:
Wherein, filmkIndicate audio-visual text The tag sorting matrix of part k, TagmWn(Ts) indicate label m in theme TiSorting position in corresponding bag of words is Wn
Two, each audio/video file can generate one in whole theme TsOn theme ProbabilityDistribution Vector Vk, theme is general Rate distribution vector VkIt indicates degree of membership of the audio/video file on some theme, audio/video file mark be can reflect out based on this vector Sign the significance level in different themes:
Three, the theme ProbabilityDistribution Vector V of audio/video file kkWith ordinal matrix Tag of the label on each themekProduct, can To describe outgoing label weight Weight in the tally set space and topic model space of audio/video filetag(k):
In the tag set of audio/video file k, when the corresponding label weight of the label of audio/video file is bigger, illustrate the mark It signs closer with the content of audio/video file;When being compared to each other between different audio/video files, the mark to each audio/video file is needed Label set is normalized, and more different audio/video files include the weight of identical label, which shows more greatly the shadow The content of sound file is closer to the label, it can be seen that can effectively be distinguished between each audio/video file label and be had based on this scoring There is the weight relationship between the audio/video file of same label.
Optionally, the label descriptor row of label is determined according to LDA topic model and audio and video information as shown in Figure 3 and Figure 5 Before sequence subset and theme ProbabilityDistribution Vector, this method further include:
S108, the training corpus for obtaining LDA topic model;Wherein, training corpus includes at least one participle, participle Any one of title participle including label, title and the brief introduction participle of brief introduction.
S109, according to training corpus, determine the word frequency and inverse document word frequency of at least one participle.
S110, word frequency and inverse document word frequency according at least one participle, determine the characterization vector of at least one participle.
According to label descriptor sort subset and theme ProbabilityDistribution Vector, after the label weight for determining label, this method Further include:
S111, characterization vector and label weight according at least one participle, determine the comprehensive score of audio/video file.
It should be noted that, in order to obtain the training corpus of LDA topic model, needing to distinguish in actual application Carrying out Chinese word segmentation to the title Title and brief introduction Summary of audio/video file, (Chinese Word Segmentation, refers to It is that a chinese character sequence is cut into individual word one by one), removal stop words is (in information retrieval, to save memory space With raising search efficiency, certain words or word are fallen in meeting automatic fitration before or after handling natural language data (or text), this A little words or word are to be referred to as stop words (Stop Words)) afterwards with tag set (refer to the audio/video file include all labels) Tag polymerization and duplicate removal, constitute the training corpus S of LDA topic model.
If the sum for the audio/video file for including in media library is D, the sum comprising some participle is DF, this point in media library Word is in word frequency (the i.e. number that occurs in the audio/video file of the participle total score that is used for divided by this file in some audio/video file Word number) tf, the participle in the audio/video file inverse document word frequency (IDF inverse document frequency, also known as Anti- document frequency is the inverse of document frequency) idf, the inverse text frequency of word frequency-of Tag is calculated based on word frequency and inverse document word frequency Index (full name in English: Term Frequency-Inverse Document Frequency, abbreviation: TF-IDF) value is as follows;
According to the TF-IDF value all segmented in the audio/video file, sparse vector is established, with the sparse vector all segmented Indicate the audio/video file:
Film=Vector.spare (length, postion (1 ..., n), value (s1,…,sn));
Save the Tag of the audio/video filetf·idfValue, the comprehensive score for the audio/video file calculate.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate Any one;As shown in Figure 4 and Figure 5 according to the characterization vector and label weight of at least one participle, the comprehensive of audio/video file is determined Close scoring, comprising:
S1110, according to temperature, number of clicks, issuing time, play time and payment rate is played, determine other scorings.
S1111, characterization vector, label weight and other scorings segmented according at least one, determine audio/video file Comprehensive score.
It should be noted that in actual application, when label weight is applied to the retrieval based on label, it is also necessary to examine Consider the attributes such as broadcasting temperature, number of clicks, issuing time, play time and the payment rate of integrating audio-video file and calculates the shadow jointly The comprehensive score of sound file, and label is directly provided the user with to the sort result recalled based on label using comprehensive score Query result, the specific steps are as follows:
One, in conjunction with Tagtf·idfWith other score value attributes featurei, the synthesis that i ∈ { 1 ..., f } calculates audio/video file comments Point:
Wherein, ScorekIndicating the comprehensive score of audio/video file k, normalize indicates normalization,It indicates Other scorings.
Two, in retrieval, in order to which user can be allowed to compare the degree of closeness of each audio/video file and label in search result, It needs to be ranked up according to the comprehensive score of each audio/video file of the label comprising user search, obtains the sequence of audio/video file Sequence, the content of the more forward audio/video file of collating sequence is closer to the label;The label for including due to different audio/video files Quantity is different, causes the corresponding comprehensive score of different audio/video files different, therefore comment in the synthesis of the different audio/video files of correspondence When the size divided, need that the comprehensive score is normalized, so as to compare different audio/video files;It is exemplary , by taking the label of user's search is idol as an example, corresponding to comprising the audio/video file that the label is idol includes: that petard and China fir China fir come , for be illustrated:
Assuming that after being normalized, the label weight of petard are as follows: [" idol " 186.02, " describing love affairs " 46.28, " father " 82.3, " pursuing a goal with determination " 41.19], China fir China fir has carried out label weight are as follows: [" idol " 253.42, " describing love affairs " 58.28, " warm blood " 43.1 " are encouraged Will " 57.86];Therefore, the content that China fir China fir has come is more nearly idol.
Illustratively, (Tag, Score) is stored in search engine ElasticSearch's with Nested (nested structure) In index, based on being illustrated for label search audio/video file:
One, the configuration parameter of ElasticSearch is set in the Mapping of building index, and setting inquiry Dsl sentence is such as Shown in Fig. 6.
Two, the index with label weight and/or comprehensive score is online, realize the inspection of the audio/video file based on label Rope.
S102, according to label, determine include label audio/video file.
S103, the audio and video information according to audio/video file, generate the collating sequence of audio/video file;Wherein, audio and video information is at least Including any one of label weight and comprehensive score, label weight is used to indicate label to the significance level of audio/video file, comprehensive Close the appraisal result that scoring is used to indicate audio/video file.
It should be noted that the collating sequence of audio/video file can be commented based on label weight, synthesis in actual application Point or label weight and comprehensive score be ranked up;Wherein, the content of the audio/video file before collating sequence with should Label is closer.
S104, audio/video file is shown according to collating sequence.
By above scheme it is found that the search method that embodiment through the invention provides, when user needs Checking label, All audio/video files comprising the label can be found first, then according in the label weight and comprehensive score of the audio/video file Any one generates the collating sequence of audio/video file, and shows audio/video file according to collating sequence;Therefore, user can be according to shadow Any one of the label weight of sound file and comprehensive score determine the application file for needing to find, so that search result is more It is accurate;When solving that user is using label search audio/video file in the prior art, it may appear that the audio-visual text unrelated with the label Part is selected for user, the poor problem of user experience.
Embodiment two
The embodiment of the present invention provides a kind of retrieval device 10, includes: as shown in Figure 7
Module 101 is obtained, for obtaining search instruction;Wherein, search instruction includes at least label.
Processing module 102, the label for obtaining according to module 101 is obtained, determines the audio/video file comprising label.
Processing module 102 is also used to the audio and video information according to audio/video file, generates the collating sequence of audio/video file;Wherein, Audio and video information includes at least any one of label weight and comprehensive score, and label weight is used to indicate label to audio/video file Significance level, comprehensive score are used to indicate the appraisal result of audio/video file.
Display module 103, the collating sequence for being generated according to processing module 102 show audio/video file.
Optionally, module 101 is obtained, is also used to obtain the audio and video information of audio/video file;Wherein, audio and video information includes mark Any one of topic, brief introduction and label;Processing module 102 is also used to according to LDA topic model and obtains what module 101 obtained Audio and video information determines the label descriptor sort subset and theme ProbabilityDistribution Vector of label;Processing module 102, is also used to root According to label descriptor sort subset and theme ProbabilityDistribution Vector, the label weight of label is determined.
Optionally, module 101 is obtained, is also used to obtain the training corpus of LDA topic model;Wherein, training corpus Including at least one participle, participle includes any one of label, the title participle of title and brief introduction participle of brief introduction;Handle mould Block 102 is also used to the training corpus obtained according to module 101 is obtained, determines word frequency and inverse document word that at least one is segmented Frequently;Processing module 102 is also used to word frequency and inverse document word frequency according at least one participle, determines the spy of at least one participle Signization vector;Processing module 102 determines audio-visual specifically for the characterization vector and label weight segmented according at least one The comprehensive score of file.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate Any one;Processing module 102, specifically for according to obtain module 101 obtain broadcasting temperature, number of clicks, issuing time, broadcast Time and payment rate are put, determines other scorings;Processing module 102, specifically for segmented according at least one characterization vector, Label weight and other scorings, determine the comprehensive score of audio/video file.
Wherein, all related contents for each step that above method embodiment is related to can quote corresponding function module Function description, effect details are not described herein.
Using integrated module, retrieval device include: obtain module, processing module, display module and Memory module.Processing module is used to carry out control management to the movement of retrieval device, for example, processing unit is for supporting retrieval dress Set process S101, S102, S103 and the S104 executed in Fig. 1;Obtain module and display module be used to support retrieval device with The information exchange of other equipment.Memory module, program code and data for memory scan device.
Wherein, using processing module as processor, memory module is memory, obtains module and display module is communication interface For.Wherein, device is retrieved referring to fig. 8, including communication interface 501, processor 502, memory 503 and bus 504, Communication interface 501, processor 502 are connected by bus 504 with memory 503.
Processor 502 can be a general central processor (Central Processing Unit, CPU), micro process Device, application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC) or one or more A integrated circuit executed for controlling application scheme program.
Memory 503 can be read-only memory (Read-Only Memory, ROM) or can store static information and instruction Other kinds of static storage device, random access memory (Random Access Memory, RAM) or letter can be stored The other kinds of dynamic memory of breath and instruction, is also possible to Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-only Memory, EEPROM), CD-ROM (Compact Disc Read- Only Memory, CD-ROM) or other optical disc storages, optical disc storage (including compression optical disc, laser disc, optical disc, digital universal Optical disc, Blu-ray Disc etc.), magnetic disk storage medium or other magnetic storage apparatus or can be used in carrying or store to have referring to Enable or data structure form desired program code and can by any other medium of computer access, but not limited to this. Memory, which can be, to be individually present, and is connected by bus with processor.Memory can also be integrated with processor.
Wherein, memory 503 is used to store the application code for executing application scheme, and is controlled by processor 502 System executes.Communication interface 501 is used to carry out information exchange, such as the information exchange with remote controler with other equipment.Processor 502 For executing the application code stored in memory 503, to realize method described in the embodiment of the present application.
In addition, a kind of calculating storage media (or medium) is also provided, including carrying out in above-described embodiment when executed Retrieve the instruction for the method operation that device executes.In addition, also providing a kind of computer program product, including above-mentioned calculating stores matchmaker Body (or medium).
It should be understood that in various embodiments of the present invention, magnitude of the sequence numbers of the above procedures are not meant to execute suitable Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present invention Process constitutes any restriction.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, it can be with It realizes by another way.For example, apparatus embodiments described above are merely indicative, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer retrieves device or the network equipment etc.) execute all or part of step of each embodiment the method for the present invention Suddenly.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (full name in English: read-only memory, English It is literary referred to as: ROM), random access memory (full name in English: random access memory, English abbreviation: RAM), magnetic disk or The various media that can store program code such as person's CD.
It is to be appreciated that any retrieval device of above-mentioned offer is corresponding for executing embodiment one presented above Method, therefore, attainable beneficial effect can refer in method and the following detailed description of foregoing embodiments one The beneficial effect of corresponding scheme, details are not described herein again.
Embodiment three
The embodiment of the present invention provides a kind of searching system, including household electrical appliance and such as any one of the offer of embodiment two Retrieve device.
It should be noted that in practical applications, household electrical appliance as shown in Figure 9 receive the search instruction of user's input When, the search instruction is sent to retrieval device (can be server), it is above-mentioned according to search instruction execution to retrieve device Search method generates corresponding collating sequence, while sending the control instruction for carrying the collating sequence to household electrical appliance, so as to family Electrical appliance shows audio/video file according to the collating sequence.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of search method characterized by comprising
Obtain search instruction;Wherein, the search instruction includes at least label;
According to the label, the audio/video file comprising the label is determined;
According to the audio and video information of the audio/video file, the collating sequence of the audio/video file is generated;Wherein, the audio and video information is extremely It include less any one of label weight and comprehensive score, the label weight is used to indicate the label to the audio/video file Significance level, the comprehensive score is used to indicate the appraisal result of the audio/video file;
The audio/video file is shown according to the collating sequence.
2. search method according to claim 1, which is characterized in that before obtaining search instruction, the method also includes:
Obtain the audio and video information of audio/video file;Wherein, the audio and video information further includes any one of title, brief introduction and label;
According to LDA topic model and the audio and video information, the label descriptor sort subset and theme probability of the label are determined Distribution vector;
According to the label descriptor sort subset and the theme ProbabilityDistribution Vector, the label weight of the label is determined.
3. search method according to claim 2, which is characterized in that according to LDA topic model and the audio and video information, really Before the label descriptor sort subset and theme ProbabilityDistribution Vector of the fixed label, the method also includes:
Obtain the training corpus of the LDA topic model;Wherein, the training corpus includes at least one participle, described Participle includes any one of label, the title participle of the title and brief introduction participle of the brief introduction;
According to the training corpus, the word frequency and inverse document word frequency of at least one participle are determined;
According to the word frequency of at least one participle and inverse document word frequency, the characterization vector of at least one participle is determined;
According to the label descriptor sort subset and the theme ProbabilityDistribution Vector, the label weight of the label is determined Afterwards, the method also includes:
According to the characterization vector of at least one participle and the label weight, determine that the synthesis of the audio/video file is commented Point.
4. search method according to claim 3, which is characterized in that the audio and video information further include: play temperature, click Any one of number, issuing time, play time and payment rate;
According to the characterization vector of at least one participle and the label weight, determine that the synthesis of the audio/video file is commented Point, comprising:
According to the broadcasting temperature, the number of clicks, the issuing time, the play time and the payment rate, determine Other scorings;
According to the characterization vector of at least one participle, the label weight and other described scorings, determine described audio-visual The comprehensive score of file.
5. a kind of retrieval device characterized by comprising
Module is obtained, for obtaining search instruction;Wherein, the search instruction includes at least label;
Processing module, the label for being obtained according to the acquisition module, determines the audio/video file comprising the label;
The processing module is also used to the audio and video information according to the audio/video file, generates the collating sequence of the audio/video file; Wherein, the audio and video information includes at least any one of label weight and comprehensive score, and the label weight is used to indicate institute Label is stated to the significance level of the audio/video file, the comprehensive score is used to indicate the appraisal result of the audio/video file;
Display module, the collating sequence for being generated according to the processing module show the audio/video file.
6. retrieval device according to claim 5, which is characterized in that the acquisition module is also used to obtain audio/video file Audio and video information;Wherein, the audio and video information includes any one of title, brief introduction and label;
The processing module is also used to determine institute according to LDA topic model and the audio and video information for obtaining module and obtaining State the label descriptor sort subset and theme ProbabilityDistribution Vector of label;
The processing module is also used to be determined according to the label descriptor sort subset and the theme ProbabilityDistribution Vector The label weight of the label.
7. retrieval device according to claim 6, which is characterized in that the acquisition module is also used to obtain the LDA master Inscribe the training corpus of model;Wherein, the training corpus includes at least one participle, and the participle includes label, described Any one of title participle and the brief introduction participle of the brief introduction of title;
The processing module is also used to the training corpus obtained according to the acquisition module, determine it is described at least one The word frequency of participle and inverse document word frequency;
The processing module is also used to word frequency and inverse document word frequency according at least one participle, determines described at least one The characterization vector of a participle;
The processing module is determined specifically for the characterization vector of at least one participle and the label weight according to The comprehensive score of the audio/video file.
8. retrieval device according to claim 7, which is characterized in that the audio and video information further include: play temperature, click Any one of number, issuing time, play time and payment rate;
The processing module, specifically for the broadcasting temperature, number of clicks, described obtained according to the acquisition module Issuing time, the play time and the payment rate, determine other scorings;
The processing module, specifically for according to the characterization vector of at least one participle, the label weight and described Other scorings, determine the comprehensive score of the audio/video file.
9. a kind of computer storage medium, including instruction, when run on a computer, so that computer executes such as above-mentioned power Benefit requires the described in any item search methods of 1-4.
10. a kind of retrieval device, comprising: communication interface, processor, memory, bus;Memory is for storing computer execution Instruction, processor are connect with memory by bus, and when retrieving device operation, processor executes the computer of memory storage It executes instruction, so that retrieval device executes such as the described in any item search methods of the claims 1-4.
CN201811126932.5A 2018-09-26 2018-09-26 A kind of data retrieval method and device Pending CN109376270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811126932.5A CN109376270A (en) 2018-09-26 2018-09-26 A kind of data retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811126932.5A CN109376270A (en) 2018-09-26 2018-09-26 A kind of data retrieval method and device

Publications (1)

Publication Number Publication Date
CN109376270A true CN109376270A (en) 2019-02-22

Family

ID=65402690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811126932.5A Pending CN109376270A (en) 2018-09-26 2018-09-26 A kind of data retrieval method and device

Country Status (1)

Country Link
CN (1) CN109376270A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069573A (en) * 2019-03-19 2019-07-30 深圳壹账通智能科技有限公司 Product data integration method, apparatus, computer equipment and storage medium
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system
CN110489525A (en) * 2019-08-09 2019-11-22 腾讯科技(深圳)有限公司 Acquisition methods and device, the storage medium and electronic device of search result
CN111625716A (en) * 2020-05-12 2020-09-04 聚好看科技股份有限公司 Media asset recommendation method, server and display device
CN113488144A (en) * 2021-07-14 2021-10-08 深圳市东亿健康服务有限公司 Slice image processing method
CN114253976A (en) * 2021-12-21 2022-03-29 北京达佳互联信息技术有限公司 Searching method and device based on bitmap scoring
CN114328798A (en) * 2021-11-09 2022-04-12 腾讯科技(深圳)有限公司 Processing method, device, equipment, storage medium and program product for searching text

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN105389325A (en) * 2014-09-02 2016-03-09 三星电子株式会社 Content search method and electronic device implementing same
CN106055538A (en) * 2016-05-26 2016-10-26 达而观信息科技(上海)有限公司 Automatic extraction method for text labels in combination with theme model and semantic analyses
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topic Mining Method and Device
CN106446135A (en) * 2016-09-19 2017-02-22 北京搜狐新动力信息技术有限公司 Method and device for generating multi-media data label
US20170109786A1 (en) * 2015-10-20 2017-04-20 Korea Electronics Technology Institute System for producing promotional media content and method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN105389325A (en) * 2014-09-02 2016-03-09 三星电子株式会社 Content search method and electronic device implementing same
US20170109786A1 (en) * 2015-10-20 2017-04-20 Korea Electronics Technology Institute System for producing promotional media content and method thereof
CN106055538A (en) * 2016-05-26 2016-10-26 达而观信息科技(上海)有限公司 Automatic extraction method for text labels in combination with theme model and semantic analyses
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topic Mining Method and Device
CN106446135A (en) * 2016-09-19 2017-02-22 北京搜狐新动力信息技术有限公司 Method and device for generating multi-media data label

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069573A (en) * 2019-03-19 2019-07-30 深圳壹账通智能科技有限公司 Product data integration method, apparatus, computer equipment and storage medium
CN110222709A (en) * 2019-04-29 2019-09-10 上海暖哇科技有限公司 A kind of multi-tag intelligence marking method and system
CN110222709B (en) * 2019-04-29 2022-01-25 上海暖哇科技有限公司 Multi-label intelligent marking method and system
CN110489525A (en) * 2019-08-09 2019-11-22 腾讯科技(深圳)有限公司 Acquisition methods and device, the storage medium and electronic device of search result
CN111625716A (en) * 2020-05-12 2020-09-04 聚好看科技股份有限公司 Media asset recommendation method, server and display device
CN111625716B (en) * 2020-05-12 2023-10-31 聚好看科技股份有限公司 Media asset recommendation method, server and display device
CN113488144A (en) * 2021-07-14 2021-10-08 深圳市东亿健康服务有限公司 Slice image processing method
CN113488144B (en) * 2021-07-14 2023-11-07 内蒙古匠艺科技有限责任公司 Slice image processing method
CN114328798A (en) * 2021-11-09 2022-04-12 腾讯科技(深圳)有限公司 Processing method, device, equipment, storage medium and program product for searching text
CN114328798B (en) * 2021-11-09 2024-02-23 腾讯科技(深圳)有限公司 Processing method, device, equipment, storage medium and program product for searching text
CN114253976A (en) * 2021-12-21 2022-03-29 北京达佳互联信息技术有限公司 Searching method and device based on bitmap scoring

Similar Documents

Publication Publication Date Title
CN109376270A (en) A kind of data retrieval method and device
US11645317B2 (en) Recommending topic clusters for unstructured text documents
Nigam Using unlabeled data to improve text classification
Huang Similarity measures for text document clustering
US8356044B2 (en) System and method for providing default hierarchical training for social indexing
US20170161375A1 (en) Clustering documents based on textual content
CN108334528B (en) Method and device for recommending information
KR102046692B1 (en) Method and System for Entity summarization based on multilingual projected entity space
US20180285448A1 (en) Producing personalized selection of applications for presentation on web-based interface
Zhai et al. Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval
Tran et al. Balancing novelty and salience: Adaptive learning to rank entities for timeline summarization of high-impact events
WO2020003109A1 (en) Facet-based query refinement based on multiple query interpretations
CN116882414B (en) Automatic comment generation method and related device based on large-scale language model
KR100452086B1 (en) Search System For Providing Information of Keyword Input Frequency By Category And Method Thereof
CN114330335A (en) Keyword extraction method, device, equipment and storage medium
CN113076481B (en) Document recommendation system and method based on maturity technology
CN114461783A (en) Keyword generating method, apparatus, computer equipment, storage medium and product
CN110472016A (en) Article recommended method, device, electronic equipment and storage medium
US11823785B2 (en) Methods and systems for calculating nutritional requirements in a display interface
El-Assady et al. LTMA: Layered topic matching for the comparative exploration, evaluation, and refinement of topic modeling results
Latha Experiment and Evaluation in Information Retrieval Models
Cobos et al. Fitness function obtained from a genetic programming approach for web document clustering using evolutionary algorithms
CN111667023A (en) Method and device for acquiring articles in target category
Lu et al. A novel approach towards large scale cross-media retrieval
Zhu et al. Customized organization of social media contents using focused topic hierarchy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190222

RJ01 Rejection of invention patent application after publication