CN109376270A - A kind of data retrieval method and device - Google Patents
A kind of data retrieval method and device Download PDFInfo
- Publication number
- CN109376270A CN109376270A CN201811126932.5A CN201811126932A CN109376270A CN 109376270 A CN109376270 A CN 109376270A CN 201811126932 A CN201811126932 A CN 201811126932A CN 109376270 A CN109376270 A CN 109376270A
- Authority
- CN
- China
- Prior art keywords
- audio
- label
- video file
- participle
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment provides a kind of data retrieval method and devices, it is related to field of home appliance technology, when solving that user is using label search audio/video file in the prior art, it will appear the audio/video file unrelated with the label to select for user, lead to user can not determine the problem of whether selected audio/video file is the audio/video file oneself needed.This method includes obtaining search instruction;According to label, the audio/video file comprising label is determined;According to the audio and video information of audio/video file, the collating sequence of audio/video file is generated;Audio/video file is shown according to collating sequence.The embodiment of the present invention is for providing a kind of search method of audio/video file.
Description
Technical field
The present invention relates to field of home appliance technology more particularly to a kind of data retrieval methods and device.
Background technique
In the prior art, when user is based on label search audio/video file, since the label of audio/video file is mostly artificial mark,
The problems such as there are missing, inaccuracy, Type-Inconsistencies, so that can have audio-visual text much unrelated with the label in search result
Part can also show user to select, user can not determine selected audio/video file whether be oneself need audio-visual text
Part, to reduce the experience of user.
It can be seen from the above, when user uses label search audio/video file in the prior art, it may appear that unrelated with the label
Audio/video file for user select, cause user can not determine selected audio/video file whether be oneself need audio/video file.
Summary of the invention
The embodiment of the present invention provides a kind of data retrieval method and device, solves user in the prior art and uses label
When retrieving audio/video file, it may appear that the audio/video file unrelated with the label is selected for user, causes user that can not determine selected
Audio/video file whether be oneself need audio/video file the problem of.
In order to achieve the above objectives, the embodiment of the present invention adopts the following technical scheme that
First aspect, the embodiment of the present invention provide a kind of search method, comprising: obtain search instruction;Wherein, retrieval refers to
It enables and includes at least label;According to label, the audio/video file comprising label is determined;According to the audio and video information of audio/video file, shadow is generated
The collating sequence of sound file;Wherein, audio and video information includes at least any one of label weight and comprehensive score, and label weight is used
In indicating label to the significance level of audio/video file, comprehensive score is used to indicate the appraisal result of audio/video file;According to sequence sequence
Column display audio/video file.
By above scheme it is found that the search method that embodiment through the invention provides, when user needs Checking label,
All audio/video files comprising the label can be found first, then according in the label weight and comprehensive score of the audio/video file
Any one generates the collating sequence of audio/video file, and shows audio/video file according to collating sequence;Therefore, user can be according to shadow
Any one of the label weight of sound file and comprehensive score determine the application file for needing to find, so that search result is more
It is accurate;When solving that user is using label search audio/video file in the prior art, it may appear that the audio-visual text unrelated with the label
Part is selected for user, the poor problem of user experience.
Optionally, before obtaining search instruction, this method further include: obtain the audio and video information of audio/video file;Wherein, audio-visual letter
Breath further includes any one of title, brief introduction and label;According to LDA topic model and audio and video information, the label master of label is determined
Write inscription sort subset and theme ProbabilityDistribution Vector;According to label descriptor sort subset and theme ProbabilityDistribution Vector, determine
The label weight of label.
Optionally, according to LDA topic model and audio and video information, determine that the label descriptor sort subset of label and theme are general
Before rate distribution vector, this method further include: obtain the training corpus of LDA topic model;Wherein, training corpus includes at least
One participle, participle include any one of label, the title participle of title and brief introduction participle of brief introduction;According to training corpus
Library determines the word frequency and inverse document word frequency of at least one participle;According to the word frequency of at least one participle and inverse document word frequency, determine
The characterization vector of at least one participle;According to label descriptor sort subset and theme ProbabilityDistribution Vector, label is determined
After label weight, this method further include: according to the characterization vector and label weight of at least one participle, determine audio/video file
Comprehensive score.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate
Any one;According to the characterization vector and label weight of at least one participle, the comprehensive score of audio/video file is determined, comprising: root
According to temperature, number of clicks, issuing time, play time and payment rate is played, other scorings are determined;According at least one participle
Characterization vector, label weight and other scorings, determine the comprehensive score of audio/video file.
Second aspect, the embodiment of the present invention provide a kind of retrieval device, comprising: obtain module, refer to for obtaining retrieval
It enables;Wherein, search instruction includes at least label;Processing module, the label for obtaining according to module is obtained determine to include label
Audio/video file;Processing module is also used to the audio and video information according to audio/video file, generates the collating sequence of audio/video file;Wherein,
Audio and video information includes at least any one of label weight and comprehensive score, and label weight is used to indicate label to audio/video file
Significance level, comprehensive score are used to indicate the appraisal result of audio/video file;Display module, the row for being generated according to processing module
Sequence sequence shows audio/video file.
Optionally, module is obtained, is also used to obtain the audio and video information of audio/video file;Wherein, audio and video information includes title, letter
Any one of Jie and label;Processing module is also used to according to LDA topic model and obtains the audio and video information of module acquisition, really
Calibrate the label descriptor sort subset and theme ProbabilityDistribution Vector of label;Processing module is also used to be arranged according to label descriptor
Sequence subset and theme ProbabilityDistribution Vector determine the label weight of label.
Optionally, module is obtained, is also used to obtain the training corpus of LDA topic model;Wherein, training corpus includes
At least one participle, participle include any one of label, the title participle of title and brief introduction participle of brief introduction;Processing module,
It is also used to the training corpus obtained according to module is obtained, determines the word frequency and inverse document word frequency of at least one participle;Handle mould
Block is also used to word frequency and inverse document word frequency according at least one participle, determines the characterization vector of at least one participle;Processing
Module determines the comprehensive score of audio/video file specifically for the characterization vector and label weight segmented according at least one.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate
Any one;Processing module, specifically for according to broadcasting temperature, the number of clicks, issuing time, play time for obtaining module acquisition
With payment rate, other scorings are determined;Processing module, specifically for segmented according at least one characterization vector, label weight
With other scorings, the comprehensive score of audio/video file is determined.
The third aspect, the embodiment of the present invention provide a kind of computer storage medium, including instruction, when its on computers
When operation, so that computer executes the search method of any one provided such as above-mentioned first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of retrieval device, comprising: communication interface, processor, memory,
Bus;For storing computer executed instructions, processor is connect with memory by bus memory, when retrieval device operation
When, processor executes the computer executed instructions of memory storage, so that retrieval device is executed as above-mentioned first aspect provides
The search method of any one.
5th aspect, the embodiment of the present invention provide a kind of searching system, including household electrical appliance and second aspect provide
Any one retrieve device.
It is to be appreciated that any retrieval device of above-mentioned offer is corresponding for executing first aspect presented above
Method, therefore, attainable beneficial effect can refer in method and the following detailed description of first aspect above
The beneficial effect of corresponding scheme, details are not described herein again.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 provides a kind of one of the flow diagram of search method for the embodiment of the present invention;
Fig. 2 provides the two of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 3 provides the three of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 4 provides the four of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 5 provides the five of the flow diagram of a kind of search method for the embodiment of the present invention;
Fig. 6 provides the tag queries DSL sentence of the ElasticSearch of search method a kind of for the embodiment of the present invention;
Fig. 7 provides a kind of one of structural schematic diagram for retrieving device for the embodiment of the present invention;
Fig. 8 provides a kind of second structural representation for retrieving device for the embodiment of the present invention;
Fig. 9 provides a kind of structural schematic diagram of searching system for the embodiment of the present invention.
Appended drawing reference:
Retrieve device -10;
Obtain module -101;Processing module -102;Display module -103.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
For the ease of clearly describing the technical solution of the embodiment of the present invention, in an embodiment of the present invention, use " the
One ", the printed words such as " second " distinguish function and the essentially identical identical entry of effect or similar item, and those skilled in the art can
To understand that the printed words such as " first ", " second " are not to be defined to quantity and execution order.
In embodiments of the present invention, " illustrative " or " such as " etc. words for indicate make example, illustration or explanation.This
Be described as in inventive embodiments " illustrative " or " such as " any embodiment or design scheme be not necessarily to be construed as comparing
Other embodiments or design scheme more preferably or more advantage.Specifically, use " illustrative " or " such as " etc. words purport
Related notion is being presented in specific ways.
In the description of the embodiment of the present invention, unless otherwise indicated, the meaning of " plurality " is refer to two or more.Example
Such as, multiple networks refer to two or more networks.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes
System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Symbol herein
Number "/" indicates that affiliated partner is that relationship such as A/B perhaps indicates A or B.
In the prior art, the label dimension based on audio/video file contents attribute is increasingly wider, quantity is more and more, if not
Label is effectively managed, is caused when based on label search audio/video file, it may appear that the audio-visual text unrelated with the label
Part causes search result chaotic;To solve the above-mentioned problems, the embodiment of the present invention provides a kind of search method, specific real
Existing mode is as follows:
Embodiment one
The embodiment of the present invention provides a kind of search method, includes: as shown in Figure 1
S101, search instruction is obtained;Wherein, search instruction includes at least label.
Optionally, before obtaining search instruction as shown in Figure 2 and Figure 5, this method further include:
S105, the audio and video information for obtaining audio/video file;Wherein, audio and video information further includes appointing in title, brief introduction and label
One.
It should be noted that in actual application, because the label that existing matchmaker provides audio/video file in library is mostly artificial mark
The problems such as note, there are missing, inaccuracy, Type-Inconsistencies;The number of labels of different audio/video files has and has less more simultaneously, is difficult to fill
The actual attribute and intension of the existing audio/video file of fission.
Therefore, it when getting the audio and video information of audio/video file, needs to some attribute information classes for including in the label
Label etc. of label, description inaccuracy, such as relevant marks of non-content such as the awards, list, affiliated series, the age brackets that are obtained
Label guarantee the accuracy of the label weight finally calculated to reject the label of mistake.
S106, according to document subject matter generate model (full name in English: Latent Dirichlet Allocation, referred to as:
LDA) topic model and audio and video information determine the label descriptor sort subset and theme ProbabilityDistribution Vector of label.
It should be noted that in actual application, it is necessary first to carry out nothing on distributed computing engine Spark cluster
The training and cluster of the LDA topic model of supervision obtain the key words probabilities model of multiple themes, the specific steps are as follows:
One, the training set composition of LDA topic model:
The characterization vector that label, title participle and the brief introduction of some audio/video file segment can indicate the audio/video file,
Each film represents a training data, and the training data of whole audio/video files constitutes the instruction of LDA topic model in media library
Practice collection.For the effect of computational efficiency and generation theme, can also first classify by type to the audio/video file in media library, such as
Film, TV play, variety, music etc. guarantee that the audio/video file under each classification has similar comparativity.
Two, the number of iterations of LDA topic model LDA topic model parameter setting: is set as 200- according to actual experience
300 times, the number of topics of topic model is 30-100, using EM algorithm (full name in English: Expectation
Maximization, referred to as: EM) model is established, based on puzzlement degree as model evaluation standard.
Three, training LDA topic model: training set iterative calculation obtains the optimal LDA topic model of puzzlement degree, uses this
LDA topic model clusters current whole audio/video files, and label descriptor sort subset and the theme for obtaining each theme are general
Rate distribution vector.
Four, trained LDA topic model is saved to distributed file system (full name in English: Hadoop
Distributed File System, referred to as: HDFS), it is ensured that the audio/video file being newly added can be by the LDA that has generated
Topic model carries out subject classification.
Five, the LDA topic model is regularly updated, training set is rearranged and carries out the training of LDA topic model, it is ensured that LDA master
Inscribe the timeliness of model.
S107, according to label descriptor sort subset and theme ProbabilityDistribution Vector, determine the label weight of label.
It should be noted that LDA topic model is made of two levels of theme and bag of words, bag of words in actual application
The sequence of middle participle reflects difference (the corresponding bag of words of each theme, each bag of words packet of the content of theme and other themes
Containing at least one participle, the content of the corresponding bag of words of each theme is not necessarily identical), the classification of some audio/video file is by institute owner
The participle sequence of topic and the probability distribution of affiliated theme codetermine, and the label Tag of some audio/video file is theme bag of words
Subset, therefore the weight relationship of label can be measured from two levels of bag of words and theme, specific calculating is as follows:
One, each theme Topic is by different participle weight sequencing WrankIts content is described, the mark of audio/video file is extracted
The position in Topic participle sequence is signed, current label can be mapped on this theme to the important journey of audio/video file description
Degree, the problem of Rank positional relationship can also differ greatly to avoid the participle weight of different themes, therefore set audio/video file k and have m
A label has s theme, and (m, n are the integer greater than 0), then label descriptor sort subset is as follows at some Topic:
Tagrank(Ti)=Tagk∩Wrank(Ti)={ Tag1(W1),Tag2(W2),…,Tagm(Wn)},Its
In, Tagm(Wn)) indicate label m in theme TiSorting position in corresponding bag of words is Wn, i ∈ s;
For whole themes, then there is the tag sorting matrix of audio/video file k:
Wherein, filmkIndicate audio-visual text
The tag sorting matrix of part k, TagmWn(Ts) indicate label m in theme TiSorting position in corresponding bag of words is Wn;
Two, each audio/video file can generate one in whole theme TsOn theme ProbabilityDistribution Vector Vk, theme is general
Rate distribution vector VkIt indicates degree of membership of the audio/video file on some theme, audio/video file mark be can reflect out based on this vector
Sign the significance level in different themes:
Three, the theme ProbabilityDistribution Vector V of audio/video file kkWith ordinal matrix Tag of the label on each themekProduct, can
To describe outgoing label weight Weight in the tally set space and topic model space of audio/video filetag(k):
In the tag set of audio/video file k, when the corresponding label weight of the label of audio/video file is bigger, illustrate the mark
It signs closer with the content of audio/video file;When being compared to each other between different audio/video files, the mark to each audio/video file is needed
Label set is normalized, and more different audio/video files include the weight of identical label, which shows more greatly the shadow
The content of sound file is closer to the label, it can be seen that can effectively be distinguished between each audio/video file label and be had based on this scoring
There is the weight relationship between the audio/video file of same label.
Optionally, the label descriptor row of label is determined according to LDA topic model and audio and video information as shown in Figure 3 and Figure 5
Before sequence subset and theme ProbabilityDistribution Vector, this method further include:
S108, the training corpus for obtaining LDA topic model;Wherein, training corpus includes at least one participle, participle
Any one of title participle including label, title and the brief introduction participle of brief introduction.
S109, according to training corpus, determine the word frequency and inverse document word frequency of at least one participle.
S110, word frequency and inverse document word frequency according at least one participle, determine the characterization vector of at least one participle.
According to label descriptor sort subset and theme ProbabilityDistribution Vector, after the label weight for determining label, this method
Further include:
S111, characterization vector and label weight according at least one participle, determine the comprehensive score of audio/video file.
It should be noted that, in order to obtain the training corpus of LDA topic model, needing to distinguish in actual application
Carrying out Chinese word segmentation to the title Title and brief introduction Summary of audio/video file, (Chinese Word Segmentation, refers to
It is that a chinese character sequence is cut into individual word one by one), removal stop words is (in information retrieval, to save memory space
With raising search efficiency, certain words or word are fallen in meeting automatic fitration before or after handling natural language data (or text), this
A little words or word are to be referred to as stop words (Stop Words)) afterwards with tag set (refer to the audio/video file include all labels)
Tag polymerization and duplicate removal, constitute the training corpus S of LDA topic model.
If the sum for the audio/video file for including in media library is D, the sum comprising some participle is DF, this point in media library
Word is in word frequency (the i.e. number that occurs in the audio/video file of the participle total score that is used for divided by this file in some audio/video file
Word number) tf, the participle in the audio/video file inverse document word frequency (IDF inverse document frequency, also known as
Anti- document frequency is the inverse of document frequency) idf, the inverse text frequency of word frequency-of Tag is calculated based on word frequency and inverse document word frequency
Index (full name in English: Term Frequency-Inverse Document Frequency, abbreviation: TF-IDF) value is as follows;
According to the TF-IDF value all segmented in the audio/video file, sparse vector is established, with the sparse vector all segmented
Indicate the audio/video file:
Film=Vector.spare (length, postion (1 ..., n), value (s1,…,sn));
Save the Tag of the audio/video filetf·idfValue, the comprehensive score for the audio/video file calculate.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate
Any one;As shown in Figure 4 and Figure 5 according to the characterization vector and label weight of at least one participle, the comprehensive of audio/video file is determined
Close scoring, comprising:
S1110, according to temperature, number of clicks, issuing time, play time and payment rate is played, determine other scorings.
S1111, characterization vector, label weight and other scorings segmented according at least one, determine audio/video file
Comprehensive score.
It should be noted that in actual application, when label weight is applied to the retrieval based on label, it is also necessary to examine
Consider the attributes such as broadcasting temperature, number of clicks, issuing time, play time and the payment rate of integrating audio-video file and calculates the shadow jointly
The comprehensive score of sound file, and label is directly provided the user with to the sort result recalled based on label using comprehensive score
Query result, the specific steps are as follows:
One, in conjunction with Tagtf·idfWith other score value attributes featurei, the synthesis that i ∈ { 1 ..., f } calculates audio/video file comments
Point:
Wherein, ScorekIndicating the comprehensive score of audio/video file k, normalize indicates normalization,It indicates
Other scorings.
Two, in retrieval, in order to which user can be allowed to compare the degree of closeness of each audio/video file and label in search result,
It needs to be ranked up according to the comprehensive score of each audio/video file of the label comprising user search, obtains the sequence of audio/video file
Sequence, the content of the more forward audio/video file of collating sequence is closer to the label;The label for including due to different audio/video files
Quantity is different, causes the corresponding comprehensive score of different audio/video files different, therefore comment in the synthesis of the different audio/video files of correspondence
When the size divided, need that the comprehensive score is normalized, so as to compare different audio/video files;It is exemplary
, by taking the label of user's search is idol as an example, corresponding to comprising the audio/video file that the label is idol includes: that petard and China fir China fir come
, for be illustrated:
Assuming that after being normalized, the label weight of petard are as follows: [" idol " 186.02, " describing love affairs " 46.28, " father "
82.3, " pursuing a goal with determination " 41.19], China fir China fir has carried out label weight are as follows: [" idol " 253.42, " describing love affairs " 58.28, " warm blood " 43.1 " are encouraged
Will " 57.86];Therefore, the content that China fir China fir has come is more nearly idol.
Illustratively, (Tag, Score) is stored in search engine ElasticSearch's with Nested (nested structure)
In index, based on being illustrated for label search audio/video file:
One, the configuration parameter of ElasticSearch is set in the Mapping of building index, and setting inquiry Dsl sentence is such as
Shown in Fig. 6.
Two, the index with label weight and/or comprehensive score is online, realize the inspection of the audio/video file based on label
Rope.
S102, according to label, determine include label audio/video file.
S103, the audio and video information according to audio/video file, generate the collating sequence of audio/video file;Wherein, audio and video information is at least
Including any one of label weight and comprehensive score, label weight is used to indicate label to the significance level of audio/video file, comprehensive
Close the appraisal result that scoring is used to indicate audio/video file.
It should be noted that the collating sequence of audio/video file can be commented based on label weight, synthesis in actual application
Point or label weight and comprehensive score be ranked up;Wherein, the content of the audio/video file before collating sequence with should
Label is closer.
S104, audio/video file is shown according to collating sequence.
By above scheme it is found that the search method that embodiment through the invention provides, when user needs Checking label,
All audio/video files comprising the label can be found first, then according in the label weight and comprehensive score of the audio/video file
Any one generates the collating sequence of audio/video file, and shows audio/video file according to collating sequence;Therefore, user can be according to shadow
Any one of the label weight of sound file and comprehensive score determine the application file for needing to find, so that search result is more
It is accurate;When solving that user is using label search audio/video file in the prior art, it may appear that the audio-visual text unrelated with the label
Part is selected for user, the poor problem of user experience.
Embodiment two
The embodiment of the present invention provides a kind of retrieval device 10, includes: as shown in Figure 7
Module 101 is obtained, for obtaining search instruction;Wherein, search instruction includes at least label.
Processing module 102, the label for obtaining according to module 101 is obtained, determines the audio/video file comprising label.
Processing module 102 is also used to the audio and video information according to audio/video file, generates the collating sequence of audio/video file;Wherein,
Audio and video information includes at least any one of label weight and comprehensive score, and label weight is used to indicate label to audio/video file
Significance level, comprehensive score are used to indicate the appraisal result of audio/video file.
Display module 103, the collating sequence for being generated according to processing module 102 show audio/video file.
Optionally, module 101 is obtained, is also used to obtain the audio and video information of audio/video file;Wherein, audio and video information includes mark
Any one of topic, brief introduction and label;Processing module 102 is also used to according to LDA topic model and obtains what module 101 obtained
Audio and video information determines the label descriptor sort subset and theme ProbabilityDistribution Vector of label;Processing module 102, is also used to root
According to label descriptor sort subset and theme ProbabilityDistribution Vector, the label weight of label is determined.
Optionally, module 101 is obtained, is also used to obtain the training corpus of LDA topic model;Wherein, training corpus
Including at least one participle, participle includes any one of label, the title participle of title and brief introduction participle of brief introduction;Handle mould
Block 102 is also used to the training corpus obtained according to module 101 is obtained, determines word frequency and inverse document word that at least one is segmented
Frequently;Processing module 102 is also used to word frequency and inverse document word frequency according at least one participle, determines the spy of at least one participle
Signization vector;Processing module 102 determines audio-visual specifically for the characterization vector and label weight segmented according at least one
The comprehensive score of file.
Optionally, audio and video information further include: play in temperature, number of clicks, issuing time, play time and payment rate
Any one;Processing module 102, specifically for according to obtain module 101 obtain broadcasting temperature, number of clicks, issuing time, broadcast
Time and payment rate are put, determines other scorings;Processing module 102, specifically for segmented according at least one characterization vector,
Label weight and other scorings, determine the comprehensive score of audio/video file.
Wherein, all related contents for each step that above method embodiment is related to can quote corresponding function module
Function description, effect details are not described herein.
Using integrated module, retrieval device include: obtain module, processing module, display module and
Memory module.Processing module is used to carry out control management to the movement of retrieval device, for example, processing unit is for supporting retrieval dress
Set process S101, S102, S103 and the S104 executed in Fig. 1;Obtain module and display module be used to support retrieval device with
The information exchange of other equipment.Memory module, program code and data for memory scan device.
Wherein, using processing module as processor, memory module is memory, obtains module and display module is communication interface
For.Wherein, device is retrieved referring to fig. 8, including communication interface 501, processor 502, memory 503 and bus 504,
Communication interface 501, processor 502 are connected by bus 504 with memory 503.
Processor 502 can be a general central processor (Central Processing Unit, CPU), micro process
Device, application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC) or one or more
A integrated circuit executed for controlling application scheme program.
Memory 503 can be read-only memory (Read-Only Memory, ROM) or can store static information and instruction
Other kinds of static storage device, random access memory (Random Access Memory, RAM) or letter can be stored
The other kinds of dynamic memory of breath and instruction, is also possible to Electrically Erasable Programmable Read-Only Memory (Electrically
Erasable Programmable Read-only Memory, EEPROM), CD-ROM (Compact Disc Read-
Only Memory, CD-ROM) or other optical disc storages, optical disc storage (including compression optical disc, laser disc, optical disc, digital universal
Optical disc, Blu-ray Disc etc.), magnetic disk storage medium or other magnetic storage apparatus or can be used in carrying or store to have referring to
Enable or data structure form desired program code and can by any other medium of computer access, but not limited to this.
Memory, which can be, to be individually present, and is connected by bus with processor.Memory can also be integrated with processor.
Wherein, memory 503 is used to store the application code for executing application scheme, and is controlled by processor 502
System executes.Communication interface 501 is used to carry out information exchange, such as the information exchange with remote controler with other equipment.Processor 502
For executing the application code stored in memory 503, to realize method described in the embodiment of the present application.
In addition, a kind of calculating storage media (or medium) is also provided, including carrying out in above-described embodiment when executed
Retrieve the instruction for the method operation that device executes.In addition, also providing a kind of computer program product, including above-mentioned calculating stores matchmaker
Body (or medium).
It should be understood that in various embodiments of the present invention, magnitude of the sequence numbers of the above procedures are not meant to execute suitable
Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present invention
Process constitutes any restriction.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, it can be with
It realizes by another way.For example, apparatus embodiments described above are merely indicative, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer retrieves device or the network equipment etc.) execute all or part of step of each embodiment the method for the present invention
Suddenly.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (full name in English: read-only memory, English
It is literary referred to as: ROM), random access memory (full name in English: random access memory, English abbreviation: RAM), magnetic disk or
The various media that can store program code such as person's CD.
It is to be appreciated that any retrieval device of above-mentioned offer is corresponding for executing embodiment one presented above
Method, therefore, attainable beneficial effect can refer in method and the following detailed description of foregoing embodiments one
The beneficial effect of corresponding scheme, details are not described herein again.
Embodiment three
The embodiment of the present invention provides a kind of searching system, including household electrical appliance and such as any one of the offer of embodiment two
Retrieve device.
It should be noted that in practical applications, household electrical appliance as shown in Figure 9 receive the search instruction of user's input
When, the search instruction is sent to retrieval device (can be server), it is above-mentioned according to search instruction execution to retrieve device
Search method generates corresponding collating sequence, while sending the control instruction for carrying the collating sequence to household electrical appliance, so as to family
Electrical appliance shows audio/video file according to the collating sequence.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of search method characterized by comprising
Obtain search instruction;Wherein, the search instruction includes at least label;
According to the label, the audio/video file comprising the label is determined;
According to the audio and video information of the audio/video file, the collating sequence of the audio/video file is generated;Wherein, the audio and video information is extremely
It include less any one of label weight and comprehensive score, the label weight is used to indicate the label to the audio/video file
Significance level, the comprehensive score is used to indicate the appraisal result of the audio/video file;
The audio/video file is shown according to the collating sequence.
2. search method according to claim 1, which is characterized in that before obtaining search instruction, the method also includes:
Obtain the audio and video information of audio/video file;Wherein, the audio and video information further includes any one of title, brief introduction and label;
According to LDA topic model and the audio and video information, the label descriptor sort subset and theme probability of the label are determined
Distribution vector;
According to the label descriptor sort subset and the theme ProbabilityDistribution Vector, the label weight of the label is determined.
3. search method according to claim 2, which is characterized in that according to LDA topic model and the audio and video information, really
Before the label descriptor sort subset and theme ProbabilityDistribution Vector of the fixed label, the method also includes:
Obtain the training corpus of the LDA topic model;Wherein, the training corpus includes at least one participle, described
Participle includes any one of label, the title participle of the title and brief introduction participle of the brief introduction;
According to the training corpus, the word frequency and inverse document word frequency of at least one participle are determined;
According to the word frequency of at least one participle and inverse document word frequency, the characterization vector of at least one participle is determined;
According to the label descriptor sort subset and the theme ProbabilityDistribution Vector, the label weight of the label is determined
Afterwards, the method also includes:
According to the characterization vector of at least one participle and the label weight, determine that the synthesis of the audio/video file is commented
Point.
4. search method according to claim 3, which is characterized in that the audio and video information further include: play temperature, click
Any one of number, issuing time, play time and payment rate;
According to the characterization vector of at least one participle and the label weight, determine that the synthesis of the audio/video file is commented
Point, comprising:
According to the broadcasting temperature, the number of clicks, the issuing time, the play time and the payment rate, determine
Other scorings;
According to the characterization vector of at least one participle, the label weight and other described scorings, determine described audio-visual
The comprehensive score of file.
5. a kind of retrieval device characterized by comprising
Module is obtained, for obtaining search instruction;Wherein, the search instruction includes at least label;
Processing module, the label for being obtained according to the acquisition module, determines the audio/video file comprising the label;
The processing module is also used to the audio and video information according to the audio/video file, generates the collating sequence of the audio/video file;
Wherein, the audio and video information includes at least any one of label weight and comprehensive score, and the label weight is used to indicate institute
Label is stated to the significance level of the audio/video file, the comprehensive score is used to indicate the appraisal result of the audio/video file;
Display module, the collating sequence for being generated according to the processing module show the audio/video file.
6. retrieval device according to claim 5, which is characterized in that the acquisition module is also used to obtain audio/video file
Audio and video information;Wherein, the audio and video information includes any one of title, brief introduction and label;
The processing module is also used to determine institute according to LDA topic model and the audio and video information for obtaining module and obtaining
State the label descriptor sort subset and theme ProbabilityDistribution Vector of label;
The processing module is also used to be determined according to the label descriptor sort subset and the theme ProbabilityDistribution Vector
The label weight of the label.
7. retrieval device according to claim 6, which is characterized in that the acquisition module is also used to obtain the LDA master
Inscribe the training corpus of model;Wherein, the training corpus includes at least one participle, and the participle includes label, described
Any one of title participle and the brief introduction participle of the brief introduction of title;
The processing module is also used to the training corpus obtained according to the acquisition module, determine it is described at least one
The word frequency of participle and inverse document word frequency;
The processing module is also used to word frequency and inverse document word frequency according at least one participle, determines described at least one
The characterization vector of a participle;
The processing module is determined specifically for the characterization vector of at least one participle and the label weight according to
The comprehensive score of the audio/video file.
8. retrieval device according to claim 7, which is characterized in that the audio and video information further include: play temperature, click
Any one of number, issuing time, play time and payment rate;
The processing module, specifically for the broadcasting temperature, number of clicks, described obtained according to the acquisition module
Issuing time, the play time and the payment rate, determine other scorings;
The processing module, specifically for according to the characterization vector of at least one participle, the label weight and described
Other scorings, determine the comprehensive score of the audio/video file.
9. a kind of computer storage medium, including instruction, when run on a computer, so that computer executes such as above-mentioned power
Benefit requires the described in any item search methods of 1-4.
10. a kind of retrieval device, comprising: communication interface, processor, memory, bus;Memory is for storing computer execution
Instruction, processor are connect with memory by bus, and when retrieving device operation, processor executes the computer of memory storage
It executes instruction, so that retrieval device executes such as the described in any item search methods of the claims 1-4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811126932.5A CN109376270A (en) | 2018-09-26 | 2018-09-26 | A kind of data retrieval method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811126932.5A CN109376270A (en) | 2018-09-26 | 2018-09-26 | A kind of data retrieval method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109376270A true CN109376270A (en) | 2019-02-22 |
Family
ID=65402690
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811126932.5A Pending CN109376270A (en) | 2018-09-26 | 2018-09-26 | A kind of data retrieval method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109376270A (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110069573A (en) * | 2019-03-19 | 2019-07-30 | 深圳壹账通智能科技有限公司 | Product data integration method, apparatus, computer equipment and storage medium |
| CN110222709A (en) * | 2019-04-29 | 2019-09-10 | 上海暖哇科技有限公司 | A kind of multi-tag intelligence marking method and system |
| CN110489525A (en) * | 2019-08-09 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Acquisition methods and device, the storage medium and electronic device of search result |
| CN111625716A (en) * | 2020-05-12 | 2020-09-04 | 聚好看科技股份有限公司 | Media asset recommendation method, server and display device |
| CN113488144A (en) * | 2021-07-14 | 2021-10-08 | 深圳市东亿健康服务有限公司 | Slice image processing method |
| CN114253976A (en) * | 2021-12-21 | 2022-03-29 | 北京达佳互联信息技术有限公司 | Searching method and device based on bitmap scoring |
| CN114328798A (en) * | 2021-11-09 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Processing method, device, equipment, storage medium and program product for searching text |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102982153A (en) * | 2012-11-29 | 2013-03-20 | 北京亿赞普网络技术有限公司 | Information retrieval method and device |
| CN105389325A (en) * | 2014-09-02 | 2016-03-09 | 三星电子株式会社 | Content search method and electronic device implementing same |
| CN106055538A (en) * | 2016-05-26 | 2016-10-26 | 达而观信息科技(上海)有限公司 | Automatic extraction method for text labels in combination with theme model and semantic analyses |
| CN106294314A (en) * | 2016-07-19 | 2017-01-04 | 北京奇艺世纪科技有限公司 | Topic Mining Method and Device |
| CN106446135A (en) * | 2016-09-19 | 2017-02-22 | 北京搜狐新动力信息技术有限公司 | Method and device for generating multi-media data label |
| US20170109786A1 (en) * | 2015-10-20 | 2017-04-20 | Korea Electronics Technology Institute | System for producing promotional media content and method thereof |
-
2018
- 2018-09-26 CN CN201811126932.5A patent/CN109376270A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102982153A (en) * | 2012-11-29 | 2013-03-20 | 北京亿赞普网络技术有限公司 | Information retrieval method and device |
| CN105389325A (en) * | 2014-09-02 | 2016-03-09 | 三星电子株式会社 | Content search method and electronic device implementing same |
| US20170109786A1 (en) * | 2015-10-20 | 2017-04-20 | Korea Electronics Technology Institute | System for producing promotional media content and method thereof |
| CN106055538A (en) * | 2016-05-26 | 2016-10-26 | 达而观信息科技(上海)有限公司 | Automatic extraction method for text labels in combination with theme model and semantic analyses |
| CN106294314A (en) * | 2016-07-19 | 2017-01-04 | 北京奇艺世纪科技有限公司 | Topic Mining Method and Device |
| CN106446135A (en) * | 2016-09-19 | 2017-02-22 | 北京搜狐新动力信息技术有限公司 | Method and device for generating multi-media data label |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110069573A (en) * | 2019-03-19 | 2019-07-30 | 深圳壹账通智能科技有限公司 | Product data integration method, apparatus, computer equipment and storage medium |
| CN110222709A (en) * | 2019-04-29 | 2019-09-10 | 上海暖哇科技有限公司 | A kind of multi-tag intelligence marking method and system |
| CN110222709B (en) * | 2019-04-29 | 2022-01-25 | 上海暖哇科技有限公司 | Multi-label intelligent marking method and system |
| CN110489525A (en) * | 2019-08-09 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Acquisition methods and device, the storage medium and electronic device of search result |
| CN111625716A (en) * | 2020-05-12 | 2020-09-04 | 聚好看科技股份有限公司 | Media asset recommendation method, server and display device |
| CN111625716B (en) * | 2020-05-12 | 2023-10-31 | 聚好看科技股份有限公司 | Media asset recommendation method, server and display device |
| CN113488144A (en) * | 2021-07-14 | 2021-10-08 | 深圳市东亿健康服务有限公司 | Slice image processing method |
| CN113488144B (en) * | 2021-07-14 | 2023-11-07 | 内蒙古匠艺科技有限责任公司 | Slice image processing method |
| CN114328798A (en) * | 2021-11-09 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Processing method, device, equipment, storage medium and program product for searching text |
| CN114328798B (en) * | 2021-11-09 | 2024-02-23 | 腾讯科技(深圳)有限公司 | Processing method, device, equipment, storage medium and program product for searching text |
| CN114253976A (en) * | 2021-12-21 | 2022-03-29 | 北京达佳互联信息技术有限公司 | Searching method and device based on bitmap scoring |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109376270A (en) | A kind of data retrieval method and device | |
| US11645317B2 (en) | Recommending topic clusters for unstructured text documents | |
| Nigam | Using unlabeled data to improve text classification | |
| Huang | Similarity measures for text document clustering | |
| US8356044B2 (en) | System and method for providing default hierarchical training for social indexing | |
| US20170161375A1 (en) | Clustering documents based on textual content | |
| CN108334528B (en) | Method and device for recommending information | |
| KR102046692B1 (en) | Method and System for Entity summarization based on multilingual projected entity space | |
| US20180285448A1 (en) | Producing personalized selection of applications for presentation on web-based interface | |
| Zhai et al. | Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval | |
| Tran et al. | Balancing novelty and salience: Adaptive learning to rank entities for timeline summarization of high-impact events | |
| WO2020003109A1 (en) | Facet-based query refinement based on multiple query interpretations | |
| CN116882414B (en) | Automatic comment generation method and related device based on large-scale language model | |
| KR100452086B1 (en) | Search System For Providing Information of Keyword Input Frequency By Category And Method Thereof | |
| CN114330335A (en) | Keyword extraction method, device, equipment and storage medium | |
| CN113076481B (en) | Document recommendation system and method based on maturity technology | |
| CN114461783A (en) | Keyword generating method, apparatus, computer equipment, storage medium and product | |
| CN110472016A (en) | Article recommended method, device, electronic equipment and storage medium | |
| US11823785B2 (en) | Methods and systems for calculating nutritional requirements in a display interface | |
| El-Assady et al. | LTMA: Layered topic matching for the comparative exploration, evaluation, and refinement of topic modeling results | |
| Latha | Experiment and Evaluation in Information Retrieval Models | |
| Cobos et al. | Fitness function obtained from a genetic programming approach for web document clustering using evolutionary algorithms | |
| CN111667023A (en) | Method and device for acquiring articles in target category | |
| Lu et al. | A novel approach towards large scale cross-media retrieval | |
| Zhu et al. | Customized organization of social media contents using focused topic hierarchy |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190222 |
|
| RJ01 | Rejection of invention patent application after publication |