CN108763578B

CN108763578B - Index file updating method and server

Info

Publication number: CN108763578B
Application number: CN201810582788.XA
Authority: CN
Inventors: 李祖嘉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-06-07
Filing date: 2018-06-07
Publication date: 2023-03-10
Anticipated expiration: 2038-06-07
Also published as: CN108763578A

Abstract

The embodiment of the invention discloses a method for updating an index file, which comprises the following steps: when a first server detects that target index information is updated, the first server acquires a first transaction log, wherein the first transaction log is used for recording the target index information; the first server updates an index file according to the first transaction log; and the first server sends the first transaction log to a second server so that the second server updates the index file according to the first transaction log. The embodiment of the invention also discloses a server. The embodiment of the invention can update the index file in time and improve the reliability of the retrieval result on one hand, and can reduce the computing resources and prevent repeated computing of the index information on the other hand, thereby improving the resource utilization rate.

Description

Index file updating method and server

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and a server for updating an index file.

Background

With the development of science and technology, especially the increase of information quantity, information retrieval technology is more and more emphasized by people. The good information retrieval technology can be used for determining the information retrieval result from a large amount of information quickly and accurately, so that the user can use the information retrieval result conveniently.

Currently, an index file is constructed by acquiring an index task at regular time, then the index file is synchronized to each server in a system, and each server locally loads the index file after acquiring the index file, and finally provides an online retrieval service.

However, if the interval time for obtaining the index task is too long, the index file cannot be updated in time, which may cause incomplete information retrieved by the user, and reduce the reliability of the retrieval result. If the time interval for obtaining the index task is too short, it may cause the index of most information in a short time to be repeated, thereby causing a waste of a large amount of computing resources.

Disclosure of Invention

Embodiments of the present invention provide a method, a server, and a system for updating an index file, which can update the index file in time, improve reliability of a search result, and reduce computing resources, thereby preventing repeated computing of index information, and improving resource utilization.

In view of the above, a first aspect of the present invention provides a method for updating an index file, including:

when a first server detects that target index information is updated, the first server acquires a first transaction log, wherein the first transaction log is used for recording the target index information;

the first server updates an index file according to the first transaction log;

and the first server sends the first transaction log to a second server so that the second server updates the index file according to the first transaction log.

A second aspect of the present invention provides a method for updating an index file, including:

when a first server detects that target index information is updated, a first transaction log sent by the first server is received by a second server, wherein the first transaction log is used for recording the target index information and triggering the first server to update an index file;

and the second server updates the index file according to the first transaction log.

A third aspect of the present invention provides a server comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a first transaction log when a first server detects that target index information is updated, and the first transaction log is used for recording the target index information;

the updating module is used for updating the index file according to the first transaction log acquired by the acquiring module;

and the sending module is used for sending the first transaction log obtained by the obtaining module to a second server so that the second server updates the index file according to the first transaction log.

A fourth aspect of the present invention provides a server comprising:

the system comprises a receiving module, a sending module and a processing module, wherein the receiving module is used for receiving a first transaction log sent by a first server when the first server detects that target index information is updated, the first transaction log is used for recording the target index information, and the first transaction log is used for triggering the first server to update an index file;

and the updating module is used for updating the index file according to the first transaction log received by the receiving module.

A fifth aspect of the present invention provides a server comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

when the update of target index information is detected, acquiring a first transaction log, wherein the first transaction log is used for recording the target index information;

updating an index file according to the first transaction log;

sending the first transaction log to a second server so that the second server updates the index file according to the first transaction log;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

A sixth aspect of the present invention provides a server comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

when the update of target index information is detected, receiving a first transaction log sent by a first server, wherein the first transaction log is used for recording the target index information and triggering the first server to update an index file;

updating the index file according to the first transaction log;

A seventh aspect of the present invention provides an index file updating system, including: the system comprises a first server, a second server and a client;

the first server updates an index file according to the first transaction log;

the first server sends the first transaction log to a second server;

the second server updates the index file according to the first transaction log;

the client sends a retrieval instruction to the second server;

the second server acquires a retrieval result from the index file according to the retrieval instruction;

and the second server sends the retrieval result to the client.

An eighth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.

According to the technical scheme, the embodiment of the invention has the following advantages:

in an embodiment of the present invention, a method for updating an index file is provided, where when a first server detects that target index information is updated, the first server obtains a first transaction log, where the first transaction log is used to record the target index information, and then the first server updates the index file according to the first transaction log, and in addition, the first server also needs to send the first transaction log to a second server, and the second server updates the index file according to the first transaction log. Through the mode, when an index task is detected, the updating process of the index file is triggered, the main server serving as the first server can generate the log only related to the updating information, then the log is issued to other slave servers, the changed index information is updated only no matter the main server or the slave servers, and the incremental content is added into the original index file.

Drawings

FIG. 1 is a block diagram of an index file update system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating an interaction flow of a method for updating an index file according to an embodiment of the present invention;

FIG. 3 is a diagram of an embodiment of a method for updating an index file according to an embodiment of the present invention;

FIG. 4 is a diagram of an embodiment of the interaction between the master server and the slave server in the embodiment of the present invention;

FIG. 5 is a diagram illustrating a hash table index structure according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an index layer architecture according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating synchronization of index files between a master server and a slave server according to an embodiment of the present invention;

FIG. 8 is a diagram of another embodiment of a method for updating an index file according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating a master server and a slave server switching according to an embodiment of the present invention;

FIG. 10 is a diagram of one embodiment of a server in an embodiment of the invention;

FIG. 11 is a diagram of another embodiment of a server in an embodiment of the invention;

FIG. 12 is a diagram of another embodiment of a server in an embodiment of the invention;

FIG. 13 is a diagram of another embodiment of a server in an embodiment of the invention;

FIG. 14 is a schematic diagram of another embodiment of a server in an embodiment of the invention;

FIG. 15 is a diagram of one embodiment of a server in an embodiment of the invention;

FIG. 16 is a schematic diagram of another embodiment of a server in an embodiment of the invention;

FIG. 17 is a diagram of another embodiment of a server in an embodiment of the invention;

FIG. 18 is a block diagram of a server according to an embodiment of the present invention;

FIG. 19 is a diagram of an embodiment of an index file updating system according to the embodiment of the present invention.

Detailed Description

Embodiments of the present invention provide a method, a server, and a system for updating an index file, which can update the index file in time to improve reliability of a retrieval result, and can reduce computing resources to prevent repeated computing of repeated index information, thereby improving resource utilization.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that the embodiment of the invention can be applied to a recommendation system based on index service, and the invention is particularly suitable for a browser, and is also particularly suitable for a public number or an applet. The browser is software that can display the content of a hypertext Markup Language (HTML) file of a web server or a file system and enable a user to interact with the HTML file. It is used to display text, images and other information within the network.

The index service, as the bottom-layer service for providing data in the recommendation system, faces billions of read requests and millions of write requests per day, so that the high-performance index service and the capability of automatic capacity expansion anytime and anywhere are key to ensure the stability of the recommendation system. In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory.

The index can be formed into a file, namely an index file, by establishing the index service, and then the index file is synchronized to the whole system, which is called an index file updating system. The index file carries the sequence file of the index. The index itself is very small, occupying only two fields. The access to the records in the index file requires the following steps:

firstly, the whole index file is loaded into a memory of a server (the file is very small and only occupies very small memory space), then items are searched, a target key is searched by adopting an efficient algorithm (such as a binary query method), the address of the record is searched according to the target key, and finally the search record is fed back to a client according to the address.

For convenience of introduction, please refer to fig. 1, where fig. 1 is a schematic structural diagram of an index file updating system according to an embodiment of the present invention, and as shown in the drawing, a first server is a master server, and a second server is a slave server. The first server can be deployed in a certain region, such as beijing, and the second server can also be deployed in other regions, such as shanghai, shenzhen, tianjin and shenyang, respectively, and when a user requests the second server through a client to retrieve data, the second server issues the relevant retrieved data to the client. The client may be deployed in a terminal device, including but not limited to a personal computer, a mobile phone, a tablet computer, and a palm computer.

Data synchronization needs to be kept between the first server and the second server, and when the first server is updated, the second server also needs to be updated, so that data synchronization is achieved. In addition, master-slave switching can be performed between the first server and the second server, and when the first server fails, a proper server can be selected from the second server to take over the work of the first server, so that the master-slave switching is realized.

Referring to fig. 2, fig. 2 is an interaction flow diagram of a method for updating an index file according to an embodiment of the present invention, as shown in the figure, specifically:

in step S1, first, when the first server detects that the index information is updated (for example, five new articles are added), the first server may obtain a transaction log, where the transaction log is used to record updated index information (titles, fields, and other contents of the five articles);

in the step S2, the first server updates the index file stored in the local according to the transaction log;

in step S3, while step S2 is performed, or after step S2 is completed, the first server sends a transaction log to the second server;

in step S4, the second server updates the index file stored in the local according to the transaction log;

in step S5, when the user needs to perform retrieval, the user sends a retrieval instruction to the second server through the client, where the retrieval instruction may carry an information identifier that the user needs to retrieve, and the information identifier is used to indicate a content that the user wants to retrieve, such as "royal glory";

in step S6, the second server searches a corresponding article list from the database according to the retrieval instruction, and may arrange the latest document at the head of the article list in order to make a new article have an exposure opportunity, and after the ordering is completed, perform obsolescence, de-emphasis logic, low-quality obsolescence, and the like on the article list;

in step S7, the article list is sent to the client, and the client displays the corresponding article list to the user.

Referring to fig. 3, a method for updating an index file according to the present invention will be described below from the perspective of a first server, where an embodiment of the method for updating an index file according to the present invention includes:

101. when the first server detects that the target index information is updated, the first server acquires a first transaction log, wherein the first transaction log is used for recording the target index information;

in this embodiment, when the first server serving as the main server detects that there is updated target index information, the first server may obtain a first transaction log, which is a file for recording the target index information.

Specifically, the first transaction log is binlog, which is a file in binary format and is used to record Structured Query Language (SQL) for updating the database by the user, for example, SQL statements for changing database tables and contents are all recorded in binlog, but queries for contents such as database tables are not recorded. When data is written into the database, the updated SQL statement is written into the corresponding binlog file at the same time. When the mysqldump tool is used for backup, data is completely backed up for a period of time, but if a database server is suddenly found to be in failure after backup, binlog is used at the moment, so that the main function of the binlog is to copy the database in a master-slave mode and restore the data in an incremental mode.

102. The first server updates the index file according to the first transaction log;

in this embodiment, the first server may update the index file originally stored in the database according to the SQL statement in the first transaction log.

103. The first server sends the first transaction log to the second server, so that the second server updates the index file according to the first transaction log.

In this embodiment, while step 102 is executed or after step 102 is executed, the first server may further send a first transaction log to the second server, and similarly, the second server also updates the index file originally stored in the database according to the SQL statement in the first transaction log, thereby achieving the purpose of data synchronization between the master server and the slave server.

Specifically, referring to fig. 4, fig. 4 is a schematic diagram of an embodiment of interaction between a master server and a slave server in the embodiment of the present invention, in which "master" is a first server and "slave" is a second server, and when an article is entered into an index library in real time, data of the entire index service under the same business is the same. The update operation will be written to the master server of the master room (which may be deployed in the sea) and the backup room slave server X (X > 0) will synchronize the binlog from the slave server 0 of the local room. While the slave server 0 of the backup computer room (the backup computer room can be deployed in a plurality of different regions, such as shenzhen, shanghai and beijing) synchronizes the binlog from the master server of the master computer room. The master server is responsible for writing data, and both the master server and the slave server can provide online query service.

Based on a classical command mode, the master server and the slave server realize interaction by sending commands (commands) to each other, the commands are an abstract concept, the commands can be used for distributing tasks and transmitting data, the processing is completely determined by business, and the framework only defines an actual command, namely a heartbeat detection command (heartbeat command). Any form of service may be implemented by a command handler that defines its own commands and provides responses. The core functions provided by the framework include underlying network communication, maintaining of a master server/slave server relationship, a command distribution function and the like.

The master-slave principle between the master server and the slave server is that the master server writes the update information into binlog and maintains an index of the file to monitor the log cycle. When a slave server connects to the master server, it informs the master server about the last successfully updated location read by the slave server in the binlog. The slave receives any updates that have occurred since then, and blocks and waits for the master to notify of the new updates. Relational database management system (MySQL) replication is based on the master server monitoring all changes (updates and deletions, etc.) to the database in binlog. Therefore, to make a copy, binlog must be enabled on the primary server.

Each slave server receives from the master server a saved update that the master server has recorded to its binlog so that the slave server can perform the same update on its data copy. The slave server may be configured to connect to the master server and wait for an update after copying the data of the master server. If the slave server fails to connect with the master server, or the slave server loses its connection with the master server, the slave server will keep trying connections periodically until it can continue to listen for updates.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a first optional embodiment of the method for updating an index file according to the embodiment of the present invention, after the first server obtains the first transaction log, the method may further include:

the first server acquires a first sequence number corresponding to the first transaction log;

if the first sequence number is not consistent with the second sequence number, the first server generates a memory file according to the first transaction log, wherein the second sequence number is a sequence number corresponding to the second transaction log, and the second transaction log is used for recording index information updated in the last period;

the first server stores the memory file.

In this embodiment, since each update operation is recorded in a binlog, there are a plurality of binlogs in the disk of the server, and the numerical suffix of the binlog file name is its serial number. At intervals (for example, 15 minutes), the memory state in the server is converted into a memory file form by a backup file system (dump) for storage, and the memory file form can be stored in a disk.

Specifically, after the first server acquires the first transaction log, a first sequence number corresponding to the first transaction log may be determined from the first transaction log. Assuming that the first sequence number is 0005, then, it is determined whether the first sequence number is consistent with a second sequence number, where the second sequence number is a sequence number corresponding to a second transaction log, and the second transaction log is a transaction log recorded by the first server in a previous period. If the first serial number is consistent with the second serial number, the memory data (index information) is not updated, and therefore repeated backup is not needed. If the first sequence number is not consistent with the second sequence number, it indicates that the memory data (index information) is updated, and therefore the first server needs to generate a memory file according to the first transaction log and store the memory file on a disk of the first server.

Wherein dump can record the memory data in a storage device at a specific time. The storage is usually for the purpose of preventing errors, and data in a readable format is copied from the primary or secondary storage to an external medium, such as a tape, disk, or printer, so that the entire virtual storage or some portion of the virtual storage can be copied to collect error information.

Secondly, in the embodiment of the present invention, after the first server obtains the first serial number corresponding to the first transaction log, if the first serial number is inconsistent with the second serial number, the memory file is generated according to the first transaction log, and finally the memory file is stored locally in the server. By the method, a new transaction log can be rolled every time data is backed up, so that the same transaction log is prevented from being backed up repeatedly, and the utilization rate of storage resources is improved. In addition, after the transaction log is backed up and the process is restarted, the transaction log in the disk can be loaded, and the operation in the transaction log is applied to be recovered to the state before downtime, so that the reliability of the scheme is greatly improved.

Optionally, on the basis of the embodiment corresponding to fig. 3, in a second optional embodiment of the method for updating an index file according to the embodiment of the present invention, the updating, by the first server, the index file according to the first transaction log may include:

the first server generates N index threads according to the first transaction log, wherein N is an integer greater than 0, and each index thread corresponds to one index type;

and the first server updates the index parameters in the hash table according to the target index information and the N index threads to obtain an index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score.

In this embodiment, a high-performance hash retrieval method is provided, where an index structure in a memory of a server (including a first server and a second server) is stored by using a hash table, and the first server is taken as an example for description herein, which, however, should not be construed as limiting the present invention.

For convenience of understanding, referring to fig. 5, fig. 5 is a schematic diagram of an index structure of a hash table according to an embodiment of the present invention, as shown in the drawing, the hash table includes X buckets (buckets), the number of the buckets may be 1.5 times to 2 times of the estimated index number, the bucket represents a storage unit capable of storing one or more records, and generally, one bucket may be a disk block. When a record is inserted, the bucket has insufficient space and bucket overflow occurs, and the system will provide an overflow bucket and if the overflow bucket is full, the system will continue to provide the next overflow bucket. In so doing, all overflow buckets for a given bucket are linked together using a linked list.

It is assumed that a plurality of pieces of updated index information are recorded in the first transaction log, and at least one index thread can be generated according to the index information, please refer to table 1, where table 1 is the index information and the corresponding index thread in the first transaction log, where the index information takes a title of an article as an example, and in practical applications, the index information may further include data such as a summary, an author, and a keyword of the article.

TABLE 1

Index information	Index type	Indexing threads
			'Liu Bei with Mingshi hidden and fast Shang Wang' people "	Topic index	Thread	1
"what girls had the ability to get C bit" creation 101 "	Topic index	Thread 1
			'Happy big book Yingliu star garden spot'	Account index	Thread	2
Vegetable oil with carcinogenic property and healthier coconut oil "	Account index	Thread					2
			"when the logistics industry meets artificial intelligence"	Account index	Thread	2

Here, "thread 1" may classify articles according to different topics, and "thread 2" may classify articles according to different accounts. The server adds a first transaction log corresponding to the target index information into the N index threads according to the target index information (i.e., the index information that needs to be recorded at this time), the same index thread is put into the same bucket, and at least one index tag may be hung on one bucket, because when hash calculation is performed on different index tags, there may be a case where different index tags correspond to the same identifier, for example, a "sports" tag is 1, and an "entertainment" tag is also 1, and then both the "sports" and "entertainment" index tags correspond to the same bucket. Each index tag also corresponds to a specific vector, and index parameters are stored in the vector, wherein the index parameters include at least one of a text message identifier, a text message timestamp, and a text message score, and specifically, the index parameters may include the following:

and updating the index parameters of each article into the vector, thereby obtaining an updated index file.

Secondly, in the embodiment of the present invention, the process of updating the index file by the server according to the transaction log may be that, firstly, the server generates N index threads according to the transaction log, and then updates the index parameters in the hash table according to the N index threads according to the target index information, so as to obtain the index file. By the mode, the index structure in the memory of the whole server is stored by the Hash table, and the article list under the index is stored by the array of the continuous memory, so that the computing resource of the server processor can be better utilized, and the query performance is improved.

Optionally, on the basis of the second embodiment corresponding to fig. 3, in a third optional embodiment of the method for updating an index file according to the embodiment of the present invention, the N index threads include a thread of a tag index, where the tag index represents an index related to a keyword;

before the first server updates the index file according to the first transaction log, the method may further include:

the first server processes the M pieces of text information through a label scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer greater than 0, the label scoring model is used for representing a functional relation between the text information and the text information scores, and the M pieces of text information belong to target index information;

if M is larger than 1, the first server sorts the M text messages according to the sequence of the text message scores corresponding to the text messages from high to low to obtain a tag index sequence, wherein the tag index sequence is used for indicating the client to display the M text messages;

the first server updates the index parameters in the hash table according to the target index information and according to the N index threads, and the updating may include:

the first server acquires text information identifiers, text information timestamps and text information scores of the M pieces of text information according to the target index information;

the first server identifies the text information, the text information time stamp and the text information scores according to the text information of the M pieces of text information. And updating the index parameter in the hash table by the thread of the label index.

In this embodiment, different index threads correspond to different index contents, please refer to fig. 6, where fig. 6 is a schematic diagram of an index layer architecture in the embodiment of the present invention, and as shown in the figure, the index threads may respectively correspond to four index types, specifically, a tag index, a topic index, an account index, and a similar text index. As shown in the figure, index data corresponding to each type of index is stored on second servers of different regions (such as Shenzhen, shanghai and Beijing), so that synchronization of the index data is guaranteed. And the index agent is deployed in the first server, and the index agent writes the index data into the second server according to the index type through the index writer.

In the following, a label index is taken as an example, the label index represents an index related to a keyword, a label may be "royal glory" or "artificial intelligence", and the like, and a plurality of text messages, i.e. articles, are usually under the same label index. The server processes the M text messages by using a tag scoring model to obtain a text message score corresponding to each text message, and then sorts the M text messages according to the order of the text message scores corresponding to each text message from high to low to obtain a tag index sequence, please refer to table 2, where table 2 is an indication of the tag index sequence.

TABLE 2

The client may sequentially display M pieces of text information according to the tag index sequence indicated in table 2. When the server updates the index parameters in the hash table, the text information identifier (e.g., article number), the text information timestamp (e.g., publication time of the article), and the text information score (result after the text information is scored by using the label scoring model) of each piece of text information can be updated respectively.

In this embodiment of the present invention, the first server may process the M pieces of text information by using a tag scoring model to obtain a text information score corresponding to each piece of text information, and then sort the M pieces of text information in order of the text information scores from high to low to obtain a tag index sequence. Through the method, the text information can be sequenced according to the scores, and the text information closer to the related labels is exposed in the index, so that the reliability and the practicability of the scheme are improved. The tags can often collectively show the content related to certain keywords, so that the user can browse the text information related to the keywords more quickly.

Optionally, on the basis of the second embodiment corresponding to fig. 3, in a fourth optional embodiment of the method for updating an index file according to the embodiment of the present invention, the N index threads include a thread of a topic index, where the topic index represents an index related to a domain;

the method comprises the steps that a first server processes M pieces of text information through a theme scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer larger than 0, the theme scoring model is used for representing a functional relation between the text information and the text information score, and the M pieces of text information belong to target index information;

if M is larger than 1, the first server sorts the M text messages according to the sequence of the text message scores corresponding to the text messages from high to low to obtain a theme index sequence, wherein the label index sequence is used for indicating the client to display the M text messages;

the first server acquires text information identifications, text information timestamps and text information scores of the M pieces of text information according to the target index information;

the first server identifies the text information, the text information time stamp and the text information scores according to the text information of the M pieces of text information. And updating the index parameters in the hash table by the thread of the subject index.

In this embodiment, a topic index is taken as an example for description, where the topic index represents an index related to a field, a topic may be "sports", "entertainment", or "science and technology", and a plurality of text messages, i.e., articles, are usually found under the same topic index. The server processes the M text messages by using the topic scoring model to obtain a text message score corresponding to each text message, and then sorts the M text messages according to the order of the text message scores corresponding to each text message from high to low to obtain a topic index sequence, please refer to table 3, where table 3 is an illustration of the topic index sequence.

TABLE 3

The client may sequentially display M pieces of text information according to the topic index sequence indicated in table 3. When the server updates the index parameters in the hash table, the text information identifier (e.g., article number), the text information timestamp (e.g., publication time of the article), and the text information score (result after the text information is scored using the topic scoring model) of each text information can be updated respectively.

In this embodiment of the present invention, the first server may process the M pieces of text information by using a topic scoring model to obtain a text information score corresponding to each piece of text information, and then sort the M pieces of text information in order of the text information scores from high to low to obtain a topic index sequence. Through the method, the text information can be sequenced according to the scores, and the text information closer to the related subjects is exposed in the index, so that the reliability and the practicability of the scheme are improved. The user can browse the text information under a certain theme, and different themes correspond to different fields.

Optionally, on the basis of the second embodiment corresponding to fig. 3, in a fifth optional embodiment of the method for updating an index file according to the embodiment of the present invention, the N index threads include a thread of an account index, where the account index represents an index related to a text creator;

the method comprises the steps that a first server processes M text messages through an account number scoring model to obtain a text message score corresponding to each text message, wherein M is an integer larger than 0, the account number scoring model is used for representing a functional relation between the text messages and the text message scores, and the M text messages belong to target index information;

if M is larger than 1, the first server sorts the M text messages according to the sequence of the text message scores corresponding to the text messages from high to low to obtain an account index sequence, wherein the account index sequence is used for indicating a client to display the M text messages;

the first server updates the index parameters in the hash table according to the target index information and the N index threads, and the method comprises the following steps:

the first server identifies the text information, the text information time stamp and the text information scores according to the text information of the M pieces of text information. And updating the index parameters in the hash table in the thread of account index.

In this embodiment, an account index is taken as an example for introduction, the account index represents an index related to a field, the account may be "microblog laugh chart", "Tencent entertainment", or "people network", and the like, and a plurality of text information, that is, articles, generally exist under the same account index. The server processes the M text messages by adopting an account number scoring model to obtain a text message score corresponding to each text message, and then sorts the M text messages according to the sequence from high to low of the text message score corresponding to each text message to obtain an account number index sequence, please refer to table 4, wherein table 4 is an indication of the account number index sequence.

TABLE 4

The client may sequentially display M pieces of text information according to the account index sequence indicated in table 3. When the server updates the index parameters in the hash table, the text information identifier (e.g., article number), the text information timestamp (e.g., publication time of the article), and the text information score (result after the text information is scored by using the account scoring model) of each piece of text information can be updated respectively.

In this embodiment of the present invention, the first server may process the M pieces of text information by using an account scoring model to obtain a text information score corresponding to each piece of text information, and then sequence the M pieces of text information in order of the text information scores from high to low to obtain an account index sequence. Through the method, the text information can be sequenced according to the scores, and the text information closer to the related account is exposed in the index, so that the reliability and the practicability of the scheme are improved. If the user is particularly interested in articles written by a certain author, the article with the strongest relevance can be selected from the articles under the author account.

Optionally, on the basis of the second embodiment corresponding to fig. 3, in a sixth optional embodiment of the method for updating an index file according to the embodiment of the present invention, the N index threads include threads of similar text indexes, where the similar text indexes represent indexes with association between texts;

the first server processes the M pieces of text information through a similarity scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer larger than 0, the similarity scoring model is used for representing a functional relation between the text information and the text information scores, and the M pieces of text information belong to target index information;

if M is larger than 1, the first server sorts the M text messages according to the sequence of the text message scores corresponding to the text messages from high to low to obtain a similarity index sequence, wherein the similarity index sequence is used for indicating a client to display the M text messages;

the first server identifies, timestamps and scores according to the text information of the M pieces of text information. And updating the index parameters in the hash table in the thread of the similar text index.

In this embodiment, similar text indexes are taken as an example for introduction, the similar text indexes represent indexes with relevance between texts, different text information is a similar text information list calculated through Collaborative Filtering (cf), and then M text information in the similar text information list is scored through a similarity scoring model to obtain a text information score corresponding to each text information. The client can display the M pieces of text information in sequence according to the similarity index sequence. When the server updates the index parameters in the hash table, the text information identifier (e.g., article number), the text information timestamp (e.g., publication time of the article), and the text information score (result after the text information is scored by using the account scoring model) of each text information can be updated respectively.

In brief, cf recommends information of interest to a user by using preferences of groups with mutual experiences and interests, and individuals give a considerable degree of responses (such as scores) to the information through a cooperation mechanism and record the responses to filter the information, so as to help others to filter the information.

cf can filter information that is difficult for machines to automatically analyze content, share experience with others, avoid incompleteness or inaccuracy of content analysis, and can filter based on some complex, difficult to express concepts such as information quality and personal taste. cf has the ability to recommend new information. It is possible to find completely dissimilar information on the content, which the user has previously not anticipated for the content of the recommended information. It is possible to discover interest preferences that are potential to the user but have not been discovered by the user. The automation degree of the cf is high, feedback information of other similar users can be effectively utilized, and the speed of personalized learning is increased.

In this embodiment of the present invention, the first server may process the M pieces of text information by using a similarity scoring model to obtain a text information score corresponding to each piece of text information, and then sequence the M pieces of text information in order of the text information scores from high to low to obtain a similarity index sequence. Through the method, the text information can be sequenced according to the scores, and the text information with higher similarity is exposed in the index, so that the reliability and the practicability of the scheme are improved. The user can browse the text information with strong relevance.

Optionally, on the basis of the second embodiment corresponding to fig. 3, in a seventh optional embodiment of the method for updating an index file according to the embodiment of the present invention, when the first server updates the index parameter in the hash table according to the target index information and according to the N index processes, the method may further include:

the first server receives an index query instruction;

and the first server acquires the index parameters of the index information to be inquired and/or the index information to be inquired from the hash table according to the index inquiry instruction.

In this embodiment, while the first server updates the index parameters in the hash table according to the target index information, an index query instruction triggered by a user may be received, and then according to the index query instruction, the index parameters of the index information to be queried, or the index parameters of the index information to be queried and the index information to be queried are read from the hash table. The hash table bucket is divided into several sections, each section being responsible for update operations by one index thread. And realizing the lock-free resource management through a Hazard pointer tool in the lock-free operation.

In the embodiment of the present invention, the first server updates the index parameters in the hash table according to the N index threads, and simultaneously receives the index query instruction, and then obtains the index parameters of the index information to be queried and/or the index information to be queried from the hash table according to the index query instruction. By the mode, the hash table and the vector adopt the lock-free programming technology, so that the index query can not be blocked by the write operation, and the stability of the online performance can be ensured.

Optionally, on the basis of any one of the first to seventh embodiments corresponding to fig. 3 and fig. 3, in an eighth optional embodiment of the method for updating an index file according to the embodiment of the present invention, after the first server sends the first transaction log to the second server, the method may further include:

and if the first server does not detect the first sequence number, the first server sends an index updating instruction to the second server, so that the second server obtains a third transaction log corresponding to a third sequence number from the first server according to the index updating instruction, and updates the index file according to the third transaction log, wherein the third sequence number is an adjacent sequence number after the first sequence number.

In this embodiment, the second server sends its current first sequence number to the first server, so that the first server knows the status of the synchronization update of the second server, that is, knows which transaction log the second server synchronizes to. If the first transaction log indicated by the first sequence number does not exist, the second server synchronizes the dump state of the first server. At this time, in order to control the egress bandwidth, the number of second servers in the synchronization dump state from the same first server is not more than 3, and the TokenBucket algorithm is used for flow control, so that the egress bandwidth is prevented from being suddenly full of a large number of transaction logs.

For convenience of introduction, please refer to fig. 7, fig. 7 is a schematic diagram of a synchronization index file between a master server and a slave server according to an embodiment of the present invention, as shown in the figure, the slave server and the master server store transaction logs synchronously under normal conditions, the master server and the slave server synchronize

transaction logs

0, 1, and 2, but after the slave server completes synchronizing transaction log 2, the slave server starts synchronizing transaction log 56, but no transaction log 56 occurs in the master server, so that the slave server needs to find a transaction log adjacent to the transaction log 2 in the master server, that is, a transaction log 3, when next synchronizing transaction log, and then the slave server continues synchronizing transaction log 3 until synchronization is completed.

Typically, transaction logs use a file name in the form of "host _ name-relay-bin, nnnnnn", where "host _ name" is the hostname from the server and "nnnnnn" is the sequence number. A continuous transaction log is created with a continuous sequence number, beginning at 000001, from the server monitoring the transaction log currently in use in the index file. The dump file is saved on the hard disk and is not lost when the slave server is shut down. The next time the slave server starts, these dump files are read to determine how many transaction logs it has read from the master server, and the extent to which it has processed its own transaction logs.

Further, in the embodiment of the present invention, after the first server sends the first transaction log to the second server, the update state of the second server is determined according to the first sequence number of the first transaction log, if the first sequence number does not exist, the second server synchronizes the memory file of the first server, and then continues to synchronize the next transaction log of the first transaction log. Through the mode, the main server and the slave server can achieve synchronization of the retrieval data, and consistency of the index data is guaranteed.

Referring to fig. 8, a method for updating an index file according to the present invention will be described below from the perspective of a second server, where an embodiment of the method for updating an index file according to the present invention includes:

201. when the first server detects that the target index information is updated, the second server receives a first transaction log sent by the first server, wherein the first transaction log is used for recording the target index information and triggering the first server to update the index file;

in this embodiment, when the first server, which is a main server, detects that updated target index information exists, the first server may obtain a first transaction log, which is a file for recording the target index information. Specifically, the first transaction log is binlog, which is a file in binary format and is used for recording SQL updated by a user on the database, and the main function of the binlog is to copy the database in a master-slave manner and restore the data in increments.

202. The second server updates the index file according to the first transaction log.

In this embodiment, the first server may update the index file originally stored in the database according to the SQL statements in the first transaction log, and the first server further sends the first transaction log to the second server, and the second server also updates the index file originally stored in the database according to the SQL statements in the first transaction log, thereby achieving the purpose of data synchronization between the master server and the slave server.

In the embodiment of the invention, when the first server detects that the target index information is updated, the second server receives the first transaction log sent by the first server, and then the second server updates the index file according to the first transaction log. Through the mode, when an index task is detected, the updating process of the index file is triggered, the main server serving as the first server can generate the log only related to the updating information, then the log is issued to other slave servers, the changed index information is updated only no matter the main server or the slave servers, and the incremental content is added into the original index file.

Optionally, on the basis of the embodiment corresponding to fig. 8, in a first optional embodiment of the method for updating an index file according to the embodiment of the present invention, after the second server receives the first transaction log sent by the first server, the method may further include:

if the first server does not detect the first serial number, the second server receives an index updating instruction, and acquires a third transaction log corresponding to a third serial number from the first server according to the index updating instruction, wherein the third serial number is an adjacent serial number after the first serial number;

and the second server updates the index file according to the third transaction log.

In this embodiment, the second server sends its current first sequence number to the first server, so that the first server knows the status of the synchronization update of the second server, that is, knows which transaction log the second server synchronizes to. If the first transaction log indicated by the first sequence number does not exist, the second server synchronizes the dump state of the first server. In order to control the egress bandwidth, the number of second servers synchronizing dump states from the same first server is not more than 3, and a TokenBucket algorithm is used for flow control to avoid that the egress bandwidth is suddenly filled by a large number of binlog synchronizations.

Secondly, in the embodiment of the present invention, after the first server sends the first transaction log to the second server, the update state of the second server is determined according to the first sequence number of the first transaction log, if the first sequence number does not exist, the second server synchronizes the memory file of the first server, and then continues to synchronize the next transaction log of the first transaction log. Through the mode, the main server and the slave server can achieve synchronization of the retrieval data, and consistency of the index data is guaranteed.

Optionally, on the basis of the first embodiment corresponding to fig. 8 or fig. 8, in a second optional embodiment of the method for updating an index file according to the embodiment of the present invention, the method may further include:

if the first server fails, the second server selects a target server, wherein the target server belongs to the server which comprises the maximum serial number in the second server;

when the target server detects that the index information is updated, the target server in the second server sends a target transaction log to other servers in the second server so that the other servers update the index file according to the target transaction log, wherein the target transaction log is used for recording the index information;

and the target server in the second server updates the index file according to the target transaction log.

In this embodiment, the first server may monitor a working state, and if it is determined that the first server fails to work normally, a suitable target server may be selected from the second servers to take over the work of the first server, where the target server is a server including a maximum sequence number in the second server, because in practical applications, there may be a certain time delay when the first server and the second server perform index data synchronization, for example, when the first server updates a transaction log with a sequence number of 0010, there is a transaction log in the second server that is also updated to 0010, and there is a transaction log in which a server is only updated to 0009, and in order to obtain more real-time index data, a server that is updated to a latest transaction log is preferably selected. And finally, the selected target server acquires the target transaction log, and when the target server updates the index file, the target transaction log is distributed to other second servers, so that the other second servers also update the index file as the target server does.

For convenience of understanding, please refer to fig. 9, where fig. 9 is a schematic diagram illustrating switching between a master server and a slave server according to an embodiment of the present invention, as shown in the figure, the master server in the shanghai may obtain Internet Protocol (IP) addresses and ports of all networks of an index service through an IndexChecker tool, detect heartbeats of all services every 3 seconds, and consider that the master server is down if the heartbeats cannot be received continuously 20 times at 9 am, 05 min, 26 sec at 29 am in 5 and 29 months in 2018. Therefore, it is necessary to select one server as a master server from the slave servers that operate normally. For example, the "slave server 0" can be selected from the machine room of shenzhen as a new master server, and then the "slave server 0" can communicate with other servers in shanghai and beijing.

The case of switching between the master server and the slave server includes, but is not limited to, the case where the master server goes down, the case where the slave server fails to synchronize, the case where the slave server fails to add an index offline, and the like. The slave server may obtain a new master server to the IndexChecker tool. Where the IndexChecker tool is used to identify potentially redundant table indices. The indices of one or more tables are read, identifying duplicate and potentially redundant indices.

In the embodiment of the present invention, when the first server fails, the second server selects a target server, where the target server belongs to a server including the largest serial number in the second server, and the target server may take over the work of the first server to synchronize data with other second servers. Through the mode, on one hand, the stability of the system can be ensured, and the slave server is selected to continue working when the main server fails, so that the feasibility and operability of the scheme are improved. On the other hand, the server with the largest serial number is preferentially selected as the main server, so that more real-time index data can be obtained, and the real-time updating of the index data is improved.

Referring to fig. 10, fig. 10 is a schematic diagram of an embodiment of a server in an embodiment of the present invention, where the server 30 includes:

an obtaining module 301, configured to obtain a first transaction log when a first server detects that target index information is updated, where the first transaction log is used to record the target index information;

an updating module 302, configured to update an index file according to the first transaction log obtained by the obtaining module 301;

a sending module 303, configured to send the first transaction log obtained by the obtaining module 301 to a second server, so that the second server updates the index file according to the first transaction log.

In this embodiment, when a first server detects that target index information is updated, an obtaining module 301 obtains a first transaction log, where the first transaction log is used to record the target index information, an updating module 302 updates an index file according to the first transaction log obtained by the obtaining module 301, and a sending module 303 sends the first transaction log obtained by the obtaining module 301 to a second server, so that the second server updates the index file according to the first transaction log.

In an embodiment of the present invention, a server is provided, where when a first server detects that target index information is updated, the first server obtains a first transaction log, where the first transaction log is used to record the target index information, and then the first server updates an index file according to the first transaction log, and in addition, the first server needs to send the first transaction log to a second server, and the second server updates the index file according to the first transaction log. Through the mode, when an index task is detected, the updating process of the index file is triggered, the main server serving as the first server can generate the log only related to the updating information, then the log is issued to other slave servers, the changed index information is updated only no matter the main server or the slave servers, and the incremental content is added into the original index file.

Optionally, on the basis of the embodiment corresponding to fig. 10, please refer to fig. 11, in another embodiment of the server 30 provided in the embodiment of the present invention, the server 30 further includes a generating module 304 and a storing module 305;

the obtaining module 301 is further configured to obtain a first sequence number corresponding to the first transaction log after obtaining the transaction log;

the generating module 304 is configured to generate a memory file according to the first transaction log if the first sequence number is inconsistent with a second sequence number, where the second sequence number is a sequence number corresponding to a second transaction log, and the second transaction log is used to record index information updated in a previous period;

the storage module 305 is configured to store the memory file generated by the generation module 304.

Alternatively, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the server 30 provided in the embodiment of the present invention,

the updating module 302 is specifically configured to generate N index threads according to the first transaction log, where N is an integer greater than 0, and each index thread corresponds to an index type;

and the first server updates index parameters in a hash table according to the target index information and the N index threads to obtain the index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score.

Secondly, in the embodiment of the present invention, the process of updating the index file by the server according to the transaction log may be that, firstly, the server generates N index threads according to the transaction log, and then updates the index parameters in the hash table according to the target index information and the N index threads, so as to obtain the index file. By the mode, the index structure in the memory of the whole server is stored by adopting the hash table, and the article list under the index is stored by adopting the array of the continuous memory, so that the computing resource of the server processor can be better utilized, and the query performance is improved.

Optionally, on the basis of the embodiment corresponding to fig. 10, please refer to fig. 12, in another embodiment of the server 30 provided in the embodiment of the present invention, the N index threads include a thread of a tag index, where the tag index represents an index related to a keyword;

the server 30 further comprises a processing module 306 and a sorting module 307;

the processing module 306 is configured to, before the updating module 302 updates the index file according to the first transaction log, process M pieces of text information through a tag scoring model to obtain a text information score corresponding to each piece of text information, where M is an integer greater than 0, the tag scoring model is used to represent a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

the sorting module 307 is configured to, if M is greater than 1, sort the M pieces of text information in an order from high to low according to the text information score corresponding to each piece of text information to obtain a tag index sequence, where the tag index sequence is used to indicate a client to display the M pieces of text information;

the updating module 302 is specifically configured to obtain the text information identifiers, the text information timestamps, and the text information scores of the M pieces of text information according to the target index information;

and according to the text information identification, the text information timestamp and the text information score of the M pieces of text information. And updating the index parameters in the hash table in the thread of the label index.

In this embodiment of the present invention, the first server may process the M pieces of text information by using a tag scoring model to obtain a text information score corresponding to each piece of text information, and then sort the M pieces of text information in order of the text information scores from high to low to obtain a tag index sequence. Through the mode, the text information can be sequenced according to the scores, and the text information closer to the related labels is exposed in the index, so that the reliability and the practicability of the scheme are improved. The tags can often collectively show content related to certain keywords, which facilitates faster browsing of text information related to the keywords by the user.

Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the server 30 provided in the embodiment of the present invention, the N thread threads include a thread of a topic index, where the topic index represents an index related to a domain;

the processing module 306 is further configured to, before the updating module 302 updates the index file according to the first transaction log, process M pieces of text information through a topic scoring model to obtain a text information score corresponding to each piece of text information, where M is an integer greater than 0, the topic scoring model is used to represent a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

the sorting module 307 is further configured to, if M is greater than 1, sort the M pieces of text information in an order from high to low according to the text information score corresponding to each piece of text information to obtain a topic index sequence, where the label index sequence is used to indicate a client to display the M pieces of text information;

the updating module 302 is specifically configured to obtain text information identifiers, text information timestamps, and text information scores of the M pieces of text information according to the target index information;

and according to the text information identification, the text information timestamp and the text information score of the M pieces of text information. Updating the index parameter in the hash table at the thread of the subject index.

In this embodiment of the present invention, the first server may process the M pieces of text information by using a topic scoring model to obtain a text information score corresponding to each piece of text information, and then sort the M pieces of text information in order of the text information scores from high to low to obtain a topic index sequence. Through the method, the text information can be sequenced according to the scores, and the text information closer to the related subjects is exposed in the index, so that the reliability and the practicability of the scheme are improved.

Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the server 30 provided in the embodiment of the present invention, the N index threads include a thread of an account index, where the account index represents an index related to a text creator;

the processing module 306 is further configured to, before the updating module 302 updates the index file according to the first transaction log, process M pieces of text information through an account scoring model to obtain a text information score corresponding to each piece of text information, where M is an integer greater than 0, the account scoring model is used to represent a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

the sorting module 307 is further configured to, if M is greater than 1, sort, by the first server, the M pieces of text information according to a sequence from high to low of a text information score corresponding to each piece of text information to obtain an account index sequence, where the account index sequence is used to instruct a client to display the M pieces of text information;

and according to the text information identification, the text information timestamp and the text information score of the M pieces of text information. And updating the index parameters in the hash table in the thread of the account index.

Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the server 30 provided in the embodiment of the present invention, the N index threads include threads of similar text indexes, where the similar text indexes represent indexes having association between texts;

the processing module 306 is further configured to, before the updating module 302 updates the index file according to the first transaction log, process M pieces of text information through a similarity scoring model to obtain a text information score corresponding to each piece of text information, where M is an integer greater than 0, the similarity scoring model is used to represent a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

the sorting module 307 is further configured to, if M is greater than 1, sort, by the first server, the M pieces of text information according to an order from high to low of a text information score corresponding to each piece of text information to obtain a similarity index sequence, where the similarity index sequence is used to indicate a client to display the M pieces of text information;

and according to the text information identification, the text information timestamp and the text information score of the M pieces of text information. And updating the index parameters in the hash table in the thread of the similar text index.

Optionally, on the basis of the embodiment corresponding to fig. 10, please refer to fig. 13, in another embodiment of the server 30 provided in the embodiment of the present invention, the server 30 further includes a receiving module 308;

the receiving module 308 is configured to receive an index query instruction when the updating module 302 updates the index parameter in the hash table according to the target index information and according to the N index routing processes;

the obtaining module 301 is further configured to obtain, according to the index query instruction received by the receiving module 308, an index parameter of the index information to be queried from the hash table, and/or the index information to be queried.

Optionally, on the basis of the embodiment corresponding to any one of fig. 10 to fig. 13, referring to fig. 14, in another embodiment of the server 30 provided in the embodiment of the present invention, the server 30 further includes a determining module 309;

the receiving module 308 is further configured to receive a first sequence number corresponding to the first transaction log sent by a second server after the sending module 303 sends the first transaction log to the second server;

the determining module 309, configured to determine an update status of the second server according to the first sequence number received by the receiving module 308;

the sending module 303 is further configured to send an index update instruction to the second server if the first server does not detect the first sequence number, so that the second server obtains a third transaction log corresponding to a third sequence number from the first server according to the index update instruction, and updates the index file according to the third transaction log, where the third sequence number is an adjacent sequence number after the first sequence number.

Referring to fig. 15, fig. 15 is a schematic diagram of an embodiment of a server according to the present invention, in which the server 40 includes:

a receiving module 401, configured to receive a first transaction log sent by a first server when the first server detects that target index information is updated, where the first transaction log is used to record the target index information, and is used to trigger the first server to update an index file;

an updating module 402, configured to update the index file according to the first transaction log received by the receiving module 401.

In this embodiment, when a first server detects that target index information is updated, a receiving module 401 receives a first transaction log sent by the first server, where the first transaction log is used to record the target index information and is used to trigger the first server to update an index file, and an updating module 402 updates the index file according to the first transaction log received by the receiving module 401.

Optionally, on the basis of the embodiment corresponding to fig. 15, referring to fig. 16, in another embodiment of the server 40 provided in the embodiment of the present invention, the server 40 further includes a sending module 403;

the sending module 403 is configured to send, after the receiving module 401 receives the first transaction log sent by the first server, a first sequence number corresponding to the first transaction log to the first server, so that the first server determines an update state of the second server;

the receiving module 401 is further configured to receive an index update instruction if the first server does not detect the first sequence number, and obtain a third transaction log corresponding to a third sequence number from the first server according to the index update instruction, where the third sequence number is an adjacent sequence number after the first sequence number;

the updating module 402 is further configured to update the index file according to the third transaction log received by the receiving module 401.

Optionally, on the basis of the embodiment corresponding to fig. 15 or fig. 16, please refer to fig. 17, in another embodiment of the server 40 provided in the embodiment of the present invention, the server further includes a selecting module 404;

the selecting module 404 is configured to select a target server if the first server fails, where the target server belongs to the server including the largest serial number in the second server;

the sending module 403 is further configured to send a target transaction log to other servers in the second server when the target server selected by the selecting module 404 detects that index information is updated, so that the other servers update the index file according to the target transaction log, where the target transaction log is used to record the index information;

the updating module 402 is further configured to update the index file according to the target transaction log.

Fig. 18 is a schematic diagram of a server 500 according to an embodiment of the present invention, where the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and a memory 532, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.

The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, and/or one or more operating systems 541, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 18.

The CPU 522 in the embodiment of the present invention is configured to execute the following steps:

updating an index file according to the first transaction log;

and sending the first transaction log to a second server so that the second server updates the index file according to the first transaction log.

Optionally, the CPU 522 is further configured to perform the following steps:

acquiring a first serial number corresponding to the first transaction log;

if the first sequence number is not consistent with a second sequence number, generating a memory file according to the first transaction log, wherein the second sequence number is a sequence number corresponding to a second transaction log, and the second transaction log is used for recording index information updated in a last period;

and storing the memory file.

Optionally, the CPU 522 is specifically configured to perform the following steps:

generating N index threads according to the first transaction log, wherein N is an integer greater than 0, and each index thread corresponds to one index type;

and updating index parameters in the hash table according to the N cable routing processes according to the target index information to obtain the index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score.

Optionally, the CPU 522 is further configured to perform the following steps:

processing the M pieces of text information through a label scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer greater than 0, the label scoring model is used for representing a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

if M is larger than 1, sequencing the M text messages according to the sequence of the text message scores corresponding to each text message from high to low to obtain a label index sequence, wherein the label index sequence is used for indicating a client to display the M text messages;

the CPU 522 is specifically configured to perform the following steps:

acquiring text information identifications, text information timestamps and text information scores of the M pieces of text information according to the target index information;

processing the M pieces of text information through a theme scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer greater than 0, the theme scoring model is used for representing a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

if M is larger than 1, sequencing the M text messages according to the sequence of the text message scores corresponding to each text message from high to low to obtain a theme index sequence, wherein the label index sequence is used for indicating a client to display the M text messages;

the CPU 522 is specifically configured to perform the following steps:

Optionally, the CPU 522 is further configured to perform the following steps:

processing the M pieces of text information through an account number scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer greater than 0, the account number scoring model is used for representing a functional relationship between the text information and the text information score, and the M pieces of text information belong to the target index information;

if M is larger than 1, sequencing the M text messages according to the sequence of the text message scores corresponding to each text message from high to low to obtain an account index sequence, wherein the account index sequence is used for indicating a client to display the M text messages;

the CPU 522 is specifically configured to perform the following steps:

Optionally, the CPU 522 is further configured to perform the following steps:

processing the M pieces of text information through a similarity scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer greater than 0, the similarity scoring model is used for representing a functional relationship between the text information and the text information scores, and the M pieces of text information belong to the target index information;

if the M is larger than 1, sequencing the M text messages according to the sequence of the text message scores corresponding to each text message from high to low to obtain a similarity index sequence, wherein the similarity index sequence is used for indicating a client to display the M text messages;

the CPU 522 is specifically configured to perform the following steps:

Optionally, the CPU 522 is further configured to perform the following steps:

according to the target index information, when the index parameters in the hash table are updated according to the N cable processes, receiving an index query instruction;

and acquiring the index parameters of the index information to be inquired from the hash table according to the index inquiry instruction, and/or acquiring the index information to be inquired.

Optionally, the CPU 522 is further configured to perform the following steps:

receiving a first sequence number corresponding to the first transaction log sent by the second server;

determining the updating state of the second server according to the first serial number;

if the first sequence number is not detected, sending an index updating instruction to the second server, so that the second server obtains a third transaction log corresponding to a third sequence number from the first server according to the index updating instruction, and updating the index file according to the third transaction log, wherein the third sequence number is an adjacent sequence number after the first sequence number.

Further, the CPU 522 in the embodiment of the present invention is configured to execute the steps of:

when the update of target index information is detected, receiving a first transaction log sent by the first server, wherein the first transaction log is used for recording the target index information and triggering the first server to update an index file;

and updating the index file according to the first transaction log.

Optionally, the CPU 522 is further configured to perform the following steps:

sending a first sequence number corresponding to the first transaction log to the first server so that the first server determines the update state of the second server;

if the first sequence number is not detected, receiving an index updating instruction, and acquiring a third transaction log corresponding to a third sequence number from the first server according to the index updating instruction, wherein the third sequence number is an adjacent sequence number after the first sequence number;

and updating the index file according to the third transaction log.

Optionally, the CPU 522 is further configured to perform the following steps:

if the first server fails, selecting a target server, wherein the target server belongs to the server which comprises the maximum serial number in the second server;

when the target server detects that the index information is updated, sending a target transaction log to other servers in the second server so that the other servers update the index file according to the target transaction log, wherein the target transaction log is used for recording the index information;

and updating the index file according to the target transaction log.

Referring to fig. 19, fig. 19 is a schematic view of an embodiment of an index file updating system according to an embodiment of the present invention, and as shown in the figure, the index file updating system includes a first server 601, a second server 602, and a client 603;

in this embodiment, when a first server 601 detects that target index information is updated, the first server 601 obtains a first transaction log, where the first transaction log is used to record the target index information, the first server 601 updates an index file according to the first transaction log, the first server 601 sends the first transaction log to a second server 602, the second server 602 updates the index file according to the first transaction log, the client 603 sends a retrieval instruction to the second server 602, the second server 602 obtains a retrieval result from the index file according to the retrieval instruction, and the second server 602 sends the retrieval result to the client 603.

In the embodiment of the invention, an index file updating system is provided, when an index task is detected, an index file updating process is triggered, a main server serving as a first server can generate a log only related to updating information, the log is issued to other slave servers, the changed index information is updated only by the main server or the slave servers, and incremental content is added to an original index file. Therefore, when the client initiates a retrieval instruction to the second server, the retrieval reliability and the real-time performance can be improved because the updating frequency of the index file is high.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for updating an index file, comprising:

the updating, by the first server, the index file according to the first transaction log specifically includes: the first server generates N index threads according to the first transaction log, wherein N is an integer greater than 0, and each index thread corresponds to one index type; the first server updates index parameters in a hash table according to the target index information and the N index threads to obtain the index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score;

2. The method of claim 1, wherein after the first server obtains the first transaction log, the method further comprises:

if the first sequence number is not consistent with a second sequence number, the first server generates a memory file according to the first transaction log, wherein the second sequence number is a sequence number corresponding to a second transaction log, and the second transaction log is used for recording index information updated in a last period;

the first server stores the memory file.

3. The method of claim 1, wherein the N index threads comprise a tag indexed thread, wherein the tag index represents an index related to a keyword;

before the first server updates an index file according to the first transaction log, the method further comprises:

the first server processes M text messages through a label scoring model to obtain a text message score corresponding to each text message, wherein M is an integer greater than 0, the label scoring model is used for representing a functional relation between the text messages and the text message scores, and the M text messages belong to the target index information;

if the M is larger than 1, the first server sorts the M text messages according to the order of the text message scores corresponding to each text message from high to low to obtain a tag index sequence, wherein the tag index sequence is used for indicating a client to display the M text messages;

and the first server updates the index parameters in the hash table in the thread of the label index according to the text information identification, the text information timestamp and the text information score of the M pieces of text information.

4. The method of claim 1, wherein the N thread threads comprise threads of a topic index, wherein the topic index represents a domain-related index;

the first server processes M text messages through a topic scoring model to obtain a text message score corresponding to each text message, wherein M is an integer greater than 0, the topic scoring model is used for representing a functional relationship between the text messages and the text message scores, and the M text messages belong to the target index information;

if the M is larger than 1, the first server sorts the M text messages according to the order of the text message scores corresponding to each text message from high to low to obtain a theme index sequence, wherein the theme index sequence is used for indicating a client to display the M text messages;

the first server updates the index parameters in the hash table according to the target index information and the N indexing processes, and the method comprises the following steps:

and the first server updates the index parameters in the hash table in the thread of the theme index according to the text information identifications, the text information timestamps and the text information scores of the M pieces of text information.

5. The method of claim 1, wherein the N index threads comprise an account indexed thread, wherein the account index represents an index associated with a text creator;

the first server processes M text messages through an account number scoring model to obtain a text message score corresponding to each text message, wherein M is an integer greater than 0, the account number scoring model is used for representing a functional relationship between the text messages and the text message scores, and the M text messages belong to the target index information;

if the M is larger than 1, the first server sorts the M text messages according to the sequence of the text message scores corresponding to the text messages from high to low to obtain an account index sequence, wherein the account index sequence is used for indicating a client to display the M text messages;

and the first server updates the index parameters in the hash table in the thread of the account index according to the text information identifications, the text information timestamps and the text information scores of the M pieces of text information.

6. The method of claim 1, wherein the N thread threads comprise threads of similar text indexes, wherein the similar text indexes represent indexes with associations between texts;

the first server processes M pieces of text information through a similarity scoring model to obtain a text information score corresponding to each piece of text information, wherein M is an integer larger than 0, the similarity scoring model is used for representing a functional relation between the text information and the text information score, and the M pieces of text information belong to the target index information;

if the M is larger than 1, the first server sorts the M text messages according to the order of the text message scores corresponding to each text message from high to low to obtain a similarity index sequence, wherein the similarity index sequence is used for indicating a client to display the M text messages;

and the first server updates the index parameters in the hash table in the thread of the similar text index according to the text information identifications, the text information timestamps and the text information scores of the M pieces of text information.

7. The method according to claim 1, wherein when the first server updates the index parameters in the hash table according to the target index information and the N indexing processes, the method further comprises:

the first server receives an index query instruction;

and the first server acquires the index parameters of the index information to be inquired from the hash table according to the index inquiry instruction and/or the index information to be inquired.

8. A method for updating an index file, comprising:

when a first server detects that target index information is updated, a second server receives a first transaction log sent by the first server, wherein the first transaction log is used for recording the target index information and triggering the first server to update an index file;

the updating the index file by the first server comprises: the first server generates N index threads according to the first transaction log, wherein N is an integer greater than 0, and each index thread corresponds to one index type; and the first server updates index parameters in a hash table according to the target index information and the N index threads to obtain the index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score.

9. The method of claim 8, wherein after the second server receives the first transaction log sent by the first server, the method further comprises:

if the first server does not detect the first sequence number, the second server receives an index updating instruction, and acquires a third transaction log corresponding to a third sequence number from the first server according to the index updating instruction, wherein the third sequence number is an adjacent sequence number after the first sequence number;

10. The method according to claim 8 or 9, characterized in that the method further comprises:

if the first server fails, the second server selects a target server, wherein the target server belongs to the server which comprises the largest serial number in the second server;

when the target server detects that index information is updated, the target server in the second server sends a target transaction log to other servers in the second server so that the other servers update the index file according to the target transaction log, wherein the target transaction log is used for recording the index information;

11. A method for updating an index file is applied to an index file updating system, and the index file updating system comprises: the method comprises the following steps:

when the first server detects that target index information is updated, the first server acquires a first transaction log, wherein the first transaction log is used for recording the target index information;

the first server sending the first transaction log to the second server;

the client sends a retrieval instruction to the second server;

and the second server sends the retrieval result to the client.

12. A server, characterized in that the server comprises: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

updating an index file according to the first transaction log specifically comprises: generating N index threads according to the first transaction log, wherein N is an integer greater than 0, and each index thread corresponds to one index type; updating index parameters in a hash table according to the N cable routing processes according to the target index information to obtain the index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score;

13. A server, characterized in that the server comprises: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is configured to execute the program in the memory, and includes the steps of:

when the update of the target index information is detected, receiving a first transaction log sent by a first server, wherein the first transaction log is used for recording the target index information and triggering the first server to update an index file;

updating the index file according to the first transaction log;

the updating the index file by the first server comprises: the first server generates N index threads according to the first transaction log, wherein N is an integer greater than 0, and each index thread corresponds to one index type; the first server updates index parameters in a hash table according to the target index information and the N index threads to obtain the index file, wherein the index parameters comprise at least one of text information identification, text information timestamp and text information score;

14. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 7, or to perform the method of any of claims 8 to 10.