CN111931939B

CN111931939B - Single-amplitude quantum computing simulation method

Info

Publication number: CN111931939B
Application number: CN201910394102.9A
Authority: CN
Inventors: 王晶; 窦猛汉
Original assignee: Benyuan Quantum Computing Technology Hefei Co ltd
Current assignee: Benyuan Quantum Computing Technology Hefei Co ltd
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2024-02-09
Anticipated expiration: 2039-05-13
Also published as: CN111931939A

Abstract

The invention discloses a single-amplitude quantum computing simulation method, which comprises the following steps: obtaining a target quantum program for each computing node of the distributed cluster; constructing an undirected graph corresponding to the target quantum program; wherein, the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the quantum logic gate is operated, and one edge of the undirected graph corresponds to one tensor; obtaining a quantum state corresponding to a target single amplitude to be measured, and calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and by matching with the GPU corresponding to the calculation node; wherein the sub-amplitude is the amplitude corresponding to the undirected graph; and returning the sub-amplitudes to the main nodes of the distributed clusters, so that the main nodes reduce the sub-amplitudes to obtain the amplitudes of the quantum states, and the amplitudes are used as target single amplitudes. With the embodiments of the present invention, quantum computing simulations involving 50 or even more qubits can be achieved.

Description

Single-amplitude quantum computing simulation method

Technical Field

The invention belongs to the technical field of quantum computing, and particularly relates to a single-amplitude quantum computing simulation method.

Background

The quantum computer is a kind of physical device which performs high-speed mathematical and logical operation, stores and processes quantum information according to the law of quantum mechanics. When a device processes and calculates quantum information and operates on a quantum algorithm, the device is a quantum computer.

Quantum computers can perform a variety of tasks not possible with classical computers, such as quantum simulation and decomposition of large prime factors. On the way quantum computation advances, in order to realize "quantum override", a quantum computer having a quantum bit number of 50 or more and high fidelity is required to be realized first. However, before the real implementation, the quantum computing simulation can be performed through the related theory of quantum computing, so that the software and hardware decoupling of the quantum computer is realized, and a foundation is laid for the development of quantum programs and quantum applications.

The quantum computing simulation is a simulation computation which simulates and follows the law of quantum mechanics by means of numerical computation and computer science, and is taken as a simulation program, and the high-speed computing capability of a computer is utilized to characterize the space-time evolution of the quantum state according to the basic law of quantum bits of the quantum mechanics.

At present, full-amplitude simulation, namely, one-time simulation of all amplitudes of quantum bit end states is generally adopted for quantum computation simulation, but the full-amplitude simulation is calculated based on unitary transformation, and the memory overhead of the full-amplitude simulation increases exponentially with the number of quantum bits. When simulating quantum computation involving 30 qubits, the memory overhead is 16GByte (Ji Bytes); at 40 qubits, the memory overhead would need 16 TBytes (terabytes), i.e. 2 ¹⁰ * (16 GByte); at 50 qubits, the memory overhead would be 16 PBytes (beat bytes), i.e. 2 ¹⁰ * (16 TByte). This is not affordable for common cloud platforms and even super computing platforms that provide quantum computing simulation services, and full-amplitude simulators currently only simulate 49 quantum bits at maximum in academia, which is a simulation result based on the world-largest super computer, but does not provide cloud services to the outside, which is very unfavorable for research and development of quantum programs and quantum applications. In this case, a single amplitude simulation, i.e. a scheme that simulates only one amplitude at a time, is proposed, and this mode would require much less memory. Thus, it can be seen that the related research and implementation of quantum computation simulation of single quantum state component amplitude are particularly important for the development of quantum computation under the condition that the memory resources of the current platform are limited.

Disclosure of Invention

The invention aims to provide a single-amplitude quantum computing simulation method, which aims to solve the defects in the prior art and realize the computing simulation involving 50 or more quantum bits.

The technical scheme adopted by the invention is as follows:

a single amplitude quantum computing simulation method, the method comprising:

obtaining a target quantum program for each computing node of the distributed cluster;

constructing an undirected graph corresponding to the target quantum program; wherein, the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the quantum logic gate is operated, and one edge of the undirected graph corresponds to one tensor;

obtaining a quantum state corresponding to a target single amplitude to be measured, and calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and by matching with the GPU corresponding to the calculation node; wherein the sub-amplitude is the amplitude corresponding to the undirected graph;

and returning the sub-amplitudes to the main nodes of the distributed clusters, so that the main nodes reduce the sub-amplitudes to obtain the amplitudes of the quantum states, and the amplitudes are used as target single amplitudes.

Optionally, the constructing an undirected graph corresponding to the target quantum program includes:

analyzing the target quantum program to obtain a linked list of recorded quantum program information;

traversing the linked list, and creating an edge with tensor order of 1 when the type of the quantum logic gate in the linked list is a first single quantum gate; the side is connected with the last vertex of the vertex chain corresponding to the quantum bit operated by the first single quantum gate, and the unitary matrix of the first single quantum gate is a diagonal matrix;

When the quantum logic gate type in the linked list is a second single quantum gate, creating an edge with tensor order of 2 and a vertex connected with the edge; the side is connected with the last vertex of the vertex chain corresponding to the quantum bit operated by the second single quantum gate, and the unitary matrix of the second single quantum gate is a non-diagonal matrix;

when the type of the quantum logic gate in the linked list is a first double quantum gate, creating an edge with tensor order of 2; the side is connected with the last vertex in the vertex chain corresponding to the two quantum bits operated by the first double-quantum gate, and the unitary matrix of the first double-quantum gate is a diagonal matrix;

when the type of the quantum logic gate in the linked list is a second double-quantum gate, creating an edge with the tensor order of 4 and two vertexes connected with the edge; the side is connected with the last vertex in the vertex chain corresponding to the two quantum bits operated by the second double-quantum gate, and the unitary matrix of the second double-quantum gate is a non-diagonal matrix;

and obtaining an undirected graph corresponding to the target quantum program.

Optionally, the calculating, based on the quantum state and the undirected graph and in cooperation with the GPU corresponding to the calculating node, the sub-amplitude of the quantum state includes:

Invoking GPU corresponding to the computing node, and respectively performing definite value reduction on tensors of edges connected with specific vertexes of the undirected graph; wherein the specific vertex is the first vertex and the last vertex of the vertex chain corresponding to each quantum bit;

deleting the specific vertex;

receiving the value of a target vertex distributed by the main node, splitting a current undirected graph based on the value of the target vertex, and calling the GPU to respectively determine value reduction for tensors of connecting edges of the target vertex aiming at each sub undirected graph obtained by splitting;

aiming at each vertex in the sub undirected graph, combining the GPU to fuse all the connecting edges of the vertex into a new edge, reducing the tensor of the new edge, and deleting the vertex;

integrating tensor values of all new edges after the order reduction to obtain a first sub-amplitude of the quantum state, which corresponds to the sub-undirected graph;

and summing the first sub-amplitudes of all sub-undirected graphs in the quantum state to obtain the sub-amplitudes of the quantum state.

Optionally, the performing a definite value reduction on tensors of the edges connected by the specific vertices of the undirected graph includes:

the GPU corresponding to the computing node sets the thread block number according to the tensor order after the order reduction and the thread number in each thread block in the GPU aiming at the edge connected with each specific vertex;

According to the thread block serial numbers, the thread number in each thread block and the thread serial numbers, calculating first element numbers of the reduced tensors, and calculating two second element numbers of the reduced tensors corresponding to the first element numbers; the number of bits of the element number corresponds to the number of vertex bits connected with the current edge one by one, and the value of each bit of the element number is the vertex value of the corresponding vertex position;

determining a second element number with a value on a number bit corresponding to a specific top point position as a preset determined value from the two second element numbers;

and acquiring a second element value corresponding to the determined second element number, and determining the second element value as a first element value corresponding to the first element number.

Optionally, the receiving the value of the target vertex allocated by the master node, splitting the current undirected graph based on the value of the target vertex, and calling the GPU to respectively determine value reduction for tensors of connecting edges of the target vertex for each sub undirected graph obtained by splitting, including:

receiving one or more values of target vertexes equally divided by a main node; wherein the target vertexes are the first m vertexes with the largest number of connected edges in the current undirected graph, and the m vertexes comprise 2 ^m Seed value, wherein the value of each vertex is 0 or 1, m is a positive integer, and the number of the calculation nodes is 2 ⁿ N is a positive integer, and n is greater than 0 and less than or equal to m;

splitting an undirected graph of the computing node into one or more sub undirected graphs aiming at each vertex value which is evenly divided;

and traversing the edges connected with the target vertexes aiming at each sub undirected graph, and calling the GPU to respectively perform definite value reduction on tensors of the edges connected with the target vertexes.

Optionally, the fusing all connection edges of the vertex with the GPU into a new edge includes:

determining a first edge and a second edge to be fused for all the connecting edges of the vertex;

according to the vertex which is not connected with the first edge in the second edge, calling the GPU to upgrade the first tensor of the first edge, and updating the first tensor by the tensor after upgrade;

deleting the second edge, and connecting the vertex which is not connected with the first edge in the second edge to the first edge to obtain a fused middle edge;

invoking the GPU to calculate tensor elements of the middle edge according to the corresponding relation between the recorded vertex numbers of the first edge and the second edge;

and returning to the execution step of determining the first edge and the second edge to be fused until the calculated tensor element is the tensor element of the last edge, and determining the last edge as a new fused edge.

Optionally, the step-up the first tensor of the first edge includes:

the GPU calculates the tensor order after the step-up according to the order of the first tensor and the step-up;

setting the number of thread blocks according to the tensor order after the ascending and the number of threads in each thread block in the GPU;

according to the thread block serial numbers, the number of threads in each thread block and the thread serial numbers, calculating the first element numbers of tensors after the step-up;

and calculating the element of the tensor after the step-up according to the first element number, the step-up number and the element of the first tensor.

Optionally, the calculating the tensor element of the middle edge includes:

the GPU sets the number of thread blocks according to the updated order of the first tensor and the number of threads in each thread block in the GPU;

according to the thread block serial numbers, the number of threads in each thread block and the thread serial numbers, calculating the first element numbers of tensors of the middle edges;

determining corresponding elements of each element in the first tensor in a second tensor of the second edge according to the corresponding relation;

each element in the first tensor is traversed, and the element is updated with the product of the element and its corresponding element in the second tensor.

Optionally, the reducing the tensor of the new edge includes:

the GPU sets the number of thread blocks according to the tensor order of the new edge after the order is reduced;

and obtaining two second element values corresponding to the two second element numbers one by one, summing the two second element values, and determining the sum as a first element value corresponding to the first element number.

Optionally, the calculation formula of the first element number is:

Idx=block_id*num+thread_id

wherein Idx is the first element number, block_id is the thread block number, num is the number of threads in each thread block, and thread_id is the thread number.

Compared with the prior art, the method can only calculate one target single amplitude of the concerned quantum bit at a time, specifically map the target quantum program onto the undirected graph, split the undirected graph onto a plurality of computing nodes by combining a path integration method, and calculate the corresponding sub undirected graph by each computing node and the subordinate GPU in a matching way. The whole calculation process is mainly based on simple operation of elements in tensors, compared with full-amplitude simulation based on unitary matrix in the prior art, the requirement on memory is greatly reduced, and the calculated amount does not rise along with the quantum bit index, so that quantum calculation simulation involving 50 or more quantum bits can be realized; the GPU has strong performance of executing a large amount of parallel computation, so that the quantum computation simulation efficiency is high. At present, the technical scheme provided by the embodiment of the invention can realize quantum computing simulation involving 196 quantum bits at maximum.

In addition, in practical application, only one or more of the full amplitudes of the qubits are sometimes needed, in which case, if the full amplitude mode in the prior art is adopted, that is, all the amplitudes are simulated at one time, the waste of resources such as a memory and time is undoubtedly caused; by applying the method provided by the embodiment of the invention, one or more times of simulation can be performed in a targeted manner, and one or more single amplitudes are needed to be simulated, so that resources and time are saved greatly.

Drawings

FIG. 1 is a specific example of splitting a quantum wire into different paths corresponding to a quantum program of the present invention;

FIG. 2 is a flow chart of a single-amplitude quantum computing simulation method provided by an embodiment of the invention;

fig. 3 is an undirected schematic illustration of different types of quantum logic gate construction in a single-amplitude quantum computing simulation method provided by an embodiment of the present invention.

Detailed Description

The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

In order to realize computational simulation involving 50 or even more qubits, the embodiment of the invention provides a single-amplitude quantum computational simulation method and device.

The following first describes a single-amplitude quantum computing simulation method provided by the embodiment of the invention.

It will be appreciated by those skilled in the art that each qubit may be at |0 at the same time>And |1>Is a quantum state of one qubitCan be expressed as a|0>+b|1>Wherein a and b are respectively |0>、|1>The amplitudes of (a) are complex. After measurement, the quantum state collapses to a fixed quantum state, where it collapses to |0>The probability of (a) is a ² Collapse to |1>The probability of (b) is b ² ，a ² +b ² =1. And the quantum state of n quantum bits is 2 ⁿ And a superposition of the individual quantum states. For example, a quantum state consisting of 3 qubits +.>Is 2 ³ (i.e., 8) superimposed states of quantum states, wherein the 8 quantum states are each |000>、|001>、|010>、|011>、|100>、|101>、|110>Sum |111>At this time, a quantum state composed of 3 qubits +.>Can be expressed as

。

Wherein each quantum state of the 8 quantum states is called a quantum state component, and each quantum state component has a corresponding amplitude, i.e. c ₀ To c ₇ These complex numbers may be referred to as a single amplitude. Full amplitude simulation, i.e. 2 in which n qubits are simulated at one time ⁿ Amplitude of the individual quantum state components; whereas single amplitude simulation refers to one-time simulation 2 ⁿ Amplitude of any one quantum state component of the individual quantum states.

Currently, with respect to quantum computing simulation, full-amplitude mode is mostly adopted in the industry. However, for the full-amplitude mode, the memory occupied by the full-amplitude mode generally increases exponentially with the number of the simulated qubits, for example, only 16KB of memory is needed for simulating 10 qubits, 16MB of memory is needed for simulating 20 qubits, 16GB is needed for simulating 30 qubits, and up to 16PB of memory is needed for simulating 50 qubits, so that full-amplitude simulation of 50 or more qubits cannot be realized in all computer memories in the world.

In view of this, the embodiment of the present invention provides a single-amplitude quantum computing simulation method, which is applied to computing nodes of a distributed cluster, where all computing nodes correspond to a master node, each computing node is subordinate to (i.e. controls) one or more GPUs, and the number of GPUs controlled by one computing node may be positive integer power of 2, so that it is convenient to process quantum information, because the quantum state number of the quantum bits is an index of 2. The following description will take a GPU controlled by a computing node as an example. Preferably, the distributed clusters may be supercomputer clusters (e.g., optical supercomputer platforms of the Shenwei Taihu lake).

It should be noted that, the quantum program is a series of instruction sequences written by a quantum language and capable of running on a quantum computer, so as to realize the support of quantum logic gate operation and finally realize the simulation of quantum computation. Specifically, the quantum program is a series of instruction sequences for operating the quantum logic gate according to a certain time sequence.

Among these, quantum logic gates are a fundamental, quantum circuit that operates on a small number of qubits (i.e., qubits). It is the basis of quantum circuits, just like the relationship between a conventional logic gate and a typical digital circuit. Quantum logic gates include single quantum logic gates, double quantum logic gates, and multiple quantum logic gates.

While a quantum program may contain hundreds of quantum logic gate operations, or thousands of quantum logic gate operations. The execution process of the quantum program is a process of executing all quantum logic gates according to a certain time sequence. The timing is a time sequence in which each quantum logic gate is executed.

It should be noted that the quantum logic gate is generally represented by a unitary matrix, and the unitary matrix is not only a matrix, but also an operation and transformation. The effect of a general quantum logic gate on a quantum state is calculated by multiplying the unitary matrix by the right vector corresponding to the quantum state.

For example: quantum state |0>The corresponding right vector isAnd quantum state |1>The corresponding right vector is +.>。

The quantum gates can be further divided into diagonal quantum gates and off-diagonal quantum gates according to unitary matrix types. The diagonal quantum logic gate refers to a quantum logic gate with unitary matrix being a diagonal matrix. As is well known, a diagonal matrix refers to a matrix in which all elements other than the main diagonal are 0, and the elements on the diagonal may be 0 or other values.

Such as an identity matrixIs typically a diagonal matrix.

In contrast, matrices having non-zero elements outside the main diagonal, i.e. non-diagonal matrices, e.g.The method comprises the steps of carrying out a first treatment on the surface of the And the unitary matrix is a quantum logic gate of an off-diagonal matrix, i.e., an off-diagonal quantum logic gate.

It will be appreciated by those skilled in the art that it is assumed that the quantum bit is divided in the target quantum programInitial quantum state |0 … 0>And the last state ex-co-involve M ₁ Each quantum state, then, since the state of each qubit can be at |0>And |1>So for M ₁ One of the quantum states is split into two quantum state components: i0>And |1>The initial quantum state to final state component X= |x can be obtained ₀ …x _n-1 >2 of (2) ^M1 The possible transformation paths are calculated, the amplitude of each path is calculated, and the sum is carried out to obtain the final state component, namely the amplitude of the target quantum state component. Wherein M is ₁ Is a positive integer.

For example, the target quantum program involves 2 qubits, respectively: q ₀ 、q ₁ Initial state s ₀ =|00>The target quantum state component is |11>The quantum program contained 2H gates (Hadamard Gate, ada Ma Men): h ₁ 、H ₂ 1 CNOT Gate (Control-not Gate, control NOT Gate).

As shown in FIG. 1a, a simple illustration of the quantum program is given for a quantum wire, it can be seen that the quantum wire divides the initial quantum state |00 >And target quantum state component |11>In addition, 4 quantum states are involved: s is(s) ₀ ¹ 、s ₀ ² 、s ₁ ¹ 、s ₁ ² Wherein each quantum state can be represented as |0>And |1>Is a superposition of (a) to (b). If s is to ₁ ¹ Splitting into |0>And |1>Two parts, then, from the initial quantum state |00>To |11>Can be split into two paths as shown in figures (1 b) and (1 c) to obtain initial quantum state |00>Transformed into |11 via two paths>And then summing the sub-amplitudes to obtain an amplitude value corresponding to the target quantum program, thereby completing the simulation.

It will be appreciated that if M in the target quantum program is to be ₁ Each quantum state is split into |0>And |1>In two parts, then, the initial quantum state to final state component is obtainedThe possible transformation paths are calculated, the amplitude of each path is calculated, and the sum is performedA target single amplitude of the quantum state can be obtained.

In a quantum program with only single quantum logic gates and diagonal double quantum logic gates, the initial quantum state of a given qubit is |0 … 0>When the last state component, namely the target quantum state component, is taken as x= |x ₀ …x _n-1 >When the amplitude is calculated, the formula can be expressed as:

(1)

equation (1) is a basic equation of the quantum mechanical path integration method.

In the formula (1) The functions are all complex functions related to Boolean variables, represent the contribution of quantum logic gates to quantum states, and for better explanation, only three types of +.>One of the functions is +.>The function is omitted; />The value is {0,1}, the corresponding qubit is +.>Is->The components of the quantum states after the action of the quantum logic gates. />The value of the function is mainly related to two factors, namely, the quantum state of the quantum bit operated by the quantum logic gate before and after the quantum logic gate is executed and the unitary matrix of the quantum logic gate.

In particular, the method comprises the steps of,is a variable +.>And->The value of which is determined by the values of two variables and unitary matrix corresponding to the diagonal two-quantum logic gate,/->And->Respectively corresponding to the quantum bit of +.>And->Is not +.>Components of the quantum state before the diagonal double-quantum logic gate acts; />Is a variable +.>The value of which is determined by the value of the variable and the unitary matrix of the corresponding diagonal single quantum logic gate,/v>The corresponding qubit is +.>Is not +.>Components of the quantum state before the diagonal single quantum logic gate acts; />Is a variable +.>And->The value of which is determined by the values of two variables and unitary matrix corresponding to the off-diagonal single quantum logic gate,/- >And->Respectively corresponding to the quantum bit of +.>Is at%>The components of the front and rear quantum states of the off-diagonal single quantum logic gate act. It will be appreciated that->、/>、/>、/>、/>、/>、/>、/>、/>、/>Are all non-negative integers.

The single-amplitude quantum computing simulation method provided by the embodiment of the invention is based on the formula (1), expands to a non-diagonal double-quantum logic gate and maps the non-diagonal double-quantum logic gate into an undirected graph. In particular, the method comprises the steps of,points corresponding to undirected graph->The function corresponds to the edge of the undirected graph, converting the solving equation (1) into a process for the undirected graph, it being understood that, according to +.>The undirected graph can be split; more specifically, firstly, constructing an undirected graph corresponding to a target quantum program on a plurality of computing nodes, then splitting the undirected graph according to different vertex values to obtain different sub undirected graphs, then obtaining the amplitudes of corresponding paths through computing the sub undirected graphs, and finally merging the amplitudes of all paths to obtain the corresponding amplitudes of the target quantum state components.

As shown in fig. 2, the single-amplitude quantum computing simulation method provided by the embodiment of the invention may include the following steps:

s201, aiming at each computing node of the distributed cluster, obtaining a target quantum program;

At the angle of the computing nodes, each computing node can obtain the same target quantum program sent by the main node, the program is preferably written by the existing qrenes language, and can also be written by other feasible quantum languages, the computing node is usually a CPU in a computer, the main node can also be the CPU, the first computing node can be used as the main node, and the user input quantum program can be taken, and the following description is given by taking the computing node as the CPU.

S202, constructing an undirected graph corresponding to the target quantum program; wherein, the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the quantum logic gate is operated, and one edge of the undirected graph corresponds to one tensor;

first, the target quantum program can be parsed to obtain a linked list of recorded quantum program information. And traversing the linked list, sequentially reading the types and unitary matrix forms of all quantum logic gates in the linked list, adding vertexes and edges, and constructing an undirected graph of the quantum program. One specific implementation is as follows:

when the type of the quantum logic gate in the linked list is a first single quantum gate, creating an edge with tensor order of 1; the side is connected with the last vertex of the vertex chain corresponding to the quantum bit operated by the first single quantum gate, and the unitary matrix of the first single quantum gate is a diagonal matrix;

and finishing traversing, and finishing adding the vertexes and the edges to obtain the undirected graph corresponding to the target quantum program.

In the process of constructing the undirected graph, when a vertex is created, the vertex is recorded as to what vertex the current quantum logic gate operates on.

In the constructed undirected graph, each quantum bit corresponds to vertex chain information, and the vertex chain information comprises the values of the vertexes from the first vertex to the last vertex, the edge information of vertex connection and vertex identification. The vertex identification uniquely determines a vertex, and according to the identification, the quantum bit to which the corresponding vertex belongs and the value, the connecting side information and the like can be determined; after the value of the vertex is determined, the vertex value is 0 or 1; however, when the value of the vertex is uncertain, the vertex can be either null or any agreed numerical value or character conforming to the value type of the vertex, such as-1, for judging the value condition of the vertex in the implementation process. The value of the vertex can be represented by tensors, variables or other reasonable data types.

The undirected graph also includes tensor information of the edge, and may include a tensor array and an identification of a vertex to which the corresponding edge of the tensor is connected.

Wherein, the vertex of the undirected graph corresponds to the quantum state component of the operated quantum bit before or after the operation of the quantum logic gate, and the values are {0,1}, which corresponds to the variable in the formula (1)The method comprises the steps of carrying out a first treatment on the surface of the The operated quantum bit is the quantum bit corresponding to the quantum logic gate operation. The edges of the undirected graph correspond to quantum logic gates in the target quantum program, specifically, each edge corresponding to the quantum logic gate corresponds to a tensor, the elements in the tensor are jointly determined by unitary matrixes corresponding to the quantum logic gates and vertex values connected with the corresponding edges, and it can be understood that ++ - >A function.

Tensor (Tensor) is a quantity defined in several linear spaces at the same time, which is a generalization of the vector concept and the matrix concept. Each tensor may employ a subscript notation, e.g. tensor T ₁₂ The number of subscripts isThe order of the quantity (rank) represents the dimension of the tensor. For example, the scalar is a 0 th order tensor, the vector is a 1 st order tensor, and the matrix is a 2 nd order tensor. The shape of the tensor refers to the number of elements in each dimension; the number of elements of the tensor is determined by its shape.

For example, tensor B is known ₁₂₃₄ Subscript "1234", tensor of order 4; wherein if 3 elements are in the dimension indicated by the subscript 1, 4 elements are in the dimension indicated by the subscript 2, 5 elements are in the dimension indicated by the subscript 3, and 4 elements are in the dimension indicated by the subscript 4, then tensor B ₁₂₃₄ Shape= [3,4,5,6 ]]The number of elements is as follows:。

in the embodiment of the invention, the order of the tensor is equal to the number of the vertexes connected with the corresponding side of the tensor, and the subscript of the tensor is the number of the vertexes in the undirected graph. Since each vertex can only take a value of 0 or 1, there are only two possibilities, so for an n-order tensor, its shape shape= [2,2 … 2]The number of elements is 2 ⁿ 。

And the number of each tensor element can be expressed as a binary number, and each bit of the binary number can be expressed as the value of the corresponding vertex.

For example, edge E _m The 4 vertices are connected, so that the corresponding tensor is a 4-order tensor, the elements of which share 2 ⁴ =16, then the element number can be expressed as a binary number: (0000) ₂ ～（1111） ₂ . When edge E _m The 4 connected vertexes are arranged according to a certain sequence to obtain a vertex sequence, when the values of all the four vertexes are O, the combination value of the vertex sequence is 0000, and the (0000) of the corresponding tensor is obtained ₂ A bit element; when edge E _m When the first vertex is 1, the second vertex is 0, the third vertex is 0, and the fourth vertex is 1, the combined value of the vertex sequence is 1001, and the corresponding tensor (1001) ₂ Bit elements.

In practical application, the equivalent logic gate is single quantum logicWhen the gate or the double-quantum logic gate and the corresponding unitary matrix are diagonal matrixes, namely diagonal single-quantum logic gate or diagonal double-quantum logic gate, the quantum logic gate acts on the quantum bit, usually only changes in amplitude are brought, and the quantum state component corresponding to the quantum bit corresponds to the quantum state component in the formula (1)There is typically no change.

Single quantum logic gates of this type, typically Pauli-Z gates (Pauli-Z gates), the unitary matrix of which is

When a qubit is operated on with a bubble-Z gate, the basic state of the qubit, i.e., 0> is left unchanged, and i.1 > is converted to- |1>.

This type of double quantum logic gate, typically a CZ gate, has a unitary matrix of:

when two qubits Q ₀ 、Q ₁ （Q ₀ To control bits, Q ₁ For the target bit) performs a CZ gate operation, when Q ₀ Is |0 in quantum state>When Q is ₁ Is unchanged; when Q is ₀ Is of the quantum state |1>When Q is ₁ Quantum state retention of |0>Unchanged, will be |1>Change to- |1>。

It can be seen that both the bubble-Z gate and the CZ gate bring about only a change in the amplitude of the quantum state component, which is unchanged. Therefore, when the undirected graph is constructed, one quantum logic gate corresponding edge is added to the undirected graph for the quantum logic gates, namely the single quantum logic gate or the double quantum logic gate with unitary matrix being diagonal matrix.

As shown in FIG. 3a, for a diagonal single quantum logic gate, only one edge E is added when constructing an undirected graph ₁ The edge has one end and topV ₀ Are connected. Wherein V is ₀ The current last vertex of the corresponding quantum bit, namely the vertex corresponding to the quantum state directly acted by the quantum logic gate.

It will be appreciated that edge E ₁ Only one vertex is connected, so its tensor Is tensor of 1 order, 2 ¹ =2 elements, the tensor corresponds to +.>And a function representing the contribution of the diagonal single quantum logic gate to the quantum state. Specifically, suppose that unitary matrix of the diagonal single-quantum logic gate +.>Then->Wherein the vertex V ₀ When the value is 0, the corresponding +.>(0) ₂ Bit element, i.e.)>，V ₀ When the value is "1", the tensor (1) is corresponding to the value ₂ Bit element, i.e.)>。

For example, when edge E ₁ Is the corresponding edge of the diagonal single-quantum logic gate Brillouin-Z gate, and the tensor is known according to the unitary matrix of the Brillouin-Z gate。

As shown in fig. 3b, for the diagonal double-quantum logic gate, only one edge E is needed to be added when constructing an undirected graph ₁₂ One end of the edge is connected with the vertex V ₁ One end is connected with the vertex V ₂ Are connected. Wherein V is ₁ And V ₂ The current last vertex corresponding to the two quantum bits of the diagonal two-quantum logic gate operation is respectively, namely the vertex corresponding to the quantum state directly acted by the diagonal two-quantum logic gate.

It will be appreciated that as shown in FIG. 3b, edge E ₁₂ And two vertexes V ₁ And V ₂ Connected so that it tensorFor the tensor of order 2, 2 is taken as a total ² =4 elements, the tensor corresponds to +.>And a function representing the contribution of the diagonal two-quantum logic gate to the quantum state. Specifically, suppose that unitary matrix of the diagonal two-quantum logic gate +. >Then. Wherein when V ₁ V ₂ When the value is '00', corresponding ++>(00) ₂ A bit element; v (V) ₁ V ₂ When the value is "01", the corresponding ∈Reinforcement is given>(01) ₂ A bit element; when V is ₁ V ₂ When the value is 10, the corresponding +.>(10) ₂ A bit element; v (V) ₁ V ₂ When the value is 11, the corresponding +.>(11) ₂ Bit elements. Of course, it is also possible to follow the vertex sequence "V ₂ V ₁ "takes on values corresponding to the elements in the tensor, if in this order, it is understood that +.>. It should be noted that, a value of the vertex sequence uniquely determines an element in the tensor, but a positional relationship between the value of the vertex sequence and a corresponding element in the tensor is not unique, and may be determined according to actual requirements.

For example, when edge E ₁₂ When the corresponding edge of the diagonal double-quantum logic gate CZ gate is known according to unitary matrix。

When the quantum logic gate is a single quantum logic gate and its corresponding unitary matrix is an off-diagonal matrix, i.e., an off-diagonal single quantum logic gate, such as an H gate, its unitary matrix isThrough the operation of the H gate, |0>Will become +.>(|0>+|1>)，|1>Will become +.>(|0>-|1>) Both the amplitude and the quantum state components change. When the undirected graph is constructed, the vertex corresponding to the new quantum state and the corresponding edge of the quantum logic gate after the single quantum logic gate is operated are added to the undirected graph.

As shown in fig. 3c, when constructing an undirected graph for a non-diagonal single quantum logic gate, a vertex V is added ₄ One edge E ₃₄ One end of the edge is connected with the vertex V ₃ One end is connected with the vertex V ₄ Are connected. Wherein V is ₃ The current last vertex corresponding to the quantum bit operated by the off-diagonal single quantum logic gate, namely the vertex corresponding to the quantum state directly acted by the off-diagonal single quantum logic gate; v (V) ₄ And the vertex is a new vertex, and corresponds to a new quantum state after the off-diagonal single quantum logic gate is operated.

It will be appreciated that edge E ₃₄ And two vertexes V ₃ And V ₄ Connected so that it tensorFor the tensor of order 2, 2 is taken as a total ² =4 elements, the tensor corresponds to +.>And a function representing the contribution of the off-diagonal single quantum logic gate to the quantum state. Specifically, suppose that unitary matrix of the diagonal single-quantum logic gate +.>Then->. Wherein when V ₃ V ₄ When the value is '00', the formula is +.>(00) ₂ A bit element; v (V) ₃ V ₄ When the value is "01", the corresponding ∈Reinforcement is given>(01) ₂ A bit element; when V is ₃ V ₄ When the value is 10, the corresponding +.>(10) ₂ A bit element; v (V) ₃ V ₄ When the value is 11, the corresponding +.>(11) ₂ Bit elements.

For example, when edge E ₃₄ When the corresponding edge of the non-diagonal single-quantum logic gate H gate is the corresponding edge, tensor can be known according to unitary matrix 。

When the quantum logic gate is a double-quantum logic gate and the corresponding unitary matrix is an off-diagonal matrix, i.e. an off-diagonal double-quantum logic gate, such as a CNOT gate, the unitary matrix isThrough the operation of the CNOT gate, the quantum state of the control bit is |0>When the quantum state of the controlled bit is unchanged, i.e. the quantum states of the controlled bit and the controlled bit are |00>And |01>After CNOT gate operation, it is still |00>And |01>The method comprises the steps of carrying out a first treatment on the surface of the Control the quantum state of bit to be |1>The quantum state of the controlled bit is then not operated, i.e., 0>Becomes |1>Will be |1>Becomes |0>I.e. the quantum states of the control bit and the controlled bit are |10>Sum |11>After CNOT gate operation, it becomes 11>And |10>. Since two qubits are usually at |00>、|01>、|10>Sum |11>After CNOT gating, it can be seen that both the amplitude and the quantum state components of the two qubits involved will change. For such quantum logic gates, when an undirected graph is constructed, for each quantum bit, one vertex corresponding to a new quantum state of the undirected graph after the operation of the double-quantum logic gate, namely two new vertices, needs to be added to the undirected graph, and a corresponding edge of the double-quantum logic gate needs to be added to the undirected graph.

As shown in the figure (3 d), for the non-diagonal double-quantum logic gate, an edge E is added when constructing an undirected graph ₅₆₇₈ For the sake of clarity, two sides are shown here, but it should be noted that the two sides in the diagram (3 d) are actually the same side E ₅₆₇₈ . Newly added edge one end and vertex V ₅ 、V ₇ One end is connected with the vertex V ₆ 、V ₈ Are connected. Wherein V is ₅ And V ₆ The current last vertex corresponding to the two quantum bits of the off-diagonal double-quantum logic gate operation is V ₇ And V ₈ And the peaks corresponding to the new quantum states of the two quantum bits after the off-diagonal double-quantum logic gate operation are respectively obtained. In particular, when the off-diagonal double quantum logic gate is a control logic gate, V ₅ 、V ₇ Corresponding to control bits, V ₆ 、V ₈ Corresponding to the controlled bits.

It will be appreciated that edge E, as shown in FIG. 3d ₅₆₇₈ And four vertexes V ₅ 、V ₆ 、V ₇ And V ₈ Connected so that it tensorFor 4 th order tensors, 2 is taken as a total ⁴ Let vertex sequence "V =16 elements ₅ V ₆ V ₇ V ₈ "values (" 0000 "to" 1111 ") correspond to +.>Middle (0000) ₂ ～（1111） ₂ Bit elements. Specifically, suppose the unitary matrix of the off-diagonal double-quantum logic gateWherein, when->When (I)>Not all 0, vertex sequence "V ₅ V ₆ V ₇ V ₈ Value and->The correspondence of each element in (a) is shown in table 1:

TABLE 1V ₅ V ₆ V ₇ V ₈ Value and U ₄ Correspondence of elements in (a)

It is known that the number of the components,。

for example, when edge E ₃₄ When the corresponding edge of the non-diagonal single-quantum logic gate CNOT gate is, it is known that the unitary matrix is Thus, tensor +.>。

It will be appreciated by those skilled in the art that any multiple quantum bit gate may be constructed with a single quantum logic gate plus any double quantum logic gate, in most cases the double quantum logic gate selecting the CNOT gate more. CNOT gates and single quantum logic gates are in a sense prototypes of all other gates. Therefore, the method provided by the application is also suitable for quantum programs with multiple quantum bit gates, and in practical application, the multiple quantum bit gates in the quantum programs can be converted into the combination of single quantum logic gates and double quantum logic gates, and then the single-amplitude quantum computing simulation method provided by the invention is applied.

S203, obtaining a quantum state corresponding to a target single amplitude to be measured, and calculating the sub-amplitude of the quantum state based on the quantum state and the undirected graph and by matching with the GPU corresponding to the calculation node; wherein the sub-amplitude is the amplitude corresponding to the undirected graph;

in practical application, when the simulation of the equivalent calculation involves a plurality of quantum bits, the Dirac sign is directly used>It would be very inconvenient to represent each quantum state in combination with a binary representation method. Thus, the decimal numbers corresponding to the binary representation method are commonly used for representing, such as |000 >I.e. zero state, |0100>I.e. 4 state. It will be appreciated that if the target quantum state component is a decimal number, the host node will convert it to a binary string and then send it to each compute node. Each bit in the binary string corresponds to a value of a quantum bit, and the low-order to the high-order quantum bits correspond to the low-order to the high-order quantum bits in sequence, and it is to be noted that, for a quantum computer, the low-order and the high-order arrangement are all arranged from the low-order to the high-order in the right-to-left order, similar to a classical computer. For example, assuming that the target quantum state component is 5-state, it is converted into binary form, i.e. "101", corresponding to 3 qubits (q in order from low order to high order ₀ 、q ₁ 、q ₂ ) Then "101" corresponds to "q", respectively ₂ q ₁ q ₀ ”。

Specifically, based on the quantum state and the undirected graph, and in cooperation with the GPU corresponding to the computation node, computing the sub-amplitude of the quantum state may include:

invoking GPU corresponding to the computing node, and respectively performing definite value reduction on tensors of edges connected with specific vertexes of the undirected graph; wherein the specific vertex is the first vertex and the last vertex of the vertex chain corresponding to each quantum bit; deleting the specific vertex; receiving the value of a target vertex distributed by a main node, splitting a current undirected graph based on the value of the target vertex, and calling the GPU to respectively determine value reduction for tensors of connecting edges of the target vertex aiming at each sub undirected graph obtained by splitting; aiming at each vertex in the sub undirected graph, combining the GPU to fuse all the connecting edges of the vertex into a new edge, reducing the tensor of the new edge, and deleting the vertex; integrating tensor values of all new edges after the order reduction to obtain a first sub-amplitude of the quantum state, which corresponds to the sub-undirected graph; and summing the first sub-amplitudes of all sub-undirected graphs in the computing node to obtain the sub-amplitudes of the quantum state.

Finally, all vertices in each sub undirected graph are deleted, leaving only edges with tensor order 0, with the 0-order tensor being a scalar. The tensor values of the edges are multiplied to obtain a first sub-amplitude of the quantum state in the compute node.

For example, for a sub undirected graph, all vertices and their connecting edges are S ₁₂ 、S ₁₆ 、S ₂₅ 、S ₃ 、S ₄ ：

For vertex 1, S ₁₂ 、S ₁₆ Fused into a new edge S ₁₂₆ For tensor A thereof ₁₂₆ Reducing the order to A ₂₆ The corresponding edge becomes S ₂₆ Deleting the vertex 1;

for vertex 2, the current connecting edge S ₂₅ 、S ₂₆ Fused into a new edge S ₂₅₆ For tensor A thereof ₂₅₆ Reduced to A ₅₆ Corresponding edge S ₅₆ Deleting the vertex 2;

for vertex 3, only edge S is connected ₃ For tensor A thereof ₃ Reduced to a=x ₃ (scalar) the corresponding edge becomes tensorEdge s with order of 0 ₃ Deleting the vertex 3;

for vertex 4, only edge S is connected ₄ For tensor A thereof ₄ Reduced to A ^′ =x ₄ (scalar) edge s for which the corresponding edge becomes tensor order 0 ₄ Deleting the vertex 4;

for vertex 5, only edge S is currently connected ₅₆ Unchanged after fusion, for tensor A ₅₆ Reducing the order to A ₆ Corresponding edge S ₆ Deleting the vertex 5;

for vertex 6, only edge S is currently connected ₆ For tensor A thereof ₆ Reduced to A ^′′ =x ₆ (scalar) edge s for which the corresponding edge becomes tensor order 0 ₆ Vertex 6 is deleted.

Calculating a first sub-amplitude x corresponding to the sub-undirected graph ₃ *x ₄ *x ₆ 。

For another example, for a sub undirected graph, all the current vertices and their connecting edges are S ₁₂₃ 、S ₁₂₄ 、S ₁₅ 、S ₄₆ ：

For vertex 1, S ₁₂₃ 、S ₁₂₄ 、S ₁₅ Fused into a new edge S ₁₂₃₄₅ For tensor A thereof ₁₂₃₄₅ Reducing the order to A ₂₃₄₅ The corresponding edge becomes S ₂₃₄₅ Deleting the vertex 1;

for vertex 2, only edge S is currently connected ₂₃₄₅ For tensor A thereof ₂₃₄₅ Reduced to A ₃₄₅ Corresponding edge S ₃₄₅ Deleting the vertex 2;

for vertex 3, only edge S is connected ₃₄₅ For tensor A thereof ₃₄₅ Reduced to A ₄₅ Corresponding edge S ₄₅ Deleting the vertex 3;

for vertex 4, the current connecting edge S ₄₅ 、S ₄₆ Fused into a new edge S ₄₅₆ For tensor A thereof ₄₅₆ Reduced to A ₅₆ Corresponding edge S ₅₆ Deleting the vertex 4;

for vertex 5, only edge S is currently connected ₅₆ For tensor A thereof ₅₆ Reducing the order to A ₆ Corresponding edge S ₆ Deleting the vertex 5;

for vertex 6, only edge S is currently connected ₆ For tensor A thereof ₆ Reduced to b=y ₆ (scalar) edge s for which the corresponding edge becomes tensor order 0 ₆ Vertex 6 is deleted.

Calculating a first sub-amplitude corresponding to the sub-undirected graph as y ₆ 。

Specifically, in order to reduce the calculation amount of the subsequent undirected graph, the tensor of the edge connected with the specific vertex of the undirected graph is respectively subjected to definite value reduction, which may be:

calculating GPU corresponding to the node, and setting the number of thread blocks according to the tensor order after the order reduction and the number of threads in each thread block in the GPU aiming at the edge connected with each specific vertex; according to the thread block serial numbers, the thread number in each thread block and the thread serial numbers, calculating first element numbers of the reduced tensors, and calculating two second element numbers of the reduced tensors corresponding to the first element numbers; the number of bits of the element number corresponds to the number of vertex bits connected with the current edge one by one, and the value of each bit of the element number is the vertex value of the corresponding vertex position; determining a second element number with a value on a number bit corresponding to a specific top point position as a preset determined value from the two second element numbers; and acquiring a second element value corresponding to the determined second element number, and determining the second element value as a first element value corresponding to the first element number.

Wherein, the calculation formula (the following is the same) of the first element number is:

Idx=block_id*num+thread_id（2）

idx is the first element number, block_id is the thread block number, num is the number of threads in each thread block, and thread_id is the thread number.

The specific implementation mode of determining the value reduction is as follows: firstly, the tensor elements may be complex numbers, and since the GPU does not (the CPU has) have a representation form of complex number x+yi, a new real part tensor space and an imaginary part tensor space need to be applied, wherein the real part tensor space is specified to store x, and the imaginary part tensor space stores y, so as to characterize the tensor.

Obtain the obtainedThe reduced order n sent by the computing node (CPU) is judged to be 2 ⁿ If not, setting 1 thread block if not, otherwise, setting the number of GPU thread blocks to 2 ⁿ And/num, the num value satisfies the power of a non-negative integer of 2. Due to the nature of the computer itself, the thread block sequence number block_id and thread sequence number thread_id tend to start numbering from 0.

Taking GPU as GTX1080ti as an example, num=1024 of GTX1080ti, thread_id=0, 1, 2 … … 1023. For example, connecting edge E to vertex number 2 (vertex 2 for short) ₁₃₂₅₄ Tensor a of 5 th order of (2) ₁₃₂₅₄ Performing definite value reduction, namely, for A ₁₃₂₅₄ The 3 rd order (or 3 rd dimension, hereinafter the same) of the determined value is reduced, wherein the dimensions are numbered from right to left, e.g., vertex 1 is 5 th dimension and vertex 4 is 1 st dimension.

Assuming that the determined value is 0 (or 1, which may be set by itself as needed), it means that the vertex 2 takes a value of 0. A first order reduction operation is typically referred to as a first order reduction, so n=4, 2 ⁿ Less than 1024, the thread block is 1 block, blcok_id=0. Reduced 4 th order tensor A ₁₃₅₄ Only 16 elements are provided, one thread calculates an element number from the thread 0, and only the threads 0 to 15 are called, wherein the first element number calculation can be as follows: idx=0, 1, 2 … ….

For the first element number 10 (binary 1010), two second element numbers before their corresponding reduction are calculated as follows:

will 2 ^3-1 -1=3 (binary 11, reduced to the mth order, then calculate 2 ^m-1 The value of-1, with the aim of splitting the element number) and 10 by bit and, resulting in 2 (binary 10). The bit-wise inversion of 3 (binary 1111 … … 1100 after inversion, the number of binary bits being dependent on the data type, e.g. 64 bits, etc.), and the bit-wise AND operation of 10 results in 8 (1000), and the binary split of 10 is achieved, resulting in the first two and the second two bits of 10. The two second element numbers with the vertex determination values of 0 and 1 corresponding to the subscript 2 before order reduction are respectively: 8 (1000) is shifted left by one bit (10000) and 2 (10) to perform or operation, and 18 (10010) is obtained; the reduced order is 3 rd order, pairs 18 (10010) and 4 (equivalent At 1<<(3-1), i.e., the mth order is reduced, 1 is performed<<The left shift operation of (m-1), m is more than or equal to 1 and less than or equal to n+1) is performed or operated, and 22 (10110) is obtained.

Because the set determination value is 0, that is, the value on the 3 rd bit of the element number corresponding to the 3 rd specific vertex is 0, the bit of the binary element number and the value on the bit are uniformly and correspondingly same with the vertex bit and the vertex value on the vertex point, thereby determining 18 to be the second element number which is required by the first element number 10 to obtain A ₁₃₂₅₄ Element value p of element number 18 of (2) as reduced A ₁₃₅₄ Is the first element value of element number 10. Similarly, A can be obtained by calculation ₁₃₅₄ The first element values corresponding to the rest first element numbers finally obtain the tensor A after the reduction ₁₃₅₄ ={p ₀ ,p ₁ ,p ₂ …p…p ₁₄ ,p ₁₅ }。

Then, the original reduced order pre-tensor A is released ₁₃₂₅₄ The occupied video memory.

Specifically, to reduce the computational complexity, one or more values of the target vertices equally divided by the master node may be received; wherein the target vertexes are the first m vertexes with the largest number of connected edges in the current undirected graph, and the m vertexes comprise 2 ^m Seed value, wherein the value of each vertex is 0 or 1, m is a positive integer, and the number of the calculation nodes is 2 ⁿ N is a positive integer, and n is greater than 0 and less than or equal to m; splitting an undirected graph of the computing node into one or more sub undirected graphs aiming at each vertex value which is evenly divided; and traversing the edges connected with the target vertexes aiming at each sub undirected graph, and calling the GPU to respectively perform definite value reduction on tensors of the edges connected with the target vertexes.

The number of the computing nodes is a preset value, namely n is a preset value, the number of the computing nodes is related to written quantum programs and available computing resources correspondingly configured, and the value m is preset according to requirements. If n=1 and m=1, for the 1 vertices with the largest number of connected edges, the vertex value is originally uncertain, and there are 2 types: 0 or 1. And distributing the 2 value conditions to 2 computing nodes for computing respectively, wherein the vertex value of the undirected graph in one computing node is determined to be 0, so as to obtain one sub undirected graph, the vertex in the undirected graph in the other computing node is determined to be 1, so as to obtain the other sub undirected graph, and splitting the undirected graph is realized.

For another example, n=1, m=2, and there are 4 cases: 00. 01, 10, 11, then 2 computing nodes equally divide 4 values of the top 2 vertices with the largest connecting edges: 00. 01, 10, 11, one computing node is divided into 00, 01, and the other computing node is divided into 10, 11. Equivalently, the undirected graph in one computing node is split into 2 sub undirected graphs, wherein the values of the 2 vertexes in one sub undirected graph are respectively 0 and 0, and the values in the other sub undirected graph are respectively 0 and 1. Similarly, the values of the two vertices in the undirected graph of the other computing node are respectively 1 and 0, and the values of the two vertices in the undirected graph of the other computing node are respectively 1 and 1.

The method is characterized in that the method is used for standing at the angles of all computing nodes, each computing node originally comprises the same undirected graph, so that the undirected graph of each computing node is split into one or more different undirected sub-graphs, each undirected sub-graph can be regarded as a path, first sub-amplitudes corresponding to each path in one computing node are calculated, and all first sub-amplitudes in the interior are summed to obtain sub-amplitudes of a quantum state corresponding to the current node, so that the idea of a path integration method (FeymannPath Integral) is reflected, and the method is not based on a unitary matrix transformation method, because the memory occupation of the sub-undirected graph is exponentially increased.

Taking n=1 and m=2 as an example, the most connected edges in the current undirected graph are vertex 3 and vertex 5. All edges connected to vertex 3 or 5 are assumed to be E ₃₂₄₁ 、E ₃₆₁ 、E ₃₂ 、E ₃₅₄ 、E ₁₂₅ 、E ₄₅ 、E ₅₆ Traversing the edges, calling the GPU subordinate to the computing node with 00 values, respectively carrying out definite value reduction on tensors of the edges in the undirected graph, wherein the principle is the same as that of respectively carrying out definite value reduction on tensors of the edges connected with specific vertexes of the undirected graph, and the method is exemplified as follows again:

for edge E ₃₂₄₁ Corresponding to 4 th order tensor T ₃₂₄₁ In the case of taking the value of 00 for the vertices 3 and 5, since the vertex 5 is not connected to the edge, only T is needed ₃₂₄₁ The 4 th order of the vertex 3 of (2) is subjected to definite value reduction. The GPU still applies for new real part tensor and imaginary part tensor to obtain reduced order n sent by the affiliated computing node (CPU), and judges 2 ⁿ If not, setting 1 thread block if not, otherwise, setting the number of GPU thread blocks to 2 ⁿ /num。

Taking GTX1080ti as an example, num=1024, thread_id=0, 1, 2 … … 1023 of GTX1080 ti. The vertex 3 takes a value of 0, i.e. the determined value is 0. A first order reduction operation is typically referred to as a first order reduction, so n=3, 2 ⁿ Less than 1024, the thread block is 1 block, blcok_id=0. Reduced 3-order tensor T ₂₄₁ Only 8 elements are provided, one thread calculates an element number from the thread 0, and only threads 0 to 7 are called, wherein the first element number is calculated by using the formula (2): idx=0, 1, 2 … … 7.

For the first element number 5 (binary 101), two second element numbers before their corresponding reduction are calculated as follows:

for the 4 th order, 2 ^4-1 -1=7 (binary 111) and 5, yielding 5 (binary 101). And performing bit inversion on 7 (binary 1111 … … after inversion), performing AND operation on the sum 5 to obtain 0, and realizing binary splitting on the sum 5. The numbers of the two second elements with the determined values of 0 and 1 of the vertex 3 corresponding to the subscript 3 before order reduction are respectively as follows: 0 shift left by one bit and 5 to perform or operation to obtain 5 (0101); for 5 (0101) and 8 (1000) (corresponding to 1) <<(4-1), i.e., the mth order is reduced, 1 is performed<<The left shift operation of (m-1), where 1.ltoreq.m.ltoreq.n+1), gives 13 (1101).

Since each binary bit value of the element number represents the value of the corresponding vertex, 0101 represents T ₃₂₄₁ Edge E of (2) ₃₂₄₁ The values of the vertexes 3, 2, 4 and 1 are sequentially 0, 1, 0 and 1, so that the finally required second element number with the first element number 5 as 5 is determined, and T is obtained ₃₂₄₁ The second element value with element number of 5 is used as T after the reduction ₂₄₁ First element value w of element number 5 in ₅ . The same can be calculated to obtain the first element values corresponding to the rest of the first element numbers, and finally the first element values are obtainedReduced tensor T ₂₄₃ ={w ₀ ,w ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ ,w ₆ ,w ₇ }。

Then, the original reduced order pre-tensor T is released ₃₂₄₁ The occupied video memory.

In addition, for edge E ₃₅₄ In the case of the vertex 3, 5 of (3) taking a value of 00, since each reduction is only one, the edge E can be opposite first ₃₅₄ Tensor T of (1) ₃₅₄ The 3 rd order of the falling vertex 3 is T ₅₄ At T ₅₄ The 2 nd order of the falling peak 5 is T ₄ 。

The reduced order calculation of the edges in the other undirected graph and the value condition is the same as the above, and the description is omitted.

Specifically, the fusing all the connection edges of the vertex with the GPU into a new edge includes:

determining a first edge and a second edge to be fused for all the connecting edges of the vertex; according to the vertex which is not connected with the first edge in the second edge, calling the GPU to upgrade the first tensor of the first edge, and updating the first tensor by the tensor after upgrade; deleting the second edge, and connecting the vertex which is not connected with the first edge in the second edge to the first edge to obtain a fused middle edge; invoking the GPU to calculate tensor elements of the middle edge according to the corresponding relation between the recorded vertex numbers of the first edge and the second edge; and returning to the execution step of determining the first edge and the second edge to be fused until the calculated tensor element is the tensor element of the last edge, and determining the last edge as a new fused edge.

In practical applications, the tensor of one edge is generally raised for the fusion of any two edges, so that the edge with the largest order in all the connected edges of the vertex is generally selected as the first edge.

It is understood that the second edge is an unfused edge of the remaining edges except the first edge.

Specifically, one side can be directly selected from the remaining sides as the second side; the connection edges of the vertices may be ordered according to the order of the vertices, but not limited to the order from small to large or from large to small, and then the edge with the largest order is determined as the first edge, and the edge with the largest order in the remaining unfused edges is determined as the second edge.

For example, vertex V _n The total of 4 connected edges are respectively: e (E) _n1 、E _n2 、E _n3 、E _n4 The corresponding orders are respectively: 3. 2, 4 and 2. Firstly, ordering edges according to the order from big to small to obtain: e (E) _n3 、E _n1 、E _n2 、E _n4 (or E) _n3 、E _n1 、E _n4 、E _n2 ) It can be seen that E _n3 The maximum order is determined as the first edge, and E is the three edges which are not fused _n1 The order is the largest and is determined to be the second edge.

And comparing the vertexes of the first edge and the second edge, determining the vertexes which are connected with the second edge but are not connected with the first edge, then executing the ascending operation on the first tensor according to the determined vertexes, and updating the first tensor with the new tensor after ascending.

For example, assume that the vertex number is 2 and the corresponding first edge is E ₁₂ Connecting vertices 1 and 2, the corresponding first tensor is A ₁₂ = {1,2,3,4}; the second side is E ₂₃ Connecting vertices 2 and 3, corresponding to a second tensor B ₂₃ = {5,6,7,8}. Comparing the vertices connected by the two sides, it can be seen that vertex 3 is a vertex connected to the second side and disconnected from the first side. Accordingly, will A ₁₂ Ascending order A ₁₂₃ A is known from the following ₁₂₃ = {1,2,3,4}, then the first tensor corresponding to the first edge is updated to a ₁₂₃ 。

Wherein the ascending computation of the first tensor for the first edge is performed by the GPU, the principle may be as follows:

the GPU performs ascending order on the first tensor of the first edge, and the tensor order t=r+s after ascending order can be calculated according to the order r and the ascending order s of the first tensor, wherein r and s are positive integers; according to the tensor order t after the rising orderAnd the number num of threads in each thread block in the GPU, setting the number of thread blocks, wherein if 2 ^t Setting 1 thread block smaller than num, otherwise setting the number of GPU thread blocks to 2 ⁿ /num; according to the thread block sequence number block_id, the thread number num and the thread sequence number thread_id in each thread block, calculating a first element number idx of the tensor after the step-up, wherein a calculation formula is the same as the formula (2); the element of the tensor after the ascending is calculated from the first element number idx, the ascending order s and the element of the first tensor.

Continuing taking GTX1080ti as an example, receiving from a computing node ascending order instruction information, a first tensor A before ascending order ₁₂ = {1,2,3,4}, which corresponds to element numbers 0, 1,2,3 (corresponding to binary 00, 01, 10, 11), rising first order to a ₁₂₃ R=2, s=1, t=3. Judgment 2 ^t Less than num (1024), configure 1 thread block, block_id=0. As in the previous case, thread_id=0, 1 … … 1023, due to a ₁₂₃ Only 8 elements are needed, and only threads 0-7 are required to be called, and each thread is calculated by a formula (2) to obtain A ₁₂₃ Idx of (a) is 0, 1,2 … … 7 in this order. Assigning a value to the element whose first element number is idx: dst [ idx ]]=src[idx/2 ^s ]A representing the first element number idx ₁₂₃ Element value dst [ idx ]]Equal to element number idx/2 ^s Is a whole-up part of ₁₂ Element value, rounding off refers to removing the integer portion remaining after the decimal point.

From A ₁₂₃ Idx/2 is calculated by idx of (2) ^s =idx/2, 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, in order, obtainable:

dst[0]=src[0]=1，dst[1]=src[0.5]=src[0]=1；

dst[2]=src[1]=2，dst[3]=src[1.5]=src[1]=2；

dst[4]=src[2]=3，dst[5]=src[2.5]=src[2]=3；

dst[6]=src[3]=4，dst[7]=src[3.5]=src[3]=4。

thereby obtaining A ₁₂₃ = {1,2,3, 4}. Similarly, if pair A ₁₂ Raise 2 nd order to A ₁₂₃₄ Can obtain A ₁₂₃₄ ={1，1，1，1，2，2，2，2，3，3，3，3，4，4，4，4}。

Then the first tensor before the rising step is taken as a tensor A of 10 steps _12345678910 For example, = {1,2,3 … … 1024} corresponds to element numbers 0, 1,2 … … 1023 (corresponding to binary 0000000000, 0000000001 … … 1111111111), rising first order to a _{1234567891011} R=10, s=1, t=11. Judgment 2 ^t Greater than num (1024), configuration 2 ^t The number of per num, namely 2 thread blocks, the block_id=0 and 1, the thread number thread_id in each thread block is 0 and 1 … … 1023, and the number of tensor elements after the step-up is 2 ¹¹ The idx obtained by calculating the number 0 thread block is sequentially 0, the idx obtained by calculating the number 1 thread block and the number 2 … … 1023,1 thread block is sequentially 1024 and 1025 … … 2047, namely A _{1234567891011} Idx of (2) is 0 to 2047.

Calculation of idx/2 ^s =idx/2, 0, 0.5, 1, 1.5 … … 1023, 1023.5, in order, yields:

dst[0]=src[0]=1，dst[1]=src[0.5]=src[0]=1；

dst[2]=src[1]=2，dst[3]=src[1.5]=src[1]=2；

……

dst[2046]=src[1023]=1024，dst[2047]=src[1023.5]=src[1023]=1024。

thereby obtaining A _{1234567891011} ={1，1，2，2……1024，1024}。

And after the tensor after the rising step is obtained, releasing the video memory occupied by the original first tensor.

Still take the first edge as E ₁₂ The second side is E ₂₃ For example, the first tensor is updated to A ₁₂₃ After that, delete the second edge E ₂₃ Connecting the vertex 3 which is not connected with the first edge in the second edge to the first edge to obtain a fused intermediate edge E ₁₂₃ . Then according to the corresponding relation between the recorded vertex numbers of the first edge and the second edge, invoking the GPU to calculate the tensor C of the middle edge ₁₂₃ Is an element of (a).

The corresponding relation can be obtained by a first tensor A ₁₂₃ And a second tensor B ₂₃ And obtaining the product. Wherein A is ₁₂₃ The corresponding vertex sequence is "123", B ₂₃ The corresponding vertex sequence is "23". It can be seen that vertex 2 is at A ₁₂₃ Corresponding vertexIn the sequence is the 2 nd bit in B ₂₃ The corresponding vertex sequence is 1 st bit, and vertex number 3 is A ₁₂₃ The corresponding vertex sequence is the 3 rd bit, and the vertex sequence is B ₂₃ The corresponding vertex sequence is the 2 nd bit, and the position corresponding relation is recorded. Specifically, the element structure of the mask array may be stored in an array, for example, in the mask array: struct { vertex at A ₁₂₃ Positions in the corresponding vertex sequence, vertex at B ₂₃ Position in the corresponding vertex sequence }.

Specifically, the steps of calculating tensor elements of the middle edge may be as follows:

the GPU firstly applies for a plurality of groups of arrays, copies the Maskarray in the CPU of the computing node to the GPU and stores the Maskarray in the array;

setting the number of thread blocks according to the updated order q of the first tensor and the number num of threads in each thread block in the GPU, if 2 ^q Setting 1 thread block less than num, otherwise setting 2 ^q A num thread block; according to the thread block sequence number block_id, the thread number num and the thread sequence number thread_id in each thread block, calculating a first element number idx of tensor of the middle edge, wherein a calculation formula is the same as the formula (2);

determining corresponding elements of each element in the first tensor in a second tensor of the second edge according to the corresponding relation stored in the array; each element in the first tensor is traversed, and the element is updated with the product of the element and its corresponding element in the second tensor.

With a first tensor A ₁₂₃ = {1, 2,3, 4} and second tensor B ₂₃ For example = {5,6,7,8}, the corresponding relationship indicates that the vertices 2 are respectively at a ₁₂₃ And B ₂₃ The 2 nd bit and the 1 st bit in the corresponding vertex sequence, the vertex numbers 3 are respectively A ₁₂₃ And B ₂₃ Bits 3 and 2 in the corresponding vertex sequence.

First, the position number of the first tensor element is expressed as a binary number (value corresponding to the vertex "123"): (000) ₂ 、(001) ₂ 、(010) ₂ 、(011) ₂ 、(100) ₂ 、(101) ₂ 、(110) ₂ 、(111) ₂ The position numbers of the elements in the second tensor are expressed as binary numbers (values corresponding to the vertices "23"): (00) ₂ 、(01) ₂ 、(10) ₂ 、(11) ₂ At tensor A according to vertices 2 and 3, respectively ₁₂₃ And B ₂₃ The corresponding position correspondence in the corresponding vertex sequence can determine A ₁₂₃ Each element in B ₂₃ The corresponding elements in (a) are shown in table 1:

table 1A ₁₂₃ Each element in B ₂₃ Corresponding element in (a)

Wherein, in the second row of Table 1, the binary numbers are underlined to clarify B ₂₃ The middle vertex is at A ₁₂₃ The values in the position numbers of each element are more clear for the convenience of explanation and are not meant in any limiting sense.

Each element in the first tensor is traversed, multiplied by the corresponding element in the second tensor, and the product is then used to update the element.

In A way ₁₂₃ And B ₂₃ For example, A is first ₁₂₃ Each element of (a) and B ₂₃ The multiplication of the corresponding elements in (a) can be obtained as the following products in turn: 5. 6, 14, 16, 15, 18, 28, 32, and then updating A with these products ₁₂₃ Obtaining the middle edge E ₁₂₃ Tensor C of (2) ₁₂₃ ={5,6,14,16,15,18,28,32}。

The reduced order computation of tensors for the new edge is performed by the GPU, and the flow may be as follows:

firstly, the GPU sets the number of thread blocks according to the tensor order of the new edge after the order is reduced;

secondly, according to the serial numbers of the thread blocks, the number of threads in each thread block and the serial numbers of the threads, calculating the first element numbers of the reduced tensors, and calculating two second element numbers of the first element numbers corresponding to the reduced tensors; the number of bits of the element number corresponds to the number of vertex bits connected with the current edge one by one, and the value of each bit of the element number is the vertex value of the corresponding vertex position.

It should be noted that, the two steps are consistent with the calculation principle of the step corresponding to the step of determining the value reduction, and are not described herein.

And then, obtaining two second element values corresponding to the two second element numbers one by one, summing the two second element values, and determining the sum as a first element value corresponding to the first element number.

It can be seen that, compared with the calculation principle of determining value reduction, the difference is that determining value reduction requires finding out the second element number L1 with the value of the preset determined value on the number bit corresponding to a certain top point from the two second element numbers L1 and L2, so as to obtain the second element value corresponding to the second element number L1 as the first element value corresponding to the first element number, and the reduction calculation requires adding the two second element values corresponding to the two second element numbers L1 and L2 respectively, so as to obtain the first element value corresponding to the first element number.

Referring to the specific implementation manner and example of the above definite value reduction, assume a 5-order tensor A ₁₃₂₅₄ The rest conditions are consistent for the tensor of a new edge after the final fusion of the vertex 2. For the first element number 10 (binary 1010), two second element numbers 18 (10010), 22 (10110) before their corresponding reduction are calculated. Because the vertex 2 at the 3 rd order has an undetermined value (which can be 0 or 1), the two element numbers 18 and 22 are the corresponding element numbers of 10, and the element values p and p corresponding to the element numbers 18 and 22 are obtained ^′ Adding the obtained sums p ^′′ I.e. the first element value of the first element number 10, and so on, to finally obtain A ₁₃₂₅₄ Reduced tensor A ₁₃₅₄ 。

For another example, assuming that the vertex is vertex 2, a new edge E is obtained by merging all the connected edges ₁₂₃₄ The tensor is A ₁₂₃₄ = {5,5,6,6,14,14,16,16,15,15,18,18,28,28,32,32}. The corresponding vertex sequence of the new edge tensor is '1234', and a new vertex sequence '134' is obtained except the vertex 2, and the value of the new vertex sequence is equal to A ₁₂₃₄ Correspondence of elements in (3)The relationship is shown in Table 2:

table 2 values and A of the new vertex sequence "134 ₁₂₃₄ Correspondence of elements in (3)

Thus, to the target edge E ₁₂₃₄ After the order is reduced based on the vertex 2, deleting the vertex 2 to obtain a new edge E after the order is reduced ₁₃₄ Corresponding tensor A ₁₃₄ ={19,19,22,22,43,43,50,50}。

And S204, returning the sub-amplitudes to the main nodes of the distributed clusters, so that the main nodes reduce the sub-amplitudes to obtain the amplitudes of the quantum states as target single amplitudes.

The reduction is data reduction, namely, the data quantity is reduced to the greatest extent on the premise of keeping the original appearance of the data as much as possible. The main node sums all the sub-amplitudes by reducing the sub-amplitudes calculated by each calculation node to obtain the target single amplitude of the measured quantum state.

The existing CPU microarchitecture is designed for high efficiency of instruction execution, and has strong logic processing (instruction execution) performance and high efficiency, but the number of GPU threads is more (hundreds or thousands), so that the CPU is focused on large-scale concurrent computation, and the numerical computation efficiency is generally about 5 to 10 times higher than that of the CPU. Based on this, in the embodiment of the present invention, the related main calculation tasks include determining the tensor element calculation of the intermediate edge of the value reduction, rising and fusion processes, etc., which are all distributed to the GPUs subordinate to each calculation node CPU for execution, and the CPU mainly executes the logic processing task, and the two tasks cooperate with each other, so that the single-amplitude quantum calculation simulation efficiency is at a higher level.

Therefore, the method can only calculate one target single amplitude of the concerned quantum bit at a time, specifically map the target quantum program onto the undirected graph, split the undirected graph onto a plurality of calculation nodes by combining a path integration method, and calculate the corresponding sub undirected graph by each calculation node and the subordinate GPU in a matching way. The whole calculation process is mainly based on simple operation of elements in tensors, compared with full-amplitude simulation based on unitary matrix in the prior art, the requirement on memory is greatly reduced, and the calculated amount does not rise along with the quantum bit index, so that quantum calculation simulation involving 50 or more quantum bits can be realized; the GPU has stronger performance of executing a large amount of parallel computation, so that the overall quantum computation simulation efficiency is higher. At present, the technical scheme provided by the embodiment of the invention can realize quantum computing simulation involving 196 quantum bits at maximum.

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. A single amplitude quantum computing simulation method, the method comprising:

constructing an undirected graph corresponding to the target quantum program; the vertex of the undirected graph represents the quantum state of the operated quantum bit before or after the quantum logic gate in the target quantum program is operated, and one edge of the undirected graph correspondingly represents one tensor of the quantum logic gate in the target quantum program;

Obtaining a quantum state corresponding to a target single amplitude to be measured, carrying out definite value reduction on tensors of edges of the undirected graph based on the quantum state and the undirected graph and matching with a GPU corresponding to the computing node until all new edges with the tensor order of 0 after reduction are obtained, and calculating the sub-amplitude of the quantum state based on the tensors of the new edges; wherein the sub-amplitude is the amplitude corresponding to the undirected graph;

2. The single amplitude quantum computing simulation method of claim 1, wherein: the constructing the undirected graph corresponding to the target quantum program comprises the following steps:

and obtaining an undirected graph corresponding to the target quantum program.

3. The single amplitude quantum computing simulation method of claim 2, wherein: determining value reduction is carried out on tensors of edges of the undirected graph on the basis of the quantum state and the undirected graph and in cooperation with the GPU corresponding to the computing node until all new edges with the tensor order of 0 after the reduction are obtained, and sub-amplitudes of the quantum state are calculated on the basis of tensors of the new edges, wherein the method comprises the following steps:

deleting the specific vertex;

4. A single amplitude quantum computing simulation method according to claim 3, wherein: and respectively carrying out definite value reduction on tensors of edges connected with specific vertexes of the undirected graph, wherein the method comprises the following steps:

5. The single amplitude quantum computing simulation method of claim 4, wherein: the method includes the steps that the value of a target vertex distributed by a main node is received, a current undirected graph is split based on the value of the target vertex, and the GPU is called for each sub undirected graph obtained through splitting to respectively determine value reduction of tensors of connecting edges of the target vertex, wherein the steps include:

6. A single amplitude quantum computing simulation method according to claim 3, wherein: the step of fusing all the connection edges of the vertex into a new edge by matching with the GPU comprises the following steps:

7. The single amplitude quantum computing simulation method of claim 6, wherein: the step-up the first tensor of the first edge includes:

8. The single amplitude quantum computing simulation method of claim 6, wherein: the computing tensor elements of the intermediate edge includes:

9. The single amplitude quantum computing simulation method of claim 6, wherein: the reducing the tensor of the new edge includes:

10. The single amplitude quantum computing simulation method according to any one of claims 4-5 and 7-9, wherein: the calculation formula of the first element number is as follows:

Idx=block_id*num+thread_id