Low-cost multi-chip high-speed high-bandwidth interconnection structure
Technical Field
The invention discloses a low-cost multi-chip high-speed high-bandwidth interconnection structure, and relates to the technical field of system-level integration of artificial intelligent modules.
Background
Artificial intelligence (ARTIFICIAL INTELLIGENCE), abbreviated AI, is a scientific technology for studying and developing theory, methods, techniques and applications for simulating, extending and expanding human intelligence. With the continuous development of artificial intelligence technology in the fields of machine learning, high-performance computing, image recognition, genetic engineering and the like, the amount of data required to be processed by the artificial intelligence technology is increasing in a burst manner. In certain AI training applications, the bandwidth requirements of a single SoC may exceed a few TBs/s. Conventional chip and system integration methods have failed to meet the needs of artificial intelligence technology. At present, the technology of the silicon-based adapter board (Silicon Interposer) is a traditional technology of artificial intelligence system integration, and compared with known products, namely Tesla A100 of NVIDIA, radeon VII, google chip TPU3.0 and the like.
TW201826403A Methods of forming cowos structures discloses a method for producing a cumulative power COWOS (Chip on Wafer on Substrate). In order to improve the integration level, the SOC and HBM (High Bandwidth Memory) are integrated by adopting a silicon-based adapter plate to form a high-bandwidth computing module. However, silicon-based interposer technology is difficult and costly to manufacture. Also, this integration approach is theoretically limited in its upper limit, which is limited by the data processing capabilities of a single SOC.
The traditional artificial intelligence system integration technology is too dependent on the development of moore's law, designs a super chip with strong computing power by continuously improving the integration level of transistors of a single chip, selects a high-cost silicon-based adapter plate technology, and performs traditional packaging after being integrated with an HBM. However, as moore's law goes to end, the size of the transistor gets closer to the limit and the corresponding fabrication costs get higher. In addition, silicon-based interposer is not naturally suitable for large area integration due to the material properties of silicon.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a low-cost multi-chip high-speed high-bandwidth interconnection structure.
The invention adopts the following technical scheme for solving the technical problems:
the low-cost multi-chip high-speed high-bandwidth interconnection structure comprises a plurality of active chips, an interconnection substrate and flip-chip interconnection bumps, wherein the interconnection substrate is used for realizing metal interconnection and mechanical support among the active chips, the active chips are used for realizing data exchange through an ultra-high-speed interface and jointly bearing artificial intelligence calculation force, and the active chips are connected with the interconnection substrate in a flip-chip welding mode through the flip-chip interconnection bumps.
The plurality of active chips are used for bearing the calculation force of the artificial intelligence system. If the calculation requirement of the system exceeds the data transmission quantity and calculation force of a single active chip, all the active chips can simultaneously acquire external transmission data, and all the active chips realize data exchange through an ultra-high speed interface and jointly bear artificial intelligence calculation force. The main advantage of the parallel computing is that the parallel computing using a plurality of identical active chips is creatively proposed, so that the computational power requirement of a single chip can be reduced, and the design and manufacturing cost of the chip are greatly reduced.
As a preferred scheme, the structures and functions of the active chips are identical, and the active chips are separate small systems which can be independently interconnected with external data or can be integrated according to an array. The number of integrated chips can be flexibly designed according to the calculation force requirement of the system. Different from different system integration of different chips, the array type integration mode greatly reduces the integration difficulty.
Preferably, the ultra-high speed interface for interconnecting the active chips comprises NRZ, PAM4, PAM8 and other high speed interconnection interfaces. The high-speed interface rate of a single channel which is commercially used at present can reach 112Gbps, and the high-speed high-bandwidth interconnection requirement can be met.
The metal interconnection comprises a horizontal interconnection and a vertical interconnection, wherein the horizontal interconnection is mainly provided by a metal wire, the metal interconnection can be manufactured by an exposure and etching method, and the vertical interconnection is manufactured by a laser blind buried hole or mechanical through hole method.
In particular embodiments, the choices available for the flip-chip interconnect bumps include C4 bumps, copper bumps, or other metal bumps.
Preferably, the flip-chip interconnection bumps are provided with a specific bump arrangement mode, wherein each pair of differential signal bumps are not adjacent, and the middle of each pair of differential signal bumps are separated by a pair of ground bumps. The specific bump array can greatly reduce crosstalk between different channels while not affecting the single channel interconnect performance.
Compared with the prior art, the technical scheme has the following technical effects that the multi-chip is integrated by utilizing the low-cost substrate technology, so that the parallel computation of the multi-chip is realized, and the computation power of the artificial intelligent system is greatly improved. So that the improvement of the system performance is not solely dependent on the improvement of the computing power of a single SOC. Under the condition of the same calculation power, compared with a single SOC chip architecture, the unique multi-chip parallel architecture can greatly reduce the design difficulty and the production cost.
Drawings
FIG. 1 is a cross-sectional view of a package structure in one embodiment of the invention;
FIG. 2 is a top view of a package structure including a plurality of active chips and their interconnections in accordance with one embodiment of the present invention;
FIG. 3 illustrates a single active die and bump arrangement in accordance with an embodiment of the present invention;
FIG. 4 is a graph of simulation results using a three-dimensional full wave electromagnetic field simulation tool in accordance with the present invention;
FIG. 5 is a diagram showing the results of an eye simulation at the receiving end in accordance with the present invention;
the semiconductor device comprises 101-active chips, 102-flip-chip interconnection bumps of the chips, 103-metal interconnection between the active chips, 104-substrate vertical electric interconnection, 105-substrate horizontal electric interconnection, 106-solder balls, 201-signal bumps and 202-power supply bumps.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
Referring to fig. 1-2, the present invention describes in detail a low-cost multi-chip high-speed high-bandwidth interconnection structure suitable for artificial intelligence system integration, which comprises a plurality of active chips, a low-cost interconnection substrate and flip-chip interconnection bumps, wherein the schematic structure is shown in fig. 1, and the top view is shown in fig. 2. The plurality of active chips 101 are each mounted on a low cost interconnect substrate for providing metal interconnection and mechanical support between the individual active chips 101. The low-cost interconnection substrate comprises an organic medium substrate or a low-temperature co-fired ceramic substrate or a high-temperature co-fired ceramic substrate. The metal interconnections 103 between the active chips may be implemented on an interconnection substrate, including a substrate horizontal electrical interconnection 105 in a horizontal direction and a substrate vertical electrical interconnection 104 in a vertical direction. And data exchange is realized among the active chips through an ultra-high speed interface, wherein the ultra-high speed interface comprises NRZ, PAM4, PAM8 and other high-speed interconnection interfaces. The flip-chip interconnect bumps 102 of the chip include C4 bumps, copper bumps, or other metal bumps through which the active chip is connected to the interconnect substrate. The flip-chip interconnect bumps should have a specific arrangement, as shown in fig. 2, where each pair of differential signal bumps are not adjacent, and are separated by a pair of ground bumps. The specific bump array can greatly reduce crosstalk between different channels while not affecting the single channel interconnect performance. The single active chip and its bump arrangement are shown in fig. 3, where signal bumps 201 are spaced apart from power bumps 202.
The invention adopts the three-dimensional full-wave electromagnetic field simulation tool to model the passive interconnection structure, and can accurately compare the transmission characteristics of different structures based on the traditional bump arrangement and the specific bump arrangement. According to the simulation result shown in fig. 4, in the passive structure corresponding to the specific bump array, the crosstalk between different channels is only below-40 dB at the nyquist frequency of 2 times, which is far better than the isolation of-30 dB based on the conventional bump array. Simulation results demonstrate that a particular bump arrangement facilitates interconnection between active chips.
The invention utilizes a time domain eye diagram simulation tool ADS to cascade a chip active model IBIS AMI model and S parameter models of all passive channels, and simulates an eye diagram result at a receiving end. According to the simulation result shown in fig. 5, when the transmission speed of a single channel is 112Gbps (only one differential pair of transmission data, and when the channel contains multiple differential pairs, the channel transmission data is multiplied), the eye diagram of the receiving end is still clearly open, which proves the possibility of high-speed and high-bandwidth transmission between active chips.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention. The present invention is not limited to the preferred embodiments, and the present invention is described above in any way, but is not limited to the preferred embodiments, and any person skilled in the art will appreciate that the present invention is not limited to the embodiments described above, while the invention has been described with respect to specific examples including presently preferred modes of carrying out the invention, those skilled in the art will appreciate that there are numerous variations and permutations of the above described embodiments that fall within the spirit and scope of the invention as set forth in the appended claims.