According to AbbreviationFinder, Central Processing Unit is known as CPU.
Most CPUs, and indeed most Sequential Logic devices, are synchronous in nature. That is, they are designed and operate based on a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move through the various branches of a CPU’s many circuits, designers can select an appropriate period for the clock signal.
This period should be longer than the amount of time it takes for a signal to move, or propagate, in the worst case. By setting the clock period to a significantly higher value over the worst-case propagation delay, it is possible to design the entire CPU and the way it moves the data around the “edges” of the rising and falling of the clock signal.. This has the advantage of simplifying the CPU significantly, both from a design perspective and from a component quantity perspective. However, this also has the disadvantage that the entire CPU must wait for its slower elements, even though some units of it are much faster. This limitation has been largely offset by various methods of increasing CPU parallelism (see below).
However, only architectural improvements do not solve all the disadvantages of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock speeds in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire drive. This has led to many modern CPUs requiring multiple identical clock signals to be provided to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major problem when the clock speed increases dramatically is the amount of heat that is dissipated by the CPU. The clock signal is constantly changing, causing the switching of many components (change of state) regardless of whether they are being used at the time. In general, a component that is changing state uses more energy than an element in a static state. Therefore, as the clock speed increases, so does the heat dissipation, causing the CPU to require more effective cooling solutions.
One method of dealing with unnecessary component switching is called Clock gating., which involves turning off the clock signal to unnecessary components, effectively disabling them. However, this is often considered difficult to implement and therefore does not see common use outside of very low power designs. Another method of dealing with some of the problems with a global clock signal is its complete removal. While removing the global signal from the clock makes the design process considerably more complex in many ways compared to similar synchronous designs, asynchronous (or no-clock) designs have marked advantages in power consumption and heat dissipation.. Although somewhat rare, entire CPUs have been built without using a global clock signal. Two notable examples of this are the AMULET, which implements the architecture of the ARM, and the MiniMIPS, compatible with the MIPS R3000.
Rather than totally removing the clock signal, some CPU designs allow certain units of the device to be asynchronous, such as using asynchronous ALUs in conjunction with superscalar pipelining to achieve some gains in arithmetic performance. While it is not entirely clear whether fully asynchronous designs can perform at a comparable level or better than their synchronous counterparts, it is evident that they at least excel at the simplest mathematical operations. This, combined with their excellent power consumption and heat dissipation characteristics, makes them well suited for embedded computers.
The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as a subscalar, operates on and executes a single instruction with one or two pieces of data at a time.
This process results in inefficiency inherent in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result, the subscalar CPU is “stuck” on instructions that take more than one clock cycle to complete. Even adding a second execution unit (see below) doesn’t improve performance much. Instead of one path freezing, now two paths are frozen and the number of unused transistors increases. This design, where CPU execution resources can operate with only one instruction at a time, can only possibly achieve scalar performance (one instruction per clock cycle). Nevertheless,
Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that make the CPU behave less linearly and more in parallel. When it comes to parallelism in CPUs, two terms are generally used to classify these design techniques.
- The Parallelism at the level of instruction, Instruction Level Parallelism English (ILP) seeks to increase the rate at which instructions are executed within a CPU, ie, increase utilization of execution resources on the chip
- The level parallelism thread of execution, in English thread level parallelism (TLP), which aims to increase the number of threads (effectively individual programs) that a CPU can execute simultaneously.
Each methodology differs both in the ways in which they are implemented, and in the relative effectiveness they produce in increasing CPU performance for an application.