# Central Processing Unit (CPU) Part II

### Range of integers

According to AbbreviationFinder, the way a CPU represents numbers is a design choice that affects the most basic ways the device works. Some of the early digital calculators used an electrical model of the Common Decimal Numbering System (base ten) to represent numbers internally. Some other computers have used more exotic numbering systems like the ternary (base three). Almost all modern CPUs represent numbers in binary form, where each digit is represented by a certain physical quantity of two values, such as a “high” or “low” voltage.

Related to numerical representation are the size and precision of the numbers that a CPU can represent. In the case of a binary CPU, a Bit refers to a significant position in the numbers that a CPU works with. The number of bits (or numerical positions, or digits) that a CPU uses to represent numbers, is often called “word size”, “bit width”, “data path width”, or “precision. of the integer “when dealing strictly with integers (as opposed to floating point numbers).

This number differs between architectures, and often within different units of the same CPU. For example, an 8-bit CPU handles a range of numbers that can be represented by eight binary digits, each digit having two possible values, and in combination the 8 bits having 2 8 or 256 discrete numbers. In effect, the size of the integer sets a hardware limit on the range of integers that the software runs and that the CPU can use directly.

The integer range can also affect the number of memory locations that the CPU can address (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one Octet (8 bits), the maximum amount of memory that the CPU can address is 2 32 octets, or 4 GB. This is a very simple view of the CPU address space, and many modern designs use much more complex addressing methods like paging to locate more memory than their entire range would allow with a flat address space.

Higher levels of the whole number range require more structures to handle the extra digits, and therefore more complexity, size, energy use, and generally cost. Therefore, it is not entirely uncommon, see Microcontrollers 4-bit and 8-bit used in modern applications, even though CPUs with a much higher range (16, 32, 64, and even 128 bits) are available. Simpler microcontrollers are generally cheaper, use less power, and therefore dissipate less heat. All of these can be important design considerations for electronic devices. However, in high end applications, the benefits produced by the additional range, (most often the additional headroom), are more significant and often affect design options.

To gain some of the advantages provided by both lower and higher bit lengths, many CPUs are designed with different bit widths for different units of the device. For example, the IBM System / 370 used a CPU that was mostly 32-bit, but used 128-bit precision within its floating pointunits to facilitate greater accuracy and range of floating point numbers. Many later CPU designs use a similar bit width mix, especially when the processor is designed for general purpose uses where a reasonable balance is required between integer and floating point capability.

#### ILP: Instructional Tubing and Superscalar Architecture

One of the simplest methods used to achieve increased parallelism is to begin the first few steps of reading and decoding the instruction before the previous instruction has finished executing. This is the simplest form of a technique known as Instruction pipelining, and it is used in almost all modern general-purpose CPUs. By dividing the execution path into discrete stages, the pipeline allows more than one instruction to be executed at any one time. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is withdrawn.

However, the pipeline introduces the possibility of a situation where it is necessary to finish the result of the previous operation to complete the next operation; a condition often called a data dependency conflict. To deal with this, extra care must be taken to check for these kinds of conditions, and if this occurs, a portion of the instruction pipeline must be delayed. Naturally, achieving this requires additional circuitry, piped processors are more complex than subscales, but not much. A piped processor can become almost fully scalar, only inhibited by abrupt pipeline stops (an instruction lasting more than one clock cycle in one stage).

A further improvement on the idea of ​​instruction pipelining led to the development of a method that further decreases the idle time of CPU components. Designs that are said to be superscalar include a long instruction pipeline and multiple identical execution units. In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously).

If so, they are dispatched to the available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is capable of simultaneously dispatching to the standby execution units, the more instructions will be completed in a given cycle.

Most of the difficulty in designing a superscalar CPU architecture lies in creating an efficient dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in a way that keeps as many execution units as busy as possible. This requires that the instruction pipeline be filled as often as possible and the need, in superscalar architectures, for significant amounts of CPU cache increases. This also creates techniques to avoid dangers such as Fork Prediction, Speculative Execution, and Out-of-Order Execution., crucial to maintaining high levels of performance.

• Branch prediction attempts to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times the entire pipeline must wait for a conditional instruction to complete.
• Speculative execution often provides modest performance gains by executing the portions of code that may or may not be necessary after a conditional operation completes.
• Out-of-order execution changes to some degree the order in which instructions are executed to reduce delays due to data dependencies.

In the case where a portion of the CPU is superscalar and a portion is not, the non-superscalar portion suffers in performance due to downtime. The original Intel Pentium (P5) had two superscalar ALUs that could each accept one instruction per clock cycle, but its FPU could not accept one instruction per clock cycle. Thus the P5 was superscalar in the part of integers but it was not superscalar of floating point (or [decimal] point) numbers. The successor to Intel’s Pentium architecture , the P6, added superscalar capabilities to its floating point functions, and thus produced a significant increase in the performance of these types of instructions.

Simple tubing and superscalar design increase the ILP of a CPU by allowing a single processor to complete instruction execution at rates exceeding one instruction per cycle (IPC). Most modern CPU designs are at least somewhat superscalar, and in the last decade, almost all general-purpose CPU designs are superscalar. In recent years some of the emphasis in high-ILP computer design has shifted from the CPU hardware to its software interface, or ISA. The very long instruction word (VLIW) strategy causes some ILP to be implicit directly by the software, reducing the amount of work that the CPU must do to give the ILP a significant boost and thus Both reduce the complexity of the design.