1.
Introduction
The transistor miniaturization and integration density in today’s VLSI technology is increasing at the rate predicted by Moore’s law and even at higher rates. Today’s multi-processor system on-chip (MPSoC), network on-chip (NoC) and graphics processing unit (GPU) technologies with high level integration of processing elements/cores are offering either server-based or cloud-based massively parallel processing. These processors play an important role in accelerating the computational speed in massively high data involved applications such as artificial intelligence in automobiles, drones and video surveillances. The SoC technology allows the integration of one or more processing cores (processors), embedded memory IPs and input/output (I/O) peripherals. RISC based processors are the backbones of application specific embedded systems. RISC?provides a platform wherein a small-set of instructions are made available for specific tasks so that the execution takes place at much higher speed i.e. even more than millions of instructions per second.
The SoC architectures also include analog and mixed signal interfaces (analog-to-digital and digital-to-analog converters) that provide the interface between analog data acquisition units and digital processors. According to a survey[1], around 80% circuitry in SoCs is digital but 80% of faults occur in analog circuitry. Though the BIST techniques are basically developed for the detection of faults based on voltage-level based testing in digital circuits, these techniques are further developed to incorporate parametric testing for self-testing of analog circuits as well.
Due to sub-micron miniaturization and high integration density of transistors, today’s ICs are becoming more and more susceptible not only to manufacturing defects but also to the environmental disturbances such as single event upset (SEU). The SEU is a natural phenomenon wherein high energy particles like alpha and beta particles may fall onto the ICs and cause malfunctioning of the system due to the induction of temporary faults[2–6]. These faults are harder to detect during testing because these faults may not occur during test. On-line self-test methodologies available in many literatures are capable of detecting such types of temporary faults without system downtime.
This paper is organized as follows: Section 2 outlines the architecture and instruction format of the DLX RISC processor. The embedded processor testing methodologies are presented in Section 3. The proposed concurrent online self-test methodology is presented in Section 4. Section 5 discusses the experimental work presented in this paper. Finally, the concluding remarks are presented in Section 6.
2.
DLX RISC processor architecture
The?DLX?is a?32-bit reduced instruction set computer (RISC)?developed based on load-store and microprocessors without interleaving pipelining system (MIPS) architecture[7]. The DLX RISC processor is the simplest architecture (as shown in Fig. 1) used for academic purposes and is the basic architecture for commercially available RISC processors. The DLX architecture includes a register set of 32 registers each of size 32-bits wide and a 32-bit program counter (PC). The processor is based on a five-stage pipelining architecture. These pipeline stages are instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM) and write back (WB).
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-1.jpg'"
class="figure_img" id="Figure1"/>
Download
Larger image
PowerPoint slide
Figure1.
Architecture of DLX RISC processor.
During the IF stage, a 32-bit instruction will get fetched from the memory. The PC holds the address of the next instruction to be fetched (i) by incrementing PC by 4 in case of sequential execution and (ii) the branch target address predicted by the branch prediction logic. During the ID stage, the instruction decoder decodes the 32-bit instructions into various fields as given in Table 1 and determines the required operands and branching address. During the EXE stage, the arithmetic logic unit (ALU) performs the arithmetic and logical operations on the operands decoded/provided by the instruction decoder. During the MEM stage, the computed results will be written back to the data memory. The result will be written back into register during the WB stage. The MEM and WB cycles can be performed in a single clock cycle and hence the execution of an instruction can be completed in 4 clock cycles.
The ALU performs 32-bit integer and floating point (single and double precision) arithmetic operations and logical operations. All the instructions in DLX processors are 32-bit long and can be divided into the following three classes according to the type of operation: R (register)-type, I (immediate)-type and J (jump)-type. In R-type instructions, three registers (two source registers and one destination) are specified in the instructions. In I-type instructions, one source register and 16-bit immediate operand (sign extended to 32-bit) are used. The J-type instructions consist of 6-bit opcode and 26-bit operand. The destination address is calculated using the 26-bit operand value. Table 1 summarizes the instruction format of the DLX processor.
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-2.jpg'"
class="figure_img" id="Figure2"/>
Download
Larger image
PowerPoint slide
Figure2.
Five stage pipelining of DLX processor.
Instruction type | bits | ||||||
31–26 | 25–21 | 20–16 | 15–11 | 10–0 | |||
R-type | Opcode | R1 | R2 | Rd | Unused | ||
I-type | Opcode | R1 | R2 | Immediate | |||
J-type | Opcode | Operand value |
Table1.
Instruction formats of DLX RISC
Table options
-->
Download as CSV
Instruction type | bits | ||||||
31–26 | 25–21 | 20–16 | 15–11 | 10–0 | |||
R-type | Opcode | R1 | R2 | Rd | Unused | ||
I-type | Opcode | R1 | R2 | Immediate | |||
J-type | Opcode | Operand value |
3.
Overview of embedded processor testing
The digital circuit testing techniques can be broadly classified into external testing and self-testing. The conventional method of testing the manufacturing defects in digital circuits is carried out using automatic test equipment (ATE) hardware. The quality test patterns are generated using algorithm based test pattern generation strategies as available in the related literatures and stored along with their expected responses in the ATE memory.
The hardware based self-testing (also known as built in self-test) of a processor facilitates the generation of test patterns using LFSR, application of the test patterns to the processor under test and the analysis of test responses for their functional correctness without the use of any external circuitry. The processor uses its internal resources such as the processor itself, the register file, instruction set, memory and other test support hardware. The major parameters to be considered while selecting the BIST strategy include: hardware overhead, test data generation and application time, performance degradation especially in critical paths in case of high performance devices and power consumption during self-testing. BIST capability is incorporated in a MIPS processor using a linear feedback shift register (LFSR), built-in logic block observer (BILBO) and concurrent BILBO (CBILBO)[8]. Various power saving techniques like weighted LFSR and dual speed LFSR have been presented to reduce the power consumption in the self-testable MIPS processor. The dynamic partial reconfiguration feature of FPGAs shall be utilized for self-testing of processor cores by dynamically reconfiguring the partial bit-files of the functional-mode processor and BIST-oriented processor onto the dynamic region of the FPGA[9].
The software-based self-testing (SBST)[10–15] provides an alternative solution for the above mentioned limitations of hardware based self-testing methodology. In this methodology, generation and application of test patterns for the processor under test and response analysis are carried out by specially written software routines executed on the processor itself. The self-test routine (software program) and test patterns are stored in the instruction memory and data memory of the processor respectively. A simplified processor model for software based self-testing is shown in Fig. 3.
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-3.jpg'"
class="figure_img" id="Figure3"/>
Download
Larger image
PowerPoint slide
Figure3.
(Color?online) Processor model for software-based self-testing.
The SBST is based on Instruction Set Architecture (ISA) and the Register Transfer Language (RTL) description of the processor, and the test engineer need not have the complete details of gate-level netlist and structural fault model. The processor executes the test programs at its actual speed and hence the SBST is capable of providing at-speed test solutions unlike to hardware-based BIST. However, the SBST is capable of providing at-speed self-test solutions to the processor for functionality verification both at the manufacturing and/or field level; it cannot substitute the structure based test approaches like BIST and hence can be used to supplement the structural based test approaches to provide a more quality test.
Most of the BIST approaches (either hardware or software based) found in the literature are off-line or non-concurrent test approaches. In these approaches either the functionality of the processor is to be suspended or the processor is to be switched into idle mode during test i.e., the test is not carried out concurrently with its functional operation. In the concurrent on-line self-test approach, which is the main contribution of this paper, both the functional and test operations will be carried out simultaneously. Refs. [16, 17] have presented the implementation of the self-checking register file and ALU using Berger code, no literature to the best knowledge of the authors is found in the direction of designing a self-checking processor. Since the Berger code forms the least redundant on-line unidirectional error detecting code, this paper has proposed a methodology to design a self-testable processor by incorporating a self-checking capability using the Berger code for the DLX RISC processor.
4.
Proposed online self-testable methodology
The major contribution for malfunctioning of digital circuits/systems in the field (while on operation) is due to the temporary (dynamic) faults. These faults may be caused by radiation and other hard environmental conditions. The SEU is the radiation-induced errors in microelectronic circuits that may change the behavior of dynamic circuits as well as memory devices. Since these faults are non-recurrent and harder to detect during a test using off-line BIST, on-line self-test methodologies are capable of detecting the temporary faults and are used to improve the reliability of the system. This paper presents the design of the Berger code based totally self-checking checkers (TSC) to detect both permanent stuck-at faults as well as temporary faults in the DLX RISC processor. Fig. 4 shows the generalized architecture for the proposed self-testable processor.
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-4.jpg'"
class="figure_img" id="Figure4"/>
Download
Larger image
PowerPoint slide
Figure4.
Simplified architecture of proposed self-testable DLX processor.
4.1
Totally self-checking checkers using berger code
Among all unidirectional error detecting (AUED) codes, the Berger code?forms the least redundant and separable code[18, 19]. The Berger code is available with two encoding schemes: B0 and B1. In the B0 encoding scheme of the Berger code used in this work, the check bits represent the binary equivalent of the number of zeros in the information bit sequence, I. In the B1 scheme of encoding, the check bits represent the 1’s complement of the number of 1’s in I. The number of check bits (k) for the information sequence of length n bits is evaluated using the inequality
ight)}}
ight
ceil $
Combinational circuit, C1
A Berger code is said to be a maximal length Berger code has if n = (2k ? 1) otherwise it is a non-maximal length Berger code. The combinational circuit C1 as shown in Fig. 4 produces an output which is the binary equivalent of the number of 1’s in the information sequence, I. In order to compute the check bits for the non-maximal length Berger code, we define a number m = I0 mod (k + 1), where I0 is the number of 0’s in the sequence I. The Berger code check bits are the binary equivalent of m and its length is equal to [log2(k + 1)].
Two-rail checker
The two-rail checker as shown in Fig. 5(a), is a 1-out-of-2 code which receives two groups of inputs X = (x1, x2, …xn) and Y = (y1, y2, …yn) from the functional circuit and produces two outputs f and g that are complementary to each other. As long as yi = (xi)| is satisfied, the outputs of the two-rail checker will be f = 0 and g = 1. The totally self-checking two-rail checker can be extended for any arbitrary pairs (xiyi and xi + 1yi + 1) of inputs as given in the structure of Fig. 5(b).
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-5.jpg'"
class="figure_img" id="Figure5"/>
Download
Larger image
PowerPoint slide
Figure5.
(a) Self-checking 2-rail checker. (b) Self-checking 2-rail checker with 6 inputs.
4.2
Berger code predictions for ALU operations
The ALU is the heart of any processor and performs various arithmetic and logical operations. This section presents the predictions of the Berger code for various ALU operations. Consider two n-bit operands A = (an, an–1, …. a2, a1) and B = (bn, bn–1, …. b2, b1). Let Ac and Bc be the Berger code of A and B respectively.
Addition (Y = A + B + cin)
The Berger code of the sum (Yc) is computed as Yc = Ac + Bc ? cin ? Cc + cout, where cin, cout and Cc are input carry, output carry and the Berger code of intermediate carries C = (cn, cn–1, …. c2, c1) respectively.
Subtraction
The Berger code of the difference (Yc) is computed as Yc = Ac – Bc + bin + BIc – bout, where bin, bout and BIc are in borrow, output borrow and the Berger code of intermediate borrows BI = (bin, bin–1, …. bi2, bi1) respectively.
2’s complement subtraction (Y = A ? B = A + B| + 1)
The Berger code of the sum (Yc) is computed as Yc = Ac-Bc ? (cin)| ? N(C) + cout, where cin, cout and N(C) are input carry, output carry and the number of 1’s in the intermediate carries C = (cn, cn–1, …. c2, c1) respectively.
Array Multiplier (Y = AB)
The Berger code of the multiplier output (Yc) is computed as Yc = 4Ac ? 4Bc ? AcBc ? N(C) + 12, where
ight) = sumnolimits_{i = 1}^m {sumnolimits_{j = 1}^{n - 1} {{C_{i,j}}} } $
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-6.jpg'"
class="figure_img" id="Figure6"/>
Download
Larger image
PowerPoint slide
Figure6.
4 × 4 binary array multiplier.
Logical-AND (Y = A.B)
The Berger code of the logical-AND output (Yc) is computed as Yc = Ac + Bc ? Zc where Zc represents the Berger code of (A|B).
Logical-OR (Y = A|B)
The Berger code of the logical-OR output (Yc) is computed as Yc = Ac + Bc ? Zc where Zc represents the Berger code of (A.B).
Logical-Inverter (Y = A|)
The Berger code of the logical-Inverter output (Yc) is computed as Yc = n ? Ac.
Logical-XOR (Y = A^B)
The Berger code of the logical-XOR output (Yc) is computed as Yc = Ac + Bc ? 2Zc + n where Zc represents the Berger code of (A.B).
5.
Experimental results and discussions
The Berger code based totally self-checking checker (TSC) logic is incorporated for various arithmetic and logical operations within the RTL description of the DLX RISC processor and simulated using Xilinx Vivado 2017.2. Fig. 7 demonstrates the simulation waveform of the DLX processor. The 32-bit instruction Data_in = 0x04031040 is decoded as opcode = 01 (ADD operation), [RA] = 0x01, [RB] = 0x02. The result of the ADD operation is [RD] = 0x03. Similarly, Data_in = 0x0C062900 is decoded as opcode = 03 (Logical-OR operation), [RA] = 0x04, [RB] = 0x05. The result of the logical-OR operation is [RD] = 0x05.
To demonstrate the self-checking capability of the implemented processor, faults are injected in the design during simulation. The Stuck-at 1 (SA1) fault is injected in the cin line by forcing cin = 1, which produced the output result as [RD] = 0x04 as demonstrated in Fig. 8, instead of [RD] = 0x03 for the same instruction Data_in = 0x04031040 shown in Fig. 7. Similarly, bit-flip of the 27th bit and the 28th bit of Data_in = 0x0C062900 will be decoded as opcode = 05 (logical-XNOR operation), [RA] = 0x04, [RB] = 0x05 which produces the output result as [RD] = 0xFFFFFFFA. The processor has produced an erroneous output in both the cases, which is indicated by fault_online = 1. These faults may be caused either by permanent defects or by SEU induced temporary faults or by any other kind of soft errors.
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-8.jpg'"
class="figure_img" id="Figure8"/>
Download
Larger image
PowerPoint slide
Figure8.
(Color?online) Simulation results of self-testable DLX RISC processor (faulty case).
onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/11/PIC/18010008-7.jpg'"
class="figure_img" id="Figure7"/>
Download
Larger image
PowerPoint slide
Figure7.
(Color?online) Simulation results of Self-testable DLX RISC Processor (Fault-free case).
Two versions of the processor (i) standard DLX RISC architecture and (ii) TSC based Self-testable DLX RISC Processor are synthesized and implemented in 7-series Zynq FPGA (xc7z020clg484-1). The device utilization reports and overall power consumption for the two designs is summarized in Table 2. The last column in the table shows the hardware/power overhead required for the design (2), which can be traded-off with its ability to facilitate on-line concurrent self-testing.
Logic resources | Design 1 (Standard DLX RISC processor) | Design 2 (TSC based self-testable DLX RISC processor) | Percentage of overhead | |||||
Utilized | Available | Percentage of utilization | Utilized | Available | Percentage of utilization | |||
1. Slice logic (LUTs) | 274 | 53 200 | 0.42 | 552 | 53 200 | 1.04 | 148 | |
2. LUT as logic | 226 | 53 200 | 0.42 | 504 | 53 200 | 0.95 | 126 | |
3. LUT as memory | 48 | 17 400 | 0.28 | 48 | 17 400 | 0.28 | 0 | |
4. Slice registers as FFs | 155 | 106 400 | 0.15 | 157 | 106 400 | 0.15 | 0 | |
5. DSPs | 3 | 220 | 1.36 | 3 | 220 | 1.36 | 0 | |
6. No. of bonded IOBs | 58 | 200 | 29 | 59 | 299 | 29.50 | 2 | |
7. Total power consumption | 34.527 W | 38.115 W | 10 |
Table2.
Comparison of device utilization summary.
Table options
-->
Download as CSV
Logic resources | Design 1 (Standard DLX RISC processor) | Design 2 (TSC based self-testable DLX RISC processor) | Percentage of overhead | |||||
Utilized | Available | Percentage of utilization | Utilized | Available | Percentage of utilization | |||
1. Slice logic (LUTs) | 274 | 53 200 | 0.42 | 552 | 53 200 | 1.04 | 148 | |
2. LUT as logic | 226 | 53 200 | 0.42 | 504 | 53 200 | 0.95 | 126 | |
3. LUT as memory | 48 | 17 400 | 0.28 | 48 | 17 400 | 0.28 | 0 | |
4. Slice registers as FFs | 155 | 106 400 | 0.15 | 157 | 106 400 | 0.15 | 0 | |
5. DSPs | 3 | 220 | 1.36 | 3 | 220 | 1.36 | 0 | |
6. No. of bonded IOBs | 58 | 200 | 29 | 59 | 299 | 29.50 | 2 | |
7. Total power consumption | 34.527 W | 38.115 W | 10 |
6.
Conclusion
The Berger code provides a unidirectional error detecting capability of detecting single or multi-bit errors in a given information sequence. Berger code prediction for the various arithmetic and logical operations that are carried out by a processor ALU has been summarized in this paper. The Berger code based totally self-checking checker (TSC) combined with a two-rail checker provides a solution for the on-line detection of SEU induced temporary faults and soft errors. The work presented in this paper has demonstrated the concurrent self-testing capability of the DLX RISC processor. The implementation results obtained in this work show that the concurrent built-in self-testing capability can be incorporated in the processor design with meager overheads in the hardware (LUTs) and marginally increased power consumption.