# Read Disturb Fault Detection in STT-MRAM

Rajendra BishnoiMojtaba EbrahimiFabian OborilMehdi B. TahooriKarlsruhe Institute of Technology, Karlsruhe, GermanyE-mails: {rajendra.bishnoi, mojtaba.ebrahimi, fabian.oboril, mehdi.tahoori}@kit.edu

*Abstract*—Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) has potential to become a universal memory technology because of its various advantageous features such as high density, non-volatility, scalability, high endurance and CMOS compatibility. However, read disturb is a major reliability issue in which a read operation can lead to a bitflip, because read and write current share the same path. This major reliability challenge is growing with technology scaling as read to write current ratio decreases. In this paper, we propose a circuit-level technique to detect read disturb by sensing the current during the read operation. Experimental results show that the proposed technique can effectively detect read disturb at the cost of negligible power and area overhead.

#### I. INTRODUCTION

As technology scales down, current memory technologies such as SRAM and DRAM are facing challenges in terms of scalability and high leakage [1, 2]. Therefore, industry is actively searching for alternatives, especially in the area of non-volatile memories as these promise close-to-zero leakage. However, many of these non-volatile memory technologies such as NAND-Flash suffer from endurance issues [3]. *Spin Transfer Torque Magnetic Random Access Memory* (STT-MRAM) is an emerging memory technology which is not only non-volatile but is also scalable and has a high endurance [4–6]. In addition to that, it has a high density, is CMOScompatible and soft-error immune [2, 7].

Despite various advantageous features, STT-MRAM is facing various reliability challenges including *write failure*, *decision failure, retention failure* and failures due to *read disturb* [8, 9]. Retention failures are due to the inherent thermal instability of STT-MRAM, which can lead to a flip of the bit-cell content, although no memory access is performed. In contrast, a write failure occurs when the bit-cell does not flip to its required value during the given write period. This can happen, since the write process in STT-MRAM is of stochastic nature. If it is not possible to distinguish between the two states of a bit-cell during a read operation it is a decision failure, while read disturb means that the bit-cell value is accidentally flipped during a read operation.

Write and decision failures can be resolved either by increasing the write and the read time period, respectively, or by increasing their corresponding current values [10]. On the other hand, the retention failure can be handled at the device-level by exploiting the thermal stability factor. Read disturb can also be resolved by widening the margin between read and write current which can be achieved either by increasing the write current or reducing the read current. However, reducing the read current increases not only the read latency but also the chances of decision failures [11]. Moreover, STT-MRAM has already a high write energy as it requires a high current to flip the bit-cell [12, 13], and a further increase in write current is

hence not a feasible solution. Furthermore, recently it has been shown that the read disturb rate is growing with technology scaling [9] and is going to be a major reliability issue in future technology nodes [8]. Therefore, employing efficient read disturb mitigation techniques is of decisive importance.

The *pulsed read* [14] and *disruptive reading and restoring* [15] schemes are two techniques to alleviate read disturb in STT-MRAM. In addition, some bit-cell architectures are proposed to mitigate read disturb [7, 16–18]. Moreover, there is a device-level scheme proposed in which the read disturb rate can be reduced by increasing the thermal stability factor [12]. However, while all of these techniques can reduce the read disturb rate, they impose excessive access time, area and/or power costs. As a consequence, these mitigation approaches are not the best way to achieve efficient designs. Instead, low-cost detection techniques paired with efficient error correction approaches should be considered for future STT-MRAM technologies.

In this paper, we propose a low-cost circuit-level technique to detect read disturb in STT-MRAM. This technique exploits the fact that during a read operation the read current is either always larger or always smaller than the reference current. However, if a read disturb occurs, the ratio of the read current to the reference current flips. This observation is used by our technique to create an acknowledgement signal, which indicates the occurrence of a read disturb. Since the read current is unidirectional, read disturb can only affect one logic value. Therefore, our read disturb detection circuit is only activated for that particular logic value which results in a very low power penalty. Moreover, there is no timing penalty as our detection circuit is isolated using a current mirror from the actual read process. Our experimental results show that the proposed read disturb detection technique can detect upto 95% of the total read disturb faults and imposes only 0.8% and 0.2% area and power overhead, respectively.

The rest of the paper is organized as follows. Section II consists of the basics of the STT-MRAM technology and the related work. In Section III, the proposed technique is explained. Section IV presents the experimental results and finally, Section V concludes the paper.

#### II. BACKGROUND

## A. Spin Transfer Torque Memory

STT-MRAM consists of magnetic tunnel junction (MTJ) cells to store the data. An MTJ cell consists of a barrier oxide layer, sandwiched by two ferromagnetic layers. The ferromagnetic layer whose magnetic orientation is always fixed, is known as *Reference Layer* (RL), while the other one, whose magnetic orientation can be freely rotated, is named *Free Layer* (FL) (see Figure 1). In an MTJ cell, values are stored in

Paper 23.3 INTERNATIONAL TEST CONFERENCE 978-1-4799-4722-5/14/\$31.00 2014 IEEE



Fig. 1. Spin transfer torque storing device

terms of resistance states. When the magnetizations of the two ferromagnetic layers are parallel ('P') to each other, it exhibits a low resistance value. Otherwise, when the magnetization of the two layers are anti-parallel ('AP') to each other, the MTJ resistance is high. To change the magnetic orientation of the free layer a bidirectional write current is required. If the write current flows from FL to RL for a sufficient duration, it switches the magnetization into the 'P' configuration, and if the current flows in the opposite direction, it changes the state into the 'AP' configuration. The switching behaviors of these two magnetization states are asymmetrical in nature due to the inherent properties of an MTJ cell [19]. This is the reason why switching from 'P' $\rightarrow$ 'AP' takes considerably more time than that from 'AP' $\rightarrow$ 'P' [20].

In this work, we use a 1T1MTJ (1T1MTJ = 1 access transistor + 1 MTJ cell) bit-cell consisting of three terminals namely, *source line* (SL), *bit line* (BL) and *word line* (WL), as depicted in Figure 3.

## B. Read behavior

In order to read a bit-cell content in STT-MRAM, a unidirectional current is required to flow through the bit-cell. This read current is high and low when the magnetization of the MTJ cell is in 'P' and 'AP' state, respectively (the difference is typically a few uA), due to the difference in the resistance corresponding to the magnetization of the MTJ cell. A sense amplifier exploits this current difference to determine the logic state of the bit-cell. For STT-MRAM, a pre-charge based sense amplifier (PCSA) is commonly used, since it is fast and energy efficient [16, 21]. However, as the output nodes of the PCSA are unstable at the beginning of the read operation, a short circuit increases the read current (up to 17 uA considering load effects). As soon as the output nodes are stable, a static current flows though the MTJ cell (5-7 uA considering load effects). This behavior is illustrated in Figure 2. The reference current for the PCSA is in the middle of the 'P' and 'AP' currents, due to the PCSA design (see Section III-B).

Please note that the duration of the short circuit period (unstable period) strongly depends on the location of the bitcell within the memory array. The closer a bit-cell is placed to the sense amplifier and the less routing delay it has, the shorter is the unstable period. Using NVSim [22] we obtained that the overall read period (including unstable and stable periods) for a 512 KByte STT-MRAM memory is 1.2 ns. This means that, in the worst case (i.e. bit-cell that is very far away and has high routing delay) it requires 1.1 ns until the PCSA output nodes are stable (the remaining 100 ps are required for latching the value at the output). However, in the best case the unstable period is just 116 ps long.



Fig. 2. Conceptual diagram of read disturb while reading 'AP' state (solid line indicates actual current flow)

# C. Read disturb

In STT-MRAM, the content of a bit-cell can accidentally change during a read operation which is known as read disturb. This is due to the fact that the read current shares one of the write current paths in STT-MRAM. However, the read current is around 5-10 times lower than the critical write current (minimum current required to flip the bit-cell at a certain write period and write error rate). Nevertheless, this low read current induces a magnetic disturbance in the MTJ cell which may lead to a flip of the magnetic orientation. Since the read current is unidirectional, the flip can only happen in one direction, i.e. either from 'AP' $\rightarrow$ 'P' or the other way around. As a result, also the resistance changes, which in turn affects the read current. This sudden change in the read current can be used to detect read disturb, as the ratio of the actual read current to the reference current flips (before read disturb occurrence, the read current is smaller than the reference current, after read disturb it is larger). This fact is exploited by our proposed low cost circuit-level technique.

The switching probability due to read disturb is given by the following equation [23]:

$$F_{rd} = 1 - e^{-\frac{t_{read}}{\tau_1 e^{\Delta(1 - I_{read}/I_{C0})}}}$$
(1)

where  $\Delta$  is the thermal stability factor,  $I_{read}$  is the read current,  $I_{C0}$  is the write critical current, and  $t_{read}$  is the read period. For STT-MRAM, the typical read disturb probability for a single read event is in the range of  $10^{-23}$  to  $10^{-21}$  [23].

#### D. Related work

Nowadays, read disturb becomes an important design parameter for STT-MRAM, as it has a dependency on the retention, write current, read current and read period values. Hence, several attempts were made earlier to reduce the read disturb rates. A unique read methodology has been proposed in which a pulsed read technique is used to read the content of the bit-cell [14]. In this technique, the word line (WL) is ON and OFF for a certain period of time in form of a pulse, so that the read current cannot flow continuously through the bit-cell. This reduces the read disturb rate at the cost of read access time and also increases the complexity of the sensing methodology. A disruptive reading and restoring scheme is proposed in [15]. While this significantly improves the read disturb rate, it also increases the overall cycle time, power and area considerably. At device-level, it was proposed to reduce the read disturb rate by increasing the thermal stability factor. However, this also increases the critical current value which in-turn increases the write power.



Fig. 3. STT-MRAM 1T1MTJ bit-cell architecture

Another technique employs a 2T1MTJ (2 access transistor + 1 MTJ cell) bit-cell architecture. In this method two access transistors are used namely one for read and another for write [16]. In this regard, the read access transistor has a low width to make sure that less read current flows through the MTJ cell. Nevertheless it imposes a significant area overhead. An alternative option is to employ *Spin Orbit Torque* MRAM instead of STT-MRAM as proposed in [7, 18]. The advantage of SOT-MRAM is that it separates the read and write paths by adding an additional terminal. However, this third terminal increases the memory area significantly.

Another technique to mitigate read disturb exploits the fact that the direction of the read current has a considerable impact on the read disturb probability [17]. If, instead of a conventional bit-cell, a reverse connected bit-cell is used (as shown in Figure 3), the read current is aligned with the 'P' $\rightarrow$ 'AP' write current rather than with the 'AP' $\rightarrow$ 'P' write current as in the conventional case. This increases the read disturb probability significantly as shown in [17]. Therefore, we use the conventional bit-cell in this work.

In summary, all of the aforementioned approaches to mitigate read disturb impose significant costs in terms of performance (access time), area and/or power. Therefore, we propose a low cost read disturb detection technique that can be combined with efficient error correction schemes to achieve a solution with lower performance, area and power overhead.

# III. PROPOSED TECHNIQUE

As read disturb has a dependency on various important design parameters like write current, read current, retention and readability, a reduction of the read disturb rate always leads to design compromises. Moreover, with a reduction technique, it is not possible to eliminate read disturb entirely. Hence, it must be detected to attain a reliable memory. Therefore, we proposed a dynamic circuit-level approach which tracks the read current, and thus, is able to detect read disturb. This is possible, as a read disturb fault changes the resistance of the affected bit-cell which in turn affects the read current. As a consequence, the ratio of the actual read current to the reference current of the sense amplifier will flip (see Figure 2). This observation is exploited by a detection circuitry to create an error signal.

In the following, we present the *Read Disturb Detection* (RDD) methodology in detail followed by its circuit-level implementation.

## A. Read disturb detection (RDD) methodology

The read operation using a PCSA is performed in two phases, namely the pre-charge phase and the evaluation phase [16, 21]. During the pre-charge phase, the two output nodes of PCSA are kept at equi-potential, while the actual read is performed during the evaluation phase. The detection phase of read disturb is a part of the evaluation phase that begins as soon as the output nodes of PCSA are stable i.e either '0' or '1'. It is inferred from Figure 4 that the pre-charge phase is the period before the activation of the WL and the duration of the evaluation phase is the complete ON period of the WL. During the evaluation phase, the read current continuously flows through the bit-cell, and thus, read disturb is possible during this phase. Therefore, the detection phase has to be part of the evaluation phase. However, since the output nodes of the PCSA are unstable at the beginning of the evaluation phase (unstable period in Figure 2), it is not possible to detect read disturb during this period (see Section III-C). Consequently, the detection phase starts as soon as the output nodes reaches a stable state (stable period in Figure 2). During this phase a specially designed detection circuit (explained in the next subsection) traces the read current behavior and by this means can detect read disturb faults during the stable period of the evaluation phase.

Please note that the output nodes of the PCSA remain unchanged after read disturb as the potential developed due to this sudden change in current after read disturb is not sufficient enough to cross the threshold of the inverters in the latch type structure of the PCSA (see Figure 5). To obtain the new value at the output node of PCSA, it has to be pre-charged again. Hence, if a read disturb occurs after the output nodes are stable, it will not affect the value which is currently read and therefore the PCSA outputs cannot be used directly to create an error signal. Instead a more sophisticated circuit is required, which is explained next.

## B. RDD circuit

In the proposed RDD technique, once the read value is stable at the output nodes of the sense amplifier, the current through the bit-cell is traced until the end of the read operation to detect read disturb. This is done by employing an additional



Fig. 4. Waveform to demonstrate the three phases for a read operation

sense amplifier. Once an unwanted bit-flip is detected, it is acknowledged by an error signal. As mention in Section II-B, we used a PCSA to read the content of the bit-cell [16, 21]. On the top of this sense amplifier we have build the proposed RDD circuit, which is shown in Figure 5. In addition, we also developed and included a self-test mechanism to test the functionality of the RDD circuit. Thus, the RDD circuit consists of five parts:

1) Equalizer circuit: The two output nodes (q1 and q2) need to be at the same potential before the sensing operation begins. This is achieved using an equalizer circuit which is controlled using a *pre-charge* (PC) signal. The equalizer circuit is active when the PC signal is '0' (during the pre-charge phase) and becomes inactive when PC is '1' (during evaluation phase).

2) Sense amplifier: The sense amplifier is used to read the bit-cell content. Therefore, a comparison of the current through the reference cell with that through the bit-cell is performed. The bit-cell is accessed based on WL value and the access transistor of the reference cell is driven through an Address Transition Detection (ATD) circuit [24]. Here, the reference cell consists of four MTJ cells connected in such a way that the effective resistance value is the middle of the two resistance states that a single MTJ can take i.e.  $(R_P + R_{AP})/2$  where  $R_P$  and  $R_{AP}$  are the resistance values during the 'P' and 'AP' states, respectively. The sense amplifier is operational when the PC signal is '1'. Afterwards, the potentials of the nodes q1 and q2 try to become stable based on the resistance values of the two branches. To speed up this process two additional backto-back inverters are employed. The final output signal can be obtained either at q1 or q2 (=  $\overline{q1}$ ) and it corresponds to the logic bit-cell state.

3) Detection circuit: The purpose of this circuit is to detect a read disturb. Therefore, this circuit is activated by the control circuit through the  $rd_enable$  signal only for read operations which can be affected by read disturb (here: only read 'AP'). In this case, it employs a current mirror to copy the current values used by the previously described sense amplifier. Then, the reference current is again compared with the current through the bit-cell. As long as the latter is smaller than the reference current, no read disturb occurred and the acknowledgement signal  $rd_ack$  remains at '0'. However, if a read disturb fault occurs after the detection circuit is activated, the current through the bit-cell will increase (due to a lower resistance) and as a result will be larger than the reference current. Consequently, rd\_ack makes a transition to '1' indicating that a read disturb happened.

4) Control circuit: As the bit-flip due to read disturb is only possible in one direction (here: 'AP' to 'P'), we just need to trace the read current for that particular read operation (here: read 'AP'). Hence, the RDD circuit is enabled just for these vulnerable read accesses using the conditional circuit shown in Figure 6. For all other memory operations the RDD circuit remains inactive. Please note that it is required for the RDD circuit activation to know which value is stored in the bitcell, since it is only turned on for particular read operations. However, as read disturb detection is anyway not possible before the output nodes of the sense amplifier become stable, the type of read operation, that is currently performed (i.e. read 'AP' or read 'P'), is already known. Therefore, one of the output nodes of the sense amplifier is used as input for the



Fig. 5. Circuit diagram for proposed read disturb detection

control circuit.

5) Self-test circuit: It is required to provide a self-test capability for the RDD circuit. This can be done using a dummy bit-cell structure connected in parallel to the existing bit-cell. The MTJ cell of this dummy cell is always configured to be in the 'P' state. This circuit is activated through a terminal named *test\_rd* and connected to the WL through an inverter (WL deactivated by disabling the row decoder using the test\_rd pin). Hence, either the dummy bit-cell or the real bit-cell is used at a time. To test the proper functionality of the detection circuit, first the real bit-cell needs to be put into the state in which read disturb is possible (i.e. 'AP' configuration). Then, the test\_rd is activated and the WL is deactivated simultaneously which results in an instant change in the current value i.e. from '0' to '1'. This needs to be detected with the RDD circuit.



Fig. 6. Control circuit to trigger the read disturb detection



Fig. 7. Waveform for read disturb detection circuit

The waveform behavior of the complete read methodology including RDD is shown in Figure 7. Here PC is activated and deactivated during the pre-charge phase. Then, WL is activated which indicates the beginning of the evaluation phase with that both output nodes start discharging. Later, one of the output nodes will become '1' and the other one '0'. If a read operation is performed that is susceptible to read disturb, q1 is '0' which activates the rd\_enable signal and the RDD circuit turns ON. If the test feature is used, the test\_rd is activated (i.e. '1') which generates the rd\_ack signal. This sequence of operation is also explained using the truth table shown in Table I.

#### C. Discussion

Although we have employed a pre-charged based sense amplifier for the read operation, our proposed circuit can be used with any other type of sense amplifier. It is only required to make sure that the detection circuit is properly biased with the bit-cell and the reference currents. However, no matter which sense amplifier is used, there are two phases during a read operation where the RDD circuit cannot detect read disturb:

- If a read disturb happens at a very early stage, before the output node of the sense amplifier stabilizes (unstable period in Fig 2).
- If a read disturb happens at the end of the read period during the deactivation of WL signal.

In the first case, the sense amplifier itself will read a wrong output value and as long as the output nodes are not stable it is impossible to detect read disturb. This is because one cannot distinguish between a current change due to read disturb and a current change due to the stabilization of the internal nodes. Consequently, the ratio of the "unstable" time to the overall read period has a huge influence on the overall

TABLE I. TRUTH TABLE TO ENABLE THE READ DISTURB DETECTION CIRCUIT

| read enable | q1 | WL/test_rd | rd_enable |
|-------------|----|------------|-----------|
| 0           | X  | X          | 0         |
| 1           | 1  | X          | 0         |
| 1           | 0  | 0          | 0         |
| 1           | 0  | 1          | 1         |

read disturb detection rate. As explained in Section II-B, for a 512 KByte memory the overall read period is 1.2 ns according to NVSim [22], while the output nodes are at least unstable for 280 ps (the actual time period depends on the routing delay of the bit-cell). In the second case, the detection signal depends on the slew of the deactivation of WL. Therefore, we use a sharp slew such that the probability of read disturb during that period is negligible. In summary, the detection rate reaches up to 95 %, depending on the clock period which determines how long WL is turned ON.

If the currents used in the RDD circuit are too low, one can add a high width transistor to amplify the currents copied by the current mirrors. However, there is a trade-off between the width of this transistor and the potential difference of the reference and bit-cell branch inside the sense amplifier. As a result, if the width is too large, the required distinguishable potential may not be developed at the output nodes of the sense amplifier which can lead to a decision failure.

## IV. EXPERIMENTAL SETUP AND RESULTS

In order to demonstrate the effectiveness of the proposed RDD technique, we implemented this technique at circuitlevel and evaluated it using SPICE simulations. In this regard, we employed the TSMC 65 nm general purpose model for CMOS components and a model from [25] for the MTJ cells. Cadence Spectre was exploited for the SPICE simulations by considering a supply voltage of 1.2 V and a temperature of  $27^{\circ}\text{C}$ . Using this experimental setup, the proposed technique is evaluated for a single bit-cell as well as for an entire memory block.

#### A. Read Disturb Detection for a Single Bit-cell

For evaluation of a single bit-cell, different scenarios due to process variations in the CMOS circuitry as well as the MTJ cell were analyzed. Since these two parts have different fabrication technologies, we separately considered the effect of process variation on these two technologies [26]. For the CMOS circuitry, we performed simulations for the slow, typical, and fast corners. For the MTJ cell, a normal distribution with  $\pm 3\sigma$  variations for *Tunneling Magneto-Resistance* (TMR) value and RA<sup>1</sup> were considered in the experiments. Based on these variations, we extracted the best, typical and worst case scenario for the time period that a single bit-cell requires from the start of a read process until the PCSA output nodes become stable by tracing the read current waveforms extracted from SPICE simulations. By combining the obtained results with Equation (1) the read disturb probability for each read phase (stable and unstable period according to Figure 2) was computed. Based on the fact that the proposed RDD technique can only detect read disturb in the second part of the read operation (i.e. when the PCSA output nodes are stable), the probability for read disturb detection with respect to the entire read period was calculated.

The read disturb detection probability for the three process corners among various read periods are shown in Figure 8. For the typical corner, the read disturb detection probability is more than 70% for a read period of 1 ns. By increasing the read period, the detection capability of the RDD technique increases and reaches more than 90% for read period of 4 ns.

<sup>&</sup>lt;sup>1</sup>Product of resistance and area



Fig. 8. Probability of read disturb detection vs read period for a single bit-cell considering process variation

A similar trend can be seen for the worst case and best case as well.

#### B. Read Disturb Detection for a Memory Array

For the evaluation of the efficiency of our proposed RDD technique in an entire memory block, we employed a 512 KByte memory array which is partitioned into several blocks, each of which consists of 512 rows of bit-cells as illustrated in Figure 9. In this regard, the bit-cells closer to the periphery circuits (including PCSAs) require less time until the PCSA output nodes become stable compared to those which are farther away. As the efficiency of our proposed technique is dependent on the ratio of the unstable to the stable time period (shown in Figure 2), the read disturb detection probability for bit-cells close to the periphery circuits is higher. Consequently, the locations of the best and the worst case bit-cells are as highlighted in Figure 9.

The overall read period for the given memory configuration in the typical process corner was extracted to be 1.2 ns using NVSim [22]. The last 100 ps of this period is required for latching the output value. Consequently, even for the worstcase bit-cell, the stable period is at least 100 ps. Assuming that all bit-cells in a memory partition are equally accessed, the read disturb detection probability for each cell is computed according to the flow explained in previous subsection. Based on the obtained results, the average and highest read disturb detection probabilities were computed. These are depicted in Figure 10 for various clock periods. As shown in the figure, for smaller clock periods, the read disturb detection probability



Fig. 9. Memory architecture to demonstrate the best and the worst cases



Fig. 10. Read Disturb Detection Probability for different clock periods

is lower as the multiple of the clock period can be closer to the given read access period, and hence, the stable period is always very short. However, for larger clock periods, the read disturb detection probability increases significantly, as the stable period becomes more and more dominant. By these means, our proposed technique can achieve a read disturb detection probability of 65 % in average for a clock frequency of 1 GHz, while for the bit-cells which are close to the periphery circuits more than 80 % are possible. Moreover, STT-MRAM is often used as a replacement for the main memory with clock frequencies in the range of a few hundreds of MHz. For these application scenarios, average detection probabilities of more than 80 % in average are achievable. Furthermore, it is worth to mention that for large memory sizes the worst case read latency increases. As a result, the stable period for the best case bit-cell becomes longer, which in turn improves the best and the average read disturb detection capabilities.

#### C. Area, Delay and Energy Overhead

The area of the mentioned 512 KByte memory array was estimated using NVSim [22]. The area of the control circuit was obtained using the TSMC standard cell library with minimum gate sizes, and the area of the detection circuit was calculated based on the transistor sizes. According to the obtained area numbers, the area overhead of our proposed technique is around 0.8 %.

Since the RDD circuitry increases the load of the PCSA used for the read operation, it can impair the read latency. This is because the read current can be lower due to the additional load. To avoid any delay penalty, we adjusted the transistor sizes in the read path to keep the read current with the RDD circuitry on the same level as in the standard memory without detection capability. Consequently, there is no delay penalty using our implementation. Please note that the larger transistor sizes were also considered in the area overhead estimation.

The energy consumption of the read circuitry including our proposed RDD circuit was extracted using SPICE simulations to be 4.0%. This was given to NVSim to obtain the total read energy for the entire memory configuration. The NVSim results show that the average read access energy increases by just 0.24%. The main reason for this low energy overhead is that the read circuitry is only activated during read operation for 'AP' configurations.

#### D. Comparison with Parity protection

We have compared the proposed technique with the parity error detecting code which is commonly used for the memories. For parity, a bit is added to each word to indicate whether the numbers of ones in the word is even or odd. Although, parity can detect all read disturb faults, it requires an additional column to store the parity bits as well as parity encoding and decoding circuitries. These contribute significantly to the area (4.0%) as well as energy consumption (1.2%), and cause a considerable timing overhead.

# V. CONCLUSIONS

STT-MRAM is a promising memory technology because of its various advantages such as non-volatility, high endurance, scalability and high density. However, read disturb is a serious reliability challenge for the development of the STT-MRAM. We proposed a detection circuit to detect the read disturb fault with a self-test mechanism to validate its behavior. We also provided a conditional circuit to activate the detection circuit only for the 'AP' configuration as read disturb always flip from 'AP' to 'P' in our implementation. Our results show that the proposed technique can effectively detect read disturb with negligible area and power overhead.

#### REFERENCES

- Chris Wilkerson, Alaa R Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, and Shih-lien Lu. Reducing cache power with low-cost, multi-bit error-correcting codes. ACM SIGARCH Computer Architecture News, 38(3):83–93, 2010.
- [2] Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. Technology comparison for large last-level caches (L 3 Cs): Low-leakage SRAM, low writeenergy STT-RAM, and refresh-optimized eDRAM. In *High Performance Computer Architecture*, pages 143–154, 2013.
- [3] Simona Boboila and Peter Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In FAST, volume 10, pages 9–9, 2010.
- [4] International Technology Roadmap for Semiconductors. http://www.itrs.net, 2012.
   [5] T. Kawahara. Scalable spin-transfer torque ram technology for normally-off
- computing. Design Test of Computers, IEEE, 28(1):52–63, Jan 2011. [6] Hai Li and Yiran Chen. An overview of non-volatile memory technology and the investigation of the technology in the second secon
- the implication for tools and architectures. In *Design, Automation Test in Europe Conference Exhibition*, pages 731–736, April 2009.
  [7] R. Bishnoi, M. Ebrahimi, F. Oboril, and M.B. Tahoori. Architectural Aspects in
- [7] R. Bishnot, M. Ebrahimi, F. Oborn, and M.B. fanoofi. Architectural Aspects in Design and Analysis of SOT-based Memories. In Asia and South Pacific Design Automation Conference, pages 700–707, Jan 2014.
- [8] Xuanyao Fong, Yusung Kim, S.H. Choday, and K. Roy. Failure Mitigation Techniques for 1T-1MTJ Spin-Transfer Torque MRAM Bit-cells. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 22(2):384–395, Feb 2014.
- [9] H. Naemi, C. Augustine, A. Raychowdhury, S. Lu, J. Tschanz. STTRAM Scaling And Retention Failure. *Intel Technology Journal*, 17, 2013.
- [10] Anurag Nigam, Clinton W Smullen IV, Vidyabhushan Mohan, Eugene Chen, Sudhanva Gurumurthi, and Mircea R Stan. Delivering on the promise of universal memory for spin-transfer torque RAM (STT-RAM). In *International symposium* on Low-power electronics and design, pages 121–126, 2011.
- [11] Yaojun Zhang, Wujie Wen, and Yiran Chen. The prospect of stt-ram scaling from readability perspective. *IEEE Transactions on Magnetics*, 48(11):3035–3038, 2012.
- [12] Yiming Huai, Mahendra Pakala, Zhitao Diao, and Yunfei Ding. Spin-transfer switching current distribution and reduction in magnetic tunneling junction-based structures. *IEEE Transactions on Magnetics*, 41(10):2621–2626, 2005.
- [13] Nikolaos Strikos, Vasileios Kontorinis, Xiangyu Dong, Houman Homayoun, and Dean Tullsen. Low-current probabilistic writes for power-efficient STT-RAM caches. In *International Conference on Computer Design (ICCD)*, pages 511–514, 2013.
- [14] A. Raychowdhury. Pulsed READ in spin transfer torque (STT) memory bitcell for lower READ disturb. In *International Symposium on Nanoscale Architectures* (NANOARCH), pages 34–35, 2013.
- [15] R. Takemura, T. Kawahara, K. Ono, K. Miura, H. Matsuoka, and H. Ohno. Highly-scalable disruptive reading scheme for Gb-scale SPRAM and beyond. In *International Memory Workshop (IMW)*, pages 1–2, 2010.
- [16] WS Zhao, T. Devolder, Y. Lakys, J.-O. Klein, C. Chappert, and P. Mazoyer. Design considerations and strategies for high-reliable STT-MRAM. *Microelectronics Reliability*, 51(9):1454–1458, 2011.
- [17] Kawahara, T and Takemura and others. 2mb spin-transfer torque ram (spram) with bit-by-bit bidirectional current write and parallelizing-direction current read. In *Solid-State Circuits Conference*, pages 480–617, 2007.

- [18] K Jabeur, LD Buda-Prejbeanu, G Prenat, and GD Pendina. Study of two writing schemes for a magnetic tunnel junction based on spin orbit torque. *International Journal of Electronics Science and Engineering*, pages 501–507, 2013.
- [19] R. Bishnoi, M. Ebrahimi, F. Oboril, and M.B. Tahoori. Asynchronous asymmetrical write termination (aawt) for a low power stt-mram. In *Design, Automation and Test* in Europe Conference and Exhibition (DATE), 2014, pages 1–6, March 2014.
- [20] D. Lee, S.K. Gupta, and K. Roy. High-performance Low-energy STT MRAM Based on Balanced Write Scheme. In *International Symposium on Low Power Electronics and Design*, pages 9–14, 2012.
- [21] R. Bishnoi, F. Oboril, M. Ebrahimi, and M.B. Tahoori. Avoiding Unnecessary Write Operations in STT-MRAM for Low Power Implementation. In *International Symposium on Quality Electronic Design*, pages 548–553, March 2014.
- [22] Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. Nvsim: A circuitlevel performance, energy, and area model for emerging nonvolatile memory. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 31(7):994–1007, 2012.
- [23] Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, Alexander Driskill-Smith, and Mohamad Krounbi. Spin-transfer Torque Magnetic Random Access Memory (STT-MRAM). J. Emerg. Technol. Comput. Syst., 9(2):13:1–13:35, May 2013.
- [24] Martin Margala. Low-power SRAM circuit design. In International Workshop on Memory Technology, Design and Testing, pages 115–122, 1999.
- [25] W Guo, G Prenat, V Javerliac, M El Baraji, N de Mestier, C Baraduc, and B Dieny. SPICE modelling of magnetic tunnel junctions written by spin-transfer torque. *Journal of Physics D: Applied Physics*, 43(21):215001, 2010.
- [26] Yaojun Zhang, Xiaobin Wang, and Yiran Chen. STT-RAM cell design optimization for persistent and non-persistent error rate reduction: A statistical design view. In *International Conference on Computer-Aided Design (ICCAD)*, pages 471–477, Nov 2011.