RMG and Associates

Insightful, timely, and accurate

Semiconductor Technology Consulting

Semiconductor & Patent Expert Consulting

Ron@Maltiel-consulting.com

(408) 446-3040

_____________________________________________________________________________________________________________________________________________

 

NAND 201: the continued evolution of NAND Flash

EETimes / Jim Cooke, Micron Technology 2/13/2011 6:23 PM EST
A lot has changed with NAND Flash memory since my original NAND 101 article was published in 2006. From the evolutionary changes of a continually shrinking NAND cell, to the performance-enhancing innovations that support increasingly advanced designs, this follow-on article will chronicle the developments in NAND technology from 2006 through early 2011. A lot has changed with NAND Flash memory since my original NAND 101 article was published in 2006. From the evolutionary changes of a continually shrinking NAND cell, to the performance-enhancing innovations that support increasingly advanced designs, this follow-on article will chronicle the developments in NAND technology from 2006 through early 2011.

Market Changes
In 2006, single-level cell (SLC) NAND Flash devices were mainstream products that accounted for more than 80% of the devices on the market. At that time, many NAND Flash vendors were struggling with two bits per cell, known as multilevel cell (MLC), and SLC device densities were in the range of just a few gigabits. Today, for example, Micron offers a range of NAND products with densities up to 512Gb in a single device.

Figure 1 shows the past, present, and projected output mix for the major NAND cell technologies. The high runners are all MLC devices, which have replaced SLC devices in holding approximately 80% of the total market. While 16-level-cell technology grew to a few percent in 2010, it is expected to drop to 0% in 2011 due to the continued difficulty of reliably placing 16 discrete thresholds in a single cell. 8LC devices, which pack three bits per cell, are expected to grow from less than 10% in early 2009 to almost 30% by the end of 2011. These devices are used primarily in value-minded consumer products that can operate with lower NAND performance and fewer PROGRAM/ERASE cycles (also known as endurance). Traditional MLC devices, which group two bits (four levels) per cell, are ideal for applications that demand higher performance and endurance; thus, MLC drives the majority of NAND output. Lastly, SLC NAND is the technology of choice for high-performance, high-endurance, and high-reliability applications. Later, we will discuss some separate, specialized NAND devices that have been born out of necessity.


Figure 1:  NAND Production by Type

Figure 2 shows the expected application adoption of 3-bit-per-cell (8LC) NAND devices. In addition to early drivers such as Flash cards and USB thumb drives, several other consumer application designs are expected to be based on this technology.



Figure 2:  8LC Density Adoptions by Application


NAND Basic Operations
While the NAND cell itself has remained essentially the same over the last several years (albeit much smaller), almost everything else about NAND has changed (see Table 1).

Array Enhancements
As the NAND cell has continued to shrink, the array has become significantly more compact, allowing more cells to be integrated in a comparable die area.

The mainstream device in 2006 (Figure 3) was a 2Gb SLC device consisting of 2048 blocks; each block had 64 pages, and each page had 2112 bytes. In contrast, today’s high-capacity devices (Figure 4) can include multiple die or logical unit numbers (LUNs). Each 32Gb monolithic SLC die has 4096 blocks, each block contains 128 pages, and each page has up to 8640 bytes—and it requires only a slightly larger silicon area than the 2Gb device of 2006. When the spare area needed to support the additional error correction code (ECC) is factored in, the result is more than 16 times the density.


Figure 3:  2Gb Array Organization of 90nm Micron SLC NAND Flash (2006)


Figure 4:  32Gb Array Organization of 25nm Micron SLC NAND Flash (2010)

Table 1 shows the evolution of SLC and MLC devices. Focusing first on the SLC NAND , the programming time (tPROG) has remained relatively constant at approximately 300µs, but the amount of data being programmed (the page size) has increased from 2112 bytes in 2006 to 8640 bytes for the latest 25nm production device. This increased page size provides a four-fold improvement in the overall array programming performance.

To improve performance even further, most NAND manufacturers have implemented what is referred to as multiplane technology, shown in Figure 4. Essentially, multiplane operations enable twice the number of array data operations in approximately the same amount of time. The details of these commands will be discussed later, but examples include multiplane ERASE or PROGRAM operations, where the array times are significant and can be amortized over double the amount of data, improving performance by almost two times. The SLC devices of 2006, by comparison, were single-plane, as shown in Figure 3.


Table 1:  Evolution of High-Density Micron SLC/MLC Devices




Title-2
Doubling the number of SLC pages from 64 to 128 in 2009 enabled a more efficient array but produced the negative side effect of a larger block size, which can present management challenges for the Flash file system, especially when dealing with small data files.

The array read time (tR) has remained very fast and relatively consistent, resulting in high performance for most READ operations.

The required ECC, NOP, and endurance specifications referenced in Table 1 are discussed as a group because they are all related to the robustness of the NAND cell. As NAND continues to scale, it becomes more challenging for cells to store data reliably. To compensate, the amount of ECC required continues to increase. Early SLC NAND devices were able to achieve an endurance rate of 100,000 with only one bit of ECC. These same devices were also able to support a NOP of eight, but that was reduced to four to match the requirement of the original 2112-byte, 64-page devices, which typically consisted of four 528-byte sectors. To remain competitive, NAND must shrink to smaller geometries while maintaining an acceptable system error rate, which requires a combination of higher levels of ECC, lower number of NOPs, and/or lower endurance.

NAND endurance has received some negative attention over the past few years. Based solely on the endurance specifications, SLC endurance has dropped 40%, from 100,000 in 2008 to approximately 60,000 in 2010, while MLC endurance has dropped 70%, from 10,000 in 2008 to 3,000.

However, endurance doesn’t tell the whole story. Consider the higher densities of today’s devices and effective wear leveling. SLC and MLC device densities have increased by 8 and 16 times, respectively, meaning that there are 8 or 16 times as many memory cells used to store data. Comparing the 2008 100,000/10,000 device specification to the current 25nm NAND technology shows that density has increased by 400%, while the MLC endurance has dropped to 30% of the original 10,000, which nets an actual improvement in the system endurance. Again, this assumes effective wear leveling, which wears all cells at essentially the same rate.

Interface Enhancements
Another significant area of NAND Flash evolution is the interface. The Open NAND Flash Interface (ONFI) workgroup was founded in February 2006, and in December 2006 the group published its first specification, ONFI 1.0. Prior to ONFI, controllers or host firmware could issue a READ ID command to the NAND device, which would return manufacturer and design ID information, in addition to a couple vendor-specific bytes. While better than nothing, the lack of device parameter information forced controller vendors to build internal firmware tables by manually entering data sheet parameters. These tables were then used by the firmware to interpret device capabilities.

One of the more significant outcomes of the ONFI 1.0 specification was the ability for controller firmware to ascertain the parameters and characteristics of the attached NAND Flash. ONFI 1.0 simplified support for the various timing specifications through timing modes. By reporting the supported timing modes, the NAND device could meet the minimum specifications, simplifying more than 30 various timing details into one of six preconfigured timing mode groupings.

In addition, ONFI 1.0 provides a parameter page data structure that controller firmware can easily read to request further details about the Flash, such as page size, number of blocks, ECC required, etc. (All released ONFI specifications are publically available at onfi.org.)

As the page size grows, the need for speed becomes more apparent. Using the original 2112-byte page device as an example, the fastest asynchronous clock specification of 20ns allows us to shift the entire 2112-byte page in less than 43µs. As the page size grows to 4320 or 8640 bytes, this time increases to 86µs and 172µs, respectively. With the array read time (tR) generally fixed at 25–35µs (depending on the geometry), the I/O bottleneck quickly becomes apparent. On a 25nm SLC device, it takes only 35µs to transfer all 8640 bytes from the array to the I/O register, but it takes and additional 172µs to get all 8640 bytes of data out of the register. This results in a total time of 207µs, or approximately 42 MB/s, not including the minimal command overhead.

Utilizing the source-synchronous ONFI 2.0 interface running a 10ns clock (DDR) would result in an I/O access rate of 200 MB/s. (ONFI 3.0 would result in 400 MB/s.) For this same example, the time to get 8640 bytes of data out of the register would be reduced from 172µs to 43µs. The array read time would remain 35µs. Adding the array read time of 35µs and the faster I/O time of 43µs results in a total time of 78µs, or approximately 110 MB/s.

Note that in order to keep these examples simple, we are considering only the single-plane read bandwidth from the NAND to the controller and are not using any cache or multiplane operations, which could provide a significant performance improvement. Also keep in mind that the controller typically has some inefficiencies, as well as ECC correction times that must be taken into account.

Command Enhancements
Table 2 shows the commands supported by Micron’s 25nm NAND Flash. (Please note that this table is an example provided for general reference; check the device data sheet for specific implementation details. As this article is intended to discuss high-level evolutionary changes, commands will not be covered in detail.)

For easy comparison, the 13 basic commands supported by the 2006 2Gb device are shaded. These 13 commands have remained compatible with newer Flash devices. The second column specifies whether the command is optional or mandatory (O/M) for NAND Flash, as required by ONFI.


Figure 5A (top) 25nm Micron MLC Performance and 5B (bottom) 25nm Micron SLC Performance

When the concepts of multiple die (LUNs) sharing a single chip enable (CE) and targets (multiple die sharing a single CE) were introduced, it was generally understood that additional information would be needed for the system to take full advantage of them. The second-to-the-last column indicates whether the command on each row can be issued when other LUNs are busy—information that is crucial in order to fully leverage the NAND’s ability to interleave or pipeline operations among several die in a single package. A typical example would be to pipeline the data loading and programming operations among all die in a particular package, as effective pipelining enables maximum performance. Figures 5A and 5B show the maximum sequential performance that can be achieved using the ONFI 2.2 source-synchronous interface, assuming the timing parameters of Table 1 for the 2010 25nm device.

Multiplane Commands
Many of the new commands in Table 2 are multiplane commands, which make better use of the physical NAND array by providing concurrent ERASE, PROGRAM, or READ operations across multiple planes. These concurrent operations must be of the same type; for example, you cannot mix an ERASE operation on one plane with a PROGRAM operation to another plane.


Table 2:  Historical Comparison of NAND Commands (click for larger image as a PDF)

Some new commands in Table 2 include:

Enhancements
Like the NAND array and interface, addressing has also followed an evolutionary migration, as shown in Tables 3 and 4. The first two bytes (the column address) have grown from 12 bits to 14 bits to accommodate the larger 8640-byte page size. Another obvious difference in the two tables is the bit naming in the third address byte. While the six least significant bits (LSBs) of Table 3 specify which of the 64 pages are being addressed, they were renamed for clarity in the transition to the newer devices. Address bytes 3–5 specify the page and block addresses, which have grown to 7 and 12, respectively, providing the ability to address the 128 pages and the 4096 blocks of the 25nm SLC NAND device. As mentioned previously, the LSB address (BA7) is used to determine which plane is selected.

Differences Between SLC and MLC
Essentially, an MLC device has the same page layout as its SLC counterpart, except with twice the number of pages. If Table 4 were describing an MLC device storing two bits per cell, it would have an additional page address bit (PA7) as the most significant bit (MSB) of the third address byte. The LSB of the block address would move to the fourth address byte, and all remaining block address bits would be shifted left accordingly.

The remaining address bit (LA0) in the fifth address byte is used to select which die (LUN) is enabled when two die share a single CE. Future NAND devices may have more than two die on a single CE. To specify the die address, additional LA (LUN addresses) would be required.

The bold specifications in Table 1 show the major timing differences for MLC, which results in the lower performance numbers of these higher-density devices.


Table 3:  Addressing for 90nm Micron 2Gb SLC NAND Flash (2006)


Table 4:  Addressing for 25nm Micron 32Gb SLC NAND Flash (2010)

Package Enhancements
The predominant package in 2006 was the 48-lead TSOP (12 x 20mm), but demand for smaller packages has since led to the adoption of the land grid array (LGA) package. The LGA package is similar to a traditional ball grid array (BGA) package but without the balls. The 14 x 18mm, 52-land LGA package has become commonplace in many consumer products. For improved package and solder joint reliability, many designers prefer the 100-ball BGA package.

All three packages are offered with up to eight die, in a variety of configurations. Both the LGA and BGA multi-die packages can be offered with two completely independent interfaces. Having separate interfaces allows users to concatenate the outputs, yielding a 16-bit-wide interface that is attractive for applications requiring higher performance. If performance is not critical, users typically tie the two 8-bit buses together to save I/O pins on the controller.

The Future of NAND Devices
Evolution of Managed Devices
In order to maintain system data reliability levels, higher levels of ECC and advanced signal-processing algorithms will be needed in the near future. These requirements will push a larger and larger share of the market toward the use of partially and fully managed devices.

Fully Managed Devices
Near the end of 2006, the MultiMedia Card Association (MMCA) agreed to work with JEDEC to create an embedded version of their card specification. The result was the initial publication of the embedded MMC (e•MMC™) specification version 4.1 in June 2007; the specification has undergone several revisions since the initial publication.

Basically, e•MMC devices are similar to many other fully managed devices in that they provide a simple, block-oriented interface to the user. While this simplicity may be a good solution for many implementations that are trying to get products to market quickly, e•MMC’s slower-performing, single-threaded interface isn’t necessarily the best choice for high-performance applications. In addition, e•MMC devices are not as deterministic as raw NAND devices because the controller may be busy with garbage collection or other time-consuming operations. If the controller is busy with these operations when the request for data is issued, it can have a big impact on performance.

Partially Managed Devices
The issue most customers object to when interfacing directly to NAND Flash is the ever-increasing amount of ECC required. Supporting the additional ECC generally requires a new version of the customer’s controller chip, a migration that is both expensive and time-consuming. Many customers prefer to handle the required block management, as it helps them differentiate their products and allows them full control over the NAND, providing higher levels of performance through the use of multithreaded techniques.

(In response to the needs of these customers, Micron is offering a new family of products called ClearNANDTM devices. ClearNAND devices are offered in similar LGA and BGA packages, with signals that are compatible to the footprint of traditional NAND. ClearNAND devices utilize the same ONFI asynchronous and synchronous interfaces that users are accustomed to and include a thin controller to manage the ECC and data integrity challenges of NAND. More details on these ClearNAND devices can be found at micron.com/clearnand.)

For several years, pundits have been projecting that NAND will be unable to scale reliably, but so far, NAND vendors have proven them wrong. However, it is clear that even more ECC and other advanced algorithms will be required to continue the migration to smaller and smaller cell geometries.

In the near future, interfacing to raw NAND Flash will be an option for only a minority of the market, the very large customers who are almost as familiar with NAND Flash as the manufacturers are. Technologies such as ClearNAND will become the solution for the majority of the market, for those customers who cannot or choose not to deal with the ever-increasing challenges of NAND, yet still want to interface with NAND directly to achieve the higher performance that the architecture can provide. ClearNAND Flash devices will enable these mainstream customers to take advantage of the high densities and attractive costs of finer geometries, while leaving the complexities (such as ECC) to the NAND manufacturers.

Growth of Tailored and Specialty Devices
In 2011, the worldwide NAND market is expected to grow to more than $20 billion and continue to drive the development of NAND Flash tailored for specific markets. Perhaps the best examples of a tailored NAND device are Enterprise NAND™ devices, unique products that increase endurance by up to six times that of standard MLC NAND. Many enterprise customers welcome the trade-off of higher endurance versus slower erase and programming timing characteristics.

With the continued growth of the NAND market and the remaining potential for cell shrinks, it is clear that NAND Flash will persist in its evolution by offering ways to reduce controller signals, increase speed, and provide better management of the NAND cell. The best way to stay informed about these technical proposals is to join onfi.org.

Top

_____________________________________________________________________________________________________________________________________________