Hard disk drives (HDD)

By Martin McBride, 2017-03-05
Tags: hard drive fragmentation
Categories: memory and storage storage

A hard disk drive (HDD) is a magnetic storage device.

Hard disk

The hard drive has a platter, a flat disk coated with a magnetic material, which spins round at high speed. It has a read/write head which can write data by magnetising or demagnetising tiny areas of the disk surface. It can read data back by sensing whether different areas are magnetised or not. The read/write head is mounted on an arm which can move to access different parts of the disk.

The data on the disk is arranged in circular tracks on the platter. The arm is moved so that the read/write head is positioned over the required track, and then the day is accessed as the platter rotates under the head.

Hard disk

A 1 Terabyte disk will typically have hundreds of thousands of tracks, and each track will store many millions of bits of data, so there would be billions of bits of data on the platter. Disk drives often have several platters. each with it own read/write arm. Platters are often double sided.

Characteristics

Hard disks are:

  • much slower than memory, often by a factor of around 100.
  • much cheaper than memory, again often by a factor of around 100.
  • physically quite large and heavy.
  • fairly power hungry.
  • fairly fragile in terms of shock (such as being dropped), extremes of temperature, or moisture.

Hard disks are a good choice for secondary storage in desktop or laptop computers. They are not a good choice for mobile device - size, weight and power consumption work against them. Although some small, light disk drives have been developed, flash memory is usually the better option.

Interface

Secondary storage is not attached directly to the CPU bus, instead it is attached to an I/O (input/output) port. The CPU accesses the drive via the operating system.

External hard drives often use a USB connector. Internal drives in desktop computers use other standard connectors such as SATA.

File based access

Hard drives (and most secondary storage devices) organise data as files and folders. This is quite different to the way memory storage works.

To store data in memory, the program has to know the exact memory location of the data. To store data on a disk, the program only needs to know the file path of the file. The program doesn't need to know where the file is actually stored on the disk (the operating system takes care of that).

The file path is the full folder name plus the file name. For example in Windows it might be C:\images\pets\cat.jpg. It tells you exactly how to find the file on disk.

Low level interface

Behind the scenes, a disk drive is really just a way of storing lots of bytes. Each track on the drive is made up of sectors, and each sector can store a fixed number of bytes. The sector size if often 512 bytes, but on modern disk drives it is sometimes 4KB.

You cannot read or write individual bytes on a disk. You can only read or write a whole sector.

Most of the sectors on the disk are used to store the actual file data. Some of the disk is used to store information about the files and folders, such as the names, and which sectors the data is located on.

Speed

When you first open a file, the disk has to move the read/write arm to the track where the file data is stored. This introduces a small delay, because the arm has to be physically moved. This delay is called the seek time (the time between the CPU asking for the data, and the data starting to be read from disk), and is usually a fraction of a second. It is also sometimes called the latency.

Once the read/write head is in the correct place, the data can be read as the platter rotates under the head. Modern disks can transfer data at around 100MB per second.

Generally, if you are reading data from one large file, transfer rates cane be quite fast because the read/write head doesn't need to move much. However, in cases where the head does need to move more frequently, the seek time makes disk access much slower. These cases include:

  • If you are reading or writing lots of small files, the disk has to seek the start of each new file.
  • If you have several programs running, and they are all accessing different files. The disk has to seek backwards and forwards between the files.

Another case if your computers runs out of memory and starts using virtual memory. Again, it is the disk having to swap between different files which causes a slow down.

Fragmentation

Imagine you have a brand new, empty disk. When you write some files onto the disk, as you might expect, the files will be written into consecutive sectors on the disk:

Files

At some point, you will probably delete some files, leaving gaps between the files:

Files

Remember that the files on your disk are not just your documents. There are also applications, temporary files, cache files. Some of these files will be created and deleted without you even knowing. Eventually, the files on the disk will scattered around, and the only free space will be the gaps between the files.

If you need to save a large file, there might not be a gap which is large enough to fit the file. So the file might have to be split and different parts of the file stored in different places on the disk:

Files

The new file has been split, and stored in 3 parts on different parts of the disk. That is OK, buy it makes the file a bit slower to read. The disk has to seek to the start, read part of the file, then seek again to read the next part, then seek again to read the final part. Since seeking is a slow operation, this slows the file access down.

As you continue to add and remove files from the disk, you might eventually reach the stage where a lot of your files are split into many different sections. This is called *fragmentation, and can have a serious effect on disk performance.

The solution is to use a disk defragmenter program. This analyses the disk, and moves data around so that all the files are contiguous (no gaps in the data), and all the free space is in one place.

See also

Sign up to the Creative Coding Newletter

Join my newsletter to receive occasional emails when new content is added, using the form below:

Popular tags

555 timer abstract data type abstraction addition algorithm and gate array ascii ascii85 base32 base64 battery binary binary encoding binary search bit block cipher block padding byte canvas colour coming soon computer music condition cryptographic attacks cryptography decomposition decryption deduplication dictionary attack encryption file server flash memory hard drive hashing hexadecimal hmac html image insertion sort ip address key derivation lamp linear search list mac mac address mesh network message authentication code music nand gate network storage none nor gate not gate op-amp or gate pixel private key python quantisation queue raid ram relational operator resources rgb rom search sort sound synthesis ssd star network supercollider svg switch symmetric encryption truth table turtle graphics yenc