As more and more organizations increase their effort to migrate their applications to the cloud, they also need to think of data storage solutions. Many companies are in the middle of out-phasing their expensive on-premises solutions. Other companies never considered to store their data in a private data center. Cloud solutions were always their preferred method to store their data. With this shift in mind, it’s crucial to understand the requirements for your data. Cloud storage has proven itself as a powerful solution that helps to support a wide range of applications and deployment models. This article explores its various options, their suitability for different use cases, the pros and cons, scalability and security options. In short: choosing the right storage solution in the cloud: here’s how.
Context – what is cloud storage
Cloud storage providers such as Amazon Web Services (AWS) S3, Microsoft Azure Blob Storage, Google Cloud Storage, Dropbox, OneDrive, and Google Drive offer a flexible and easy way to store and manage your data in their data centers. Often they offer benefits in terms of accessibility, scalability, data redundancy, cost-effectiveness, and security. Besides these advantages, cloud storage providers also offer collaboration tools to work with data solutions as a team. End users and application developers can upload, access, and manage their data through a web interface or using APIs. In this article, we focus on the perspective of application developers and not end users.
Different types
Before you “quick pick” a cloud storage solution and start the implementation and integration of it into your workload with a cloud, it’s good to consider which types of cloud storage there are and what their main use cases are.
Type one: object storage
Object storage is pretty common for a lot of users. It is ideal for storing unstructured data like documents, images, and videos that do not fit into a (relational) database. This type is commonly used in content delivery networks (CDNs), backup and archiving, and data lakes. Some examples that benefit from object storage are:
- Media and Entertainment: The media and entertainment industry relies on object storage to store and manage large media files, such as high-definition videos and audio recordings.
- Data Analytics: object storage is used as a storage layer for data analytics platforms, enabling organizations to store and analyze large datasets that they use to drive business decisions.
- IoT Data: with the increased popularity of IoT devices, organizations collect and store large amounts of sensor data. Object storage is suitable for efficiently handling the massive influx of unstructured IoT data.
Onto the next type.
Type two: file storage
File storage is a type of data storage method that organizes and manages data as individual files within a hierarchical structure, such as the way files and folders are organized on a personal computer. This type of cloud storage is suited for applications that rely on shared file systems, such as document management and file-sharing platforms. The following examples highlight what file storage can do for you:
- Configuration Files: configuration files for software applications and systems are often stored in file storage systems for easy access and management.
- User Home Directories: in enterprise environments, user home directories are often hosted on file storage systems, ensuring that users can access their files and personal data from any authorized device.
- Software Development: developers use file storage to store source code, libraries, and project files. Version control systems like Git also rely on file storage to manage code repositories.
Although both storage types have their differences, they share the support for unstructured data, the massive amounts of data they can handle as well as the accessibility of these data.
Type three: block storage
Block storage is a type of data storage system that manages data in fixed-sized, individually addressable blocks. Unlike file storage, which organizes data into files and directories, block storage treats data as discrete blocks of a fixed size (typically ranging from a few bytes to several megabytes). Typical examples of block storage are:
- Storage Area Networks (SANs): SANs use block storage to provide centralized and high-performance storage for servers and applications. This solution is commonly used in enterprise environments which need to support mission-critical applications.
- Databases: relational databases like Oracle, MySQL, and Microsoft SQL Server often use block storage to manage data files and log files. Database systems demand a structured approach, so block storage fits well in here.
- Big Data and Analytics: block storage is used in big data and analytics environments to store large datasets that require high-speed access and processing.
As seen in the above-mentioned examples, all three storage types have different characteristics and use cases. It acts as a starting point for your application data requirements.
Technical considerations
In addition to the functional requirements and a high-level selection of applications/use cases that determine the type of cloud storage that fits your case, there are other considerations to take into account. The following technical aspects need to be carefully assessed to influence your choice.
Data structure
First of all, you need to determine which type of data structure you need (or want). This section to crucial since it has an impact on how the actual data is organized and how you would access it. Object storage organizes data as objects, each with a unique identifier (such as a key) and associated metadata while file storage releases files and directories within a hierarchical structure. In block storage, the data is divided into fixed-sized blocks, each with a unique block address.
Storage unit
The “storage unit” concept determines how the data is actually stored. In object storage, data objects can vary in size, typically ranging from a few bytes to several gigabytes. Block storage uses “data blocks” which range from 512 bytes to several megabytes to store the data. In file storage, you would handle actual files from a few bytes to terabytes or even more.
Durability
One of the key requirements of data storage solutions is durability. The three different types of cloud storage solutions have huge differences in this topic. Object storage solutions are often designed for very high durability, often exceeding 99.9999999% (eleven nines) reliability. In contrast to that, the durability of file storage solutions depends on backup and redundancy strategies implemented by yourself. So this also requires expert-level knowledge of the storage solution and its capabilities. Often this is cloud provider-specific. The same is true for block storage.
Access (control)
File storage solutions offer “traditional” ways to access files and directories using network file-sharing protocols such as SMB (Server Message Block) for Windows environments and NFS (Network File System) for Unix-based environments. Administrators can set quotas and limits such as the restriction to execute scripts or only being able to store files smaller than 100MB. In block storage, applications interact with block storage devices at the block level using protocols like iSCSI or Fibre Channel. This works completely differently under the hood. Opposite of the above, object storage solutions should be accessed over standard HTTP(S) or (REST)ful APIs. To keep data secure, an administrator should implement access policies and permissions.
Data redundancy
Providers often offer built-in redundancy by replicating data across multiple locations or nodes when it comes to object storage solutions. File storage solutions require implementations on the file system level or through additional backup solutions. There are no “3-click” solutions to help you here. In block storage, you often need to rely on RAID (Redundant Array of Independent Disks) technology to keep your data safe in case problems occur.
Scalability
Whatever type of application you have, you need to be able to withstand peak traffic load to handle tough queries or a high number of end-users such as visitors to your website. You need to be able to scale your data solutions. Here, things also become different for each storage type.
Block storage can be scaled vertically (by adding more capacity to existing devices) or horizontally (by adding more storage devices). Within the cloud ecosystem, this is a rather simple task that almost always can be achieved without downtime. For file storage, scalability can be achieved by adding additional storage devices or capacity to file servers. This can also be done without downtime. Keep in mind the time that is needed to expand your data volumes to distribute your data evenly across all devices. These types often require manual intervention. Object storage solutions offer completely automated scalability solutions such as up- and down-scaling on demand.
Security topics
Security always remains a key priority when it comes to fine-tuning the requirements. The following considerations do not always apply to all data storage types but they give you a good idea of the security-related topics to keep in mind.
Generic topics
- Data classification means to classify data based on its sensitivity and regulatory requirements. Apply appropriate security controls, such as encryption and access controls, based on data classifications.
- Train employees: learn data security best practices and raise the awareness of potential threats such as phishing attacks and social engineering.
Prevent data losses
- Think of data loss prevention (DLP) methods to prevent data from leaving the organization. DLP solutions should detect and block unauthorized data transfers.
- Data backup and disaster recovery: establish solid backup and disaster recovery plans to ensure data availability in case of data loss, hardware failures, or cyberattacks. Create and manage an incident response plan. These plans should outline the steps to be taken in case of a security incident or data breach.
Rules and regulations
- Understand and apply actions based on (external) compliance and regulations. Every data storage solution should give an answer to which (data security) standard it supports. Think of the NIST, CIS, SOC2, or PCS standard which helps to implement the GDPR, HIPAA, or SOC2 data security regulations.
- Network security is also an important factor to take into account. Make sure you implement network security measures like Virtual Private Clouds (VPCs) or Virtual Networks to segment and isolate data storage resources from the public internet and other parts of the network. Use firewall rules and security groups to control traffic to and from data storage resources. This also helps you to cover topics such as proper access control.
There are many more security topics to keep in mind such as vulnerability scanning, and patching of (critical) systems. Furthermore, data retention and deletion policies as well as third-party assessments.
Wrap up
Cloud storage solutions are roughly split into three types: file storage, object storage, and block storage. All of them have different characteristics and use cases. These include structured or unstructured storage methods, and the actual usage of storage units. As well as access control or the options to implement data redundancy and backup. Every organization should define its own requirements which apply to every application that needs a storage solution in the cloud. Security as well as compliance and regulations remains a key factor that should not be left out.