Keypoints
Ensuring data protection and compliance through a comprehensive review of backup systems
- The importance of backup systems in safeguarding data and meeting compliance
- Key considerations for system design and implementation
- The role of RTO and RPO in defining system performance
- A structured approach using the NIST Cybersecurity Framework (CSF)
- Insights from recent client experiences in various sectors
Backup systems are essential for safeguarding sensitive data and ensuring compliance with regulatory requirements. The design and implementation of these systems are critical due to their interaction with assets of varying classifications. Key decisions—such clear policies definition, systems classification, and backup strategies— could significantly impact the system’s overall security, performance, and reliability. Leveraging our extensive experience in various IT security roles, our task is to analyze your infrastructure by comparing it against best practices, considering both technical and governance perspectives. With the help of your experts, policies, and configurations we can quickly understand your solution, identify vulnerabilities, and suggest practical and effective remedies.
Backup system
A backup system is a comprehensive framework designed to create, store, and manage copies of critical data and systems to ensure their availability and integrity in the event of data loss, corruption, or system failure. This system typically involves processes for regularly duplicating data from primary storage to a secure, secondary location—whether on-premises, in the cloud, or both. Effective backup systems are essential for maintaining business continuity, as they allow organizations to restore operations quickly and with minimal data loss. Key components of a backup system include scheduling, data encryption, secure storage solutions, and regular testing to verify the reliability and completeness of backups.
It’s also crucial to emphasize the importance of proper, restorable backups, particularly in the context of ransomware attacks, where they can act as a company’s “life insurance policy”. Ransomware resilience via backups may require special provisions, such as ensuring that the backups themselves are not encrypted by malware and that any malware present is not restored from the backups. These additional measures help guarantee that backups remain an effective safeguard against ransomware and other threats.
Design a backup system
When designing an effective backup system, alignment with Business Continuity Management (BCM) strategies is crucial. Two parameters, Recovery Time Objective (RTO) and Recovery Point Objective (RPO), are the key to ensure that backup solutions meet organizational resilience requirements. RTO defines the maximum acceptable downtime after a disruption, addressing, “How quickly must we restore operations?” A shorter RTO necessitates faster recovery solutions, such as high-speed network links and rapid failover mechanisms. RPO defines the maximum allowable data loss, measured in time, addressing, “How much data can we afford to lose?” A stricter RPO requires more frequent backups or real-time data replication to minimize potential data loss.
In designing systems for effective RTO and RPO, several considerations come into play:
- Data Encryption: Encrypting backups is crucial for security but can slow down the restoration process, affecting both RTO and RPO. Organizations should balance security needs with recovery speed, possibly using optimized encryption algorithms.
- Monitoring and Alerting: Quick detection of disruptions is vital. Effective monitoring and immediate alerting mechanisms help minimize downtime since the RTO clock starts as soon as an issue occurs, rather than when the restore process begins.
- Physical Access to Backups: For offline backups, physical access can delay recovery. The location of backups, access protocols, and transport logistics should be planned to minimize impact on RTO.
- Transactional Systems: High-frequency transaction environments make achieving a low RPO challenging. Different technologies can be used, each with its own consequences to consider.
- Restore Sequence: the order of system restores impacts overall recovery time. Proper planning ensures that critical systems and underlying infrastructure are prioritized, optimizing both RTO and RPO.
RTO and RPO must be aligned to ensure an effective recovery strategy. For example, if an organization requires a short RTO, such as 15 minutes, it should have a similarly short RPO to ensure recent data is quickly restored, since if the RTO is 15 minutes but the RPO is set at 24 hours, the organization may restore operations quickly, but with data that is up to 24 hours old, which could result in data loss and operational inconsistencies. This misalignment leads to an ineffective recovery strategy, where resources are misallocated, investing in rapid system recovery without ensuring that the restored data is current.
Accurate data classification is essential for defining Recovery Time Objective (RTO) and Recovery Point Objective (RPO), allowing organizations to segment their infrastructure based on varying recovery needs. This segmentation directly affects infrastructure components such as network links, storage capacity, and access speed. For instance, a shorter RTO may necessitate higher-speed network links and faster storage solutions, while a stricter RPO can increase storage demands due to the need for more frequent backups.
In complex systems like interconnected transactional environments—such as those using message queuing (MQ) for data exchange— good definition of RTO and RPO is even more critical. These systems rely on continuous, real-time data processing, where any downtime or data loss can cascade across multiple platforms. Precise RTO and RPO values ensure that each component can recover within required timeframes, maintaining the integrity and continuity of the entire transactional workflow.
“Business continuity is not just about maintaining operations, it is about ensuring that data integrity and security are preserved during and after a disruption.”
Technical consequences of the regulations
Backup systems are not just a technical necessity but also a critical component of regulatory compliance. Regulations like GDPR, PCI-DSS, and NIST SP 800-53 impose specific requirements that organizations must adhere to when managing backup data. Understanding and implementing these requirements is essential for maintaining compliance and protecting sensitive data. Depending on the types of data they handle and the jurisdictions in which they operate, organizations face multiple compliance obligations. Below the most important:
- Data Protection and Encryption:
- Ensure personal data and cardholder data in backups are encrypted both in transit and at rest. (GDPR, PCI-DSS, NIST SP 800-53)
- Ensure the integrity and confidentiality of data by implementing cryptographic protections and continuous monitoring for unauthorized access or alterations. (NIST SP 800-53)
- Apply controls for cross-border transfers of backup data, such as Standard Contractual Clauses (SCCs) or approved mechanisms. (GDPR)
- Access Control:
- Enforce strict access controls, including multi-factor authentication, allowing only authorized personnel to access backups, following the “need-to-know” principle. (GDPR, PCI-DSS, NIST SP 800-53)
- Data Retention and Erasure:
- Implement data retention policies, ensuring data is not retained longer than necessary, and unnecessary or outdated data is deleted from backups. (GDPR, PCI-DSS)
- Include the ability to delete personal data from backups upon request, supporting the right to erasure without compromising data integrity. (GDPR)
- Incident Response and Auditing:
- Develop robust incident response procedures, including rapid data restoration from backups to minimize downtime and data loss. (PCI-DSS, NIST SP 800-53)
- Maintain detailed audit trails for all actions related to backup data, including access, modifications, and deletions, with secure storage of logs for review. (GDPR, PCI-DSS, NIST SP 800-53)
From the regulations, there are technical implications in managing data backup systems, as example when dealing with PCI-DSS zones, WORM systems, and offline tape backups:
- PCI-DSS Zone Compliance:
Data within PCI-DSS zones must remain encrypted at all times, even during backup. To meet this requirement, organizations normally install a dedicated backup node within the PCI-DSS zone. This node encrypts the data before it is transferred to storage servers or tapes, ensuring that sensitive information never leaves the zone unencrypted. The restore of the data in the corresponding zone requires that the underlying infrastructure (switches, routers, firewalls, IAM, etc.) is correctly functioning before the restore can begin. Managing such systems within the PCI-DSS zone, including the relative cryptographic material, may require different procedures, including stronger authentication methods and the 4-eyes principle. Additionally, updating these systems can be challenging, as direct connections to the internet are typically not allowed.
- WORM Systems and GDPR Compliance:
WORM systems, which store data in an immutable format, pose significant challenges for GDPR compliance, especially regarding the right to erasure (Article 17) and the right to rectification (Article 16). Since WORM systems prevent data from being modified or deleted, organizations must carefully plan and implement specific mechanisms to address these GDPR requirements, such as using additional layers of encryption or implementing a process to logically delete or mask data.
- Offline Tape Backup Management:
Offline tape backups add another layer of complexity, as they can delay data access and complicate deletion processes. This makes it challenging to comply promptly with GDPR requests for data erasure or modification. To overcome this, organizations should establish detailed procedures for managing offline tapes, ensuring that any data stored on these media can be accessed and deleted in a manner that supports GDPR compliance.
Our Approach
At scip AG, we conduct a comprehensive review of governance, operational processes, and technical infrastructure of backup solutions. Our expertise in reverse engineering entire systems, focusing on security components and their interactions, enables us to identify deviations from design specifications or established security standards. This includes adherence to frameworks like CIS and industry-specific mandates such as PCI-DSS. Based on our findings, we provide actionable recommendations to realign systems with their intended design or applicable standards.
Review Methodology: NIST CSF Oriented
Our review process is structured around the NIST Cybersecurity Framework (CSF), covering the following areas:
Govern
- Backup Policy Review: Ensure a comprehensive backup policy aligns with governance and risk management frameworks.
- Risk Management Integration: Confirm backup risks are documented and mitigated as part of the organization’s overall risk assessment.
- Regulatory Compliance: Verify backups meet legal and regulatory requirements for data protection and compliance.
Identify
- Backup Coverage: Ensure all critical data and systems are included in the backup scope.
- Compliance Adherence: Confirm that backup processes comply with relevant legal and industry standards.
- Documentation: Verify that comprehensive documentation of backup procedures and policies is available and up to date.
Protect
- Authentication and Authorization: Ensure secure access control mechanisms are in place for backup systems.
- Encryption: Confirm that data is encrypted during storage (at rest) and during transfer (in transit).
- OS and Configuration Security: Review operating system settings and backup configurations to ensure adherence to security best practices.
- Immutable Storage: Verify that backups are stored in an immutable or pseudo-immutable state to prevent tampering.
Detect
- Monitoring: Implement continuous monitoring of backup operations to track system performance and detect anomalies.
Respond
- Alerting: Establish alert mechanisms for backup failures or security incidents to enable quick and effective responses.
Recover
- Backup Frequency: Ensure that backups are performed at intervals that adequately prevent data loss.
- Recovery Testing: Regularly test recovery processes to ensure quick and accurate data restoration.
- Management Ease: Assess the ease of managing backup operations, including restoration processes and administrative tasks.
Recent Experiences
In the past three years, we have conducted detailed assessments of major backup solutions for clients in banking, insurance, and industrial sectors. Some of the solutions reviewed include:
Conclusion
In summary, designing and implementing a backup system that aligns with both security standards and business continuity requirements is essential for safeguarding an organization’s critical data. By focusing on key governance parameters like RTO and RPO, and by following a structured review process such as the NIST CSF, organizations can ensure their backup systems are both resilient and efficient. As demonstrated in our recent client experiences, a well-architected backup strategy not only protects against data loss but also ensures quick recovery and continuity of operations, even in complex transactional environments. For more insights on best practices and the latest trends in IT security, read our publications at scip.ch.
About the Authors
Andrea Covello has been working in information security since the 1990s. His strengths are in engineering, specializing in Windows security, firewalling and advanced virtualization.
Rocco Gagliardi has been working in IT since the 1980s and specialized in IT security in the 1990s. His main focus lies in security frameworks, network routing, firewalling and log management.
Links