Transition to OpenSearch - The Rationale Behind Our Stack Evolution

Transition to OpenSearch

The Rationale Behind Our Stack Evolution

Rocco Gagliardi
by Rocco Gagliardi
on February 01, 2024
time to read: 13 minutes

Keypoints

Our stack evolution regarding OpenSearch

  • ELK Stack for comprehensive analysis
  • Graylog focussed on log management
  • OpenSearch for scalable visualization and multi-tenancy
  • OpenSearch excels in handling large, complex data environments
  • OpenSearch offers advanced data analysis and robust, flexible security features

We started parsing the log of Raptor Firewall on Solaris using grep and regex. As data volumes grew with evolving systems, we shifted from developing custom solutions to customising open-source systems. We used ELK stack form couple of years then transitioned to Graylog for its free authentication/authorization features, despite less advanced visualization capabilities. Since two years, thanks to Elastic vs AWS license story that impacted also on Graylog, we started to look at OpenSearch. This article will try to explain why internally we switched to OpenSearch.

The stacks

As said, we started with ELK. The ELK Stack, consisting of Elasticsearch, Logstash, and Kibana, is a versatile open-source suite designed for data analysis, with a particular focus on comprehensive log management and analysis. ELK facilitates both local and major cloud provider deployments, offering a complete end-to-end solution that spans from data ingestion to visualization.

Graylog primarily focused on real-time log management, has evolved from a phase of unclear objectives to now offering specialized, licensed features for targeted problem-solving in log processing and analysis, see Illuminate, Graylog Security, Graylog Operation, and Graylog API Security. The ingestion components we used at the beginning where rsyslog, after couple of years we traded the performance with the easy configuration and the huge number of plugins of Logstash.

OpenSearch provides extensive search and analytics capabilities for various data types. Notable for its high scalability and advanced data visualization through OpenSearch Dashboards, it supports multi-tenancy but is currently limited to AWS for its cloud-based version. The ingestion we use is still Logstash, but we are also evaluating Vector.

Differences between ELK, Graylog and OpenSearch

The table is designed to succinctly outline the key differences between various log management products. According to vendor specifications, each product has the capability to address a range of log management areas. However, our table specifically highlights what we perceive as the core focus of each product. For instance, while Graylog also encompasses aspects of visualization, it is not its strongest feature, so we limit the primary function to Log management and analysis.

Feature/Aspect ELK Stack Graylog OpenSearch CIS-CSC Context
Primary Function Integrated log analysis, search, and visualization Log management and analysis Advanced search and analytics capabilities, fork of Elasticsearch Useful for CIS Control 6 (Maintenance, Monitoring, and Analysis of Audit Logs)
Data Processing High-performance data indexing and searching Efficient log aggregation and processing High-performance data indexing, searching, and analytics Supports CIS Control 6 for efficient log analysis and anomaly detection
Scalability Highly scalable, suitable for large datasets Scales well for log management Highly scalable, similar to Elasticsearch Aligns with CIS Control 3 (Continuous Vulnerability Management) by managing large data environments
Flexibility Highly flexible, customizable with various plugins Focused on log management, some integration capabilities Highly flexible with broad integration options Supports CIS Control 1 (Inventory and Control of Hardware Assets) and Control 2 (Software Assets)
Visualization Advanced visualization with Kibana Integrated dashboards Advanced visualization with OpenSearch Dashboards Facilitates monitoring and analysis for various CIS Controls
Community & Support Large community, extensive documentation Strong community, good documentation Growing community, inherited Elasticsearch’s robust documentation. OpenSearch documentation should be enhanced Important for ongoing security updates and best practices adherence
Licensing Open source, with paid features in Elastic Stack Open source, with paid features for advanced analysis and visualisation Fully open source, Apache 2.0 licensed Ensures compatibility with organizational policies and compliance requirements

You might question the difference in Data Processing and Scalability row among the products, considering they all fundamentally utilize Elasticsearch. The reality, however, is more nuanced. Performance differences arise due to the unique internal data organization inherent to each product. Over time, we expect these differences will widen. This is because OpenSearch and Elasticsearch are continually evolving, adding new features and functionalities, leading to an increasing divergence in their capabilities and performance.

OpenSearch features

Why switch to OpenSearch? First, OpenSearch offers superior scalability, crucial for handling the increasing volume of data in complex IT environments. Second, OpenSearch provides enhanced data analysis capabilities, facilitating more effective identification and mitigation of security threats. Lastly, OpenSearch, being a fork of Elasticsearch, inherits robust and flexible authentication and authorization features. It supports a wider range of authentication mechanisms, including LDAP/Active Directory, SAML, Kerberos, and OpenID Connect, alongside RBAC for fine-grained access control on both Dashboards and Indexes. Therefore, while all platforms offer authentication and authorization features, OpenSearch provides a more comprehensive and flexible solution.

Here some of the technical advantages:

Feature Description
Tenancy Support OpenSearch includes built-in support for multi-tenancy, allowing for the segregation of data and dashboards based on different users or user groups.
Role-Based Access Control (RBAC) RBAC is crucial for managing access rights in a multi-tenant environment. It allows administrators to define roles and assign these roles to users or groups, controlling access to data and resources based on these roles.
Index Permissions These permissions enable administrators to control access at the index level. In a multi-tenant setup, this means certain tenants can be restricted from accessing indexes that don’t pertain to them.
Document Level Security (DLS) DLS allows for finer-grained access control by restricting access to specific documents within an index. This is particularly useful in multi-tenant environments where different tenants may need access to different subsets of data within the same index.
Field Level Security (FLS) Similar to DLS, FLS restricts access to specific fields within a document. This means that even if multiple tenants have access to the same document, they can be restricted to view only certain fields within that document.
Tenant-Specific Resources OpenSearch supports the creation of tenant-specific resources like dashboards and visualizations, ensuring that one tenant’s data and resources are not visible or accessible to another tenant.
Authentication and Authorization Mechanisms Implementing strong authentication and authorization is critical in a multi-tenant environment to ensure that only authenticated and authorized users can access their respective data and resources.
Audit Logging Keeping detailed logs of user activities, especially in terms of access and modifications, is vital for security and compliance in a multi-tenant setup.
Resource and Query Management Ability to manage and allocate resources such as memory and CPU effectively, and to manage and optimize queries to ensure that no single tenant can monopolize system resources.

Advantages of OpenSearch versus ELK and Graylog

In the following table we try to summarize the advantages of using OpenSearch versus ELK or Graylog.

Feature/Aspect OpenSearch Advantages Over ELK OpenSearch Advantages over Graylog
Licensing and Cost Fully open-source under Apache 2.0 license, no paid tiers Fully open-source, avoiding potential licensing costs
Data Processing Capabilities Handles broader data types beyond logs, advanced analytics More advanced search and analytics capabilities
Scalability and Performance Highly scalable, similar to Elasticsearch, better for large datasets More scalable, especially in distributed environments
Flexibility and Integration High flexibility, wide range of integrations possible Broader integration capabilities and data source support
Visualization Advanced visualization with OpenSearch Dashboards More sophisticated visualization tools than Graylog
Community and Support Growing community, benefits from Elasticsearchs legacy Larger and possibly more active community than Graylog

Risks of using OpenSearch

OpenSearch may present also some risks.

Complex Configuration and Management

OpenSearch’s advanced features and capabilities require careful configuration and ongoing management. Misconfigurations can lead to security vulnerabilities, such as unauthorized access or data leakage. Regular audits and reviews of the configuration settings aligned with best practice security standards are essential to address this risk.

Dependency and Vendor Lock-in

As OpenSearch is closely integrated with AWS, there is a potential risk of vendor lock-in, especially for its cloud-based services. This can limit flexibility in terms of migration to other platforms and may pose challenges in adapting to different security requirements or standards outside of the AWS ecosystem.

The ingestion components

A few additional words about ingestion components. ELK and OpenSearch primarily use Logstash, while Graylog has its own mechanisms with various listeners and pipelines. However, there are a myriad of log shippers that can be used to generate, aggregate, and securely send messages to a server: For example, Logstash, Filebeat, Fluentd, Fluent Bit, Vector, Rsyslog, syslog-ng, etc. Choosing a ‘log shipper’ depends on multiple factors such as where it needs to be installed, to which platform the logs need to be sent, the ease of modifying messages, and so on. In our use cases, we use Logstash, but we are currently evaluating Vector.

Feature/Aspect Logstash Vector
Primary Use Part of the ELK stack, used for log processing and ingestion into Elasticsearch High-performance, highly reliable observability data pipeline
Performance Good performance, but can be resource-intensive for complex processing Designed for high throughput and low latency, more efficient resource usage
Configuration Configured with a custom language specific to Logstash Uses a TOML-based configuration which is generally considered to be simpler and more readable
Data Enrichment Extensive filter plugins for data enrichment and transformation Supports a variety of transformations and is extendable
Deployment Typically deployed as a standalone service Can be deployed as an agent or service; lightweight and suitable for edge computing
Integration Integrates primarily with Elasticsearch but also supports other outputs Broad output integrations including cloud services, observability platforms, and databases
Reliability Offers robust features like persistent queues for reliability Built-in reliability features including backpressure management and observability
Community and Support Part of the popular ELK stack with a large community and extensive documentation Growing community, often praised for its modern approach and efficiency

Summary

For years we have been using log management solutions. After developing in-house solutions, we first customized ELK and then Graylog. Each of these products has its use case and limitations for our use. Recently, we have switched to OpenSearch, which, even though it carries the risk of vendor lock-in, currently offers the best performance in terms of data management, visualization, and security.

About the Author

Rocco Gagliardi

Rocco Gagliardi has been working in IT since the 1980s and specialized in IT security in the 1990s. His main focus lies in security frameworks, network routing, firewalling and log management.

Links

Is your data also traded on the dark net?

We are going to monitor the digital underground for you!

×
Enhancing Data Understanding

Enhancing Data Understanding

Rocco Gagliardi

Graylog v5

Graylog v5

Rocco Gagliardi

auditd

auditd

Rocco Gagliardi

Security Frameworks

Security Frameworks

Rocco Gagliardi

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here