Enhancing Data Understanding
Rocco Gagliardi
Our stack evolution regarding OpenSearch
We started parsing the log of Raptor Firewall on Solaris using grep and regex. As data volumes grew with evolving systems, we shifted from developing custom solutions to customising open-source systems. We used ELK stack form couple of years then transitioned to Graylog for its free authentication/authorization features, despite less advanced visualization capabilities. Since two years, thanks to Elastic vs AWS license story that impacted also on Graylog, we started to look at OpenSearch. This article will try to explain why internally we switched to OpenSearch.
As said, we started with ELK. The ELK Stack, consisting of Elasticsearch, Logstash, and Kibana, is a versatile open-source suite designed for data analysis, with a particular focus on comprehensive log management and analysis. ELK facilitates both local and major cloud provider deployments, offering a complete end-to-end solution that spans from data ingestion to visualization.
Graylog primarily focused on real-time log management, has evolved from a phase of unclear objectives to now offering specialized, licensed features for targeted problem-solving in log processing and analysis, see Illuminate, Graylog Security, Graylog Operation, and Graylog API Security. The ingestion components we used at the beginning where rsyslog, after couple of years we traded the performance with the easy configuration and the huge number of plugins of Logstash.
OpenSearch provides extensive search and analytics capabilities for various data types. Notable for its high scalability and advanced data visualization through OpenSearch Dashboards, it supports multi-tenancy but is currently limited to AWS for its cloud-based version. The ingestion we use is still Logstash, but we are also evaluating Vector.
The table is designed to succinctly outline the key differences between various log management products. According to vendor specifications, each product has the capability to address a range of log management areas. However, our table specifically highlights what we perceive as the core focus of each product. For instance, while Graylog also encompasses aspects of visualization, it is not its strongest feature, so we limit the primary function to Log management and analysis.
Feature/Aspect | ELK Stack | Graylog | OpenSearch | CIS-CSC Context |
---|---|---|---|---|
Primary Function | Integrated log analysis, search, and visualization | Log management and analysis | Advanced search and analytics capabilities, fork of Elasticsearch | Useful for CIS Control 6 (Maintenance, Monitoring, and Analysis of Audit Logs) |
Data Processing | High-performance data indexing and searching | Efficient log aggregation and processing | High-performance data indexing, searching, and analytics | Supports CIS Control 6 for efficient log analysis and anomaly detection |
Scalability | Highly scalable, suitable for large datasets | Scales well for log management | Highly scalable, similar to Elasticsearch | Aligns with CIS Control 3 (Continuous Vulnerability Management) by managing large data environments |
Flexibility | Highly flexible, customizable with various plugins | Focused on log management, some integration capabilities | Highly flexible with broad integration options | Supports CIS Control 1 (Inventory and Control of Hardware Assets) and Control 2 (Software Assets) |
Visualization | Advanced visualization with Kibana | Integrated dashboards | Advanced visualization with OpenSearch Dashboards | Facilitates monitoring and analysis for various CIS Controls |
Community & Support | Large community, extensive documentation | Strong community, good documentation | Growing community, inherited Elasticsearch’s robust documentation. OpenSearch documentation should be enhanced | Important for ongoing security updates and best practices adherence |
Licensing | Open source, with paid features in Elastic Stack | Open source, with paid features for advanced analysis and visualisation | Fully open source, Apache 2.0 licensed | Ensures compatibility with organizational policies and compliance requirements |
You might question the difference in Data Processing and Scalability row among the products, considering they all fundamentally utilize Elasticsearch. The reality, however, is more nuanced. Performance differences arise due to the unique internal data organization inherent to each product. Over time, we expect these differences will widen. This is because OpenSearch and Elasticsearch are continually evolving, adding new features and functionalities, leading to an increasing divergence in their capabilities and performance.
Why switch to OpenSearch? First, OpenSearch offers superior scalability, crucial for handling the increasing volume of data in complex IT environments. Second, OpenSearch provides enhanced data analysis capabilities, facilitating more effective identification and mitigation of security threats. Lastly, OpenSearch, being a fork of Elasticsearch, inherits robust and flexible authentication and authorization features. It supports a wider range of authentication mechanisms, including LDAP/Active Directory, SAML, Kerberos, and OpenID Connect, alongside RBAC for fine-grained access control on both Dashboards and Indexes. Therefore, while all platforms offer authentication and authorization features, OpenSearch provides a more comprehensive and flexible solution.
Here some of the technical advantages:
Feature | Description |
---|---|
Tenancy Support | OpenSearch includes built-in support for multi-tenancy, allowing for the segregation of data and dashboards based on different users or user groups. |
Role-Based Access Control (RBAC) | RBAC is crucial for managing access rights in a multi-tenant environment. It allows administrators to define roles and assign these roles to users or groups, controlling access to data and resources based on these roles. |
Index Permissions | These permissions enable administrators to control access at the index level. In a multi-tenant setup, this means certain tenants can be restricted from accessing indexes that don’t pertain to them. |
Document Level Security (DLS) | DLS allows for finer-grained access control by restricting access to specific documents within an index. This is particularly useful in multi-tenant environments where different tenants may need access to different subsets of data within the same index. |
Field Level Security (FLS) | Similar to DLS, FLS restricts access to specific fields within a document. This means that even if multiple tenants have access to the same document, they can be restricted to view only certain fields within that document. |
Tenant-Specific Resources | OpenSearch supports the creation of tenant-specific resources like dashboards and visualizations, ensuring that one tenant’s data and resources are not visible or accessible to another tenant. |
Authentication and Authorization Mechanisms | Implementing strong authentication and authorization is critical in a multi-tenant environment to ensure that only authenticated and authorized users can access their respective data and resources. |
Audit Logging | Keeping detailed logs of user activities, especially in terms of access and modifications, is vital for security and compliance in a multi-tenant setup. |
Resource and Query Management | Ability to manage and allocate resources such as memory and CPU effectively, and to manage and optimize queries to ensure that no single tenant can monopolize system resources. |
In the following table we try to summarize the advantages of using OpenSearch versus ELK or Graylog.
Feature/Aspect | OpenSearch Advantages Over ELK | OpenSearch Advantages over Graylog |
---|---|---|
Licensing and Cost | Fully open-source under Apache 2.0 license, no paid tiers | Fully open-source, avoiding potential licensing costs |
Data Processing Capabilities | Handles broader data types beyond logs, advanced analytics | More advanced search and analytics capabilities |
Scalability and Performance | Highly scalable, similar to Elasticsearch, better for large datasets | More scalable, especially in distributed environments |
Flexibility and Integration | High flexibility, wide range of integrations possible | Broader integration capabilities and data source support |
Visualization | Advanced visualization with OpenSearch Dashboards | More sophisticated visualization tools than Graylog |
Community and Support | Growing community, benefits from Elasticsearchs legacy | Larger and possibly more active community than Graylog |
OpenSearch may present also some risks.
OpenSearch’s advanced features and capabilities require careful configuration and ongoing management. Misconfigurations can lead to security vulnerabilities, such as unauthorized access or data leakage. Regular audits and reviews of the configuration settings aligned with best practice security standards are essential to address this risk.
As OpenSearch is closely integrated with AWS, there is a potential risk of vendor lock-in, especially for its cloud-based services. This can limit flexibility in terms of migration to other platforms and may pose challenges in adapting to different security requirements or standards outside of the AWS ecosystem.
A few additional words about ingestion components. ELK and OpenSearch primarily use Logstash, while Graylog has its own mechanisms with various listeners and pipelines. However, there are a myriad of log shippers that can be used to generate, aggregate, and securely send messages to a server: For example, Logstash, Filebeat, Fluentd, Fluent Bit, Vector, Rsyslog, syslog-ng, etc. Choosing a ‘log shipper’ depends on multiple factors such as where it needs to be installed, to which platform the logs need to be sent, the ease of modifying messages, and so on. In our use cases, we use Logstash, but we are currently evaluating Vector.
Feature/Aspect | Logstash | Vector |
---|---|---|
Primary Use | Part of the ELK stack, used for log processing and ingestion into Elasticsearch | High-performance, highly reliable observability data pipeline |
Performance | Good performance, but can be resource-intensive for complex processing | Designed for high throughput and low latency, more efficient resource usage |
Configuration | Configured with a custom language specific to Logstash | Uses a TOML-based configuration which is generally considered to be simpler and more readable |
Data Enrichment | Extensive filter plugins for data enrichment and transformation | Supports a variety of transformations and is extendable |
Deployment | Typically deployed as a standalone service | Can be deployed as an agent or service; lightweight and suitable for edge computing |
Integration | Integrates primarily with Elasticsearch but also supports other outputs | Broad output integrations including cloud services, observability platforms, and databases |
Reliability | Offers robust features like persistent queues for reliability | Built-in reliability features including backpressure management and observability |
Community and Support | Part of the popular ELK stack with a large community and extensive documentation | Growing community, often praised for its modern approach and efficiency |
For years we have been using log management solutions. After developing in-house solutions, we first customized ELK and then Graylog. Each of these products has its use case and limitations for our use. Recently, we have switched to OpenSearch, which, even though it carries the risk of vendor lock-in, currently offers the best performance in terms of data management, visualization, and security.
We are going to monitor the digital underground for you!
Rocco Gagliardi
Rocco Gagliardi
Rocco Gagliardi
Rocco Gagliardi
Our experts will get in contact with you!