To fully appreciate Apache Cassandra and what it can do, it’s helpful to first understand NoSQL databases and to then look more specifically at Cassandra’s architecture and capabilities. Doing so provides a good introduction to the system, so you can determine if it’s right for your business.
(This article is part of our Cassandra Guide. Use the right-hand menu to navigate.)
What is Cassandra?
Apache Cassandra is a distributed database management system that is built to handle large amounts of data across multiple data centers and the cloud. Key features include:
- Highly scalable
- Offers high availability
- Has no single point of failure
Written in Java, it’s a NoSQL database offering many things that other NoSQL and relational databases cannot.
Cassandra was originally developed at Facebook for their inbox search feature. Facebook open-sourced it in 2008, and Cassandra became part of the Apache Incubator in 2009. Since early 2010, it has been a top-level Apache project. It’s currently a key part of the Apache Software Foundation and can be used by anyone wanting to benefit from it.
Cassandra stands out among database systems and offers some advantages over other systems. Its ability to handle high volumes makes it particularly beneficial for major corporations. As a result, it’s currently being used by many large businesses including Apple, Facebook, Instagram, Uber, Spotify, Twitter, Cisco, Rackspace, eBay, and Netflix.
What is a NoSQL Database?
A NoSQL, often referred to as “not only SQL”, database is one that stores and retrieves data without requiring data to be stored in tabular format. Unlike relational databases, which require a tabular format, NoSQL databases allow for unstructured data. This type of database offers:
- A simple design
- Horizontal scaling
- Extensive control over availability
NoSQL databases do not require fixed schema, allowing for easy replication. With its simple API, I like Cassandra for it’s overall consistentcy and its ability to handle large amounts of data.
That said, there are pros and cons of using this type of database. While NoSQL databases offer many benefits, they also have drawbacks. Generally, NoSQL databases:
- Only support simply query language (SQL)
- Are just “eventually consistent
- Don’t support transactions
Nevertheless, they are effective with huge amounts of data and offer easy, horizontal scaling, making this type of system a good fit for many large businesses. Some ofthe most popular and effective NoSQL databases include:
- Apache Cassandra
- Apache HBase
What makes Apache Cassandra unique?
Cassandra is one of the most efficient and widely-used NoSQL databases. One of the key benefits of this system is that it offers highly-available service and no single point of failure. This is key for businesses that can afford to have their system go down or to lose data. With no single point of failure, it offers truly consistent access and availability.
Another key benefit of Cassandra is the massive volume of data that the system can handle. It can effectively and efficiently handle huge amounts of data across multiple servers. Plus, it is able to fast write huge amounts of data without affecting the read efficiency. Cassandra offers users “blazingly fast writes,” and the speed or accuracy is unaffected by large volumes of data. It is just as fast and as accurate for large volumes of data as it is for smaller volumes.
Another reason that so many enterprises utilize Cassandra is its horizontal scalability. Its structure allows users to meet sudden increases in demand, as it allows users to simply add more hardware to accommodate additional customers and data. This makes it easy to scale without shutdowns or major adjustments needed. Additionally, its linear scalability is one of the things that helps to maintain the system’s quick response time.
Some other benefits of Cassandra include:
- Flexible data storage. Cassandra can handle structured, semi-structured, and unstructured data, giving users flexibility with data storage.
- Flexible data distribution. Cassandra uses multiple data centers, which allows for easy data distribution wherever or whenever needed.
- Supports ACID. The properties of ACID (atomicity, consistency, isolation, and durability) are supported by Cassandra.
Cleary, Apache Cassandra offers some discrete benefits that other NoSQL and relational databases cannot. With continuous availability, operational simplicity, easy data distribution across multiple data centers, and an ability to handle massive amounts of volume, it is the database of choice for many enterprises.
How does Cassandra work?
Apache Cassandra is a peer-to-peer system. Its distribution design is modeled on Amazon’s DynamoDB, and its data model is based on Google’s Big Table.
The basic architecture consists of a cluster of nodes, any and all of which can accept a read or write request. This is a key aspect of its architecture, as there are no master nodes. Instead, all nodes communicate equally.
While nodes are the specific location where data lives on a cluster, the cluster is the complete set of data centers where all data is stored for processing. Related nodes are grouped together in data centers. This type of structure is built for scalability and when additional space is needed, nodes can simply be added. The result is that the system is easy to expand, built for volume, and made to handle concurrent users across an entire system.
Its structure also allows for data protection. To help ensure data integrity, Cassandra has a commit log. This is a backup method and all data is written to the commit log to ensure data is not lost. The data is then indexed and written to a memtable. The memtable is simply a data structure in the memory where Cassandra writes. There is one active memtable per table.
When memtables reach their threshold, they are flushed on a disk and become immutable SSTables. More simply, this means that when the commit log is full, it triggers a flush where the contents of memtables are written to SSTables. The commit log is an important aspect of Cassandra’s architecture because it offers a failsafe method to protect data and to provide data integrity.
Who should use Cassandra?
If you need to store and manage large amounts of data across many servers, Cassandra could be a good solution for your business. It’s ideal for businesses who:
- Can’t afford for data to be lost
- Can’t have their database down due to the outage of a single server
Further, it’s also easy to use and easy to scale, making it ideal for businesses that are consistently growing.
At its core, Apache Cassandra’s structure is “built-for-scale” and can handle large amounts of data and concurrent users across a system. It lets major corporations store massive amounts of data in a decentralized system. Yet, despite the decentralization, it still allows users to have control and access to data.
And, data is always accessible. With no single point of failure, the system offers true continuous availability, avoiding downtime and data loss. Additionally, because it can be scaled by simply adding new nodes, there is constant uptime and no need to shut the system down to accommodate more customers or more data. Given these benefits, it’s not surprising that so many major companies utilize Apache Cassandra.
- MongoDB Guide, a series of articles and tutorials
- MongoDB vs Cassandra: NoSQL Databases Compared
- Data Storage Explained: Data Lake vs Warehouse vs Database
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.
See an error or have a suggestion? Please let us know by emailing firstname.lastname@example.org.
What is Apache Cassandra introduction? ›
Apache Cassandra® is a distributed NoSQL database used by the vast majority of Fortune 100 companies. By helping companies like Apple, Facebook, and Netflix process large volumes of fast-moving data in a reliable, scalable way, Cassandra has become essential for the mission-critical features we rely on today.What is Apache Cassandra used for? ›
Apache Cassandra, a distributed database management system, is built to manage a large amount of data over several cloud data centers.What are the basic concepts of Cassandra? ›
The key feature of Cassandra is the ability to scale incrementally. This includes the ability to dynamically partition the data over a set of nodes in the cluster. Cassandra partitions data across the cluster using consistent hashing and randomly distributes the rows over the network using the hash of the row key.Which statement regarding Apache Cassandra is correct? ›
1. Which of the following is true about Apache Cassandra? Explanation: Apache Cassandra is a free and open-source, distributed, wide column store.Why is Cassandra called Cassandra? ›
Its name is inspired on priestess Cassandra of Greek mythology, who had the gift of prophecy and predicted the Trojan Horse deception.What type of database is Cassandra? ›
Cassandra is a NoSQL distributed database. By design, NoSQL databases are lightweight, open-source, non-relational, and largely distributed. Counted among their strengths are horizontal scalability, distributed architectures, and a flexible approach to schema definition.Why is Cassandra needed? ›
Cassandra persists data to disk for two very different purposes. The first is to the commitlog when a new write is made so that it can be replayed after a crash or system shutdown. The second is to the data directory when thresholds are exceeded and memtables are flushed to disk as SSTables.What are the features of Cassandra? ›
Apache Cassandra is an open source, user-available, distributed, NoSQL DBMS which is designed to handle large amounts of data across many servers. It provides zero point of failure. Cassandra offers massive support for clusters spanning multiple datacentres.Which programming language is used in Cassandra? ›
The Cassandra Query Language (CQL) is the primary language for communicating with the Apache Cassandra™ database. The most basic way to interact with Apache Cassandra is using the CQL shell, cqlsh.What data structure does Cassandra use? ›
Cassandra uses a storage structure similar to a Log-Structured Merge Tree, unlike a typical relational database that uses a B-Tree. Cassandra avoids reading before writing. Read-before-write, especially in a large distributed system, can result in large latencies in read performance and other problems.
Why use Cassandra over SQL? ›
Advantages of Cassandra
Along with scalability, the data storage is flexible. Because it is a NoSQL database, it can deal with structured, unstructured, or semi-structured data. In the same way, the data distribution is flexible. Several different data centers can be used, which makes it easier to distribute the data.
Cassandra data model provides a mechanism for data storage. The components of Cassandra data model are keyspaces, tables, and columns.Is Cassandra a document database? ›
One of these early NoSQL databases is Cassandra, a distributed database based on a hybrid between a tabular store and a key-value store. MongoDB's distributed document data model is a proven alternative to Cassandra, as it can be adapted to serve many different use cases.What is the full meaning of Cassandra? ›
: one that predicts misfortune or disaster.What does Cassandra stand for? ›
Meaning:Shining upon man. Cassandra is a feminine name of Greek origin meaning "shining upon man." This is a Latinized version of the Greek Kassandra—a name shared with a Trojan princess who was given the gift of prophecies by Apollo but cursed so that no one would believe her.What is the meaning of Cassandra? ›
Origin. Word/name. Greek mythology. Meaning. the one who shines and excels over men.Why is Cassandra faster than SQL? ›
Major reason behind Cassandra's extremely faster writes is its storage engine. Cassandra uses Log-structured merge trees, whereas traditional RDBMS uses B+ Trees as underlying data structure.How does Cassandra work internally? ›
Internally each Cassandra node handles the data between memory and disk using mechanisms to avoid less disk access operations as possible and for do that it uses a set of caches and indexes in memory to make it faster to find the data on right location.How many nodes are there in Cassandra? ›
As we said earlier, each instance of Cassandra has evolved to contain 256 virtual nodes. The Cassandra server runs core processes. For example, processes like spreading replicas around nodes or routing requests.Is Cassandra a JSON? ›
Cassandra provides support for JSON. You can, of course, store JSON text into Cassandra text columns.
Is Cassandra SQL based? ›
Cassandra keyspace is a SQL database.Who uses Cassandra database? ›
Who uses Cassandra? 516 companies reportedly use Cassandra in their tech stacks, including Uber, Facebook, and Netflix.How data is stored in Cassandra example? ›
When a write occurs, Cassandra stores the data in a memory structure called memtable, and to provide configurable durability, it also appends writes to the commit log on disk. The commit log receives every write made to a Cassandra node, and these durable writes survive permanently even if power fails on a node.What type of architecture is Cassandra? ›
Cassandra has a ring-type architecture. Cassandra has no master nodes and no single point of failure. Cassandra supports network topology with multiple data centers, multiple racks, and nodes. Cassandra read and write processes ensure fast read and write of data.Why does Facebook use Cassandra? ›
Cassandra was designed to fulfill the storage needs of the Inbox Search problem. In- box Search is a feature that enables users to search through their Facebook Inbox.Why does Netflix use Cassandra? ›
Netflix uses Cassandra for its scalability and lack of single points of failure and for cross-regional deployments. ” In effect, a single global Cassandra cluster can simultaneously service applications and asynchronously replicate data across multiple geographic locations.”Is Cassandra a relational database? ›
Cassandra is a high performance and highly scalable distributed NoSQL database management system. RDBMS is a Database management system or software which is designed for relational databases. 2. Cassandra is a NoSQL database.How does Cassandra replicate data? ›
Cassandra stores data replicas on multiple nodes to ensure reliability and fault tolerance. The replication strategy for each Edge keyspace determines the nodes where replicas are placed. The total number of replicas for a keyspace across a Cassandra cluster is referred to as the keyspace's replication factor.What is a cluster in Cassandra? ›
A Cassandra cluster is a collection of nodes, or Cassandra instances, visualized as a ring. Cassandra clusters can be defined as “rack aware” or “datacenter aware” so that data replicas could be distributed in a way that could even survive physical outages of underlying infrastructure.Is Cassandra structured or unstructured data? ›
Apache Cassandra™ is a massively scalable open source NoSQL database. Cassandra is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple datacenters and the cloud.
What programming language is Cassandra? ›
The Cassandra Query Language (CQL) is the primary language for communicating with the Apache Cassandra™ database. The most basic way to interact with Apache Cassandra is using the CQL shell, cqlsh. Using cqlsh, you can create keyspaces and tables, insert and query tables, plus much more.How is Cassandra different from MySQL? ›
MySQL, as you know, is an RDBMS (Relational Database Management System). Cassandra, however, is a NoSQL database. This means that MySQL will follow more of a master/worker architecture, while Cassandra follows peer-to-peer architecture.What language is Apache Cassandra written in? ›
Written in Java, it's a NoSQL database offering many things that other NoSQL and relational databases cannot. Cassandra was originally developed at Facebook for their inbox search feature. Facebook open-sourced it in 2008, and Cassandra became part of the Apache Incubator in 2009.What makes Cassandra unique? ›
Apache Cassandra - A NoSQL Database. Apache Cassandra® is the only distributed NoSQL database that delivers the always-on availability, blisteringly fast read-write performance, and unlimited linear scalability needed to meet the demands of successful modern applications.Why do Cassandra uses NoSQL database? ›
Cassandra is a NoSQL database, which means it does not use the traditional table structure found in SQL databases. This can make Cassandra more flexible and easier to use for certain types of data. Cassandra is designed to be highly available, meaning it can continue to function even if some of its nodes fail.Who uses Cassandra? ›
Who uses Cassandra? 516 companies reportedly use Cassandra in their tech stacks, including Uber, Facebook, and Netflix.