Some Real-time Issues faced during Cassandra Node Extension

 Adding a node for Cassandra isn't as simple as it sounds. There are many factors that should be evaluated before we proceed in this scenario. At Telenet, we encountered some interesting issues. We added 15 nodes for CASSANDRA_CLUST1 and 14 nodes for CASSANDRA_CLUST2 respectively

Let's get started and try to understand these issues one by one.

Issue - Cassandra_clust1 1st node could not be added, system.log error as follows.

java.nio.file.FileSystemException: <PATH>/bb-3116-bti-Data.db:Too many open files.


Root Cause-
Hard and soft limits for Cassandra_clust1 nodes were not set as per the Datastax recommendations.
Solution-
We modified number of open files for CASSANDRA_CLUST1 new nodes first. It was first tested on one node and then re-running ansible job
to add new nodes was done successfully for CASSANDRA_CLUST1 nodes.

Values from the file /etc/security/limits.conf were -

<cassandra-user> hard nofile 100000
<cassandra-user> soft nofile 100000 

were modified with below -

<cassandra-user> hard nofile 1048576 
<cassandra-user> soft nproc 32768

Before we could start next attempt to add the new node, we moved the older directories on cassandra_clust1 
mv metadata metadata_bkp

mv saved_caches saved_caches_bkp
mv hints hints_bkp
mkdir data metadata saved_caches hints
cd /dbr0001/cas/
mv commitlog commitlog_bkp

References from Datastax -

https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/config/configRecommendedSettings.html?hl=recommended%2Cproduction%2Csettings

NOTE:  It is important to note increasing number of open files for any OS user depends on sysctl.conf file parameter fs.nr_open.

Issue - Cassandra_clust2 1st node could not be added, system.log error as follows-

java.nio.file.FileSystemException: <PATH>/bb-3116-bti-Data.db:Too many open files. 

On the server side, we could see server (cassandra_clust2 new node) did not respond with high load and excessive memory consumption. A ticket was raised with Datastax Team.


Root Cause-
a. Hard and soft limits for Cassandra_clust2 nodes were not set as per the Datastax recommendations.

b. Memory arguments were not optimally set on cassandra_clust2 nodes.

c. zerocopy_streaming_enabled parameter was not set and was recommended by Datastax to set to false.


Solution-

Values from the file /etc/security/limits.conf were -

<cassandra-user> hard nofile 100000
<cassandra-user> soft nofile 100000 

were modified with below -

<cassandra-user> hard nofile 1048576 
<cassandra-user> soft nproc 32768


At node level, resources/cassandra/conf/jvm8-server.options file stores Xmx,Xms and they are being read during node startup. We increased the memory arguments from -Xmx19G to -Xmx31G for all the nodes (old and new). A rolling restart for older nodes was initiated to start with new memory arguments. 

zerocopy_streaming_enabled can be set in cassandra.yaml file under resources/cassandra/conf/ at node level and this was set to false as per recommendations from DataStax support.


Conclusion- 

The 2 issues faced above were striking enough to document them although we faced many other issues too when adding these nodes. The nodes were added using from ansible to reduce the number of steps. The complete extension activity that took almost 3 weeks as we added 29 nodes in total.

Some concluding observations to add -

  1. To verify is cassandra is started successfully on a node we can run - grep "DSE startup complete." logs/cassandra/system.log
  2. nodetool cleanup must be executed on every node ONE-AT-A-TIME after all the nodes are being added.
  3. Datastax recommends 200 tables per cluster irrespective of any number of keyspaces, so when it comes to cassandra, it is important to keep a check on the total number of tables we have. We tend to create backup tables assuming they are not being used unless being access by a query, but with Cassandra, that is not the case. 
  4. Tagging some key URLs that are my recommendations for cassandra enthusiasts - 
    https://community.datastax.com/questions/12579/limit-on-number-of-cassandra-tables.html#:~:text=We%20recommend%20a%20maximum%20of,of%20the%20number%20of%20keyspaces).

No comments:

Post a Comment