Some Real-time Issues faced during Cassandra Node Extension

 Adding a node for Cassandra isn't as simple as it sounds. There are many factors that should be evaluated before we proceed in this scenario. At Telenet, we encountered some interesting issues. We added 15 nodes for CASSANDRA_CLUST1 and 14 nodes for CASSANDRA_CLUST2 respectively

Let's get started and try to understand these issues one by one.

Issue - Cassandra_clust1 1st node could not be added, system.log error as follows.

java.nio.file.FileSystemException: <PATH>/bb-3116-bti-Data.db:Too many open files.


Root Cause-
Hard and soft limits for Cassandra_clust1 nodes were not set as per the Datastax recommendations.
Solution-
We modified number of open files for CASSANDRA_CLUST1 new nodes first. It was first tested on one node and then re-running ansible job
to add new nodes was done successfully for CASSANDRA_CLUST1 nodes.

Values from the file /etc/security/limits.conf were -

<cassandra-user> hard nofile 100000
<cassandra-user> soft nofile 100000 

were modified with below -

<cassandra-user> hard nofile 1048576 
<cassandra-user> soft nproc 32768

Before we could start next attempt to add the new node, we moved the older directories on cassandra_clust1 
mv metadata metadata_bkp

mv saved_caches saved_caches_bkp
mv hints hints_bkp
mkdir data metadata saved_caches hints
cd /dbr0001/cas/
mv commitlog commitlog_bkp

References from Datastax -

https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/config/configRecommendedSettings.html?hl=recommended%2Cproduction%2Csettings

NOTE:  It is important to note increasing number of open files for any OS user depends on sysctl.conf file parameter fs.nr_open.

Issue - Cassandra_clust2 1st node could not be added, system.log error as follows-

java.nio.file.FileSystemException: <PATH>/bb-3116-bti-Data.db:Too many open files. 

On the server side, we could see server (cassandra_clust2 new node) did not respond with high load and excessive memory consumption. A ticket was raised with Datastax Team.


Root Cause-
a. Hard and soft limits for Cassandra_clust2 nodes were not set as per the Datastax recommendations.

b. Memory arguments were not optimally set on cassandra_clust2 nodes.

c. zerocopy_streaming_enabled parameter was not set and was recommended by Datastax to set to false.


Solution-

Values from the file /etc/security/limits.conf were -

<cassandra-user> hard nofile 100000
<cassandra-user> soft nofile 100000 

were modified with below -

<cassandra-user> hard nofile 1048576 
<cassandra-user> soft nproc 32768


At node level, resources/cassandra/conf/jvm8-server.options file stores Xmx,Xms and they are being read during node startup. We increased the memory arguments from -Xmx19G to -Xmx31G for all the nodes (old and new). A rolling restart for older nodes was initiated to start with new memory arguments. 

zerocopy_streaming_enabled can be set in cassandra.yaml file under resources/cassandra/conf/ at node level and this was set to false as per recommendations from DataStax support.


Conclusion- 

The 2 issues faced above were striking enough to document them although we faced many other issues too when adding these nodes. The nodes were added using from ansible to reduce the number of steps. The complete extension activity that took almost 3 weeks as we added 29 nodes in total.

Some concluding observations to add -

  1. To verify is cassandra is started successfully on a node we can run - grep "DSE startup complete." logs/cassandra/system.log
  2. nodetool cleanup must be executed on every node ONE-AT-A-TIME after all the nodes are being added.
  3. Datastax recommends 200 tables per cluster irrespective of any number of keyspaces, so when it comes to cassandra, it is important to keep a check on the total number of tables we have. We tend to create backup tables assuming they are not being used unless being access by a query, but with Cassandra, that is not the case. 
  4. Tagging some key URLs that are my recommendations for cassandra enthusiasts - 
    https://community.datastax.com/questions/12579/limit-on-number-of-cassandra-tables.html#:~:text=We%20recommend%20a%20maximum%20of,of%20the%20number%20of%20keyspaces).

Key Takeaways from the recent hrglobal.drv patch installation for EBS r12.2

 1. Metalink note does not provide explicit step to source patch fs before running below adop command. Please source it before running adop apply so that right hrglobal.drv is picked up from PATCH_BASE and not RUN_BASE

2. Please run the patch with parallel worker process by appending - workers=8, again this is not mentioned in the hrglobal.drv and will speed up the patch apply.

3. There are some pre-requisite patches before applying hrglobal.drv. make sure they are applied. They are actually listed on the lower end of the metalink document but must be applied before running hrglobal.drv patch.

4. I tried running DataInstall with hostname of the db server, but it failed to execute, while the same went through when used IP

java oracle.apps.per.DataInstall apps appspwd thin 10.3.x.xxx:<PORT DB>:<DB_SID>

5. Tagging few important metalink notes that you must review before applying hrglobal.drv

Doc ID 1469456.1 Datainstall and Hrglobal Application: 12.2 specifics
Doc ID 2006776.1 R12.2: How to Install HRMS Legislative Data Using Data Installer and hrglobal.drv
and Install the Legislations and run the hrglobal as per your business need including India Legislation.
HRGLOBAL:DataInstall Fails with Java.Lang.Nullpointerexception (Doc ID 730910.1)


EBS Cloud Manager -- A DBA Sailing around Linux Administration, OCI Cloud Shell, OS Firewalls...

 

After deploying EBS using cloud manager, we were not able to login to apps and db nodes on OCI as root/opc users.
The EBS Cloud manager guide has mentioned only one way to login to apps and db nodes after deployment on OCI i.e. -
- Login to Cloud Manager as opc
- sudo su - oracle
- ssh apps node ip
- ssh db node ip

We may require connecting to root OS user for some superuser related tasks. In my case we had to check db node port as developers were
not able to connect using sqldeveloper after connecting to the VPN (interesting things coming up for this issue later in this blog).

So we were in this scenario -
1. Port is somewhere blocked.
2. db node ip is pingable
3. We can only login to the db node using os user - oracle
4. We can't check firewall rules without root access.
It all started with setting root password for this db node and we followed below note -

Ref -How to Reset Root Password in Oracle Linux 7 (Doc ID 1954652.1)

1. Launch Cloud Shell on OCI for the specific instance.



Now keep cloud shell open, and reboot the db node (of course after shutting the db and listener) -


2. Reboot the server

3. Press upper key when Grub is loading.. 


5. While booting GRUB 2 Edit Menu Option (by pressing E)

6. Select the line starts with linux16**** (or linuxefi**** for UEFI bios)  and append "rd.break" at the end of the line.
Example:
linux16 **** rd.break


7. Press ctrl+x to boot or start.

8. First we will remount the sysroot file system in read and write mode and then use chroot to got into a chroot jail:
# mount -o remount,rw /sysroot
# chroot /sysroot
9. Type passwd command in the command line and press same password twice for reset root password:
# passwd
10. Make sure that all unlabel files ( including shadow ) get relabeled during booting:
# touch /.autorelabel
11. Type the command to sync:
# sync
12. Type twice exit command to leave & logout.
13. The system will apply some SELinux contexts and reboot.
All the commands in one screen below for your reference -



Now the interesting part. I was initially confused if at all firewall was causing port blocking. Reason was simple, apps node was 
connecting to the database node successfully.

So I checked a couple of things at OCI level -
1. Security List.
2. Route table ( we had 2 VCNs, one where VPN was connected and the second for EBS on Cloud Deployment)
We are getting into OCI networking now :). The only thing that was pending was firewall settings at db node and then came gotcha moment -

# firewall-cmd --get-default-zone
public
# firewall-cmd --get-active-zones
# firewall-cmd --list-all
public
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: dhcpv6-client ssh
  ports:
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
        rule family="ipv4" source address="10.3.x.xxx" port port="1521" protocol="tcp" accept


A rich rule is defined by terraform when deploying EBS on Cloud through Cloud Manager and it has db listener port say 1521 open
only for the apps node 
# firewall-cmd --permanent --zone=public --list-rich-rules
rule family="ipv4" source address="10.3.x.xxx" port port="1521" protocol="tcp" accept
So we added another rule to open port and add CIDR range for the VCN that could connect to the db.
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family=ipv4 source address=11.x.x.x/20 port port=1521 protocol=tcp accept'
success
# firewall-cmd --reload
success
# firewall-cmd --permanent --zone=public --list-rich-rules
rule family="ipv4" source address="10.3.x.xxx" port port="1521" protocol="tcp" accept
rule family="ipv4" source address="11.x.x.x/20" port port="1521" protocol="tcp" accept


So we started with OCI Console connection, reset the root password for an OCI db node and added a rich rule to open port for a specific CIDR. 








EBS Cloud Manager Troubleshooting - Creating Backups

EBS Cloud manager is well automated for setups on OCI and there are scenarios where DBA intervention would still be required, let's discuss a classic example of such scenario today (17feb2022)

Task - Create backup of EBS Cloud Manager Environment, ebs r12.2.9, db version 12.1.0.2
When you login to EBS Cloud Manager, simply check top right section - 


Once you click on Create backup, it will ask you for encryption password and apps credentials. Backup is then submitted as a JOB and you can get the details for the running backup under Jobs Tab.

Please note that EBS cloud manager creates an OSS level backup on the Object Storage. In my case, the job failed at - Validate -> EBS cloud backup Application tier validations 

Error Details - 

ERROR : WLS domain size is higher than EBS default threshold: 5120 MB ). Please check and cleanup some of the server log files or any unnecessary file under /u01/install/APPS/fs1/FMW_Home/user_projects/domains/EBS_domain.

Failed with code: 1

[2022/02/17 03:45:46] [APPSTIEREBSVALIDATION] ERROR: Source application tier post-validation failed.

[2022/02/17 03:45:46] [APPSTIEREBSVALIDATION] Updating taskid appsTierEBSValidation status to Failed


Now the error was self explanatory, but this step has to be intervened by a DBA and fixed by clearing older logs under your domain, in my case, I cleared some space for below folders -

Check for EBS_domain size - 

$ du -sh EBS_domain/
5.7G    EBS_domain/

Remove files under - 
/u01/install/APPS/fs1/FMW_Home/user_projects/domains/EBS_domain/servers/oacore_server1/logs
oacore_server1.log00*
/u01/install/APPS/fs1/FMW_Home/user_projects/domains/EBS_domain/servers/AdminServer/logs
EBS_domain.log00*

check current size again -
$ du -sh EBS_domain/
2.1G    EBS_domain/


You may then try the 'Retry' option for Backup to be resumed again. 


Steps to compile fmb in EBS - r12.1.x, r12.2.x

 

Take backup of fmx -

$ pwd

/u01/install/APPS/fs1/EBSapps/appl/inv/12.0.0/forms/US

$ cp INVSDOIO.fmx INVSDOIO.fmx17feb2022


go to AU_TOP -

cd $AU_TOP/forms/US

template - 

frmcmp_batch userid=apps/<apps_pwd> module=<form_name>.fmb output_file=<form_name>.fmx module_type=form batch=no compile_all=special


Example -

frmcmp_batch userid=apps/apps module=INVSDOIO.fmb output_file=INVSDOIO.fmx module_type=form batch=no compile_all=special

Check output for any errors on screen

Ref -R12: How to Compile a Form in Release 12 and 12.2 (Doc ID 1085928.1)


Special Scenario - Loss of Datafile (non-critical) without any backup and in Noarchivelog mode

 We faced below interesting issue in one of our test environments, I call it SPICE19 here. One of the datafile was deleted with the intent to recreate it. Unfortunately, database was up and running at that moment when the datafile was deleted at OS level.

Observations -

1. Database SPICE19 was in noarchivelog mode.

2. No backups were available for restoration.

3. Datafile '/u02/oracle/oradata/SPICE19/users06' was created accidentally with a wrong name (should have been /u02/oracle/oradata/SPICE19/users06.dbf)

4. It was then removed using (rm) at OS level.

5. A new datafile was added with the name - /u02/oracle/oradata/SPICE19/users06.dbf

6. Database got crashed searching for datafile - /u02/oracle/oradata/SPICE19/users06

7. Database complained for missing datafile - /u02/oracle/oradata/SPICE19/users06 as the information was in controlfile.

 

Troubleshooting Steps -

1. Missing datafile id was found below -

SQL>  select * from v$recover_file;

     FILE# ONLINE  ONLINE_ ERROR                              CHANGE# TIME          CON_ID

---------- ------- ------- ------------------------------------------ --------- ----------

        18 ONLINE  ONLINE  FILE NOT FOUND                           0                    0

 

2. Second step was to recreate missing datafile with a different name-

SQL> alter database create datafile '/u02/oracle/oradata/SPICE19/users06' as '/u02/oracle/oradata/SPICE19/users06_new.dbf' reuse;

Database altered.

 

3. As the database was in norachivelog mode, recovery is not possible UNLESS complete information is present in the REDO logs and they are not overwritten.

 

4. We have 3 redo logs here,

/u02/oracle/oradata/SPICE19/redo03.log

/u02/oracle/oradata/SPICE19/redo02.log

/u02/oracle/oradata/SPICE19/redo01.log

 

5. Tried applying them in the following sequence with incomplete recovery -

SQL> recover database using backup controlfile until cancel;

/u02/oracle/oradata/SPICE19/redo01.log

/u02/oracle/oradata/SPICE19/redo02.log

Log applied.

Media recovery complete.

 

6. Opened the database with resetlogs option

 

Correct Approach -

We must rename datafile with below 2 steps.

1. Tablespace must be offline (no users can access the data during this period)

alter tablespace USERS offline immediate;

 

2. Rename the datafile at OS level using mv command for Linux OS.

mv /u02/oracle/oradata/SPICE19/users06 /u02/oracle/oradata/SPICE19/users06.dbf

 

2. Rename the datafile using below command to update the controlfile at database level -

ALTER TABLESPACE users rename datafile '/u02/oracle/oradata/SPICE19/users06'

to '/u02/oracle/oradata/SPICE19/users06.dbf';

 

Conclusion-

We were lucky to have the redo information in the redo logs and they were not overwritten. This was a rare case where recovery of a datafile was possible with database in noarchivelog mode and with no backups available.


Exploring different use-cases for OCI Object Storage Gateway deployments


This post will cover different approaches to deploy Object Storage Gateway. You can call Object Storage gateway as a bridge that will connect your on-premise environment with Object Storage. It enables File-to-object transparency. 

Object Storage buckets are mounted as nfs mount points in your on-prem environment. Substantial information is available on Object storage gateway and links are shared in this post.


Let's jump to understanding different approaches to deploy Object Storage Gateway.

My observations when implementing below POCs-

1. Object Storage gateway can be deployed either on-prem or on OCI. It can be downloaded for free here

2. SSDs drives and XFS (Extended File system) for mounting are recommended for storing storage gateway - metadata, cache and logs.

3. OSG does not support Windows operating environment.

4. If installing OSG on-prem, make sure you have proper access control onto storage gateway server and secure it with mfa.

5. If installing OSG on-cloud you can have Open VPN/IP-Sec to add another layer of security and OSG server can be placed in your private subnet.

6. Filesystem created on OSG management URL automatically creates a bucket. This bucket will be created and placed as per your inputs provided for compartment, username, API Signing Key (private key and its fingerprint)

7. Finally, an interesting discussion with me and Anil on securing Oracle object storage on Oracle Cloud Customer Connect -

https://cloudcustomerconnect.oracle.com/posts/cd615cf2eb


References -

https://www.oracle.com/cloud/storage/storage-gateway-faq.html

https://docs.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm#:~:text=The%20Object%20Storage%20service%20can,from%20within%20the%20cloud%20platform.

http://dineshbandelkar.com/how-to-setup-oci-storage-gateway/

https://docs.oracle.com/en-us/iaas/api/#/en/objectstorage/20160918/

https://docs.oracle.com/en-us/iaas/Content/StorageGateway/Reference/bestpracticesusingstoragegateway.htm





Sr. No.SummaryDecription
1Stored OS Username and Password1st Authentication Factor On-prem.
22-FA with Google AuthenticatorVerification code sent on sysadmin mobile device for authentication on-prem
3On-prem NFS Shareon-prem NAS device protected by exportfs rules storing backups
4Securing Data in-transitUse of openssl encryption for applications files and Use of rman based encryption for db backups
5OCI DatacenterOracle Cloud Jeddah Region as secondary backup location
6VCN - virtual Cloud NetworkVCN consist of public and private subnet
7Security list for public subnetOpen Port for OpenVPN
8Open VPN serverPublic facing VPN Server for accessing OCI resources.
9Security list for private subnetOpen port for object storage gateway mgmt console and Open port for nfs port
10Object Storage Gateway ServerObject storage gateway server compute instance in private subnet. Creates filesystem which is mapped to auto-created bucket.
11Object StorageObject storage bucket automatically gets created when creating filesystem on Object storage server
Sr. No.SummaryDecription
1Stored OS Username and Password1st Authentication Factor On-prem.
22-FA with Google AuthenticatorVerification code sent on sysadmin mobile device for authentication on-prem
3On-prem NFS Shareon-prem NAS device protected by exportfs rules storing backups
4Fortigate Firewall(CPE) Public IPCustomer Premise Equipment that is one point of IP-Sec VPN Connectivity.
5IP-Sec VPN connectionPre-shared key authentication with DRG on OCI
6Static Routing Method/BGPManual/automatic routing for IP-SEC VPN connectivity.
7OCI DatacenterOracle Cloud Jeddah Region as secondary backup location
8DRGDynamic Routing Gateway configured on OCI
9Object Storage Gateway Server"Object storage gateway server compute instance in private subnet.Creates filesystem which is mapped to auto-created bucket."
10Object StorageObject storage bucket automatically gets created when creating filesystem on Object storage server
Sr. No.SummaryDecription
1Stored OS Username and Password1st Authentication Factor On-prem.
22-FA with Google AuthenticatorVerification code sent on sysadmin mobile device for authentication on-prem
3On-prem NFS Shareon-prem NAS device protected by exportfs rules storing backups
4Open VPN Client-saved profileProfile saved on staging server for OPEN VPN
5Object storage gateway setupOSG installed on-prem on a staging server.
6Securing Data in-transit"a. Use of openssl encryption for applications files. b. Use of rman based encryption for db backups"
7OCI DatacenterOracle Cloud Jeddah Region as secondary backup location
8VCN - virtual Cloud NetworkVCN consist of public and private subnet
9Security list for public subnetOpen Port for OpenVPN
10Open VPN serverPublic facing VPN Server for accessing OCI resources.
11Object StorageObject storage bucket automatically gets created when creating filesystem on Object storage server

Back to Basics - Some OCI User-group deletion issues


Recently, I created a group and added user to that group for some testing. 

Username - Useradmin001

Groupname - sysadmin01



Once testing was done, I decided to delete both user and group. When trying to delete a Useradmin001 user belonging to a group, I received below error notification on OCI-

Cannot delete Useradmin001. A user in a group can’t be deleted.


So I thought of deleting group as they both (group and user) were being used to testing and not needed anymore.

Cannot delete sysadmin01. A group with members can’t be deleted.


Finally figured out that we need to perform below steps -

1. Delete member from group first like as follows (remove group membership)



2. Delete group. (this time it went through. :))


  

3. Delete user.




Understanding -msimode for managed servers and how it can be used with Oracle EBS r12.2

 Recently, I was asked implications of Admin server availability when connecting to Oracle EBS login Page.
When pondering further, Few scenarios that arise were -
1. What if running Admin server goes down for some reason, will this impact managed servers that are already running?
2. Can we start managed server (say oacore_server1) if admin server is down? How?

Let's start understanding relationship between admin server and managed server.

Admin server 

is the central entity for configuration of entire domain. You manage and 
configure all resources in the domain through admin server. Config.xml is loaded
by Admin server during its startup. This file is stored under -
$FMW_Home/user_projects/domains/EBS_domain/config

Managed server 

is responsible for hosting components and associated resources that comprise your application (ex. JSPs, EJBs,datasources,JMS Modules,etc).

During startup of managed server, it will reach out to admin server and get configuration and deployment settings. This pretty much the second question-- 

Can we start managed server (say oacore_server1) if admin server is down? How?

But there is a catch here, we have MSI mode for managed servers to skip this dependency. MSI stands to Managed Server Independence

During Startup -

Managed Server ---> configuration files ---> Admin server 

if Admin server is down ---> Managed server fails to start

If Managed server has MSI enabled(by default enabled) 

                                --> use -msimode option

                                    --> managed server starts by directly accessing configuration/security files


By default, MSI is enabled and below navigation can be used to check the same -

Environment 

          > Servers  

    > Select Managed server(say oacore_server1)

> Tuning>(scroll down)

       > Advanced

       >Checkbox for MSI





Using MSI mode with oacore_server1 managed server -

admanagedsrvctl.sh start oacore_server1 -msimode                        


Coming onto 1st question, 

What if running Admin server goes down for some reason, will this impact managed servers that are already running?

1. If servers(admin and managed) are on same host machine, typically the case in a single node ebs environment, then obviously all servers will have similar implications.

2. If only the admin server is down while managed servers are up and running, failure of an Administration Server itself does not interrupt the operation of Managed Servers in the domain

Ref- https://docs.oracle.com/cd/E13222_01/wls/docs90/domain_config/understand_domains.html


Note: When admin server is down, each Managed Server periodically attempts to reconnect to the Administration Server to synchronize its configuration state with that of the Administration Server.


Concluding Thoughts - 

It is important to understand dependency between admin server and managed server when starting up application services as there may be scenarios when managed servers would be waiting for admin server for EBSDomain to start and eventually result in overall availability of the system. -msimode may be used to independently start a managed server when admin server is not available and root cause of admin server can be fixed later.





Terraform Diary - Desired State and Current State scenarios with OCI

 


Desired state of a resource can be simply defined as the state that is mentioned in your resource block of tf file.

Current state of a resource is the actual configuration state for the resource. This may be different from desired state mentioned in .tf file.

 

Terraform will only synchronize current state(changes done manually in console) to desired state(changes mentioned in your tf file) for attributes that are explicitly mentioned in your tf file.

Case 1 –

Please consider below example, note that object bucket ‘test_bucket is already present on OCI –

resource "oci_objectstorage_bucket" "test_bucket" {

    #Required

    compartment_id = "${var.compartment_ocid}"

    name = "test_bucket"

    namespace = "${var.namespace}"

    versioning = "Enabled"

        }

  

We added versioning attribute in our tf file and current state for is bucket is as follows –



 Now we know desired state and current state (console) are different. Let us run terraform plan and observe –


      ~ versioning            = "Suspended" -> "Enabled"

    }

 

Plan: 0 to add, 1 to change, 0 to destroy.

 

 

Case 2 –

Simply remove versioning from .tf file as follows, (commented it) –

resource "oci_objectstorage_bucket" "test_bucket" {

    #Required

    compartment_id = "${var.compartment_ocid}"

    name = "test_bucket"

    namespace = "${var.namespace}"

    #versioning = "Enabled"




This proves that terraform will only check and try to modify current state à desired state for attributes that are explicitly mentioned in you .tf file


Now what will happen if there is no explicit mention for an attribute that is modified manually in your OCI console?

In this scenario, terraform will not perform any action mentioning below -


Hence, it is important to mention attributes in your tf file to make sure terraform identifies discrepancies between desired state and current state.



"Connection timed out" when mouting file storage service on OCI instances

 

Issue - 

Recently faced below 'Connection Timed Out' Error when trying to mount a freshly created OCI file system. Would like to share my experience and document this for future references.

I am using public and private subnets with security lists defined for each. File system is created in the private subnet and faced below issue when trying to mount it from an instance in the same private subnet,


sudo mount -v 10.3.2.10:/fssfortestcomp /mnt/fssfortestcomp

mount.nfs: timeout set for Wed Sep  9 06:27:20 2020

mount.nfs: trying text-based options 'vers=4.1,addr=10.3.2.10,clientaddr=10.3.2.2'

mount.nfs: mount(2): Connection timed out


Documenting list of ports that need to be open for mounting a file storage service on OCI instances on private/public subnet


Solution - 

Update Security List Rules -

Please note to open destination ports for  respective subnet where you created filesystem. These ports are opened to make sure our source oci instances can access nfs services like nfsd, rpcbind, etc running on file system storage.


Ingress Rules -

Rule TypeProtocolSource Port RangeDestination Port RangeStateful/Stateless
IngressTCPAll111Stateful
IngressTCPAll2048Stateful
IngressTCPAll2049Stateful
IngressTCPAll2050Stateful
IngressUDPAll111Stateful
IngressUDPAll2048Stateful


Egress rules -

Rule TypeProtocolSource Port RangeDestination Port RangeStateful/Stateless
Egress TCP All111Stateful
Egress TCP All2048Stateful
Egress TCP All2049Stateful
Egress TCP All2050Stateful
Egress UDP All111 Stateful
Egress UDP All2048Stateful


Next time when trying to mount  - 

$ sudo mount -v 10.3.2.10:/fssfortestcomp /mnt/fssfortestcomp
mount.nfs: timeout set for Wed Sep  9 06:42:49 2020
mount.nfs: trying text-based options 'vers=4.1,addr=10.3.2.10,clientaddr=10.3.2.2'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'vers=4.0,addr=10.3.2.10,clientaddr=10.3.2.2'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'addr=10.3.2.10'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.3.2.10 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 10.3.2.10 prog 100005 vers 3 prot UDP port 2048
mount.nfs: portmap query retrying: RPC: Timed out
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 10.3.2.10 prog 100005 vers 3 prot TCP port 2048


Check newly added mount -
showmount -e 10.3.2.10
Export list for 10.3.2.10:
/fssfortestcomp (everyone)

A question may arise here that why are we explicitly creating egress rules when we already have stateful ingress rules in place -

You should look at it as if there were firewalls attached to every (virtual) network card. 
traffic goes like so:

Request
Instance1(request)===>VNIC_instance1===>network===>VNIC_nfsfilesystem===>nfsfilesystem 

Response
nfsfilesystem(answer)===>VNIC_nfsfilesystem===>network===>VNIC_instance1===>Instance 1 


First you need to exit from the oci instance 1 to the network, you should therefore first do an egress from instance 1. At this point in time the ingress rules weren't evaluated yet (there were no inrgress traffic anywhere), and therefore the state of the stateful ingress rule doesn't exist.

Happy OCI Learning :)

References -

https://docs.cloud.oracle.com/en-us/iaas/Content/File/Tasks/securitylistsfilestorage.htm

https://cloudcustomerconnect.oracle.com/posts/7eb3f888e6


6 Protection Rules for securing your Bastion Host on OCI

 

Today's post is fundamentally cover 6 aspects on securing your bastion hosts for any Cloud environment. Please note that use-case performed in the later stage of this post is focused on Oracle Cloud. 


1. Ingress Rule

OS Level - firewall to accept connections only from On-prem CPE Public IP. This will straight away reject connections to your Public subnet from outside network and help you channelize incoming connections.  



2. Protocol and Ports


1. TCP/22 -- ssh connectivity
2. ICMP type 8 -- ping



3. Disable irrelevant user ids at OS


You can get list of users from /etc/passwd file and users can be set to /sbin/nologin like as follows -

demouser:x:1000:1000:demouser:/home/demouser:/sbin/nologin



4. Enabling 2-factor authentication for bastion server

This has been explained in my earlier post -

Implementing 2-factor authentication for Bastion server on OCI with Google Authenticator



5. Packages installed -

Remove irrelevant packages. Keep bastion host as 'lite' as possible by avoiding unnecessary packages being installed as this will result in services running and eventually leading to attackers trying to hack into the system.



6. Disclaimer Banner for ssh logins. 


Sample -




Use-case -


Below use case will cover 3 layers of security to logon to bastion host setup on a public ip over Public subnet on OCI. 


1. Private-public key pair

2. Security List on OCI.

3. 2-FA using Google Authenticator




We can also add OS-level firewall rules to the list. Hope this helps!!