Nov 5, 2011

TNS-12518: A new bug in Oracle 11gR2

TNS-12518 is an old error, but has new cause in 11GR2.

Environment: Oracle 11.2..0.2 3-node RAC
RHEL 5

Problem: On one of the 3-nodes, Oracle listener failed to establish the connection.
As we used the shared server and dedicated server, the shared server connection is working fine but the dedicated server doesn't work.

Here is the error message,

[root@waleucp02 trace]# tail -f listener.log
28-OCT-2011 17:46:59 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=PRD)(CID=(PROGRAM=perl@REMOTE)(HOST=REMOTE)(USER=nagios))) * (ADDRESS=(PROTOCOL=tcp)(HOST=*******.8.17)(PORT=41612)) * establish * PRD * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
 TNS-12560: TNS:protocol adapter error
  TNS-00517: Lost contact
   Linux Error: 32: Broken pipe

Troubleshooting procedure:

1)Follow the Metlink article:


TROUBLESHOOTING GUIDE TNS-12518 TNS listener could not hand off client connection [ID 550859.1]

2)As this is a DEDICATED server, we increased the Oracle parameter "processes" from 150 to 500. After I bounce the database, but it doesn't work

3)Initially, we thinks it was caused by Firewall setting, but the networking team didn't find anything suspicious.

4Finally, we figured out it is caused by the file permission (setuid)of oracle executable. This is the second time we encountered the similar issues since installing 11GR2 on two different servers.

go to $ORACLE_HOME/bin, the oracle's permission should be 6751, but for some reason, it was changed to 4751.


Note: I encountered a case, if tar backup by root, and recovered by oracle user, it may cause this issue. However, our cluster is brand new installation.

Incorrect permission.
# cd /u01/app/oracle/product/11.2.0/bin
# ll oracle
-rwxr-s--x 1 oracle asmadmin 228886426 Aug 17 14:25 oracle
Correct one:
# chmod 6751 oracle    
# ll oracle
-rwsr-s--x 1 oracle asmadmin 228886426 Aug 17 14:25 oracle

After setting the correct setuid, listener started working again.
Similarly, one months ago, we encountered a similar issue for ASM. Please reference Metalink ID:
"DBCA Does Not Display ASM Disk Groups In 11.2 [ID 1177483.1]". 
That error is also caused by the setuid bit of oracle executable under the $GRID_HOME/bin
We've submitted this error to Oracle and Oracle support has created a bug for this error.



3 comments:

  1. Jay, thank you for this blog. I was stuck with this problem on a clients box. Everyone else had been saying processes and sessions and network settings. Unfortunately, that wasn't the problem here. The permissions wouldn't have been something I would probably had gotten to before blowing it up.

    ReplyDelete
  2. Thank you very much it worked for me. Seems tha setwrap from grid made this bug happens, 'cause only root can set 6751, else the permissions are wrong.

    Regards

    Hector

    ReplyDelete
  3. Thanks a metric ton Jay ! You saved my day !

    Regards: ZenRoe

    ReplyDelete