Oracle 11.2.0.4 RAC 恢复 ocr 和 votedisk
1. 恢复背景
RAC 环境 asm 实例无法启动,磁盘组无法 mount,crs 资源启动失败。
1. 查看集群资源状态,发现资源未启动
2. 检查 crs(集群就绪服务),报错如下:
[grid@rac01 ~]$ ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530:Communications
daemon
CRS-4534: Cannot communicate with Event Manager
failure contacting Cluster
Synchronization Services
3. 尝试手动启动 crs,报错
4. 进入 asmcmd 查看磁盘组是否正常,提示 asm 实例未启动:
[grid@rac01 ~]$ asmcmd
Connected to an idle instance.
ASMCMD> lsdg
ASMCMD-08102: no connection to ASM; command requires ASM to run
5. 手动启动 asm 实例:
[grid@rac01 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Mon Mar 12 14:22:13 2012
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
2. 确定问题原因
查看警报日志,看是否有可用报错信息
1. crs 服务警报日志(未发现有用信息):
[grid@rac01 crsd]$ more /u01/grid/log/rac01/crsd/crsd.log
2. css 服务警报日志(未发现有用信息):
[grid@rac01 cssd]$ more /u01/grid/log/rac01/cssd/ocssd.log
3. has 服务警报日志(未发现有用信息):
[grid@rac01 ohasd]$ more /u01/grid/log/rac01/ohasd/ohasd.log
4. asm 实例警报日志(在该警报日志中看到一些报错信息):
[grid@rac01 trace]$ more /u01/grid/log/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
...
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CRS" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "CRS"
ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:97} */
...
警报信息提示+CRS 磁盘组不能 mount,由此可推断可能是磁盘组损坏,导致磁盘组不能挂
载,asm 实例无法启动。
3. 解决问题
由于磁盘组损坏,所以只能重建磁盘组,然后使用 ocr 备份对 ocr 进行还原。
3.1 停止两个节点的 has 资源
[root@rac01 ~]# /u01/grid/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac01'
CRS-2673: Attempting to stop 'ora.crf' on 'rac01'
CRS-2677: Stop of 'ora.mdnsd' on 'rac01' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac01'
CRS-2677: Stop of 'ora.gipcd' on 'rac01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac01'
CRS-2677: Stop of 'ora.gpnpd' on 'rac01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac01' has
completed
CRS-4133: Oracle High Availability Services has been stopped.
3.2 在其中一个节点使用独占模式重新启动 crs,这将可以启动 ASM 实例:
[root@rac01 ~]# /u01/grid/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'
CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'
CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'
CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac01'
CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
3.3 将已损坏的磁盘组格式化,重新创建磁盘组
格式化磁盘组:
[root@rac01 ~]# dd if=/dev/zero if=/dev/raw/raw1 bs=1024 count=1000
重新创建磁盘组:
[grid@rac01 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Wed Feb 7 10:12:26 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL>create
crs
'compatible.asm'='11.2.0.4.0';
查看磁盘组信息:
SQL> select name,state from v$asm_diskgroup;
NAME
-----------------------------------------------
-----------------
diskgroup
external
STATE
redundancy
disk
'/dev/raw/raw1'
attribute
CRS
MOUNTED
3.4 还原 ocr
2018/02/01 13:48:27
2018/02/01 13:48:27
2018/02/01 13:48:27
查看备份片保留的位置:
[root@rac01 rac-cluster]# /u01/grid/bin/ocrconfig -showbackup
rac01
rac01
rac01
PROT-25: Manual backups for the Oracle Cluster Registry are not available
使用最新的备份还原 ocr:
[root@rac01 ~]#
还原成功之后,检查一下 ocr 磁盘:
[grid@rac01 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
/u01/grid/cdata/rac-cluster/backup00.ocr
/u01/grid/cdata/rac-cluster/day.ocr
/u01/grid/cdata/rac-cluster/week.ocr
/u01/grid/bin/ocrconfig -restore /u01/grid/cdata/rac-cluster/backup00.ocr
Version
Total space (kbytes)
Used space (kbytes)
Available space (kbytes)
ID
Device/File Name
3
:
262120
:
3088
:
:
259032
: 256919468
+CRS
:
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
3.5 恢复 votedisk
[grid@rac01 ~]$ crsctl replace votedisk
Successful addition of voting disk 93ccd2ae41b74f07bf1e3b5c842d79ac.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced
+CRS
3.6 检查 crs,看是否修复成功(可能需要等一段时间)
/u01/grid/bin/crsctl check crs
[root@rac01 rac-cluster]#
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
3.7 若 crs 跟 css 所有资源的启动成功,查看资源状态
Type
rac01
rac01
rac01
rac01
Target
State
[root@rac01 rac-cluster]# /u01/grid/bin/crsctl stat res -t
[grid@rac01 ~]$ crs_stat -t
Name
Host
------------------------------------------------ -----------------------------------------------
ora....up.type
ora.CRS.dg
rac01
ora....up.type
ora.DATA.dg
ora....er.type
ora....ER.lsnr
ora....er.type
ora....N1.lsnr
ora.asm.type
ora.asm
ora.cvu.type
ora.cvu
ora.gsd.type
ora.gsd
ora....network ora....rk.type
ora.oc4j.type
ora.oc4j
ora.ons
ora.ons.type
ora.orcl.db
ora....se.type
ora....SM1.asm application
application
ora....01.lsnr
ora.rac01.gsd
application
application
ora.rac01.ons
ora.rac01.vip
ora....t1.type
ora....SM2.asm application
application
ora....02.lsnr
application
ora.rac02.gsd
ora.rac02.ons
application
ora....t1.type
ora.rac02.vip
ora....fs.type
ora....ry.acfs
ora.scan1.vip
ora....ip.type
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
ONLINE
OFFLINE
ONLINE
ONLINE
ONLINE
ONLINE
rac01
rac01
rac01
rac01
rac01
rac01
rac01
rac02
rac02
rac02
rac02
rac02
rac01
3.8 手动 mount
DATA 磁盘组
[grid@rac01 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Wed Feb 7 10:54:23 2018
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup data mount;
3.9 查看 data 资源信息
如果 data 磁盘组能 mount,则数据库可以正常启动,如果不能,则移除 data 资源,重新创
建数据库:
删除残留资源:
[grid@rac01 ~]$ crsctl delete resource ora.DATA.dg
使用 dbca 删除数据库,然后重新创建数据库:
[oracle@rac01 ~]$ export DISPLAY=:0
[oracle@rac01 ~]$ xhost +
No protocol specified
xhost: unable to open display ":0"
[oracle@rac01 ~]$ dbca