logo资料库

Oracle RAC ASM磁盘组故障解决办法.docx

第1页 / 共6页
第2页 / 共6页
第3页 / 共6页
第4页 / 共6页
第5页 / 共6页
第6页 / 共6页
资料共6页,全文预览结束
1.恢复背景
2.确定问题原因
3.解决问题
3.1 停止两个节点的has资源
3.2 在其中一个节点使用独占模式重新启动crs,这将可以启动ASM实例:
3.3 将已损坏的磁盘组格式化,重新创建磁盘组
3.4 还原ocr
3.5 恢复votedisk
3.6 检查crs,看是否修复成功(可能需要等一段时间)
3.7若crs跟css所有资源的启动成功,查看资源状态
3.8 手动mount DATA磁盘组
3.9 查看data资源信息
Oracle 11.2.0.4 RAC 恢复 ocr 和 votedisk 1. 恢复背景 RAC 环境 asm 实例无法启动,磁盘组无法 mount,crs 资源启动失败。 1. 查看集群资源状态,发现资源未启动 2. 检查 crs(集群就绪服务),报错如下: [grid@rac01 ~]$ ./crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530:Communications daemon CRS-4534: Cannot communicate with Event Manager failure contacting Cluster Synchronization Services 3. 尝试手动启动 crs,报错 4. 进入 asmcmd 查看磁盘组是否正常,提示 asm 实例未启动: [grid@rac01 ~]$ asmcmd Connected to an idle instance. ASMCMD> lsdg ASMCMD-08102: no connection to ASM; command requires ASM to run 5. 手动启动 asm 实例: [grid@rac01 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.1.0 Production on Mon Mar 12 14:22:13 2012 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ORA-01078: failure in processing system parameters ORA-29701: unable to connect to Cluster Synchronization Service
2. 确定问题原因 查看警报日志,看是否有可用报错信息 1. crs 服务警报日志(未发现有用信息): [grid@rac01 crsd]$ more /u01/grid/log/rac01/crsd/crsd.log 2. css 服务警报日志(未发现有用信息): [grid@rac01 cssd]$ more /u01/grid/log/rac01/cssd/ocssd.log 3. has 服务警报日志(未发现有用信息): [grid@rac01 ohasd]$ more /u01/grid/log/rac01/ohasd/ohasd.log 4. asm 实例警报日志(在该警报日志中看到一些报错信息): [grid@rac01 trace]$ more /u01/grid/log/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log ... ORA-15032: not all alterations performed ORA-15017: diskgroup "CRS" cannot be mounted ORA-15063: ASM discovered an insufficient number of disks for diskgroup "CRS" ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:97} */ ... 警报信息提示+CRS 磁盘组不能 mount,由此可推断可能是磁盘组损坏,导致磁盘组不能挂 载,asm 实例无法启动。 3. 解决问题 由于磁盘组损坏,所以只能重建磁盘组,然后使用 ocr 备份对 ocr 进行还原。 3.1 停止两个节点的 has 资源 [root@rac01 ~]# /u01/grid/bin/crsctl stop has -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac01' CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac01' CRS-2673: Attempting to stop 'ora.crf' on 'rac01' CRS-2677: Stop of 'ora.mdnsd' on 'rac01' succeeded CRS-2677: Stop of 'ora.crf' on 'rac01' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'rac01' CRS-2677: Stop of 'ora.gipcd' on 'rac01' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac01' CRS-2677: Stop of 'ora.gpnpd' on 'rac01' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac01' has completed
CRS-4133: Oracle High Availability Services has been stopped. 3.2 在其中一个节点使用独占模式重新启动 crs,这将可以启动 ASM 实例: [root@rac01 ~]# /u01/grid/bin/crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01' CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01' CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01' CRS-2672: Attempting to start 'ora.gipcd' on 'rac01' CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac01' CRS-2672: Attempting to start 'ora.diskmon' on 'rac01' CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01' CRS-2672: Attempting to start 'ora.ctssd' on 'rac01' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01' CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rac01' CRS-2676: Start of 'ora.asm' on 'rac01' succeeded 3.3 将已损坏的磁盘组格式化,重新创建磁盘组 格式化磁盘组: [root@rac01 ~]# dd if=/dev/zero if=/dev/raw/raw1 bs=1024 count=1000 重新创建磁盘组: [grid@rac01 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Wed Feb 7 10:12:26 2018 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL>create crs 'compatible.asm'='11.2.0.4.0'; 查看磁盘组信息: SQL> select name,state from v$asm_diskgroup; NAME ----------------------------------------------- ----------------- diskgroup external STATE redundancy disk '/dev/raw/raw1' attribute
CRS MOUNTED 3.4 还原 ocr 2018/02/01 13:48:27 2018/02/01 13:48:27 2018/02/01 13:48:27 查看备份片保留的位置: [root@rac01 rac-cluster]# /u01/grid/bin/ocrconfig -showbackup rac01 rac01 rac01 PROT-25: Manual backups for the Oracle Cluster Registry are not available 使用最新的备份还原 ocr: [root@rac01 ~]# 还原成功之后,检查一下 ocr 磁盘: [grid@rac01 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : /u01/grid/cdata/rac-cluster/backup00.ocr /u01/grid/cdata/rac-cluster/day.ocr /u01/grid/cdata/rac-cluster/week.ocr /u01/grid/bin/ocrconfig -restore /u01/grid/cdata/rac-cluster/backup00.ocr Version Total space (kbytes) Used space (kbytes) Available space (kbytes) ID Device/File Name 3 : 262120 : 3088 : : 259032 : 256919468 +CRS : Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user 3.5 恢复 votedisk [grid@rac01 ~]$ crsctl replace votedisk Successful addition of voting disk 93ccd2ae41b74f07bf1e3b5c842d79ac. Successfully replaced voting disk group with +CRS. CRS-4266: Voting file(s) successfully replaced +CRS 3.6 检查 crs,看是否修复成功(可能需要等一段时间) /u01/grid/bin/crsctl check crs [root@rac01 rac-cluster]# CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
3.7 若 crs 跟 css 所有资源的启动成功,查看资源状态 Type rac01 rac01 rac01 rac01 Target State [root@rac01 rac-cluster]# /u01/grid/bin/crsctl stat res -t [grid@rac01 ~]$ crs_stat -t Name Host ------------------------------------------------ ----------------------------------------------- ora....up.type ora.CRS.dg rac01 ora....up.type ora.DATA.dg ora....er.type ora....ER.lsnr ora....er.type ora....N1.lsnr ora.asm.type ora.asm ora.cvu.type ora.cvu ora.gsd.type ora.gsd ora....network ora....rk.type ora.oc4j.type ora.oc4j ora.ons ora.ons.type ora.orcl.db ora....se.type ora....SM1.asm application application ora....01.lsnr ora.rac01.gsd application application ora.rac01.ons ora.rac01.vip ora....t1.type ora....SM2.asm application application ora....02.lsnr application ora.rac02.gsd ora.rac02.ons application ora....t1.type ora.rac02.vip ora....fs.type ora....ry.acfs ora.scan1.vip ora....ip.type ONLINE OFFLINE ONLINE ONLINE ONLINE ONLINE OFFLINE ONLINE ONLINE ONLINE OFFLINE ONLINE ONLINE OFFLINE ONLINE ONLINE ONLINE ONLINE OFFLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE OFFLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE OFFLINE ONLINE ONLINE ONLINE ONLINE OFFLINE ONLINE ONLINE ONLINE ONLINE rac01 rac01 rac01 rac01 rac01 rac01 rac01 rac02 rac02 rac02 rac02 rac02 rac01 3.8 手动 mount DATA 磁盘组 [grid@rac01 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Wed Feb 7 10:54:23 2018 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup data mount; 3.9 查看 data 资源信息 如果 data 磁盘组能 mount,则数据库可以正常启动,如果不能,则移除 data 资源,重新创
建数据库: 删除残留资源: [grid@rac01 ~]$ crsctl delete resource ora.DATA.dg 使用 dbca 删除数据库,然后重新创建数据库: [oracle@rac01 ~]$ export DISPLAY=:0 [oracle@rac01 ~]$ xhost + No protocol specified xhost: unable to open display ":0" [oracle@rac01 ~]$ dbca
分享到:
收藏