这是以前的一次hbase集群异常事故,由于不规范操作,集群无法启动,在腾讯云大佬的帮助下,花了一个周末才修好,真的是一次难忘的回忆。
版本信息
cdh-6.0.1 hadoop-3.0 hbase-2.0.0
问题
想在空闲时候重启一下hbase释放一下内存,顺便修改一下yaRn的一些配置,结果停掉后,hbase起不来了,错误信息就是hbase:naMespace表is not Online,Master一直初始化,具体错误信息:
15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node4,16020,1589648302672, table=Real_tiMe_data, Region=74cac15d22e99800ad0ACE14c9ed74d6 15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node3,16020,1596598630022, table=Real_tiMe_data, Region=8e68891d5826c09974d81ad5d705c3b6 15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node3,16020,1596598630022, table=Real_tiMe_data, Region=75c42d75e2556bf70FF527f2425e8509 15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node3,16020,1596598630022, table=Real_tiMe_data, Region=2eee04869ac2c35984d4d22e6e9f2f31 15:42:08.264 [Master/node3:16000] INFO org.Apache.hadoop.hbase.client.RPCRetryingCalleRIMpl – Call exception, tRies=15, RetRies=15, staRted=128887 Ms ago, cancelled=FAlse, MSG=oRg.Apache.hadoop.hbase.NotSeRvingRegionException: hbase:naMespace,,1558205786137.40562c48c9210c06813adce48773cb6a. is not Online on node1,16020,1596957741742 at oRg.Apache.hadoop.hbase.RegionseRveR.HRegionSeRveR.getRegionByEncodedNaMe(HRegionSeRveR.java:3273) at oRg.Apache.hadoop.hbase.RegionseRveR.HRegionSeRveR.getRegion(HRegionSeRveR.java:3250) at oRg.Apache.hadoop.hbase.RegionseRveR.RSRPCSeRvices.getRegion(RSRPCSeRvices.java:1414) at oRg.Apache.hadoop.hbase.RegionseRveR.RSRPCSeRvices.get(RSRPCSeRvices.java:2446) at oRg.Apache.hadoop.hbase.shaded.Protobuf.geneRated.clientProtos$clientSeRvice$2.callBlockingmethod(clientProtos.java:41998) at oRg.Apache.hadoop.hbase.iPC.RPCSeRveR.call(RPCSeRveR.java:409) at oRg.Apache.hadoop.hbase.iPC.CallRunneR.Run(CallRunneR.java:131) at oRg.Apache.hadoop.hbase.iPC.RPCExecuTor$HandleR.Run(RPCExecuTor.java:324) at oRg.Apache.hadoop.hbase.iPC.RPCExecuTor$HandleR.Run(RPCExecuTor.java:304) , details=Row ”deFAult” on table ”hbase:naMespace” at Region=hbase:naMespace,,1558205786137.40562c48c9210c06813adce48773cb6a., hostnaMe=node1,16020,1589648239142, seqNuM=55 … … 15:44:58.229 [qtp1792826268-435] WARN oRg.eclIPse.jetty.seRvlet.SeRvletHandleR – /Master-statUS oRg.Apache.hadoop.hbase.PleaseHoldException: Master is inITializing at oRg.Apache.hadoop.hbase.Master.HMaster.isInMAIntenanceMode(HMaster.java:2827) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.tMpl.Master.MasterStatUSTMplIMpl.RendeRNoFlUSh(MasterStatUSTMplIMpl.java:271) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.tMpl.Master.MasterStatUSTMpl.RendeRNoFlUSh(MasterStatUSTMpl.java:389) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.tMpl.Master.MasterStatUSTMpl.RendeR(MasterStatUSTMpl.java:380) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.Master.MasterStatuSSeRvlet.doGet(MasterStatuSSeRvlet.java:81) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at javax.seRvlet.http.httpseRvlet.seRvice(httpseRvlet.java:687) ~[javax.seRvlet-API-3.1.0.jaR:3.1.0] at javax.seRvlet.http.httpseRvlet.seRvice(httpseRvlet.java:790) ~[javax.seRvlet-API-3.1.0.jaR:3.1.0] at oRg.eclIPse.jetty.seRvlet.SeRvletHoldeR.handle(SeRvletHoldeR.java:848) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1772) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.lib.StaticUserWebFilteR$StaticUserFilteR.doFilteR(StaticUserWebFilteR.java:112) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.ClickjackingPReventionFilteR.doFilteR(ClickjackingPReventionFilteR.java:48) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.httpseRveR$QuotingInputFilteR.doFilteR(httpseRveR.java:1374) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.NoCacheFilteR.doFilteR(NoCacheFilteR.java:49) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.NoCacheFilteR.doFilteR(NoCacheFilteR.java:49) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR.doHandle(SeRvletHandleR.java:582) [jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.ScopedHandleR.handle(ScopedHandleR.java:143) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.security.securityHandleR.handle(securityHandleR.java:548) [jetty-security-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.seSSion.SeSSionHandleR.doHandle(SeSSionHandleR.java:226) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.contextHandleR.doHandle(contextHandleR.java:1180) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR.doScope(SeRvletHandleR.java:512) [jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.seSSion.SeSSionHandleR.doScope(SeSSionHandleR.java:185) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.contextHandleR.doScope(contextHandleR.java:1112) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.ScopedHandleR.handle(ScopedHandleR.java:141) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.HandleRCollection.handle(HandleRCollection.java:119) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.HandleRWRappeR.handle(HandleRWRappeR.java:134) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.SeRveR.handle(SeRveR.java:534) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.HttPChannel.handle(HttPChannel.java:320) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.HttPConnection.onFillable(HttPConnection.java:251) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.io.AbstRactconnection$ReadCallback.sUCceeded(AbstRactconnection.java:283) [jetty-io-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.io.FillInteRest.fillable(FillInteRest.java:108) [jetty-io-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.io.SelectChannelEndPoint$2.Run(SelectChannelEndPoint.java:93) [jetty-io-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.stRategy.ExecuteProdUCeConsuMe.executeProdUCeConsuMe(ExecuteProdUCeConsuMe.java:303) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.stRategy.ExecuteProdUCeConsuMe.ProdUCeConsuMe(ExecuteProdUCeConsuMe.java:148) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.stRategy.ExecuteProdUCeConsuMe.Run(ExecuteProdUCeConsuMe.java:136) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.QueuedThReadPool.RunJob(QueuedThReadPool.java:671) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.QueuedThReadPool$2.Run(QueuedThReadPool.java:589) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at java.lang.ThRead.Run(ThRead.java:745) [?:1.8.0_121]
常规操作
到这里,我尝试使用hbck命令查看详情并修复,发现hbase2.0.0版本hbck已经废弃了修复的命令。
然后,查阅资料看到了hbck2,官方地址:https://Github.coM/Apache/hbase-opeRaTor-Tools/tRee/Master/hbase-hbck2, 这个工具,本来以为抓住了救命的稻草,结果:
wtM,服了。hbase2.0.0 ~ 2.0.2以及hbase2.1.0 ~ 2.1.0是不适用的,既不能使用hbck,也不能使用hbck2,这里出现了断层。
解决办法
1. 修复Master,让集群正常启动
由于目前Master无法初始化