kubernetes垃圾回收器GarbageCollector 源碼分析(三)
摘要:= item.identity.UID { klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity) gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity) // since we're manually inserting a delete event to remove this node, // we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete //因爲我們手動插入刪除事件以刪除此節點,我們不需要將其作爲虛擬節點跟蹤並在attemptToDelete中重新排隊 item.markObserved() return nil } // TODO: attemptToOrphanWorker() routine is similar. Consider merging // attemptToOrphanWorker() into attemptToDeleteItem() as well. // item的從資源正在刪除中,同時刪除其從資源 if item.isDeletingDependents() { return gc.processDeletingDependentsItem(item) } // compute if we should delete the item // 獲取該object裏metadata.ownerReference // 計算我們是否應刪除該項目 ownerReferences := latest.GetOwnerReferences() if len(ownerReferences) == 0 { //沒有owner的不用處理 klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity) return nil } //solid(owner存在,owner沒被刪或者終結器不爲foregroundDeletion Finalizer)。//等待其依賴項被刪除的進程項 func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error { //阻塞item資源刪除的從資源列表 blockingDependents := item.blockingDependents() //沒有阻塞item資源刪除的從資源,則移除item資源的foregroundDeletion終結器 if len(blockingDependents) == 0 { klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity) return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents) } //遍歷阻塞item資源刪除的從資源 for _, dep := range blockingDependents { // 如果dep的從資源沒有開始刪除,則將dep加入到嘗試刪除隊列中 if。
kubernetes版本:1.13.2
接兩節:
kubernetes垃圾回收器GarbageCollector Controller源碼分析(一) kubernetes垃圾回收器GarbageCollector Controller源碼分析(二)
主要步驟
GarbageCollector Controller源碼主要分爲以下幾部分:
-
monitors
作爲生產者將變化的資源放入graphChanges
隊列;同時restMapper
定期檢測集羣內資源類型,刷新monitors
-
runProcessGraphChanges
從graphChanges
隊列中取出變化的item
,根據情況放入attemptToDelete
隊列; -
runProcessGraphChanges
從graphChanges
隊列中取出變化的item
,根據情況放入attemptToOrphan
隊列; -
runAttemptToDeleteWorker
從attemptToDelete
隊列取出,嘗試刪除垃圾資源; -
runAttemptToOrphanWorker
從attemptToOrphan
隊列取出,處理該孤立的資源;
上一節分析了第2,3部分,本節分析第4、5部分。
終結器
在閱讀以下代碼時,有必要先了解一下終結器。
對象的終結器是在對象刪除之前需要執行的邏輯,所有的對象在刪除之前,它的終結器字段必須爲空,終結器提供了一個通用的 API,它的功能不只是用於阻止級聯刪除,還能過通過它在對象刪除之前加入鉤子:
type ObjectMeta struct { // ... Finalizers []string }複製代碼
終結器在對象被刪之前運行,每當終結器成功運行之後,就會將它自己從 Finalizers 數組中刪除,當最後一個終結器被刪除之後,API Server 就會刪除該對象。
在默認情況下,刪除一個對象會刪除它的全部依賴,但是我們在一些特定情況下我們只是想刪除當前對象本身並不想造成複雜的級聯刪除,垃圾回收機制在這時引入了 OrphanFinalizer,它會在對象被刪除之前向 Finalizers 數組添加或者刪除 OrphanFinalizer。
該終結器會監聽對象的更新事件並將它自己從它全部依賴對象的 OwnerReferences 數組中刪除,與此同時會刪除所有依賴對象中已經失效的 OwnerReferences 並將 OrphanFinalizer 從 Finalizers 數組中刪除。
通過 OrphanFinalizer 我們能夠在刪除一個 Kubernetes 對象時保留它的全部依賴,爲使用者提供一種更靈活的辦法來保留和刪除對象。
同時,也希望可以看一下"垃圾回收"官網文檔: 垃圾收集
attemptToDelete隊列
來到代碼$GOPATHsrck8s.iokubernetespkgcontrollergarbagecollectorgarbagecollector.go中:
func (gc *GarbageCollector) runAttemptToDeleteWorker() { for gc.attemptToDeleteWorker() { } }複製代碼
從attemptToDelete隊列中取出資源,調用gc.attemptToDeleteItem(n)處理,期間如果出現error,則通過rateLimited重新加回attemptToDelete隊列。
func (gc *GarbageCollector) attemptToDeleteWorker() bool { //從隊列裏取出需要嘗試刪除的資源 item, quit := gc.attemptToDelete.Get() gc.workerLock.RLock() defer gc.workerLock.RUnlock() if quit { return false } defer gc.attemptToDelete.Done(item) n, ok := item.(*node) if !ok { utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item)) return true } err := gc.attemptToDeleteItem(n) if err != nil { if _, ok := err.(*restMappingError); ok { // There are at least two ways this can happen: // 1. The reference is to an object of a custom type that has not yet been // recognized by gc.restMapper (this is a transient error). // 2. The reference is to an invalid group/version. We don't currently // have a way to distinguish this from a valid type we will recognize // after the next discovery sync. // For now, record the error and retry. klog.V(5).Infof("error syncing item %s: %v", n, err) } else { utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err)) } // retry if garbage collection of an object failed. // 如果對象的垃圾收集失敗,則重試。 gc.attemptToDelete.AddRateLimited(item) } else if !n.isObserved() { // requeue if item hasn't been observed via an informer event yet. // otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed. // see https://issue.k8s.io/56121 klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity) gc.attemptToDelete.AddRateLimited(item) } return true }複製代碼
關鍵方法attemptToDeleteItem:
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error { klog.V(2).Infof("processing item %s", item.identity) // "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents. // item資源被標記爲正在刪除,即deletionTimestamp不爲nil;且不是正在刪除從資源(這個從上一節可以看出,只有item被foreground方式刪除時,deletingDependents纔會被設置爲true) // item在刪除中,且爲Orphan和Background方式刪除則直接返回 if item.isBeingDeleted() && !item.isDeletingDependents() { klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity) return nil } // TODO: It's only necessary to talk to the API server if this is a // "virtual" node. The local graph could lag behind the real status, but in // practice, the difference is small. //根據item裏的信息獲取object對象體 latest, err := gc.getObject(item.identity) switch { case errors.IsNotFound(err): // the GraphBuilder can add "virtual" node for an owner that doesn't // exist yet, so we need to enqueue a virtual Delete event to remove // the virtual node from GraphBuilder.uidToNode. klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity) gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity) // since we're manually inserting a delete event to remove this node, // we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete item.markObserved() return nil case err != nil: return err } //uid不匹配 if latest.GetUID() != item.identity.UID { klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity) gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity) // since we're manually inserting a delete event to remove this node, // we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete //因爲我們手動插入刪除事件以刪除此節點,我們不需要將其作爲虛擬節點跟蹤並在attemptToDelete中重新排隊 item.markObserved() return nil } // TODO: attemptToOrphanWorker() routine is similar. Consider merging // attemptToOrphanWorker() into attemptToDeleteItem() as well. // item的從資源正在刪除中,同時刪除其從資源 if item.isDeletingDependents() { return gc.processDeletingDependentsItem(item) } // compute if we should delete the item // 獲取該object裏metadata.ownerReference // 計算我們是否應刪除該項目 ownerReferences := latest.GetOwnerReferences() if len(ownerReferences) == 0 { //沒有owner的不用處理 klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity) return nil } //solid(owner存在,owner沒被刪或者終結器不爲foregroundDeletion Finalizer); dangling(owner不存在) // waitingForDependentsDeletion(owner存在,owner的deletionTimestamp爲非nil,並且有foregroundDeletion Finalizer)owner列表 solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences) if err != nil { return err } klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion) switch { //item對象的owner存在,且不是正在刪除 case len(solid) != 0: klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", solid, item.identity) if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 { return nil } klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity) // waitingForDependentsDeletion needs to be deleted from the // ownerReferences, otherwise the referenced objects will be stuck with // the FinalizerDeletingDependents and never get deleted. // waitingForDependentsDeletion需要從 ownerReferences中刪除,否則引用的對象將被 // FinalizerDeletingDependents所卡住,並且永遠不會被刪除。 //需要移除的ownerUids ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...) //拼接patch請求參數 patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...) //發送patch請求 _, err = gc.patch(item, patch, func(n *node) ([]byte, error) { return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...) }) return err //item對象的owner正在被刪除; 且item有從資源 case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0: deps := item.getDependents() // 遍歷item從資源 for _, dep := range deps { if dep.isDeletingDependents() { // this circle detection has false positives, we need to // apply a more rigorous detection if this turns out to be a // problem. // there are multiple workers run attemptToDeleteItem in // parallel, the circle detection can fail in a race condition. klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity) // 生成一個補丁,該補丁會取消設置item所有ownerReferences的BlockOwnerDeletion字段,避免阻塞item的owner刪除 patch, err := item.unblockOwnerReferencesStrategicMergePatch() if err != nil { return err } //執行patch if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil { return err } break } } //item對象的至少一個owner具有foregroundDeletion Finalizer,並且該對象本身具有依賴項,因此它將在Foreground中刪除 klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity) // the deletion event will be observed by the graphBuilder, so the item // will be processed again in processDeletingDependentsItem. If it // doesn't have dependents, the function will remove the // FinalizerDeletingDependents from the item, resulting in the final // deletion of the item. // graphBuilder將觀察刪除事件,因此將在processDeletingDependentsItem中再次處理該項目。 // 如果沒有依賴項,該函數將從項中刪除foregroundDeletion Finalizer,最終刪除item。 policy := metav1.DeletePropagationForeground return gc.deleteObject(item.identity, &policy) default: // item doesn't have any solid owner, so it needs to be garbage // collected. Also, none of item's owners is waiting for the deletion of // the dependents, so set propagationPolicy based on existing finalizers. // item沒有任何實體所有者,因此需要收集垃圾 。此外,項目的所有者都沒有等待刪除 // 依賴項,因此請根據現有的終結器設置propagationPolicy。 var policy metav1.DeletionPropagation switch { case hasOrphanFinalizer(latest): // if an existing orphan finalizer is already on the object, honor it. //如果現有的孤兒終結器已經在對象上,請尊重它。 policy = metav1.DeletePropagationOrphan case hasDeleteDependentsFinalizer(latest): // if an existing foreground finalizer is already on the object, honor it. //如果現有的前景終結器已經在對象上,請尊重它。 policy = metav1.DeletePropagationForeground default: // otherwise, default to background. //否則,默認爲背景。 policy = metav1.DeletePropagationBackground } klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy) //刪除孤兒對象 return gc.deleteObject(item.identity, &policy) } }複製代碼
主要做以下事情:1、item在刪除中,且爲Orphan和Background方式刪除則直接返回;2、item是foreground方式刪除時,調用processDeletingDependentsItem去處理阻塞其刪除的從資源,將其放到attemptToDelete隊列;3、獲取item的owner對象集,調用classifyReferences將owner集合分爲3類,分別爲solid(owner存在或者終結器不爲foregroundDeletion的owner集合), dangling(已經不存在了的owner集羣), waitingForDependentsDeletion(owner的deletionTimestamp爲非nil,並且爲foregroundDeletion終結器的owner集合)4、switch第一個case:solid集合不爲空,即item存在沒被刪除的owner。當dangling和waitingForDependentsDeletion都爲空,則直接返回;當dangling或waitingForDependentsDeletion不爲空,合併兩個集合uid,執行patch請求,將這些uid對應的ownerReferences從item中刪除5、switch第二個case:waitingForDependentsDeletion集合不爲空,且item有從資源。即item的owner不存在,或正在被foregroundDeletion方式刪除,如果item的從資源正在刪除依賴項,則取消阻止item的owner刪除,給item執行patch請求,最終採用foregroundDeletion方式刪除item;6、switch第三個case:以上條件不符合時,則直接根據item中的終結器刪除item,默認爲Background方式刪除。
往細了說,processDeletingDependentsItem方法獲取item從資源中BlockOwnerDeletion爲true的ownerReferences集合,如果爲空,則移除item的foregroundDeletion終結器。否則遍歷,將未開始刪除的依賴項的從資源dep加入到嘗試刪除隊列attemptToDelete。
//等待其依賴項被刪除的進程項 func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error { //阻塞item資源刪除的從資源列表 blockingDependents := item.blockingDependents() //沒有阻塞item資源刪除的從資源,則移除item資源的foregroundDeletion終結器 if len(blockingDependents) == 0 { klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity) return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents) } //遍歷阻塞item資源刪除的從資源 for _, dep := range blockingDependents { // 如果dep的從資源沒有開始刪除,則將dep加入到嘗試刪除隊列中 if !dep.isDeletingDependents() { klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity) //將從資源加入刪除隊列 gc.attemptToDelete.Add(dep) } } return nil }複製代碼
gc.classifyReferences(item, ownerReferences)方法:遍歷了item的owner列表,調用isDangling方法將已不存在的owner加入到isDangling列表;owner正在被刪除,且owner有foregroundDeletion終結器的加入到waitingForDependentsDeletion列表;owner沒開始刪或者終結器不爲foregroundDeletion的加入到solid列表。
// 將latestReferences分爲三類: // solid:所有者存在,且不是waitingForDependentsDeletion // dangling懸空:所有者不存在 // waitingForDependentsDeletion: 所有者存在,其deletionTimestamp爲非nil,並且有FinalizerDeletingDependents func (gc *GarbageCollector) classifyReferences(item *node, latestReferences []metav1.OwnerReference) ( solid, dangling, waitingForDependentsDeletion []metav1.OwnerReference, err error) { //遍歷該node的owner for _, reference := range latestReferences { //獲取owner是否存在;isDangling爲true表示不存在,發生err則最終將該item加入AddRateLimited attemptToDelete隊列 isDangling, owner, err := gc.isDangling(reference, item) if err != nil { return nil, nil, nil, err } //將不存在的owner加入dangling切片 if isDangling { dangling = append(dangling, reference) continue } //owner存在,獲取accessor ownerAccessor, err := meta.Accessor(owner) if err != nil { return nil, nil, nil, err } //owner正在被刪除,且owner有foregroundDeletion Finalizer if ownerAccessor.GetDeletionTimestamp() != nil && hasDeleteDependentsFinalizer(ownerAccessor) { //owner將等待依賴刪除;收集等待刪除依賴的owner列表 waitingForDependentsDeletion = append(waitingForDependentsDeletion, reference) } else { //owner沒被刪或者終結器不爲foregroundDeletion Finalizer solid = append(solid, reference) } } return solid, dangling, waitingForDependentsDeletion, nil }複製代碼
gc.isDangling(reference, item)方法:先從absentOwnerCache緩存中根據owner uid獲取owner是否存在;如果緩存中沒有,則根據ownerReferences中的參數,構建參數,調用apiserver接口獲取owner對象是否能查到。查到如果uid不匹配,加入absentOwnerCache緩存,並返回false。
// isDangling檢查引用是否指向不存在的對象。 如果isDangling在API服務器上查找引用的對象,它也返回其最新狀態。 func (gc *GarbageCollector) isDangling(reference metav1.OwnerReference, item *node) ( dangling bool, owner *unstructured.Unstructured, err error) { if gc.absentOwnerCache.Has(reference.UID) { klog.V(5).Infof("according to the absentOwnerCache, object %s's owner %s/%s, %s does not exist", item.identity.UID, reference.APIVersion, reference.Kind, reference.Name) return true, nil, nil } // TODO: we need to verify the reference resource is supported by the // system. If it's not a valid resource, the garbage collector should i) // ignore the reference when decide if the object should be deleted, and // ii) should update the object to remove such references. This is to // prevent objects having references to an old resource from being // deleted during a cluster upgrade. resource, namespaced, err := gc.apiResource(reference.APIVersion, reference.Kind) if err != nil { return false, nil, err } // TODO: It's only necessary to talk to the API server if the owner node // is a "virtual" node. The local graph could lag behind the real // status, but in practice, the difference is small. owner, err = gc.dynamicClient.Resource(resource).Namespace(resourceDefaultNamespace(namespaced, item.identity.Namespace)).Get(reference.Name, metav1.GetOptions{}) switch { case errors.IsNotFound(err): gc.absentOwnerCache.Add(reference.UID) klog.V(5).Infof("object %s's owner %s/%s, %s is not found", item.identity.UID, reference.APIVersion, reference.Kind, reference.Name) return true, nil, nil case err != nil: return false, nil, err } if owner.GetUID() != reference.UID { klog.V(5).Infof("object %s's owner %s/%s, %s is not found, UID mismatch", item.identity.UID, reference.APIVersion, reference.Kind, reference.Name) gc.absentOwnerCache.Add(reference.UID) return true, nil, nil } return false, owner, nil }複製代碼
attemptToOrphan隊列
來到代碼:
func (gc *GarbageCollector) runAttemptToOrphanWorker() { for gc.attemptToOrphanWorker() { } }複製代碼
死循環一直從attemptToOrphan隊列中獲取item資源,調用gc.orphanDependents(owner.identity, dependents)方法,從item從資源中刪掉該item的ownerReferences,期間如果發生錯誤,則通過rateLimited重新加回attemptToOrphan隊列。最後移除item中的orphan終結器。
// attemptToOrphanWorker將一個節點從attemptToOrphan中取出,然後根據GC維護的圖找到它的依賴項,然後將其從其依賴項的 // OwnerReferences中刪除,最後更新item以刪除孤兒終結器。如果這些步驟中的任何一個失敗,則將節點添加回attemptToOrphan。 func (gc *GarbageCollector) attemptToOrphanWorker() bool { item, quit := gc.attemptToOrphan.Get() gc.workerLock.RLock() defer gc.workerLock.RUnlock() if quit { return false } defer gc.attemptToOrphan.Done(item) owner, ok := item.(*node) if !ok { utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item)) return true } // we don't need to lock each element, because they never get updated owner.dependentsLock.RLock() dependents := make([]*node, 0, len(owner.dependents)) for dependent := range owner.dependents { dependents = append(dependents, dependent) } owner.dependentsLock.RUnlock() // 處理孤兒 err := gc.orphanDependents(owner.identity, dependents) if err != nil { utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err)) gc.attemptToOrphan.AddRateLimited(item) return true } // update the owner, remove "orphaningFinalizer" from its finalizers list // 移除item的orphan終結器 err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents) if err != nil { utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err)) gc.attemptToOrphan.AddRateLimited(item) } return true }複製代碼
gc.orphanDependents(owner.identity, dependents)方法:遍歷item的從資源,併發的執行patch請求,刪除從資源中和item同uid的ownerReferences,將error加入到errCh channel中,最後給調用者返回error列表:
// dependents are copies of pointers to the owner's dependents, they don't need to be locked. func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error { errCh := make(chan error, len(dependents)) wg := sync.WaitGroup{} wg.Add(len(dependents)) for i := range dependents { go func(dependent *node) { defer wg.Done() // the dependent.identity.UID is used as precondition patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID) _, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) { return gc.deleteOwnerRefJSONMergePatch(n, owner.UID) }) // note that if the target ownerReference doesn't exist in the // dependent, strategic merge patch will NOT return an error. if err != nil && !errors.IsNotFound(err) { errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err) } }(dependents[i]) } wg.Wait() close(errCh) var errorsSlice []error for e := range errCh { errorsSlice = append(errorsSlice, e) } if len(errorsSlice) != 0 { return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error()) } klog.V(5).Infof("successfully updated all dependents of owner %s", owner) return nil }複製代碼
deleteOwnerRefStrategicMergePatch方法:拼接patch請求參數。該方法同樣的,在處理attemptToDelete死循中,第一個switch case處被調用。
func deleteOwnerRefStrategicMergePatch(dependentUID types.UID, ownerUIDs ...types.UID) []byte { var pieces []string //拼接需要刪除的uid for _, ownerUID := range ownerUIDs { pieces = append(pieces, fmt.Sprintf(`{"$patch":"delete","uid":"%s"}`, ownerUID)) } //拼接patch請求參數 patch := fmt.Sprintf(`{"metadata":{"ownerReferences":[%s],"uid":"%s"}}`, strings.Join(pieces, ","), dependentUID) return []byte(patch) }複製代碼
回到初衷
中間件redis容器化後,在測試環境上部署的redis集羣,在kubernetes apiserver重啓後,redis集羣被異常刪除(包括redis exporter statefulset、redis statefulset)。
原因定位
在開發環境上經多次復現,apiserver重啓後,通過查詢redis operator日誌,並沒有發現主動去刪除redis集羣(redis statefulset)、監控實例(redis exporter)。進一步去查看kube-controller-manager的日誌,將其日誌級別設置--v=5,繼續復現,最終在kube-controller-manager日誌中發現如下日誌:
可以看到,垃圾回收器garbage collector在處理redis exporter statefulset時,發現其加了ownerReferences,在exporter所在分區(monitoring)查詢其owner——redisCluster對象redis-0826,而redisCluster對象redis-0826存在於kube-system分區,所以在monitoring分區查詢到的是404 Not Found,garbage collector會將該owner不存在信息(uid)存入緩存absentOwnerCache。因redis exporter statefulset的owner不存在,所以gc認爲需要回收垃圾,故將其刪除掉。同理,當處理redis statefulset時,從緩存中發現owner不存在,也會回收垃圾,將其刪除掉。
經過多次復現故障,發現重啓kube-controller-manager時有概率復現。(Apiserver的重啓時,kube-controller-manager在連接apiserver失敗多次後,也會發生自重啓),之所以是概率問題,這和garbage collector將資源對象加入attemptToDelete隊列的順序有關:
先同步monitoring分區的exporter statefulset,後同步kube-system分區的redis statefulset,就會出現該故障;反之就不會出現故障,這取決於garbage collector啓動時全量獲取集羣內資源(listwatch)的順序。在apiserver和kube-controller-manager正常運行時不出現該故障,可以從garbage collector源碼中看到以下代碼邏輯: Garbage collector中維護一個父子關係圖表,controller-manager啓動時該圖裏節點是不存在的,會走上圖switch的第一個case,之後圖形成之後,會走第二個case。第二個case裏只有在owner發生變化時纔會觸發將資源對象加入attemptToDelete隊列,所以在各個組件正常運行時沒有出現該故障。
獲取圖表的接口地址,IP和端口都是controller-manager的,可以重定向到tmp.dot文件dot.exe
curl http://127.0.0.1:10252/debug/controllers/garbagecollector/graph curl http://127.0.0.1:10252/debug/controllers/garbagecollector/graph?uid=11211212edsaddkqedmk12複製代碼
之後用可視化工具Graphviz軟件,進入到bin目錄下,執行以下命令生成svg文件,用瀏覽器打開,Graphviz和dot的使用可以自行谷歌。
dot -Tsvg -o graph2.svg tmp.dot複製代碼
解決方法
在redis operator創建redis集羣時,將exporter放到和redis同一分區。
思考反思
1、出現該故障,主要是因進行了跨命名空間owner引用。在使用垃圾回收機制時,應該儘量參考kubernetes官方網站中的說明.如下,官網中說明了owner引用在設計時就不允許跨namespace使用,這意味着:
1)命名空間範圍的從屬只能指定同一命名空間中的所有者,以及羣集範圍的所有者。
2)羣集作用域的從屬只能指定羣集作用域的所有者,而不能指定命名空間作用域的所有者。
參考文檔
垃圾回收官方文檔:
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/
詳解 Kubernetes 垃圾收集器的實現原理:
https://draveness.me/kubernetes-garbage-collector#
本公衆號免費 提供csdn下載服務,海量IT學習資源, 如果你準備入IT坑,勵志成爲優秀的程序猿,那麼這些資源很適合你,包括但不限於java、go、python、springcloud、elk、嵌入式 、大數據、面試資料、前端 等資源。同時我們組建了一個技術交流羣,裏面有很多大佬,會不定時分享技術文章,如果你想來一起學習提高,可以公衆號後臺回覆【 2 】,免費邀請加技術交流羣互相學習提高,會不定期分享編程IT相關資源。
掃碼關注,精彩內容第一時間推給你