Customization
Overview
Eraser uses a configmap to configure its behavior. The configmap is part of the deployment and it is not necessary to deploy it manually. Once deployed, the configmap can be edited at any time:
kubectl edit configmap --namespace eraser-system eraser-manager-config
If an eraser job is already running, the changes will not take effect until the job completes. The configuration is in yaml.
Key Concepts
Basic architecture
The manager runs as a pod in your cluster and manages ImageJobs. Think of an ImageJob as a unit of work, performed on every node in your cluster. Each node runs a sub-job. The goal of the ImageJob is to assess the images on your cluster's nodes, and to remove the images you don't want. There are two stages:
- Assessment
- Removal.
Scheduling
An ImageJob can either be created on-demand (see Manual Removal), or they can be spawned on a timer like a cron job. On-demand jobs skip the assessment stage and get right down to the business of removing the images you specified. The behavior of an on-demand job is quite different from that of timed jobs.
Fault Tolerance
Because an ImageJob runs on every node in your cluster, and the conditions on
each node may vary widely, some of the sub-jobs may fail. If you cannot
tolerate any failure, set the manager.imageJob.successRatio
property to
1.0
. If 75% success sounds good to you, set it to 0.75
. In that case, if
fewer than 75% of the pods spawned by the ImageJob report success, the job as
a whole will be marked as a failure.
This is mainly to help diagnose error conditions. As such, you can set
manager.imageJob.cleanup.delayOnFailure
to a long value so that logs can be
captured before the spawned pods are cleaned up.
Excluding Nodes
For various reasons, you may want to prevent Eraser from scheduling pods on
certain nodes. To do so, the nodes can be given a special label. By default,
this label is eraser.sh/cleanup.filter
, but you can configure the behavior with
the options under manager.nodeFilter
. The table provides more detail.
Configuring Components
An ImageJob is made up of various sub-jobs, with one sub-job for each node. These sub-jobs can be broken down further into three stages.
- Collection (What is on the node?)
- Scanning (What images conform to the policy I've provided?)
- Removal (Remove images based on the results of the above)
Of the above stages, only Removal is mandatory. The others can be disabled. Furthermore, manually triggered ImageJobs will skip right to removal, even if Eraser is configured to collect and scan. Collection and Scanning will only take place when:
- The collector and/or scanner
components
are enabled, AND - The job was not triggered manually by creating an ImageList.
Disabling scanner will remove all non-running images by default.
Swapping out components
The collector, scanner, and remover components can all be swapped out. This
enables you to build and host the images yourself. In addition, the scanner's
behavior can be completely tailored to your needs by swapping out the default
image with one of your own. To specify the images, use the
components.<component>.image.repo
and components.<component>.image.tag
,
where <component>
is one of collector
, scanner
, or remover
.
Universal Options
The following portions of the configmap apply no matter how you spawn your ImageJob. The values provided below are the defaults. For more detail on these options, see the table.
manager:
runtime:
name: containerd
address: unix:///run/containerd/containerd.sock
otlpEndpoint: "" # empty string disables OpenTelemetry
logLevel: info
profile:
enabled: false
port: 6060
imageJob:
successRatio: 1.0
cleanup:
delayOnSuccess: 0s
delayOnFailure: 24h
pullSecrets: [] # image pull secrets for collector/scanner/remover
priorityClassName: "" # priority class name for collector/scanner/remover
additionalPodLabels: {}
extraScannerVolumes: {}
extraScannerVolumeMounts: {}
nodeFilter:
type: exclude # must be either exclude|include
selectors:
- eraser.sh/cleanup.filter
- kubernetes.io/os=windows
components:
remover:
image:
repo: ghcr.io/eraser-dev/remover
tag: v1.0.0
request:
mem: 25Mi
cpu: 0
limit:
mem: 30Mi
cpu: 1000m
Component Options
components:
collector:
enabled: true
image:
repo: ghcr.io/eraser-dev/collector
tag: v1.0.0
request:
mem: 25Mi
cpu: 7m
limit:
mem: 500Mi
cpu: 0
scanner:
enabled: true
image:
repo: ghcr.io/eraser-dev/eraser-trivy-scanner
tag: v1.0.0
request:
mem: 500Mi
cpu: 1000m
limit:
mem: 2Gi
cpu: 0
config: |
# this is the schema for the provided 'trivy-scanner'. custom scanners
# will define their own configuration. see the below
remover:
image:
repo: ghcr.io/eraser-dev/remover
tag: v1.0.0
request:
mem: 25Mi
cpu: 0
limit:
mem: 30Mi
cpu: 1000m
Scanner Options
These options can be provided to components.scanner.config
. They will be
passed through as a string to the scanner container and parsed there. If you
want to configure your own scanner, you must provide some way to parse this.
Below are the values recognized by the provided eraser-trivy-scanner
image.
Values provided below are the defaults.
cacheDir: /var/lib/trivy # The file path inside the container to store the cache
dbRepo: ghcr.io/aquasecurity/trivy-db # The container registry from which to fetch the trivy database
deleteFailedImages: true # if true, remove images for which scanning fails, regardless of why it failed
deleteEOLImages: true # if true, remove images that have reached their end-of-life date
vulnerabilities:
ignoreUnfixed: true # consider the image compliant if there are no known fixes for the vulnerabilities found.
types: # a list of vulnerability types. for more info, see trivy's documentation.
- os
- library
securityChecks: # see trivy's documentation for more information
- vuln
severities: # in this case, only flag images with CRITICAL vulnerability for removal
- CRITICAL
ignoredStatuses: # a list of trivy statuses to ignore. See https://aquasecurity.github.io/trivy/v0.44/docs/configuration/filtering/#by-status.
timeout:
total: 23h # if scanning isn't completed before this much time elapses, abort the whole scan
perImage: 1h # if scanning a single image exceeds this time, scanning will be aborted
Detailed Options
Option | Description | Default |
---|---|---|
manager.runtime.name | The runtime to use for the manager's containers. Must be one of containerd, crio, or dockershim. It is assumed that your nodes are all using the same runtime, and there is currently no way to configure multiple runtimes. | containerd |
manager.runtime.address | The runtime socket address to use for the containers. Can provide a custom address for containerd and dockershim runtimes, but not for crio due to Trivy restrictions. | unix:///run/containerd/containerd.sock |
manager.otlpEndpoint | The endpoint to send OpenTelemetry data to. If empty, data will not be sent. | "" |
manager.logLevel | The log level for the manager's containers. Must be one of debug, info, warn, error, dpanic, panic, or fatal. | info |
manager.scheduling.repeatInterval | Use only when collector ando/or scanner are enabled. This is like a cron job, and will spawn an ImageJob at the interval provided. | 24h |
manager.scheduling.beginImmediately | If set to true, the fist ImageJob will run immediately. If false, the job will not be spawned until after the interval (above) has elapsed. | true |
manager.profile.enabled | Whether to enable profiling for the manager's containers. This is for debugging with go tool pprof . | false |
manager.profile.port | The port on which to expose the profiling endpoint. | 6060 |
manager.imageJob.successRatio | The ratio of successful image jobs required before a cleanup is performed. | 1.0 |
manager.imageJob.cleanup.delayOnSuccess | The amount of time to wait after a successful image job before performing cleanup. | 0s |
manager.imageJob.cleanup.delayOnFailure | The amount of time to wait after a failed image job before performing cleanup. | 24h |
manager.pullSecrets | The image pull secrets to use for collector, scanner, and remover containers. | [] |
manager.priorityClassName | The priority class to use for collector, scanner, and remover containers. | "" |
manager.additionalPodLabels | Additional labels for all pods that the controller creates at runtime. | {} |
manager.nodeFilter.type | The type of node filter to use. Must be either "exclude" or "include". | exclude |
manager.nodeFilter.selectors | A list of selectors used to filter nodes. | [] |
components.collector.enabled | Whether to enable the collector component. | true |
components.collector.image.repo | The repository containing the collector image. | ghcr.io/eraser-dev/collector |
components.collector.image.tag | The tag of the collector image. | v1.0.0 |
components.collector.request.mem | The amount of memory to request for the collector container. | 25Mi |
components.collector.request.cpu | The amount of CPU to request for the collector container. | 7m |
components.collector.limit.mem | The maximum amount of memory the collector container is allowed to use. | 500Mi |
components.collector.limit.cpu | The maximum amount of CPU the collector container is allowed to use. | 0 |
components.scanner.enabled | Whether to enable the scanner component. | true |
components.scanner.image.repo | The repository containing the scanner image. | ghcr.io/eraser-dev/eraser-trivy-scanner |
components.scanner.image.tag | The tag of the scanner image. | v1.0.0 |
components.scanner.request.mem | The amount of memory to request for the scanner container. | 500Mi |
components.scanner.request.cpu | The amount of CPU to request for the scanner container. | 1000m |
components.scanner.limit.mem | The maximum amount of memory the scanner container is allowed to use. | 2Gi |
components.scanner.limit.cpu | The maximum amount of CPU the scanner container is allowed to use. | 0 |
components.scanner.config | The configuration to pass to the scanner container, as a YAML string. | See YAML below |
components.scanner.volumes | Extra volumes for scanner. | {} |
components.remover.image.repo | The repository containing the remover image. | ghcr.io/eraser-dev/remover |
components.remover.image.tag | The tag of the remover image. | v1.0.0 |
components.remover.request.mem | The amount of memory to request for the remover container. | 25Mi |
components.remover.request.cpu | The amount of CPU to request for the remover container. | 0 |