Health Checks¶
Health checks are the heart of ExaCheck; a list of health checks can be defined in the checks
key.
There are two categories of health checks:
- Local: Local checks are for scripts executed on the local machine or testing if a file exists/doesn't exist.
- Remote: Remote checks are anything that uses the network. This may be a HTTP request, DNS request etc. The remote checks have additional options not available to local checks.
Configuration Keys¶
The following top level configuration keys apply to health checks.
Key | Type | Default |
---|---|---|
name |
String | undef |
args |
Dict | undef |
prefixes |
List[IPNetwork] | undef |
nexthop |
String | self |
interval |
Integer/Float | 15 |
rise |
Integer | 3 |
fall |
Integer | 3 |
description |
Optional String | undef |
path_id |
Optional Integer/IPv4 Address | undef |
metric |
Optional Integer | undef |
metric_down |
Optional Integer | undef |
local_preference |
Optional Integer | undef |
disable |
Optional String | undef |
neighbors |
Optional List[IPAddress] | undef |
communities |
Optional List[String] | undef |
as_path |
Optional String | undef |
Name¶
The name of this health check. The name is used for logging purposes as well as optional filtering of logs/notifications.
Args¶
The args
key contains the actual arguments for each health check. These are things such as the health check type and parameters for the service to be marked as up.
Warning
The args
key must contain a method
value. The method is used to configure which health check needs to run. The current list of available health check methods are:
Method Name | Description |
---|---|
dns |
Query a DNS server and optionally validate the response. |
file |
Test if a file path exists or doesn't exist. |
http |
Send HTTP or HTTPS requests and validate the response and/or SSL certificates. |
icmp |
Send ICMP ping requests and validate the latency is within an acceptable range. |
ntp |
NTP health check for time servers. |
shell |
Run the supplied script and validate the exit code returned is successful. |
tcp |
Validate a TCP connection can be opened to the supplied host/port. |
Both local and remote health checks support the timeout
key which defines the time limit for how long a check execution can take. This timeout prevents a health check being in a stuck condition; the check will be terminated if the timeout is reached. By default the timeout is set to 10 seconds.
Remote Check Args¶
Remote health check methods (such as http
, icmp
, dns
) support the following common args:
Key | Type | Default |
---|---|---|
host |
String | undef |
address_family |
Optional String | undef |
all_valid |
Bool | False |
Host¶
The host
field contains the IP address or hostname that the health check should be executed against. If a hostname is supplied, each health check execution will perform a DNS resolution to ensure the current IP is used.
As the hostname may resolve to multiple addresses and/or address families (eg. IPv4 and IPv6) the address_family
and all_valid
options can be used to control what happens in those situations.
Warning
If specifying a host name instead of an IP address, temporary DNS resolution errors will cause the health check to fail. Specify an IP address to avoid this behaviour.
Address Family¶
The address_family
key is used when host
is set to a hostname. Should the hostname resolve to an IPv4 and IPv6 address you may want the check to only be sent to a single address family rather than both.
The values ipv4
or ipv6
are supported. If not defined there is no filtering for IPv4 or IPv6 addresses applied.
All Valid¶
The all_valid
key is used when host
is set to a hostname. If the hostname resolves to multiple IP addresses and all_valid
is set to True
, the health check will be executed against all IP addresses available. Should the health check to any IP address fail the service will be marked as down.
If set to the default False
value a successful health check from any IP address is considered valid and the service will be marked up.
Prefixes¶
The prefixes
array contains the list of IP addresses or networks to advertise when the check is considered up. IPv4 or IPv6 addresses/networks may be defined. The addresses/networks must all be of the same address family; you cannot mix IPv4 and IPv6 in the same health check (instead you would define an IPv4 and IPv6 health check).
Warning
If the nexthop
value is set to an IP address, the addresses/networks must be of the same address family.
Nexthop¶
The nexthop
attribute defines the next hop to advertise for the addresses/prefixes for the service. May be set to an IPv4 or IPv6 address or the value self
.
Interval¶
The interval
value defines how often the health check is executed in seconds.
Rise¶
The rise
value defines how many health checks must be successful in a row before the service is considered as up.
Fall¶
The fall
value defines how many health checks must fail in a row before the service is considered as down.
Description¶
The description
is an optional value which is not used by ExaCheck; you may enter a description to make the configuration easier to read.
Path ID¶
The path_id
attribute can be used for handling of ECMP routes. If there is a single server running ExaCheck and there are two different health checks which advertise the same prefix, the path ID must be set to identify the route. If it is not defined it will result in unpredictable behaviour for announcements/withdrawals and ECMP may fail to work depending on the router vendor.
Metric¶
If the metric
value is set, routes will be advertised with this MED value.
Metric Down¶
If the metric_down
value is set, rather than withdrawing routes with failed health checks the route will be announced with the supplied metric.
Danger
This feature is currently not working.
Local Preference¶
The local_preference
value may be defined to advertise routes with a specific local preference.
Disable¶
The disable
key may be set to a path on the local host filesystem. If the file path exists it will result in the service being marked as down and the routes withdrawn.
This can be used to take down a service manually by touching a file (eg. for maintenance).
Neighbors¶
By default, ExaBGP will advertise the supplied route to all available neighbors. If you have multiple neighbors configured and you want to filter an advertisement to one or more specific neighbors only, set the neighbors
attribute to a list of IP addresses.
Communities¶
To add BGP communities to advertised routes set the communities
attribute to a list of BGP community values.
Regular, large and extended BGP communities are supported.
AS Path¶
To override the AS path of advertised routes you may set the as_path value
.
Examples¶
These are some configuration examples of some health checks.
---
# Various example health checks
checks:
- name: Example DNS Service
description: Perform a basic SOA query for example.com to 192.0.2.255. If the query returns a response, 192.0.2.255 would be advertised with BGP.
args:
method: dns
host: 192.0.2.255 # Note DNS queries are being sent to 192.0.2.255 which should be bound to loopback
query: example.com
prefixes:
- 192.0.2.255
nexthop: self
- name: Example Basic ICMP Check
description: Ensure that the IP 8.8.8.8 responds to ICMP ping requests
args:
method: icmp
host: 8.8.8.8
prefixes:
- 192.0.2.255
nexthop: self
- name: Example file exists check
description: Verify that the file path "/var/run/exists" exists
args:
method: file
path: /var/run/exists
prefixes:
- 192.0.2.255
nexthop: self
- name: Example Basic Shell Check
description: Run the command "true" (will always return exit code 0)
args:
method: shell
command: true
prefixes:
- 192.0.2.255
nexthop: self