Commit Graph

99 Commits

Author SHA1 Message Date
Lauri Võsandi f3f5549c09 Enable multiarch
continuous-integration/drone Build is passing Details
2023-02-19 13:11:44 +02:00
Lauri Võsandi 9c4cda7c0a Deprecate disk exhausted errors
continuous-integration/drone Build was killed Details
2023-01-22 12:19:20 +02:00
Lauri Võsandi 1a005c89e2 Add Drone config
continuous-integration/drone Build is failing Details
2023-01-21 12:56:43 +02:00
Lauri Võsandi 17f653b15f Disable free disk space checks 2023-01-21 12:51:31 +02:00
Mehran Kholdi 3db48a0fd2 Release 0.8.0 2023-01-21 12:51:31 +02:00
Mehran Kholdi dc60350292 Support creating snapshots from btrfs volumes 2023-01-21 12:51:31 +02:00
Mehran Kholdi b12bbde73a Update base python version 2023-01-21 12:51:31 +02:00
Mehran Kholdi d389ee270d Neat: code cleanup 2023-01-21 12:51:31 +02:00
Mehran Kholdi 14fb741bdc Delete task pods even upon failure
To prevent cluttering the namespace with lots of failing task pods.
2023-01-21 12:51:31 +02:00
Lauri Võsandi 6ab8470221 Fix xfs_grow arguments
Signed-off-by: Lauri Võsandi <lauri@k-space.ee>
2022-11-29 07:09:18 +02:00
Mehran Kholdi ac45d74b7c Do not log `GetCapacity` requests
These are run periodically and not particularly interesting.
2021-11-19 19:25:25 +03:30
Mehran Kholdi 63c8eb44ba Fix race condition that was causing dangling loop devices
Apparently it is wrong to assume that `DeleteVolume` gets called
only after `UnstageVolume` returns success. This was causing the
disk image file to be deleted while the volume was still mounted.
This would prevent the loop device from getting detached and in
turn disk space from getting reclaimed.
2021-11-19 18:59:49 +03:30
Mehran Kholdi 9d5ed19d7b Fix bug with negative capacity in overprovisioned disks 2021-10-07 18:09:59 +03:30
Mehran Kholdi 520864be1a Release 0.7.0 2021-10-07 17:13:24 +03:30
Mehran Kholdi 110dee7d3d Enable "Storage Capacity Tracking" 2021-10-02 21:22:25 +03:30
Mehran Kholdi 45d1ab1aa3 Refuse to create/resize volumes in case of insufficient disk space 2021-10-02 14:43:26 +03:30
Mehran Kholdi 50437acf16 Increase controller's timeout to prevent retry loops
Since remote tasks might get a bit longer to get scheduled, it's
reasonable to increase this timeout. Specifically, we faced an
issue with a `DeleteVolume` action timing out over and over since
it was running a bit over the default timeout.
2021-08-03 01:02:50 +04:30
Mehran Kholdi 877e90e034 Expose volume stats as prometheus metrics
This should help in:

- Keeping track of deleted PVs with `Retain` policy
- Detecting disk overprovisioning
2021-07-05 00:00:10 +04:30
Mehran Kholdi 2b6a0a33b8 Refactor: Extract utility functions out of metrics module 2021-07-04 23:15:50 +04:30
Mehran Kholdi c651f69e9c Specifiy fs type in mount commands 2021-07-04 23:15:50 +04:30
Mehran Kholdi 2fb84efb6d Neat: reformat code using black 2021-07-02 20:31:34 +04:30
Mehran Kholdi 7717264801 Update CSI proto to 1.5.0 2021-07-02 20:31:34 +04:30
Mehran Kholdi e585684502 Release 0.5.0 2021-07-01 23:48:23 +04:30
Mehran Kholdi 6d8c7738f3 Do not create volumes smaller than 16MiB
XFS fails in formatting the volume with the following error:

```
agsize (2560 blocks) too small, need at least 4096 blocks
```
2021-07-01 23:48:23 +04:30
Mehran Kholdi eff26e8c3e Drop support for k8s <1.19
So that we can:
* Rely on existence of newer features
* Update external components' images
2021-07-01 23:48:23 +04:30
Mehran Kholdi c454a51ccd Nit: cleanup e2e test scripts 2021-07-01 23:48:23 +04:30
Mehran Kholdi 4d6d83c24a Support xfs filesystem 2021-07-01 22:34:20 +04:30
Mehran Kholdi 7c7e8eb4ce btrfs: Change default subvol upon creation
The default root subvol comes with its own limitations and it might be
better off changing the default subvol upon creation. This should also
let us create hidden subvols that may be used for storing snapshots,
without exposing them to the end-user.
2021-07-01 22:34:20 +04:30
Mehran Kholdi c11646e08c btrfs: Mount with `flushoncommit` flag
Not an expert on this, but my understanding is that without this flag,
outages will result in a state that despite being consistent, most
applications are not mature enough to handle. Namely, we ran benchmarks
that reproduced appearance of zero-length files upon sudden poweroffs.

Databases should be fine since they know well about the guarantees the
filesystem must provide, but not applications are databases. So let's
play safe about this.

See:
- https://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/
- https://github.com/Zygo/bees/issues/68#issuecomment-403262059
2021-06-26 03:51:40 +04:30
Mehran Kholdi d52f8ffbe0 ext4: Do not reserve free space for root user upon creation
PVCs are data volumes most of the times, and reserving space for system
tasks is probably unnecessary.

The user can still modify a specific PVC's reserved blocks through the
`tune2fs` command.
2021-06-26 03:51:40 +04:30
Mehran Kholdi 87e78705b1 Report "available" space rather than "free" space in volume stats
These two numbers may differ, and having the wrong number may result in
a volume having no useable space, while the metrics suggest it does.
2021-06-26 03:51:40 +04:30
Mehran Kholdi 1cd4ca3d1f Refactor: code cleanup 2021-06-26 02:51:02 +04:30
Mehran Kholdi 89de295293 Fix race condition by making the scrub function idempotent
Under certain situtations, a race condition could lead to pvc deletion
tasks getting stuck in a failing state.
2021-06-26 02:50:39 +04:30
Mehran Kholdi 8db829ed6e Update dependencies 2021-06-26 01:14:00 +04:30
Mehran Kholdi fd2e59929b Fix bug with online resizing btrfs filesystems having non-default subvol
```
Command 'losetup -c /dev/loop0[/default]' returned non-zero exit status 1.
```
2021-03-01 13:45:25 +03:30
Mehran Kholdi 5dc8afc0a6 Fix bug that was preventing btrfs filesystems from being resized 2021-03-01 08:38:31 +03:30
Mehran Kholdi d203eba5a9 Release 0.4.4 2021-02-26 17:56:47 +03:30
Mehran Kholdi 5edcdff216 Fix #5: Actually delete PVC image files 2021-02-26 16:10:10 +03:30
Mehran Kholdi 8bbb30a2e1 Release 0.4.3 2021-02-13 02:40:42 +03:30
Hanieh Marvi bd68bd6e64 Fix typo 2021-02-13 02:03:04 +03:30
Hanieh Marvi ba7f4c1b7f Remove requests from tasks
So pods do not stay in pending state because of lack of resources.
2021-02-13 02:03:04 +03:30
Hanieh Marvi 8424536588 Set resources for sidecar container 2021-02-13 02:03:04 +03:30
Mehran Kholdi ab50217ea5 Release 0.4.2 2021-01-16 04:01:22 +03:30
Mehran Kholdi b4faf9d7cb Expose volume metrics through gRPC calls rather than metrics endpoint 2021-01-16 03:58:08 +03:30
Mehran Kholdi c58dd14bf7 Extract blockdevice-to-filesystem logic from rawfile servicer
Summary: So that it's possible to use it with any other blockdevice provider.

Test Plan: N/A

Reviewers: sina_rad, h.marvi, mhyousefi, s.afshari

Differential Revision: https://phab.hamravesh.ir/D870
2021-01-16 03:58:08 +03:30
Mehran Kholdi 01a35354b6 Fix a bug where broken symlinks where not being cleaned up
See: https://docs.python.org/3/library/pathlib.html#pathlib.Path.exists
"Note If the path points to a symlink, exists() returns whether the symlink points to an existing file or directory."
2021-01-16 03:45:09 +03:30
Mehran Kholdi c2110108cb Change conditions upon which e2e test are run 2020-11-28 04:50:30 +03:30
Mehran Kholdi 9bafb101ac Remove liveness probes 2020-11-28 04:50:11 +03:30
Mehran Kholdi 05c661165f Fix ci setup script
So that it does not explicitly depend on travis
2020-11-08 01:46:08 +03:30
Mehran Kholdi b88fd0cfdf Release 0.4.1 2020-09-11 20:45:17 +04:30