Commit Graph

96 Commits

Author SHA1 Message Date
6648d0070c PV grow workaround
All checks were successful
continuous-integration/drone Build is passing
2022-11-28 22:40:20 +02:00
439954e3ed Add Drone config 2022-11-28 19:07:45 +02:00
Mehran Kholdi
4e0a4fe698 Release 0.8.0 2022-03-30 13:20:57 +04:30
Mehran Kholdi
0a130f42ff Support creating snapshots from btrfs volumes 2022-03-30 13:15:46 +04:30
Mehran Kholdi
c978b3290b Update base python version 2022-03-30 12:47:15 +04:30
Mehran Kholdi
22f2fb1628 Neat: code cleanup 2022-03-30 12:47:14 +04:30
Mehran Kholdi
2d1fa49b2a Delete task pods even upon failure
To prevent cluttering the namespace with lots of failing task pods.
2022-01-22 00:19:00 +03:30
Mehran Kholdi
ac45d74b7c Do not log GetCapacity requests
These are run periodically and not particularly interesting.
2021-11-19 19:25:25 +03:30
Mehran Kholdi
63c8eb44ba Fix race condition that was causing dangling loop devices
Apparently it is wrong to assume that `DeleteVolume` gets called
only after `UnstageVolume` returns success. This was causing the
disk image file to be deleted while the volume was still mounted.
This would prevent the loop device from getting detached and in
turn disk space from getting reclaimed.
2021-11-19 18:59:49 +03:30
Mehran Kholdi
9d5ed19d7b Fix bug with negative capacity in overprovisioned disks 2021-10-07 18:09:59 +03:30
Mehran Kholdi
520864be1a Release 0.7.0 2021-10-07 17:13:24 +03:30
Mehran Kholdi
110dee7d3d Enable "Storage Capacity Tracking" 2021-10-02 21:22:25 +03:30
Mehran Kholdi
45d1ab1aa3 Refuse to create/resize volumes in case of insufficient disk space 2021-10-02 14:43:26 +03:30
Mehran Kholdi
50437acf16 Increase controller's timeout to prevent retry loops
Since remote tasks might get a bit longer to get scheduled, it's
reasonable to increase this timeout. Specifically, we faced an
issue with a `DeleteVolume` action timing out over and over since
it was running a bit over the default timeout.
2021-08-03 01:02:50 +04:30
Mehran Kholdi
877e90e034 Expose volume stats as prometheus metrics
This should help in:

- Keeping track of deleted PVs with `Retain` policy
- Detecting disk overprovisioning
2021-07-05 00:00:10 +04:30
Mehran Kholdi
2b6a0a33b8 Refactor: Extract utility functions out of metrics module 2021-07-04 23:15:50 +04:30
Mehran Kholdi
c651f69e9c Specifiy fs type in mount commands 2021-07-04 23:15:50 +04:30
Mehran Kholdi
2fb84efb6d Neat: reformat code using black 2021-07-02 20:31:34 +04:30
Mehran Kholdi
7717264801 Update CSI proto to 1.5.0 2021-07-02 20:31:34 +04:30
Mehran Kholdi
e585684502 Release 0.5.0 2021-07-01 23:48:23 +04:30
Mehran Kholdi
6d8c7738f3 Do not create volumes smaller than 16MiB
XFS fails in formatting the volume with the following error:

```
agsize (2560 blocks) too small, need at least 4096 blocks
```
2021-07-01 23:48:23 +04:30
Mehran Kholdi
eff26e8c3e Drop support for k8s <1.19
So that we can:
* Rely on existence of newer features
* Update external components' images
2021-07-01 23:48:23 +04:30
Mehran Kholdi
c454a51ccd Nit: cleanup e2e test scripts 2021-07-01 23:48:23 +04:30
Mehran Kholdi
4d6d83c24a Support xfs filesystem 2021-07-01 22:34:20 +04:30
Mehran Kholdi
7c7e8eb4ce btrfs: Change default subvol upon creation
The default root subvol comes with its own limitations and it might be
better off changing the default subvol upon creation. This should also
let us create hidden subvols that may be used for storing snapshots,
without exposing them to the end-user.
2021-07-01 22:34:20 +04:30
Mehran Kholdi
c11646e08c btrfs: Mount with flushoncommit flag
Not an expert on this, but my understanding is that without this flag,
outages will result in a state that despite being consistent, most
applications are not mature enough to handle. Namely, we ran benchmarks
that reproduced appearance of zero-length files upon sudden poweroffs.

Databases should be fine since they know well about the guarantees the
filesystem must provide, but not applications are databases. So let's
play safe about this.

See:
- https://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/
- https://github.com/Zygo/bees/issues/68#issuecomment-403262059
2021-06-26 03:51:40 +04:30
Mehran Kholdi
d52f8ffbe0 ext4: Do not reserve free space for root user upon creation
PVCs are data volumes most of the times, and reserving space for system
tasks is probably unnecessary.

The user can still modify a specific PVC's reserved blocks through the
`tune2fs` command.
2021-06-26 03:51:40 +04:30
Mehran Kholdi
87e78705b1 Report "available" space rather than "free" space in volume stats
These two numbers may differ, and having the wrong number may result in
a volume having no useable space, while the metrics suggest it does.
2021-06-26 03:51:40 +04:30
Mehran Kholdi
1cd4ca3d1f Refactor: code cleanup 2021-06-26 02:51:02 +04:30
Mehran Kholdi
89de295293 Fix race condition by making the scrub function idempotent
Under certain situtations, a race condition could lead to pvc deletion
tasks getting stuck in a failing state.
2021-06-26 02:50:39 +04:30
Mehran Kholdi
8db829ed6e Update dependencies 2021-06-26 01:14:00 +04:30
Mehran Kholdi
fd2e59929b Fix bug with online resizing btrfs filesystems having non-default subvol
```
Command 'losetup -c /dev/loop0[/default]' returned non-zero exit status 1.
```
2021-03-01 13:45:25 +03:30
Mehran Kholdi
5dc8afc0a6 Fix bug that was preventing btrfs filesystems from being resized 2021-03-01 08:38:31 +03:30
Mehran Kholdi
d203eba5a9 Release 0.4.4 2021-02-26 17:56:47 +03:30
Mehran Kholdi
5edcdff216 Fix #5: Actually delete PVC image files 2021-02-26 16:10:10 +03:30
Mehran Kholdi
8bbb30a2e1 Release 0.4.3 2021-02-13 02:40:42 +03:30
Hanieh Marvi
bd68bd6e64 Fix typo 2021-02-13 02:03:04 +03:30
Hanieh Marvi
ba7f4c1b7f Remove requests from tasks
So pods do not stay in pending state because of lack of resources.
2021-02-13 02:03:04 +03:30
Hanieh Marvi
8424536588 Set resources for sidecar container 2021-02-13 02:03:04 +03:30
Mehran Kholdi
ab50217ea5 Release 0.4.2 2021-01-16 04:01:22 +03:30
Mehran Kholdi
b4faf9d7cb Expose volume metrics through gRPC calls rather than metrics endpoint 2021-01-16 03:58:08 +03:30
Mehran Kholdi
c58dd14bf7 Extract blockdevice-to-filesystem logic from rawfile servicer
Summary: So that it's possible to use it with any other blockdevice provider.

Test Plan: N/A

Reviewers: sina_rad, h.marvi, mhyousefi, s.afshari

Differential Revision: https://phab.hamravesh.ir/D870
2021-01-16 03:58:08 +03:30
Mehran Kholdi
01a35354b6 Fix a bug where broken symlinks where not being cleaned up
See: https://docs.python.org/3/library/pathlib.html#pathlib.Path.exists
"Note If the path points to a symlink, exists() returns whether the symlink points to an existing file or directory."
2021-01-16 03:45:09 +03:30
Mehran Kholdi
c2110108cb Change conditions upon which e2e test are run 2020-11-28 04:50:30 +03:30
Mehran Kholdi
9bafb101ac Remove liveness probes 2020-11-28 04:50:11 +03:30
Mehran Kholdi
05c661165f Fix ci setup script
So that it does not explicitly depend on travis
2020-11-08 01:46:08 +03:30
Mehran Kholdi
b88fd0cfdf Release 0.4.1 2020-09-11 20:45:17 +04:30
Mehran Kholdi
23c7912977 Update chart's default values 2020-09-11 20:44:56 +04:30
Mehran Kholdi
a2cf384d4f Make logs less noisy 2020-09-11 20:44:40 +04:30
Mehran Kholdi
6fde8e0271 Update external csi sidecar containers 2020-09-11 20:44:29 +04:30