149 lines
3.7 KiB
Markdown
149 lines
3.7 KiB
Markdown
# Apache2 logide parsimine
|
|
|
|
## Apache2 logide formaat
|
|
|
|
Apache2 logid on formaadis:
|
|
|
|
```
|
|
IP-address - username [day/month/year:hour:minute:second +timzone] "VERB /path/to/page HTTP/1.1" status size "referrer" "user-agent"
|
|
```
|
|
|
|
Näidised:
|
|
|
|
```
|
|
87.250.224.52 - - [11/Jun/2018:18:40:16 +0300] "GET /~lvosandi/ HTTP/1.1" 200 640 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
|
54.36.148.150 - - [04/Apr/2018:09:13:30 +0300] "GET /~lvosandi/pics/lauri.jpeg HTTP/1.1" 404 465 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)"
|
|
66.249.66.213 - - [13/Mar/2018:18:18:06 +0200] "GET /~lvosandi/check.html HTTP/1.1" 200 831 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
|
|
```
|
|
|
|
Viimane GET päring aadressile http://enos.itcollege.ee/~lvosandi/check.html tehti aadressilt 66.249.66.213
|
|
kasutades Nexus 5X telefoni veebilehitsjeat 13. märtsil 18:18:06.
|
|
HTTP vastuse suurus oli 831 baiti ning kood oli 200 OK.
|
|
|
|
## Logide näidised
|
|
|
|
Laadi alla mõned näidised ja paki need lahti:
|
|
|
|
```bash
|
|
wget https://media.k-space.ee/apache2.tar
|
|
ionice nice tar xvf apache2.tar # Lahti pakkimine madala CPU ja IO prioriteediga
|
|
cd apache2
|
|
```
|
|
|
|
## Bashi käskude näidised
|
|
|
|
Leiame esimesed kümme GET päringut apache logifailist:
|
|
|
|
```bash
|
|
cat access.log | grep GET | head -n3
|
|
```
|
|
|
|
Esimene tulp kus on tavaliselt IP aadressid, kus tulbad tühikutega eraldatud:
|
|
|
|
```bash
|
|
cat access.log | grep GET | cut -d " " -f 1 | head
|
|
cat access.log | grep GET | awk '{ print $1 }' | head
|
|
```
|
|
|
|
Natuke keerukam näide awk abil, kus GET sõnet otsitakse konkreetsest tulbast mis on kahekordsete ülakomadega eraldatud:
|
|
|
|
```bash
|
|
cat access.log | awk -F\" '{ if ($2 ~ "^GET ") print $1 }' | awk '{ print $1 }' | head
|
|
```
|
|
|
|
Teisest tulbast kuni lõpuni välja:
|
|
|
|
```bash
|
|
cat access.log | grep GET | cut -d " " -f 2- | head
|
|
```
|
|
|
|
Sorteeri IP aadressid mis esinesid logifailides:
|
|
|
|
```bash
|
|
cat access.log | grep GET | cut -d " " -f 1 | sort
|
|
```
|
|
|
|
Loenda kokku ühesuguste IP-dega read:
|
|
|
|
```bash
|
|
cat access.log | grep GET | cut -d " " -f 1 | sort | uniq -c
|
|
```
|
|
|
|
Top 10 IP aadressid kust HTTP päringud tulid:
|
|
|
|
```bash
|
|
cat access.log | grep GET | cut -d " " -f 1 | sort | uniq -c | sort -n -r | head
|
|
```
|
|
|
|
Top 10 IP aadressid kahest kokku pakitud failist:
|
|
|
|
```bash
|
|
zcat access.log.1.gz access.log.2.gz \
|
|
| grep GET \
|
|
| cut -d " " -f 1 \
|
|
| sort \
|
|
| uniq -c \
|
|
| sort -n -r \
|
|
| head
|
|
```
|
|
|
|
Mitmest (pakitud) logifailist lugemine, asjakohaste ridade välja filtreerimine, esimese tulba välja lõikamine, sorteerimine deduplitseerimiseks, ridade loendamine, vastete arvu järgi sorteerimine ning lõpuks top 10 kuvamine:
|
|
|
|
```bash
|
|
(cat access.log; zcat access.log.1.gz access.log.2.gz) \
|
|
| grep GET \
|
|
| cut -d " " -f 1 \
|
|
| sort \
|
|
| uniq -c \
|
|
| sort -n -r \
|
|
| head
|
|
```
|
|
|
|
Enim külastatud URL veebiserveris:
|
|
|
|
```bash
|
|
cat access.log \
|
|
| awk -F '"' '{ print $2}' \
|
|
| cut -d ' ' -f 2- \
|
|
| sort \
|
|
| uniq -c \
|
|
| sort -n -r \
|
|
| head
|
|
```
|
|
|
|
Kõige enam sisu postitanud veebilehitsejad:
|
|
|
|
```bash
|
|
cat access.log \
|
|
| awk -F '"' '{ if ($2 ~ "^POST ") print $6}' \
|
|
| sort \
|
|
| uniq -c \
|
|
| sort -n -r \
|
|
| head
|
|
```
|
|
|
|
Veateate 4xx põhjustanud päringute leidmine:
|
|
|
|
```bash
|
|
(cat access.log; zcat access.log*.gz) \
|
|
| awk -F \" '{ if ($3 ~ "^ 4[0-9][0-9] ") print $0}'
|
|
```
|
|
|
|
Kokkupakitud logifailide originaalsuuruse tuvastamine pv ning wc abil:
|
|
|
|
```bash
|
|
zcat *.gz | pv > /dev/null
|
|
zcat *.gz | wc -c
|
|
```
|
|
|
|
Numbrite vormindamine:
|
|
|
|
```bash
|
|
SUURUS=$(zcat *.gz | wc -c)
|
|
echo $SUURUS | numfmt --to=iec
|
|
echo $SUURUS / 1024 / 1024 | bc
|
|
expr $SUURUS / 1024 / 1024
|
|
echo "Logifailide originaalsuurus: $(echo $SUURUS | numfmt --to=iec-i)B"
|
|
echo "Logifailide originaalsuurus: $(expr $SUURUS / 1048576) MiB"
|
|
```
|