This notebook presents a systematic movement data exploration workflow. The proposed workflow consists of five main steps:
The workflow is demonstrated using horse collar tracking data provided by Prof. Lene Fischer (University of Copenhagen) and the Center for Technology & Environment of Guldborgsund Municiplaity in Denmark but should be generic enough to be applied to other tracking datasets.
The workflow is implemented in Python using Pandas, GeoPandas, and MovingPandas (http://movingpandas.org).
For an interactive version of this notebook visit https://mybinder.org/v2/gh/anitagraser/movingpandas/master.
%matplotlib inline
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import urllib
import os
import numpy as np
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame, read_file
from datetime import datetime, timedelta
from pyproj import CRS
import sys
sys.path.append("..")
import movingpandas as mpd
import warnings
warnings.simplefilter("ignore")
import hvplot.pandas # seems to be necessary for the following import to work
from holoviews import opts
opts.defaults(opts.Overlay(active_tools=['wheel_zoom']))
df = read_file('data/demodata_horse_collar.gpkg')
df['t'] = pd.to_datetime(df['timestamp'])
df = df.set_index('t').tz_localize(None)
print("This dataset contains {} records.\nThe first lines are:".format(len(df)))
df.head()
df.columns
df = df.drop(columns=['LMT_Date', 'LMT_Time',
'Origin', 'SCTS_Date', 'SCTS_Time', 'Latitude [?]', 'Longitude [?]',
'FixType', 'Main [V]', 'Beacon [V]', 'Sats', 'Sat',
'C/N', 'Sat_1', 'C/N_1', 'Sat_2', 'C/N_2', 'Sat_3', 'C/N_3', 'Sat_4',
'C/N_4', 'Sat_5', 'C/N_5', 'Sat_6', 'C/N_6', 'Sat_7', 'C/N_7', 'Sat_8',
'C/N_8', 'Sat_9', 'C/N_9', 'Sat_10', 'C/N_10', 'Sat_11', 'C/N_11',
'Easting', 'Northing',], axis=1)
df.head()
collar_id = df['CollarID'].unique()[0]
print("There is only one collar with ID {}.".format(collar_id))
df['Activity'].unique()
original_crs = df.crs
original_crs
The first step in our proposed EDA workflow can be performed directly on raw input data since it does not require temporally ordered data. It is therefore suitable as a first exploratory step when dealing with new data.
df.to_crs({'init': 'epsg:4326'}).hvplot(title='Geographic extent of the dataset', geo=True, tiles='OSM', width=500, height=500)
The main area (the horste's pasture?) is located south of Nykobing Strandhuse.
However, we find also find two records on the road north west of the main area. Both points have been recorded on 2018-11-14 which is the first day of the dataset.
pd.DataFrame(df).sort_values('lat').tail(2)
A potential hypothesis for the origin of these two records is that the horse (or the collar) was transported on 2018-11-14, taking the road from Nykobing Falster south to the pasture.
If we remove these first two records from the dataset, the remainder of the records are located in a small area:
df = df[2:].to_crs({'init': 'epsg:4326'})
( df.hvplot(title='OSM showing paths and fences', size=2, geo=True, tiles='OSM', width=500, height=500) +
df.hvplot(title='Imagery showing land cover details', size=2, color='red', geo=True, tiles='EsriImagery', width=500, height=500) )
It looks like the horse generally avoids areas without green vegetation since point patterns in these area appear more sparse than in other areas.
temp = df.to_crs(CRS(25832))
temp['geometry'] = temp['geometry'].buffer(5)
total_area = temp.dissolve(by='CollarID').area
total_area = total_area[collar_id]/10000
print('The total area covered by the data is: {:,.2f} ha'.format(total_area))
print("The dataset covers the time between {} and {}.".format(df.index.min(), df.index.max()))
print("That's {}".format(df.index.max() - df.index.min()))
df['No'].resample('1d').count().hvplot(title='Number of records per day')
On most days there are 48 (+/- 1) records per day. However, there are some days with more records (in Nov 2018 and later between Mai and August 2019).
There is one gap: On 2019-10-18 there are no records in the dataset and the previous day only contains 37 and the following day 27 records.
Considering that the dataset covers a whole year, it may be worthwhile to look at the individual months using small multiples map plots, for example:
df['Y-M'] = df.index.to_period('M')
a = None
for i in df['Y-M'].unique():
plot = df[df['Y-M']==i].hvplot(title=str(i), size=2, geo=True, tiles='OSM', width=300, height=300)
if a: a = a + plot
else: a = plot
a
The largest change between months seems to be that the southernmost part of the pasture wasn't used in August and September 2019.
The second exploration step puts movement records in their temporal and geographic context. The exploration includes information based on consecutive movement data records, such as time between records (sampling intervals), speed, and direction. Therefore, this step requires temporally ordered data.
For example, tracking data of migratory animals is expected to exhibit seasonal changes. Such changes in vehicle tracking systems however may indicate issues with data collection .
t = df.reset_index().t
df = df.assign(delta_t=t.diff().values)
df['delta_t'] = df['delta_t'].dt.total_seconds()/60
pd.DataFrame(df).hvplot.hist('delta_t', title='Histogram of intervals between consecutive records (in minutes)', bins=60, bin_range=(0, 60))
The time delta between consecutive records is usually around 30 minutes.
However, it seems that sometimes the intererval has been decreased to around 15 minutes. This would explain why some days have more than the usual 48 records.
For example: Does the data contain unattainable speeds?
tc = mpd.TrajectoryCollection(df, 'CollarID')
traj = tc.trajectories[0]
traj.add_speed()
max_speed = traj.df.speed.max()
print("The highest computed speed is {:,.2f} m/s ({:,.2f} km/h)".format(max_speed, max_speed*3600/1000))
pd.DataFrame(traj.df).hvplot.hist('speed', title='Histogram of speeds (in meters per second)', bins=90)
The speed distribution shows no surprising patterns.
traj.add_direction(overwrite=True)
pd.DataFrame(traj.df).hvplot.hist('direction', title='Histogram of directions', bins=90)
There is some variation in movement directions but no directions stand out in the histogram.
Let's look at spatial patterns of direction and speed!
For example: Do nocturnal animal tracks show movement at night?
pd.DataFrame(traj.df).hvplot.heatmap(title='Mean speed by hour of day and month of year',
x='t.hour', y='t.month', C='speed', reduce_function=np.mean)
The movement speed by hour of day shows a clear pattern throughout the year with earlier and longer fast movements during the summer months and later and slower movements during the winter months.
In addition to time, the dataset also contains temperature information for each record:
traj.df['n'] = 1
pd.DataFrame(traj.df).hvplot.heatmap(title='Record count by temperature and month of year',
x='Temp [?C]', y='t.month', C='n', reduce_function=np.sum)
pd.DataFrame(traj.df).hvplot.heatmap(title='Mean speed by temperature and month of year',
x='Temp [?C]', y='t.month', C='speed', reduce_function=np.mean)
For example: Do vessels follow traffic separation schemes defined in maritime maps? Are there any ship trajectories crossing land?
traj.df['dir_class'] = ((traj.df['direction']-22.5)/45).round(0)
a = None
temp = traj.df
for i in sorted(temp['dir_class'].unique()):
plot = temp[temp['dir_class']==i].hvplot(geo=True, tiles='OSM', size=2, width=300, height=300, title=str(int(i*45))+"°")
if a: a = a + plot
else: a = plot
a
There are no obvious spatial movement direction patterns.
traj.df['speed_class'] = (traj.df['speed']*2).round(1)
a = None
temp = traj.df
for i in sorted(temp['speed_class'].unique()):
filtered = temp[temp['speed_class']==i]
if len(filtered) < 10:
continue
plot = filtered.hvplot(geo=True, tiles='EsriImagery', color='red', size=2, width=300, height=300, title=str(i/2)) # alpha=max(0.05, 50/len(filtered)),
if a: a = a + plot
else: a = plot
a
Low speed records (classes 0.0 and 0.05 m/s) are distributed over the whole area with many points on the outline (fence?) of the area.
Medium speed records (classes 0.1 and 0.15 m/s) seem to be more common along paths and channels.
The third exploration step looks at individual trajectories. It therefore requires that the continuous tracks are split into individual trajectories. Analysis results depend on how the continuous streams are divided into trajectories, locations, and events.
tc.hvplot()