Fractal dimensions or how to generalize geometries in Python and how it affects data

For todays 30dayMapChallenge: Process I was processing a coastline dataset for Norway, trying to find out about the fractal dimension.

osmdata.openstreetmap.de has a nice extract of this here: https://osmdata.openstreetmap.de/data/coastlines.html, you just need to clip it for youre desired region. I was doing so by applying a buffer to the GADM dataset and then clipping by location in QGIS to get a Norway coastline line item:

First step in calculating the fractal dimensions: we need to reduce the resolution. I was doing this using a step-approach to be able to visualize differences in a GIF afterwards:

import geopandas as gpd
from shapely.geometry import LineString, MultiLineString
from shapely.ops import linemerge
import os

# -----------------------------
# CONFIGURATION
# -----------------------------
INPUT_PATH = "/Users/riccardoklinger/Documents/work/norwa_coastline_diss_25832.gpkg"  # your detailed coastline line dataset
OUTPUT_FOLDER = "simplified_steps"
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

# 10 simplification levels
tolerance_values = [
    10,
    50,
    100,
    200,
    400,
    800,
    1500,
    2500,
    4000,
    7000,
    12000,
]  # in meters

min_length_threshold = 500

# -----------------------------
# LOAD DATA
# -----------------------------
print("Loading...")
gdf = gpd.read_file(INPUT_PATH)

# Ensure projected CRS (coastline fractal analysis requires metres!)
if gdf.crs is None or gdf.crs.to_epsg() in [4326, 4258]:
    raise ValueError(
        "Dataset must be in a projected CRS (meters). Reproject before running!"
    )

# dissolve all lines into a single geometry
merged = gdf.union_all()
# linemerge gets a cleaner MultiLineString
merged = linemerge(merged)
# ensure it's a MultiLineString
if isinstance(merged, LineString):
    merged = MultiLineString([merged])

# -----------------------------
# FUNCTION: simplify + remove small components
# -----------------------------
def process_step(lines: MultiLineString, tolerance: float, min_length: float):
    simplified_parts = []
    for geom in lines.geoms:
        simp = geom.simplify(tolerance, preserve_topology=False)
        if simp.length >= min_length:
            simplified_parts.append(simp)
    if len(simplified_parts) == 0:
        print("Warning: after simplifying, all components removed!")
        return MultiLineString([])

    return MultiLineString(simplified_parts)


# -----------------------------
# ITERATE THROUGH STEPS
# -----------------------------
current = merged

for i, tol in enumerate(tolerance_values, start=1):
    print(f"Step {i}: tolerance = {tol} m")
    current = process_step(current, tol, min_length_threshold)
    out_gdf = gpd.GeoDataFrame(geometry=[current], crs=gdf.crs)
    out_path = os.path.join(OUTPUT_FOLDER, f"coast_simplified_stepa_{i:02d}.gpkg")
    # Each step stored in its own GPKG; layer name required
    layer_name = f"step_{i:02d}"
    out_gdf.to_file(out_path, layer=layer_name, driver="GPKG")
    print("Saved:", out_path)

print("Done.")

After this I got 10 nice files each with a different simplicity:

Here is a small insight to the Vesterålen, Lofot and Ofot islands with different levels of simplification:

After this I was curious about the fractal dimensions of each coast line. The fractal dimension is somewhere between 1 and 2. a curved line is not a real linear feature (dimension is 1) and not a real area (dimension 2). But we need to agree: a curved line takes some area… and that’s the fractal dimension. You can read about it on good old wikipedia.

The mathematical concept is quite good explained here. A basic implementation in python might look like this:

# ---------------------------------------------
# Box-counting fractal dimension
# ---------------------------------------------
def box_counting_dimension(geom, epsilons):
    minx, miny, maxx, maxy = geom.bounds
    Ns = []

    for eps in epsilons:
        nx = int(math.ceil((maxx - minx) / eps))
        ny = int(math.ceil((maxy - miny) / eps))

        occupied = set()

        # For each linestring: sample points along the line
        for line in geom.geoms:
            length = line.length
            n_samples = max(int(length / (eps / 2)), 2)
            for d in np.linspace(0, length, n_samples):
                x, y = line.interpolate(d).coords[0]
                ix = int((x - minx) // eps)
                iy = int((y - miny) // eps)
                occupied.add((ix, iy))

        Ns.append(len(occupied))

    # Prepare regression
    X = np.log(1 / np.array(epsilons)).reshape(-1, 1)
    y = np.log(np.array(Ns))

    model = LinearRegression().fit(X, y)
    D = model.coef_[0]

    return D, Ns

In this process the ruler/box size is the interesting factor and I was using it with ranges from 1km till 100km. The srcipt looks like this in the end:

  • loads the gpkg
  • calculate dimensions
  • adds results to a dataframe
  • print the dataframe

Feel free to download the file here.
The final result is made in QGIS and EZGIF:

0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments