HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation

Hari Krishna Gadi

ICLR 2026 · International Conference on Learning Representations

HierLoc: Hyperbolic Entity Embeddings
for Hierarchical Visual Geolocation

Hari Krishna Gadi^1,2, Daniel Matos¹, Hongyi Luo^1,2, Lu Liu¹, Yongliang Wang¹, Yanfeng Zhang¹, Liqiu Meng²

¹ Huawei Riemann Lab · ² Technical University of Munich

arXiv

0% Error Reduction

0% City Accuracy ↑

0× Inference Speedup

0k Entity Embeddings

Overview

HierLoc in 60 Seconds

Abstract

A New Geometry
for the Globe

Visual geolocalization — predicting where an image was taken — remains challenging due to global scale, visual ambiguity, and the inherently hierarchical structure of geography. Existing paradigms rely on either large-scale retrieval, which requires storing a large number of image embeddings, grid-based classifiers that ignore geographic continuity, or generative models that struggle with fine detail.

We introduce an entity-centric formulation of geolocation that replaces image-to-image retrieval with a compact hierarchy of geographic entities embedded in Hyperbolic space. Images are aligned to country, region, subregion, and city entities through Geo-Weighted Hyperbolic contrastive learning, directly incorporating haversine distance into the contrastive objective.

This hierarchical design enables interpretable predictions and efficient inference with 240k entity embeddings instead of over 5 million image embeddings on the OSV5M benchmark, reducing mean geodesic error by 19.5% while improving fine-grained subregion accuracy by 43%.

Figures & Results

Key Findings

HierLoc Architecture: Images are encoded and mapped with exp₀ into the Lorentz model of Hyperbolic space, while entities (countries, regions, subregions, cities) combine image, text, and location features. In tangent space at origin, cross-modal attention aligns each image with entities per hierarchy level. Training employs Geo-Weighted Hyperbolic InfoNCE (GWH-InfoNCE) loss.

Hyperbolic Projections: The exp₀ and log₀ functions enable bidirectional mapping between tangent space at origin O and hyperbolic space, preserving hierarchical tree-like relationships critical for geographic entity representation in Lorentz model (K = −1 curvature).

Geographic Error Distribution: Mean error visualization across global regions on OSV5M dataset. HierLoc achieves median localization error of 25.3 km, outperforming previous state-of-the-art methods with 19.5% relative improvement in mean geodesic error.

Table 1 OSV5M Results

HierLoc achieves state-of-the-art on OSV5M with 25.3 km median error.

Method	GeoScore ↑	Dist. (km) ↓	Country	Region	Subregion	City
SC 0-shot	2273	2854	38.4	20.8	9.9	14.8
Regression	3028	1481	56.5	16.3	1.5	0.7
ISNs	3331	2308	66.8	39.4	—	4.2
Hybrid	3361	1814	68.0	39.4	10.3	5.9
SC Retrieval	3597	1386	73.4	45.8	28.4	19.9
RFM S₂	3767	1069	76.2	44.2	—	5.4
LocDiff	—	—	77.0	46.3	—	11.0
HierLoc (ViT-L/14)	3850	1067	80.1	52.9	39.0	22.2
HierLoc (DINOv3)	3963	861	82.9	55.0	40.7	23.3

Table 2 Benchmark Performance

Cross-dataset evaluation on IM2GPS, IM2GPS3K, and YFCC4K.

Method	Median ↓	1 km ↑	25 km ↑	200 km ↑	750 km ↑	2500 km ↑
IM2GPS
PlaNet	>200	8.4	24.5	37.6	53.6	71.3
PIGEON	70.5	14.8	40.9	63.3	82.3	91.1
HierLoc	21.4	10.5	51.9	67.5	83.1	92.4
IM2GPS3K
PIGEON	147.3	11.3	36.7	53.8	72.4	85.3
Img2Loc	—	17.1	45.1	57.8	72.9	84.6
HierLoc	73.4	11.3	43.8	58.4	74.1	85.1
YFCC4K
PIGEON	383.0	10.4	23.7	40.6	62.2	77.7
Img2Loc	—	14.1	29.5	41.4	59.2	76.8
HierLoc	341.9	8.4	30.2	43.3	61.7	75.8

Inference Strategies: Beam search comparison showing HierLoc achieves 97% of full-search accuracy using less than 10% search time. At beam width=5, HierLoc achieves 135 km error vs 433 km error for SC retrieval with 240k image searches per query.

Beam Width Optimization: Accuracies and search time for different beam widths. Narrow beam (width=10) achieves near-optimal accuracy (98.7% of best) with 95% reduction in search time, demonstrating efficient hierarchical search in hyperbolic space.

Computational Efficiency: Inference latency vs accuracy on same GPU. HierLoc provides 6.56× speedup over RFM S₂ while using 10× fewer training images (4.7M vs 48M). Scales efficiently to billions of query images with 0.64 ms per image on A100 GPU.

Explainer

Methodology Explained

Citation

BibTeX

@misc{gadi2026hierlochyperbolicentityembeddings,
      title={HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation},
      author={Hari Krishna Gadi and Daniel Matos and Hongyi Luo and Lu Liu
              and Yongliang Wang and Yanfeng Zhang and Liqiu Meng},
      year={2026},
      eprint={2601.23064},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.23064},
}

HierLoc: Hyperbolic Entity Embeddingsfor Hierarchical Visual Geolocation

HierLoc in 60 Seconds

A New Geometryfor the Globe

Key Findings

Hyperbolic Projections: The exp₀ and log₀ functions enable bidirectional mapping between tangent space at origin O and hyperbolic space, preserving hierarchical tree-like relationships critical for geographic entity representation in Lorentz model (K = −1 curvature).

Geographic Error Distribution: Mean error visualization across global regions on OSV5M dataset. HierLoc achieves median localization error of 25.3 km, outperforming previous state-of-the-art methods with 19.5% relative improvement in mean geodesic error.

Inference Strategies: Beam search comparison showing HierLoc achieves 97% of full-search accuracy using less than 10% search time. At beam width=5, HierLoc achieves 135 km error vs 433 km error for SC retrieval with 240k image searches per query.

Beam Width Optimization: Accuracies and search time for different beam widths. Narrow beam (width=10) achieves near-optimal accuracy (98.7% of best) with 95% reduction in search time, demonstrating efficient hierarchical search in hyperbolic space.

Computational Efficiency: Inference latency vs accuracy on same GPU. HierLoc provides 6.56× speedup over RFM S₂ while using 10× fewer training images (4.7M vs 48M). Scales efficiently to billions of query images with 0.64 ms per image on A100 GPU.

Methodology Explained

BibTeX

BibTeX

HierLoc: Hyperbolic Entity Embeddings
for Hierarchical Visual Geolocation

A New Geometry
for the Globe