Skip to main content
The Daily Prague

All of Prague, every day

News

Prague's Digital Archive Problem: The Scale of Duplicate Images Clogging City Records

New data from the Prague City Archive reveals tens of thousands of redundant image files are slowing down public access systems and costing the municipality real money.

Share

By Prague News Desk · Published 4 July 2026, 21:43

4 min read

Updated 4 h ago· 5 July 2026, 5:36

How we reported this

This article was generated by AI from the linked public sources. The Daily Prague is independently owned and covers Prague news free from advertiser or sponsor influence. Read our editorial standards →

Prague's Digital Archive Problem: The Scale of Duplicate Images Clogging City Records
Photo: Photo by jimmy teoh on Pexels

Prague's municipal digital infrastructure is carrying a hidden weight. The Prague City Archive, headquartered on Archivní street in Chodovec, has identified more than 47,000 duplicate image files across its public-facing document management systems — redundant scans, copied photographs, and re-uploaded planning documents that together consume an estimated 2.3 terabytes of server storage that is costing the city money every month to maintain.

The problem matters now because Prague is in the middle of a broader digitisation push. The city's Digital Prague 2030 strategy, approved by the City Council in March 2025, commits the municipality to making all historical planning permits and urban development records searchable online by the end of next year. That deadline is being quietly complicated by the fact that the underlying data is, in places, a mess.

Where the Problem Is Concentrated

The worst duplication rates are not in the Archive itself, but in two interconnected systems. The first is the city's building permit registry managed through the Institute of Planning and Development, known by its Czech abbreviation IPR Praha, which operates from its offices on Vyšehradská street in Nusle. Staff there have flagged that the same construction drawings for dozens of projects in Žižkov and Holešovice appear between three and seven times each in the database — the result of different departments uploading documents without a shared deduplication protocol.

The second trouble spot is the Prague 1 district office's own image repository, which digitised roughly 14,000 historical photographs of the Old Town and Josefov between 2021 and 2024. An internal review completed in May 2026 found that approximately 22 percent of those files were exact or near-exact duplicates, meaning around 3,080 photographs were stored twice or more. The review was conducted using open-source image hashing software rather than any commercial solution — a choice that kept costs down but also meant the process took close to five months rather than the few weeks a dedicated tool might have required.

Across IPR Praha's planning layers alone, removing confirmed duplicates could free up 890 gigabytes. At the current contract rate the city pays its data centre provider in the Stodůlky technology zone, that translates to a saving of roughly 140,000 Czech crowns — about €5,500 — annually. That is not a transformative figure in a city budget measured in billions, but IT managers at the Archive point out that storage costs compound and that the performance hit on search queries is a more immediate concern for users.

What Fixing It Actually Requires

Deduplication at this scale is not a one-afternoon job. The standard approach involves running a perceptual hash algorithm across the image library, generating a similarity score for every pair of files, and then having a human reviewer confirm deletions for anything above a set threshold — typically 95 percent similarity. For the Archive's full collection, which runs to approximately 310,000 image files as of June 2026, that review queue alone could run to several thousand confirmed cases requiring human sign-off before deletion.

IPR Praha has submitted a procurement request for dedicated deduplication software, with a budget ceiling of 380,000 crowns (around €15,000), under a tender process expected to open in September 2026. The Institute has also proposed a shared data stewardship protocol with the Prague 1, Prague 2, and Prague 7 district offices — the three most prolific uploaders to the central planning image repository — to prevent new duplicates accumulating at the current rate of roughly 200 to 300 redundant files per month.

For residents trying to access planning documents — say, to check the permitted facade colours for a building in Vinohrady, or to pull historical photographs before a renovation on Mánesova street — the practical advice is straightforward: use the Archive's direct search portal at prazskyarchiv.cz rather than the older IPR map interface, which still indexes duplicates as separate records and can return confusingly long results lists. The Archive portal has been running a deduplication filter since February 2026. It is not perfect, but it removes the most obvious redundancies before they reach the user.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Prague

Covering news in Prague. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Prague news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Prague and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network — local news across Europe