Weeknote — 2021-01-25

Digital land sprint notes

A longer sprint which included the Christmas break. Our main goal was to work with our pilot local authorities to publish their development plans and policies as data, linked to other data collected from other sources.

Development plan documents

We continued work started in the last sprint to collect data about development plans.

We’re indebted to a number of busy people in local authorities, particularly Emily in Buckinghamshire who has not only curated and categorised almost 1,000 development policies from the four local authorities which became the Unitary Authority last year. Representing the policy areas as data, rather than documents helps services such as RIPA to ask fewer, and more relevant questions of their users when making a planning application.

The work we’ve recently put into our page templates meant Jon was able to quickly take Buckinghamshire’s data to create a web page for each development policy. Emily also provided geospatial data for development policy areas which aren’t the local authority district or parish boundary. Whilst these pages are currently terse, they’re an important tool for exploring and reviewing the data, and will become more interesting as we link them to other data, and cite their text found in local-plan and other development-plan documents.

Aston Clinton neighbourhood plan

Service design

Loïc interviewed Emily and others about their experience making data to our specifications, and used their insights to create a user experience map. Mapping highlights parts of the process which users need the most help with and helps us to prioritise our work. The current map highlights the importance of feedback, on the shape and contents of the data, helping a publisher know they’ve met a data standard. This map would normally become a living artifact on the team’s wall, a constant reminder of where our users need more help. Now we’re working remotely we’re looking at ways of keeping the map visible, relevant and upto date.

Brownfield land

The end of the year is when many local planning authorities update their brownfield registers. Kishan and Colm quickly updated our brownfield land validator. Rather than simply reporting a potentially long list of errors, we process and present the data back to the user as a table, and on a map.

Once the data has been checked by the user, a valid CSV file can be downloaded to be published on their website.

Whilst brownfield land data isn’t our current top-priority, it continues to be a great place from which to learn the tools and practices needed to support local planning authorities publishing new datasets which we are now developing.

Check your brownfield-land register

Colm continued work to improve our national map of brownfield land data, linking each site to a page-per brownfield land site.

A page per-brownfield site

Alerting and operational dashboards

We heard from users that our new brownfield land validator was often unavailable. We discovered this was because we had exceeded the quotas from our hosting service. We’ve given the service more resources, and improved how the application wakes up after not being used which we hope has resolved the issue.

To reduce the risk of this happening again, Kishan has created a “smoke test” which runs every 10 minutes and alerts the team on Slack if the service is unavailable, along with a simple dashboard to help us monitor the growing number of GitHub actions we run each night to update the site.

Article 4 directions

We continued to explore how we should represent planning policies as data. Article 4 directions are simple in concept: an area to which permitted development is restricted. However often the designated area has more complicated geography than a simple boundary. In some cases buildings are referred to in the designation, in other cases it’s the class of building to which the direction applies: “all pubs in the town centre”. Both of these are harder than one would hope to turn into trustworthy data. Old text addresses are often hard to match to a UPRN, and we don’t yet have the use-classes for buildings which are found in planning decisions.

Whilst these issues make it much harder to process existing and historical information, they’re proving to be very useful examples for our work improving how the department develops new policy and data standards.

Unique identifiers

We’re collecting data and information about a large number of different things: organisations, conservation areas, development policies, local plans, etc.

If we have a unique identifier for each thing, we can use it to make a URL for a page for each thing.

A page per-thing helps us link the thing to other data and information about the thing found elsewhere. It’s also a place for us to show the history of changes to a thing, when a local authority has become a part of a unitary authority, or where a conservation area has changed its designated area for example.

One of our biggest challenges is not all of the data we’re collecting contains identifiers. Where they do exist, they’re not used consistently and they’re not unique. It’s hard for us to match data where a different identifier is used for a thing in different places, or distinguish data about things when the same name or reference number is used for different things in different places.

It’s hard for us to resolve such issues automatically, so we have been developing a number of different methods for constructing, matching and patching the data we collect in our pipeline process. This is essential to ensure consistency, and so we don’t lose any changes we manually apply to data.

This sprint we begun to review and rationalise the different methods we’ve used with to create unique identifiers:

  • Some data sources have a key field containing an identifier we can rely upon to be unique. For example the GOV.UK local authority register has an identifier such as “DAC” for Dacorum Borough Council.
  • When we combine data from different registers, we can add a prefix to make a CURIE identifier which is world unique. For example a list of organisations may include “local-authority-eng:DAC” for a local authority and “government-organisation:D4” for a government department.
  • Should a data source contain a human-friendly reference field, we can use it as the basis of the key field. Sometimes a reference may be written in different ways, such as “BA/234432 (1996)” and “BA/234432/1996”, in such cases we can convert them into a consistent shape: “BA-234432-1996”.
  • A number of different local authorities use “BF001” as the reference for a brownfield site, so we need a way of adding the publisher as a namespace. We could use the local authority code as a prefix as we did for the CURIE, but unfortunately a CURIE can only contain one prefix. In such cases we’re experimenting with using the path part of the URL (the slug) as our unique identifier, for example: /brownfield-land/local-authority-eng/MDE/MDDC-BF001/. This approach is also problematic where the data is published jointly by a number of different organisations.
  • Finally, if no identifier exists, we can create one. This is our least favourite option as it’s hard to reliably assign a thing the same identifier each time we process the dataset, so we’ve been experimenting with generating a hash, or fingerprint from the data. Unfortunately these numbers are quite long, and can alienate people not used to working with data, ie “8224e195-a24e-4e10-a1f0-e40721f2e250” isn’t as friendly as “CA001”.

This is a hard-problem, and is a topic we’ll continue to explore.

As well as continuing to develop these techniques, and test them with our users and data, we’ve also been rationalising how we engage these approaches in our pipeline process.

Data architecture

We spent some time this sprint reflecting on how we create a dataset from individual resources collected from local authorities. Currently this is simply a unique sort of rows of data, and is the source of duplicate, and potentially lost data issues.

Paul made a sketch of how we can use an Entity-Attribute-Value (EAV) model, or “triples” as a data store. This approach should improve the reliability of our processing, but also means we can trace the origin of a fact back to where we collected it from. A generic data store means we can more easily build different data packages for users of the data.

Different models

Local Digital projects

We continued working closely with BOPS , RIPA, and Camden’s planning notice project. The RIPA team are helping us fine tune our data model for Article 4 Directions and Conservation Areas, and the BOPS team are informing our thinking on a potential standard for planning decisions.

PropTech event

Jess convened a PropTech showcase event intended to help policymakers gain new insights and further understand the art of the possible from digital planning. More than 100 attendees from MHCLG and public sector organisations including Homes England, PINS and planners working in Scotland, Wales, and Northern Ireland saw a series of rapid demonstrations and presentations of digital planning pilot projects. We hope everyone enjoyed it as much as we did.

Screenshot of the PropTech event video call

Planning policy sprints

As always it’s hard to talk about the policy work of the team. Rishi, Euan and Jess continue to work closely with May-N, Lawrence and colleagues from the planning directorate setting out options for delivering digital planning, and help inform the decisions needed following the planning consultation.

Recruitment

Quite a bit of the team’s time and energy has been spent this sprint on recruitment. We advertised a number of roles in December, and have been interviewing for a Policy lead, Product Manager, Interaction Designer and Content Designer. We’re grateful for people taking their time to apply and be interviewed, and looking forward to several new people joining us soon!