Frequently Asked Questions

WHAT IS THE REGULATORY GENOME PROJECT?

The Project is led by the Cambridge Centre for Alternative Finance (CCAF) in collaboration with The Natural Language and Information Processing (NLIP) Research Group and the Cambridge Centre for Data-Driven Discovery (C2D3) at the University of Cambridge and aims to map the structure, combination and evolution of regulatory obligations across industries. Once advanced, it will pave the way to regulation that is truly machine- and human- readable: a needed capability for the transfer of value in a digital economy and a core objective of the emergent RegTech industry.

Compliance can be hugely complex, but regulators’ toolkits are not. Regulators match a finite set of process and control components to a finite set of desired outcomes to create obligations. They benchmark their regulatory frameworks against others’, copying what they find useful. Obligations that work, and fit local conditions, tend to survive and even propagate internationally.

Starting with financial services, the RGP will leverage the insights of industry experts and regulators to build a complete matrix of obligations; map a global feed of regulatory content against these; and offer this rich content feed back to the world as open data.

HOW DOES IT WORK?

At the heart of the Project is Regulatory Sequencing: a method for making regulatory text reliably machine-readable and quantifiable at low cost. It relies on experts to build a hierarchy of obligations applying to a regulated activity (e.g. payments in the case of financial services) or to a cross-cutting regulatory theme such as data protection. It then uses natural language processing (NLP) and machine learning to train classification models that can identify references to those obligations in previously unknown text. The models can then be used to enrich incoming regulations at a granular level, with tags representing the obligations detected in the text. Applications that recognise the classification models can then execute queries against any text, e.g. via an API.

WHAT IS THE IMPACT OF THE PROJECT?

We have initially applied Regulatory Sequencing to applications in regulatory benchmarking – by powering truly like-for-like comparisons of regulatory regimes it allows regulators to quickly identify good practices and make an informed case for regulatory convergence (or divergence). Research recently undertaken by the CCAF in collaboration with the World Bank suggests that benchmarking is used in over 90% of regulatory reviews for alternative finance, and has helped trigger over half of all changes in alternative finance regulation. Yet this crucial process is still costly, slow, and usually limited in scope to a few well-studied jurisdictions.

In the long-run, the Project will unlock downstream industry and regulatory innovation by providing a realistic route to machine-readable regulation that can be open to all organisations. By giving all parties a common language and common, near-free data relating to regulatory obligations, it will spur the development of downstream third-party applications that might otherwise take years to develop. These can in turn be used to reveal trends and patterns in the evolution of global regulation, power prompt regulatory impact assessments or the creation of interactive rulebooks.

IS THIS A PAPER EXERCISE, OR HAS ANY OF IT ACTUALLY BEEN BUILT?

In 2018, CCAF worked with Omidyar Network Fund (now Flourish Ventures) to fund and develop RegSimple, a prototype Regulatory Intelligence platform utilising NLP and machine learning. A pilot led up to the deployment of a limited, alpha version in September 2018. That alpha platform is live and available to test today.

Building on the success of this pilot and CCAF’s research into industry and regulatory use cases, the CCAF was awarded funding by the UK Foreign Commonwealth Development Office order to develop a complete sequencing of the core alternative finance and fintech activities; and to build a complete application that will demonstrate the potential of the Regulatory Genome Project by mid-2021.

WHY IS THE UNIVERSITY WELL-PLACED TO LEAD THIS?

Adopting shared standards of any kind comes with non-trivial compliance risks, and these can only be managed if the standards can be reconciled with those employed by regulators. CCAF research suggests that some regulators are willing to commit human and reputational capital to the development of public goods, however, engaging regulators is practically impossible for strictly commercial applications. The aim of enabling access to the data set at near-zero cost and developing it as a public good by a non-profit academic institution operating in the public interest makes an important difference. Extensive collaboration with many of the world’s regulators through our research and capacity building programmes combined with thought-leading data science in the University’s academically-renowned Computer Lab and Cambridge’s global convening power comprise a unique set of building blocks to attract and build confidence amongst the critically important stakeholder groups contributing to the Project.

IS THERE A PLAN TO ENGAGE INDUSTRY?

Yes – feedback from subject matter experts, including compliance professionals in industry and law or advisory firms, is crucial to maintaining a high-quality taxonomy of regulatory obligations. The CCAF has, and will continue to develop, a global network of regulation experts via its ongoing research activity. The current roadmap for CCAF’s regulatory intelligence platform provides for mass, near-free access, as a means of evaluating and recruiting promising experts. However, the CCAF also seeks much more structured input from industry into the Project, in the form of an international Industry Consortium focusing on financial services regulation, selected in the first instance for their commitment to machine-readable regulation.

Menu