Understanding Aadhaar: Data Security
Data Access
The core Aadhaar infrastructure is called the Central Information Data Repository (CIDR). There are two main purposes for which data is accessed from the CIDR: Authentication & e-KYC.
For an explanation on the different entities and their relationships within the Aadhaar ecosystem, please see the section on Aadhaar entities.
Authentication
A vendor can use Aadhaar authentication to verify you before offering services. This process utilizes your Aadhaar number and either your OTP or biometrics as a second factor to authenticate you. Vendors who use Aadhaar authentication services are authorized by the UIDAI as Authorized User Agencies (AUAs). Each AUA must use an Authorized Service Agency (ASA) — the only entities allowed to connect to the CIDR. The authentication process is explained in more detail here.
In response, the CIDR returns a digitally signed ‘Yes/No’ depending on the success of the authentication request.
Electronic Know Your Customer (e-KYC)
A vendor can use Aadhaar for the purpose of instant electronic KYC. Just like authentication entities, e-KYC entities must be authorized by the UIDAI, and utilize KSAs (ASAs with KYC permissions) — the only entities allowed to use the e-KYC connection endpoints.
In response, the CIDR returns a digitally signed and encrypted demographic record (name, gender, address, etc) and photograph block.[1] In January 2018, a ‘limited KYC’ was introduced where only need-based details would be shared with the authorised agencies.
APIs
Application Programming Interfaces (APIs) are the industry-standard method to connect to systems. They allow the issuing entity (in this case, the UIDAI) to maintain control over what information — and in what manner — is shared with the requestor. APIs adds a layer of encapsulation by limiting what is available to the ‘outside’ via targeted requests, thus not requiring any direct access to a data store or database (DB).
For example, let’s say you wish to know how many steps I’ve walked on a day, which I log in a DB. Rather than give you direct access to my DB—which also contains other data which you don’t require, nor do I want to share—I give you an API request endpoint, validated by a key:
[me.example/getSteps](https://me.example/getSteps?authKey=45aedlfkja359874425)</code>
which returns:
{ “steps”: 372 }
Similarly, the Aadhaar authentication service is done over a URL endpoint:[2]
https://<host>/<version>/<AUA code>/<uid digit 0>/<uid digit 1>/<ASA license key>
A separate API is used for eKYC or enrolment. The production server hosts are only shared with the ASAs and are not available online. As such, the API endpoints are not accessible over the internet.
The request format for an authentication request is as follows[3]:
<Auth uid=”” rc=”” tid=”” ac=”” sa=”” ver=”” txn=”” lk=””>
<Uses pi=”” pa=”” pfa=”” bio=”” bt=”” pin=”” otp=””/>
<Meta udc=”” rdsId=”” rdsVer=”” dpId=”” dc=”” mi=”” mc=”” />
<Skey ci=””>encrypted and encoded session key</Skey>
<Hmac>SHA-256 Hash of Pid block, encrypted and then encoded</Hmac>
<Data type=”X|P”>encrypted PID block</Data>
<Signature>Digital signature of AUA</Signature> </Auth>
The request only shares meta-data comprising the device information, AUA signature, etc. and the encrypted PID block. Information such as details of the service being provided, or any details about the holder do not form part of the request. Even if a service provider were to pass such information — and it would be illegal to do so — the only blocks and variables recognized by the CIDR are the ones in the above request.
PID Block
When a registered device is called by the application, it captures, processes and encodes the digitally signed biometric record.[4]
Bh=SHA-256(bio_record)
Be=DSA(Bh+timestamp+unique_device_code,device_private_key)
This is used to finally create the signed & encoded biometric record:
Bs=base64(Be)
A Device Info Hash (dih) is also included as part of the PID block:
dih = SHA-256(device_provider_id+device_service_id+device_version+unique_device_code+model_id+idHash)
The idHash
is an encrypted device ID which must match the hash shared with the CIDR when the device was registered. A separate wadh
(wrapper API data hash) is also used for eKYC transactions.
In this way, the PID block is created and encrypted on the registered device itself and the process is independent of the AUA application. The biometric record is encrypted using a 256-bit symmetric encryption session key (AES/GCM/No padding). The one-time use session key is then encrypted asymmetrically (RSA/ECB/PKCS1Padding) with a 2048-bit UIDAI-issued public key, making the CIDR the sole entity that can decrypt the record.
Process
Once created, the PID block is then signed with the AUAs digital key and passed to the ASA. The ASA then makes the API call, along with its own unique digital identifiers and a success or failure response is returned.
Each device must be registered with the UIDAI along with the device provider’s ID, a unique identifier (serial number), a digitally signed certificate, along with the provider’s key issued by the UIDAI.[5] All of these details are part of the meta-data of the authentication or eKYC request.
Currently, the UIDAI is in process to increase the CIDR’s authentication capacity to 10 crore transactions per day[6].
Data Security
To recap, your fingerprint and/or eye scan data are classified as core biometric information and has a one-way ticket into the CIDR. By law, core biometric information collected shall not be shared “with anyone for any reason whatsoever”[7].
As highlighted in the previous post, the CIDR is classified as a Protected System under the IT Act, 2000, a Critical Information Infrastructure (CII) by the NCIIPC and the UIDAI itself is ISO27001:2013 certified.
The following are some of the security enforcements across the Aadhaar ecosystem:
- There is no connection endpoint that returns a holder’s biometrics, as required by law[8].
- Demographic information and biometric information are partitioned into separate databases secured between firewalls; a single database does not have all the user’s data in one physical location[9]. Raw biometric data is stored in encrypted form within the CIDR.
- Connection to the CIDR is only available to ASAs (27 entities at this time) over 1-to-1 leased line or MPLS[10] links, i.e., not over the internet.
- Each ASA is allotted a separate DMZ¹¹.[11]
- The PID block is encrypted with AES-256 using a one-time session key. The session key is then encrypted with a 2048-bit UIDAI issued key. This makes it extremely expensive to break.[12]
- The PID block includes a timestamp and a one-time session key to prevent reuse.
- The digital keys of the registered device, the AUA and the ASA are logged and validated for every transaction.
- The session key used for the PID block must not be stored and should not be reused across transactions.[13]
GCM encryption[14] was added in API v2.0 - Network connectivity between the ASA and AUA must be secured via a leased line or at least a VPN/SSL. It is the ASA’s responsibility to ensure this connectivity requirement.
- Multi-factor authentication (using Aadhaar number + biometrics + OTP) is also available over the API. This is not mandatory as a phone number is not required for Aadhaar enrolment.
- In cases where self-service is not possible and operator interference is required to operate a device, the operator must be authenticated and their Aadhaar number logged to perform any functions.
ASA and registered devices do not store PIB blocks beyond a few seconds in cache for buffering. Device applications and ASAs are verified by the UIDAI directly. - Enrolment client software is entirely written, maintained and provided directly by the UIDAI.
- All meta-data of a request — timestamp, the entities involved — are made available online for 6 months, and archived for 7 years by law.[15]
- The CIDR in its entirety is located inside India and all authentication requests route within the Country.
- The Aadhaar number itself is random and is not based on any identifying personal factors of the holder.[16]
- In January 2018, the UIDAI provides the option to generate a further randomized 16-digit Virtual ID for authentication requests so that users do not need to share their Aadhaar ID with providers. There are no limit to generating new Virtual IDs for the same Aadhaar, the old one is automatically discontinued once a new ID is generated.
Aadhaar does not independently verify addresses submitted as part of enrollment. However, it should be noted that Aadhaar only accepts otherwise verifiable proof of address (such as a passport, electricity bill, water bill within 3 months old, bank statement, etc). ↩︎
Lok Sabha unstarred question № 5064, answered 4 April 2017 ↩︎
The only exception under law is in the interest of ‘national security’, which requires approval by an Oversight Committee and a court order ↩︎
Aadhaar Act, 2016 Section 29 (a) ↩︎
Multiprotocol Label Switching (MPLS) is a unidirectional tunnel between a pair of routers, making it a 1:1 network between a source and a destination, not exposed to anyone else. ↩︎
An RSA study titled A cost-based security analysis of symmetric and asymmetric key lengths, published in 2000, concludes that it would take about 3 million years to break a 1020-bit key with US$10 million being available for hardware. Scaling further, it concludes a 128-bit symmetric key (equivalent to a 1620-bit asymmetric RSA key) — with a budget of $US10 trillion — would require 10¹⁰ years (for context, the age of the universe is estimated to be 1.38•10¹⁰ years). A 2011–12 yearly report by the European Network of Excellence in Cryptology II (Encrypt II) rates a 256-bit key as adequate protection for the “foreseeable future”, affording good production even against quantum computers. Additionally, the Algorithms, key size and parameters report – 2014 by the European Union Agency for Network & Information Security (ENISA) notes: AES is classified as “the block cipher of choice for future applications”, with a recommended minimum size of 128-bits. An attack on AES-256 is rated as requiring time 2⁹⁹ and data complexity 2⁹⁹, such that related keys are used (and therefore, requiring more time if keys are randomly selected). Breaking AES-256 is expected to require 2²⁵⁴ encryption operations and 2⁸⁰ plaintexts. SHA-256, used for building the Device Info Hash is classified as “future use” hash function. ↩︎
The only re-use of the session key is allowed when used as a ‘seed key’ during a synchronised key scheme session. In this case, an AES session key is generated and can be reused until the session is completed (or 4 hours, whichever is earlier). ↩︎
Galois/Counter Mode encryption was added to Aadhaar authentication requests in 2016. See the API documentation for how Aadhaar uses GCM encryption. ↩︎
Aadhaar (Authentication) Regulations, 2016 Chapters III, IV ↩︎
Aadhaar Act, 2016 Section 4(2) ↩︎