An opinion piece with an example... about the security of numbers, specifically ID numbers, and how people outside your organization might use your exposed information.
What is an ID number?
You've heard the expression what's in a name? But have you ever wondered what's in a number? What about an ID number?
Numbers surround us, and they are part of our every day communications. But have you ever wondered what you can learn or infer from them? Many people don't think about it.
Phone numbers are simple - they map to a phone... and sometimes they are sequential (see below).
Drivers licenses are a form of ID - so are social insurance numbers - they are not sequentially assigned.
In computers / tech and data, an ID is normally an identifier. Customer #n (Number N) or Location #n. There might be many different types of records that have a number 4, but in many cases this number will be unique and used to identify and connect the information. Different types of records can be connected by their ID's - two different addresses or contacts could both be related to Company #2. The key issue here, is does it matter if it's Company #2, or Company # c0b656b1-7351-4dc2-84c8-62a2afb41e66 (this is a UUID)?
I think it does matter.
Why do people do it?
Simple. It is SO easy. Most database systems include a feature to automatically increment a number. So you start with 1, and go from there.
Not all database systems include support for things like UUID's, and using them is more work! In many cases you will have both, and will have to jump back and forth between them.
Why does it matter?
Traditionally things requiring security are not assigned sequentially so why do people do it?
A simple Google search will tell you... A LOT! (search "sequential numeric id security" without the quotes).
Articles will include things like:
Exposing database IDs - security risk? (https://stackoverflow.com/questions/396164/exposing-database-ids-security-risk)
Why Auto Increment Is A Terrible Idea (https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/)
Primary Keys: IDs versus GUIDs (https://blog.codinghorror.com/primary-keys-ids-versus-guids/)
If you read into it, you can see what some people have done... While a lot of these articles will tell you people's opinions in general why things like UUID's are better than a simple ID on a technical basis it might be hard to understand how that applies to the real world.
The Real World
Presume you have a list of clients. You have client #1 through client #9 - would you want client #10 to know their #? What if client #10 refers a friend to you years after they have been a client and they happen to find out that your new client is #12. Will that change their confidence in your growing business?
What if you have a "phone tree", auto attendant or IVR. If you list the partners (1 through 7) anyone who calls can know how large your business is. Some companies like to have contiguous ranges of numbers for their staff. That can work against them. A recuiter can just call 604-555-1000 throuh 604-555-1100 and contact every one of their staff directly!
Let's look at a real example: Zendesk (https://www.Zendesk.com/). Zendesk is a ticket system. Each company that works with Zendesk starts to number their tickets at #1 (although this can be changed to another number).
What does it matter?
Consider some tickets we submitted to Zendesk and some calculations we performed:
|Fake ID||Created||Interval||Minutes||New Tickets||T/M|
For this explanation, we started with the Fake ID (starting at 1, but based on the real differences in the ID numbers and represents those ID's), and Created time stamp of our tickets. We calculated the Interval as the portion of a whole day that elapsed between the current ticket's Created and the one that follows. Minutes is an expansion of the Interval into the number of minutes in that same period. New Tickets is simply the difference between the Fake ID values - the number of Tickets created in the preceding period of time.
Finally, that allows us to calculate T/M (or Tickets per Minute) which is the number of new tickets created per minute on average in the preceding period.
Note that this example is VERY rudimentary and only an attempt to illustrate what something as simple as an ID number can tell us - to have useful data we could collect more samples. Maybe by starting fake tickets - maybe from a variety of addresses - possibly some that might be harder to block like those belonging to an ISP or Gmail for example.
Does this matter? What could that tell us?
At the end of that data collection, we could extrapolate a lot of things from this sort of information. We could infer information about:
- the growth of the subscriber base (more clients have more questions)
- the number of staff assigned to processing tickets and observe the growth of the employee pool
- times of day with higher or lower ticket volumes (might lead us to infer what time zones zendesk has more clients located in)
Unfortunately for Zendesk these aren't only things we can infer about their company, but could be extended to their clients.
Your Tools, Your Future.
Think about your own tools. Do you want customer #14 to know they are your 14th? Do you want employee #102 to know that 101 have gone before them?
Even before AI but more so with it, simple numeric ID's need to be limited to as narrow a scope as possible to prevent external actors from using the patterns of change to find out more information about things that we might prefer be kept private.
Imagine a service that monitors users of Zendesk and warns you when your service providers are having an issue because of an abnormal number of tickets being generated?
For what it's worth, Zendesk does have a feature (encoded ID which unfortunately is not a UUID), but at the time of this writing, they are unable to use it for operations in their Web UI or API. There is a statement from one of their team that says using the "encoded_ID" this way would be a security risk (https://develop.zendesk.com/hc/en-us/community/posts/360001640268-Getting-the-Encoded-ID) - and while I believe the opposite is true (that it is safer - after all, it's already exposed to the world in email, and email headers!), they gave no reason for the position. I suspect the reason might have to do with the ability to add to a ticket when you are not a paying agent through email - which could facilitate people who want to circumvent the license fees.
Hopefully that can be changed in the future.