
The same can be said for plain typos: most humans will not notice the typo in Humans tend to ignore details in longer identifiers: the variable nameĪccessibi1ity_options can still look indistinguishable fromĪccessibility_options, while they are distinct for the compiler. However, what is “noticeably” different always depends on the context. Indistinguishable from the single letter “m”.Īgain, programmers’ fonts make these pairs of confusables Similarly, in fonts designed for human languages, the uppercase “I” and In programming languages, however, distinction between digits and letters isĬritical – and most fonts designed for programmers make it easy to tell them Human readers could tell them apart by context only. The digits 0 and 1: users typed O (capital o) and l Confusables and Typosīefore the age of computers, many mechanical typewriters lacked the keys for The’re presented here to help better understanding of the non-ASCII cases. While issues with the ASCII character set are generally well understood, This section lists some Unicode-related features that can be surprisingĪSCII is a subset of Unicode, consisting of the most common symbols, numbers, Which focuses on Bidirectional override characters and homoglyphs in a variety Trojan Source Attacks, reported by Nicholas Boucher and Ross Anderson, Investigation for this document was prompted by CVE-2021-42574, Or recommendations: it is rather a list of things to keep in mind.įor general security considerations in Unicode text, see and. This document purposefully does not give any solutions (such as diff displays), by enforcing project-specific policies,Īnd by raising awareness of individual programmers. They should be solved in code editors and review tools The possible issues generally can’t be solved in Python itself without

It is possible to misuse Python’s Unicode-related features to write code thatĪppears to do something else than what it does.Įvildoers could take advantage of this to trick code reviewers into It also allows writing code that is potentially confusing to readers. While this allows programmers from all around the world to express themselves,

Python code may consist of almost all valid Unicode characters. It aims to allow any character from any human language to be Unicode is a system for handling all kinds of written language.

This document does not give any recommendations and solutions. Programs that appear to do something else than they actually do. This document explains possible ways to misuse Unicode to write Python
