The Digits of CLIP

(March 9, 2021)

OpenAI CLIP

OpenAI Microscope

contrastive learning

Numbers and Digits

https://creativecommons.org/licenses/by/4.0/

Sifting through a single layer of OpenAI’s CLIP, examining numerical and textual neurons

2021-03-09

What is CLIP?

In case you are new to CLIP, please first visit the fantastic introductions to OpenAI’s very successful contrastive learning models in 2021: blog announcement, analyzing its multimodal neurons, and the official paper.

This Post

This blog is not a systematic study – rather a simple table of contents from visiting the OpenAI’s Microscope pages. Concretely, I spent a few hours staring at all of the 2,560 entries in Layer 4/4/Add_6 of the mid-sized RN50-x4 model. My labels are just personal impressions, so take them with a large grain of salt.

My main interest was understanding how CLIP sees numbers, and – if applicable – mathematical syntax – since that’s my own corner of the woods. My list includes various other curious categories I ran into, but I definitely did not include everything here. There are overall ≈270 neurons listed here, or ≈10% from that single layer, in the mid-sized CLIP model. A lot more is available for the curious to explore. Thanks to OpenAI for making all of this possible!

To Preview: make sure to hover on each link to fetch a preview image directly from the explorer. This can be a bit slow at times and apologies in advance if you’re reading here with the previews completely broken – they are not part of the post and will disappear the moment OpenAI modifies its URL scheme. On the robust side, clicking on the link will navigate you to the official microscope page in a new tab.

Numbers, numbers, numbers

•

Digit
•

0, small eye
•

0, target
•

00000
•

1,2
•

x00
•

1xx, 1xx, 1xxx +vegetation
•

2,20,22
•

3 – no preview, my only example from block 3.
•

4
•

5, goatee
•

6,7
•

7,9
•

8,9
•

Celsius, Fahrenheit
•

#
•

%
•

full width four-six digits
•

full width two/three digit numbers
•

3-5 digit grid-like numbers
•

simple math, arithmetic

Numbers in context

•

2k years [1]
•

2k years [2]
•

calendar dates + animal tails
•

calendar dates + anime faces
•

calendar dates
•

countdown/tires
•

fig. num captions
•

full clock face
•

handwritten numbers on boards
•

kb, gb – information size
•

latent ears, but matches id-like number/text
•

latent anatomy, but units+quantities on imagenet
•

lb, kg – weight quantities
•

measuring tape, rulers
•

micro, nano, tiny – length quantities
•

numbers + speed
•

numbers in digital time
•

numbers in digital time + puppy faces
•

numbers on plaques
•

numbers on race vehicles
•

numbers on white square backgrounds
•

numbers with adjacent marks – slashes, plus/minus, scripts
•

numerical prices
•

railway numerics
•

relatively small 2-digit numbers and belly buttons
•

small flames, digital display numbers – if the clock light is too bright, matches number
•

vertical numbers
•

numbers on speedometers mph
•

numbers on speedometers km/h
•

numbers on speedometers mph+fast
•

numbers on vehicle plates [1]
•

numbers on vehicle plates [2]
•

numbers on vehicle plates [3]

Brief Discussion

It appears that CLIP has a variety of neurons concerned with digit detection. I seem to have collected at least one neuron concerned with each individual digit, as well as at least one neuron that is interested in the category of “a digit” which exhibits all ten.

There is also a set of detectors for handwritten digits – e.g. on chalkboards, notebooks and parchment. Several cases appear to spot a particular length of number (e.g. between 4 and 6 digits), or a particular aspect ratio - centered in the image, or spanning the entire image width. Another neuron looks for vertically ordered digits.

Units, quantities and measurements are detected separately – but curiously grouped by certain unit kinds, such as weight or length. Also separate are clock faces, digital and analog.

A couple of neurons even grasp for basic algebraic syntax. The basic algebra capacity has not gone far, and appears shared between visually related syntax with a completely different meaning – e.g. subtraction is matched together with compound identifiers such as ISBN numbers, phone numbers and simple chemical formulas.

Some numbers seem to appear as secondary markers to contextual items – detecting jerseys, calendar dates, rulers. Lastly, it is also fascinating to see some neurons being shared with seemingly unrelated categories – goatees, puppies, tails, small flames, belly buttons – for what may be rarer specialized cases.

Possibly broader STEM

•

arrow
•

book cover
•

book page
•

chalk board
•

cube box
•

diagram
•

library
•

magazine
•

maps
•

postal letter, envelope, stamp
•

pyramid, diamond shape
•

red peppers and chemical formulas
•

science – mostly chemical molecule
•

topology

Characters / Text

•

!
•

?
•

@
•

A
•

B, brain
•

B
•

C
•

D
•

FR
•

F
•

K
•

P
•

R
•

TA, TAB, TAXI
•

W
•

alphabet + snakes
•

caligraphic/inverted letters
•

casual shoes, cups, cans, alphabet letters
•

dotted letter abbr.
•

gothic letters
•

grading/writing
•

graffiti letters
•

indiscernible letters, airplane windows
•

letters in grids
•

letters on vehicle plates
•

lists / menus
•

margin notes
•

musical note/glass
•

musical notes and hieroglyphs
•

newspaper clip
•

old printing or handwritten parchment
•

pens, pencils
•

pound sterling
•

rules, contracts, hall
•

sticky notes/letters
•

studying/reading
•

tiny footnote labels
•

type
•

wall of text

Sampling everything else

•

air
•

aluminum food container
•

antipasti
•

anxiety,stress
•

arabic
•

arch
•

art
•

bacon
•

ball
•

banana
•

bar
•

barcode
•

bees
•

bicycle
•

birthday
•

black and white
•

body anatomy
•

bones
•

bowtie
•

brick chimney
•

bridge [1]
•

bridge [2]
•

business suit
•

business tie
•

cage pattern – strawberry, shopping cart, waffles
•

calzone, jazz
•

canyon, wood art
•

cat
•

chair
•

change
•

charging cable
•

chicken
•

cigarettes
•

citrus slice
•

clean – broom, trash bag
•

clothing leather
•

clover
•

clown
•

coffee
•

comic speech bubble
•

construction cranes
•

corn
•

cross
•

cucumber/tzukini slice
•

dark beverage
•

digital
•

doctor who
•

easter egg
•

eyelashes
•

feather
•

fog
•

fork
•

fractals in nature
•

free/roam
•

ginger, red hair
•

greek columns
•

guns
•

happiness
•

hardhat
•

hello kitty
•

high heels
•

ice cream cone
•

ice cream scoop
•

jar lid
•

keyboard
•

kid doodles [1]
•

kid doodles [2]
•

kilt
•

knobs
•

lama
•

latex
•

leather boots, leather sandals
•

lego,puzzle
•

lemon/lime wedge
•

light beverage, milk
•

lighthouse
•

lit candles
•

lit flame
•

luck
•

makeup, palette
•

meal sides
•

meme
•

memorial
•

monkey
•

moon+Ford
•

movember
•

mushrooms
•

neatly organized appliances
•

neatly organized elongated items
•

neatly organized produce
•

nest
•

not all that glitters is gold
•

office
•

old sign, spokes
•

olympics, prices, xbox controller
•

opera/shakespear
•

overlooking/lookout
•

owl-like eyes
•

padlock
•

pillow
•

pizza
•

plastic water bottle
•

pope
•

popsicle
•

pumpkin
•

railroad tracks
•

rain
•

recycle
•

road cone
•

road
•

rollercoaster
•

rose
•

running
•

running
•

sailboat, pirate
•

sandals
•

sauce
•

seal
•

sealed
•

seatbelt
•

seeds
•

selfie [1]
•

selfie [2]
•

selfie [3]
•

setting
•

signatures
•

skeletons
•

skulls
•

sky lift
•

snoopy
•

socks
•

solar panels
•

soup
•

spider web
•

squirrel
•

stained glass
•

statue
•

stitches
•

stormtrooper
•

street lights at night
•

styrofoam food container
•

sunset
•

table, pier
•

teeth
•

telephone
•

temperature
•

tile eye
•

tomato
•

trophy
•

tropical
•

turtle
•

tuxedo
•

twins
•

umbrella
•

venue seating
•

volcano
•

waffle +unrelated
•

walking lane
•

walking
•

wallpaper
•

water drop
•

waves
•

wrinkled face with glasses
•

wrinkled face
•

xmas