CPU microarchitectures

The primary goal of archspec is to be able to detect and label CPU microarchitectures at a granularity that allows reasoning about binary compatibility. Using this library a client can:

  1. Detect the microarchitecture of the current host, and compare it to a label on a binary to determine whether they are compatible.
  2. Check if a particular microarchitecture supports a given feature
  3. Retrieve the flags to use for a particular compiler to build a binary specifically for a microarchitecture

JSON database

All the static knowledge of microarchitecture names, features, compiler support etc. is stored in a JSON file. The most important information there is the dictionary of known microarchitectures. An example record in this dictionary looks like:

"sandybridge": {
   "from": ["westmere"],
   "vendor": "GenuineIntel",
   "features": [
     "mmx",
     "sse",
     "sse2",
     "ssse3",
     "sse4_1",
     "sse4_2",
     "popcnt",
     "aes",
     "pclmulqdq",
     "avx"
   ],
   "compilers": {
     "gcc": [
       {
         "versions": "4.9:",
         "flags": "-march={name} -mtune={name}"
       },
       {
         "versions": "4.6:4.8.5",
         "name": "corei7-avx",
         "flags": "-march={name} -mtune={name}"
       }
     ],
   }
 },

Each entry maps a unique, human-readable, label to corresponding information on:

  • The closest compatible microarchitecture
  • The vendor of the microarchitecture
  • The features that are available
  • The optimization support provided by compilers

The granularity of the labels follow those used by compilers to emit processor-specific instructions, but the actual labels might differ a bit to enhance their readability (e.g. archspec refers to the steamroller microarchitecture as opposed to bdver3). On top of this static information archspec provides language bindings with logic to detect, query and compare different microarchitectures.

Host detection

Detection of the host where archspec is being run can be performed with a simple function call:

>>> import archspec.cpu
>>> host = archspec.cpu.host()

where the return value is a archspec.cpu.Microarchitecture object. To obtain the label of the host one can simply convert this object to a string:

>>> str(host)
'cannonlake'

If more information is needed the object can also be converted to a built-in dictionary:

>>> import pprint
>>> pprint.pprint(host.to_dict())
{'features': ['adx',
              'aes',
              'avx',
              'avx2',
              'avx512bw',
              'avx512cd',
              'avx512dq',
              'avx512f',
              'avx512ifma',
              'avx512vbmi',
              'avx512vl',
              'bmi1',
              'bmi2',
              'clflushopt',
              'f16c',
              'fma',
              'mmx',
              'movbe',
              'pclmulqdq',
              'popcnt',
              'rdrand',
              'rdseed',
              'sha',
              'sse',
              'sse2',
              'sse4_1',
              'sse4_2',
              'ssse3',
              'umip',
              'xsavec',
              'xsaveopt'],
 'generation': 0,
 'name': 'cannonlake',
 'parents': ['skylake'],
 'vendor': 'GenuineIntel'}

Queries and comparison

The list of all microarchitectures known by archspec is accessible through a global dictionary that maps the microarchitecture labels to a corresponding Microarchitecture object in memory:

>>> import archspec.cpu
>>> archspec.cpu.TARGETS
<archspec.cpu.schema.LazyDictionary object at 0x7fc7eae49650>

>>> archspec.cpu.TARGETS['broadwell']
Microarchitecture('broadwell', ...)

>>> len(archspec.cpu.TARGETS)
43

This dictionary is constructed lazily from data stored in the JSON database upon the first operation performed on it (e.g. the Host detection shown in the previous section). A Microarchitecture object can be queried for its name and vendor:

>>> uarch = archspec.cpu.TARGETS['broadwell']
>>> uarch.name
'broadwell'

>>> uarch.vendor
'GenuineIntel'

All the names used for microarchitectures are intended to be human-understandable and to capture an entire class of chips that have the same capabilities. A microarchitecture can also be queried for features:

>>> 'avx' in archspec.cpu.TARGETS['broadwell']
True
>>> 'avx' in archspec.cpu.TARGETS['thunderx2']
False
>>> 'neon' in archspec.cpu.TARGETS['thunderx2']
True

since they implement a “container” semantic that is meant to indicate which cpu features they support. The verbatim list of features for each object is stored in the features attribute:

>>> archspec.cpu.TARGETS['nehalem'].features
{'sse2', 'sse', 'ssse3', 'sse4_1', 'mmx', 'sse4_2', 'popcnt'}

>>> archspec.cpu.TARGETS['thunderx2'].features
{'fp', 'cpuid', 'aes', 'sha2', 'crc32', 'pmull', 'sha1', 'atomics', 'evtstrm', 'asimd', 'asimdrdm'}

>>> archspec.cpu.TARGETS['power9le'].features
set()

Usually the semantic of this field varies according to the CPU that is modeled. For instance Intel tend to list all the features of a chip in that field, while ARM list only the flags that have been added on top of the base model. Given a microarchitecture we can query its direct parents or the entire list of ancestors:

>>> archspec.cpu.TARGETS['nehalem'].parents
[Microarchitecture('core2', ...)]

>>> archspec.cpu.TARGETS['nehalem'].ancestors
[Microarchitecture('core2', ...), Microarchitecture('nocona', ...), Microarchitecture('x86_64', ...)]

Parenthood in this context is considered by CPU features and not chronologically. This way each architecture is compatible with its parents i.e. binaries running on the parents can be run on the current microarchitecture. Following the list of ancestors we can arrive at the root of the DAG that models a given microarchitecture:

>>> archspec.cpu.TARGETS['nehalem'].ancestors[-1]
Microarchitecture('x86_64', ...)

The same result can be achieved using the family attribute:

>>> archspec.cpu.TARGETS['nehalem'].family
Microarchitecture('x86_64', ...)

since the returned object represents the “family architecture” i.e. the lowest common denominator of all the microarchitectures in the DAG. Finally, modeling microarchitectures as DAGs permits to implement set comparison among them:

>>> archspec.cpu.TARGETS['nehalem'] < archspec.cpu.TARGETS['broadwell']
True

>>> archspec.cpu.TARGETS['nehalem'] == archspec.cpu.TARGETS['broadwell']
False

>>> archspec.cpu.TARGETS['nehalem'] > archspec.cpu.TARGETS['broadwell']
False

>>> archspec.cpu.TARGETS['nehalem'] > archspec.cpu.TARGETS['a64fx']
False

Compiler’s Optimization Flags

Another information that each microarchitecture object has available is which compiler flags needs to be used to emit code optimized for itself:

>>> archspec.cpu.TARGETS['broadwell'].optimization_flags('intel', '19.0.1')
'-march=broadwell -mtune=broadwell'

Sometimes compiler flags change across versions of the same compiler:

>>> archspec.cpu.TARGETS['thunderx2'].optimization_flags('gcc', '9.1.0')
'-mcpu=thunderx2t99'

>>> archspec.cpu.TARGETS['thunderx2'].optimization_flags('gcc', '5.1.0')
'-march=armv8-a+crc+crypto'

If a compiler is unknown to archspec an empty string is returned:

>>> archspec.cpu.TARGETS['broadwell'].optimization_flags('unknown', '5.1')
''

while if a compiler is known to not be able to optimize for a given architecture an exception is raised:

>>> archspec.cpu.TARGETS['icelake'].optimization_flags('gcc', '4.8.3')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/user/PycharmProjects/archspec/archspec/cpu/microarchitecture.py", line 282, in optimization_flags
    raise UnsupportedMicroarchitecture(msg)
archspec.cpu.microarchitecture.UnsupportedMicroarchitecture: cannot produce optimized binary for micro-architecture 'icelake' with gcc@4.8.3 [supported compiler versions are 8.0:]