Intel® Stratix® 10 DX Device Errata

ID 683249
Date 11/01/2022
Public
Document Table of Contents

3.9. 836870: Non-Allocating Reads May Prevent a Store Exclusive From Passing

Description

A Cortex-A53 MPCore* processor executing a load and store exclusive instruction in a loop is allowed to repeatedly fail under certain conditions. An example of an allowed condition is when another processor repeatedly writes to the same cache line.

Because of this erratum, a non-allocating load from another processor may cause the store-exclusive instruction to repeatedly fail.

The failure occurs under these conditions:
  • One CPU executes a loop containing a load-exclusive and a store-exclusive instruction. The loop continues until the store-exclusive instruction succeeds.
  • Another CPU or master in the system repeatedly executes a load to the same L1 data cache index as the store-exclusive instruction.
  • The cache line containing the address of the load is present in the L1 cache of the CPU, and is not present in the L2 cache.
  • The load does not cause an allocation into the cache of the CPU or the master executing the load. This response results in a snoop arriving at the CPU holding the cache line every time the load is performed.
  • The load is repeated at exactly the same frequency as the load- and store-exclusive loop, such that every time around the loop, the snoop arrives at the same time as the store exclusive instruction is executing.
A load can be non-allocating in Cortex-A53 MPCore* processor in the following conditions:
  • A load instruction executes on a CPU, while the CPUACTLR.DTAH bit is not set, with one of the following conditions:
    • The memory address is marked as writeback cacheable, with no read allocate, in the page tables.
    • The memory address is marked as writeback cacheable, transient, in the page tables.
    • The memory address is marked as writeback cacheable, and an LDNP non-temporal load instruction is used.
  • A read transaction is made on the ACP interface.

Impact

If a CPU or other master is polling a location to determine when the value changes, then it can prevent another CPU from updating that location, causing a software livelock. Most polling routines, however, use memory that can be allocated into their cache. For improved efficiency it is recommended to use a load-exclusive and WFE instruction to avoid repeated polling. Because the frequency of the polling loop must match the frequency of the load store-exclusive loop, the conditions that cause this erratum are unlikely. Any other disturbance in the system, such as an interrupt or other bus traffic can easily alter the frequency of either loop sufficiently to break the livelock.

Workaround

If the repeating load is being executed on a Cortex-A53 CPU, then this erratum can be worked around by setting the CPUACTLR.DTAH bit. Note this version of the Cortex-A53 MPCore* processor sets this bit by default.

If the repeating load is being executed by another master in the system connected to the ACP interface, then the erratum can be worked around by changing the frequency of the polling so that it no longer aligns with the frequency of the load and store exclusive loop.

If the repeating load is being executed by another master in the system connected through the coherent interconnect, then the erratum can be worked around either by ensuring that the other master allocates the line into its cache, or by changing the frequency of the polling so that it no longer aligns with the frequency of the load and store exclusive loop. Note that the Cortex-A53 MPCore* processor always allocates the line into either the cache or a temporary buffer, and so does not require any workaround for typical polling loops.

Category

Category 3