Sunday, August 14, 2016

Surprising properties of C null pointer constants

A previous blog post mentioned two properties of NULL that many developers find surprising:
  • NULL does not need to be 0 when stored in memory — the compiler is allowed to transform the value when casting between integer and pointer types, so
    union {
        uintptr_t u;
        void *p;
    } u;
    u.p = NULL;
    printf("0x%" PRIxPTR "\n", u.u);
    
    does not need to print 0, even though u.p==(void*)0 evaluates to true.
  • NULL does not need to be a pointer — it is valid for an implementation to define NULL as
    #define NULL 0
    
    so it is not portable to pass NULL to a va_arg function, as this will fail for architectures where sizeof(int)!=sizeof(void*).

Today I found a third case in the paper "Beyond the PDP-11: Architectural support for a memory-safe C abstract machine" — casting an integer variable having the value 0 to a pointer type is not guaranteed to produce a null pointer. For example, it is implementation-defined if the comparison
int zero = 0;
bool b = (void*)0 == (void*)zero;
evaluates to true or false. Both (void*)0 and (void*)zero cast the value 0 to void*, so I had naively thought that both would produce a null pointer, but the standard does only guarantee this for (void*)0 as it is a constant expression
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
while the result is implementation-defined when casting a variable
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
So (void*)0 is a null pointer while (void*)zero is converted in an implementation-defined manner,  which may or may not produce a null pointer...

Sunday, July 24, 2016

Code behaving differently in C90, C99, C11, C++98, and C++11

There are some subtle differences between the revisions of the C standard that makes it possible to create programs that behaves differently depending on if they are compiled as C90, C99, or C11. Similarly, C++ is mostly a superset of C, but there are constructs that produce different results for C and C++.

This is used in Don Yang's contribution to the 2015 International Obfuscated C Code Contest to create a program that produces different output depending on if it is compiled as C89, C99, C11, C++98, C++11. For C90 it prints stars of the form
*********************************************              ***               **
***********************     *****************             ******             **
*********************        ****************           **********           **
********************         ****************         **************         **
******        ******         *****************     ****************************
******           **          **************************************************
******                      *************************(O************************
*******                     *     *********************************************
*********                              ****************************************
***********                             ***************************************
************                            ***************************************
*********                             *****************************************
*******                     ***************************************************
******                       ****************)d));o((d=************************
******           **          **************************************************
******         *****         *********************      ***********************
********************(         O******************        **********************
**********************       *******************          *********************
************************     *******************          **************)p)-p);
o(d-(p=*****************************************          *********************
For C99 the stars have eyes, C++11 prints circles, etc. (There is more to this. The program reads text from standard input, and the output is obfuscated C90 source code that prints that text — all the * characters are pointer dereferences!)

The source code for the program is a little bit hard to read
                                           #define r(R) R"()"
                          /*[*/#include  /**/<stdio.h>
                      #include<math.h>/*!![crc=0f527cd2]*/
                   float I,bu,k,i,F,u,U,K,O;char o[5200];int
              #define R(U) (sizeof('U')==1||sizeof(U"1"[0])==1)
            h=0,t=-1,m=80,n=26,d,g,p=0,q=0,v=0,y=112,x=40;  float
           N(float/*x*/_){g=1<<30;d=-~d*1103515245&--g;return  d*_
          /g;}void/**/w(int/**/_){if(t<0){for(g=0;g<5200;o[g++   ]=
          0);for(;g;o[g+79]=10)g-=80;for(t=37;g<62;o[80+g++]=32)   ;
         }if(m&&o[h*80+m-1]==10){for(g=0;g<79;o[t*80+g++]=0){}o[t
         ++*80+g]=10;t%=64;n+=2;I=N(70)+5;if(n>30&&(I-x)*(I-x)+n*
        n>1600&&R()){O=0;F=(x=0x1!=sizeof(' '))?k=1+N(2),i=12-k+N(
        8),N(4):(k=17+N(5),i=0,r()[0]?O=.1:  0);for(u=U=-.05;u<32;
        U=k+i+i*.5*sin((u+=.05)+F))for( K=0   ;K< U;K+=.1)if((bu=K*
       sin(u/5),g=I+cos( u/5) *K)>=0&&g  <     79  )o[g+(int)(t+44+
       bu*(.5-(bu>0?3*O:  O)   ) )%64*  80      ]  =32;x*=02//* */2
      -1;n=O+x?n=I+(x?0   :N     (k)-   k           /2),g=(t+42  )%
      64,m=-~g%64,x?g=m          =-~        m%64:0  ,n>5?o[g*80   +
     n-3]=o[m*80+n-3]=       0:   0              ,n <75?o[g*80+n
     +2]=o[m*80+n+2]=0   :0:0;                      x=I;}h=-~h%64
    ;m=0;}putchar((g=o [h*                          80+m++])?g:_);
   if(g){w(_);}}void W                               (const char*_
  ){for(;*_;w(*_++));}                               int main(int a
  ,char**_){while(a--)d              +=_[a          ]-(char*)0;W( \
 "#include<stdio.h>typed"             "e"         "f\40int\40O;v"
 "oid o(O _){putchar(_);}O"                    "\40main(){O"  ""
"*_[512],**p=_,**d,b,q;for(b=0;b"        "++<512;p=_+q)_[q"    \
"=(p-_+1)*9%512]=(O*)p;") ;      for(;(g= getchar())-EOF;p=
q){q=p;for(v=512;p-q-g&&q-p-              g;  v--)q=-~q*9%512
;W("o(");if(p>q)w(y),w(45);w(                      40);w(y^=20
);w(075);for(a=0;a<v;a++)w(42);                      for(W("(O**"
 );a--;w(42)){}w(41);w(y^024);w(                      41);if(p<=q)w(
   45),w(y^20);W(");");}for(a=7;a-6                      ;W(a<6?"{;}":""
      ))for(a  =0;a  <6 &&   !o[h*80+m                       +a];a++){}W("r"
         "etu"  /*J   */       "rn+0;}\n"                             );return
             /*                      "#*/0                                   ;}
but it is, as far as I can tell, using three tricks in order to detect which C or C++ dialect is used:

  • // comments
    C90 does not have // comments, so constructs of the form
    int i = 2 //**/2
        ;
    
    can be used to differentiate it from the other C and C++ revisions, as C90 compiles this as
    int i = 2 //**/2
        ;
    
    while C++ and the more recent revisions of C compiles this as
    int i = 2 //**/2
        ;
    
  • Type of character constants
    Character constants, such as 'a', have type int in C while C++ use the type char. This means that sizeof('a') evaluates to a different value for C and C++.
  • Wide string literals
    C11 and C++11 have wide string literals where for example U"hello!" is a string with characters of type char32_t. This can be used with a macro
    #define R(U) sizeof(U"a"[0])
    
    that is used as R(""). For C11 and C++ this expands to
    sizeof(U"a"[0])
    
    which evaluates to 4, while the older revisions of the standards treat U and "a" as two tokens, and the macro expands to
    sizeof("""a"[0])
    
    which evaluates to 1.

Wednesday, July 20, 2016

Using MathJax in Blogger

I have added MathJax support to this blog so that I can typeset mathematics using a large subset of \(\LaTeX\) commands. This functionality works in comments too, which has the nice side effect that it can be used to work around Blogger's limited formatting. In particular, formatting source code in comments can be done as
\(
  \verb|int foo(void)| \\
  \verb|{| \\
  \verb|    return 0;| \\
  \verb|}| \\
\)
which is rendered with most of the formatting intact:
\(
  \verb|int foo(void)| \\
  \verb|{| \\
  \verb|    return 0;| \\
  \verb|}|
\)

One annoying thing is that the \(\LaTeX\) commands are not processed when writing or previewing the comment, but they are rendered correctly when it is published...

How to enable MathJax in Blogger

MathJax need to be loaded and configured when the page is loaded. This is done by adding the following
<script type="text/javascript" async="async"
  src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML,Safe">
</script>
in the <head> block by editing the blog's template (choose "Template", and click the "Edit HTML" button). This only adds the support to the desktop template, so you need to enable it separately for mobile by pressing the gear button, and choosing the "custom" mobile template (that generates a mobile template from your desktop template).

There are several configurations to choose between, with support for \(\LaTeX\), MathML, and AsciiMath notation, and control over how much functionality is loaded up front, and how much is loaded on-demand. I have chosen the TeX-AMS_CHTML which enables all \(\LaTeX\) support, but avoids MathML and AsciiMath.

Note the ,Safe modifier added after the configuration name. This disables unsafe constructs such as running javascript from \(\LaTeX\) commands
\[E \href{javascript:alert("Einstein says so!")}{=} mc^2\]
This is needed in order to prevent commenters messing up the blog by adding evil constructs in comments.

Testing the functionality

Below are some random formulas, just to verify that the rendering works as intended:
\[
  \sigma = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\mu)^{2}}
\]
\[
  \left\{
  \begin{aligned}
    a_1x+b_1y+c_1z &=d_1+e_1 \\
    a_2x+b_2y&=d_2 \\
    a_3x+b_3y+c_3z &=d_3
  \end{aligned}
  \right.
\]
\[
  \require{AMScd}
  \begin{CD}
    \pi(X, x_{0}) @>\phi_{*}>> \pi(Y, \phi(x_{0}))\\
    @VVuV @VVvV\\
    \pi(X, x_{1}) @>\phi_{*}>> \pi(Y, \phi(x_{1}))
  \end{CD}
\]

Thursday, July 7, 2016

Checked C

Microsoft Research recently announced a work-in-progress project, Checked C, that adds bounds checking to pointers and arrays:
Checked C adds new pointer types and array types that are bounds-checked, yet layout-compatible with existing pointer and array types. In keeping with the low-level nature of C, programmers control the placement of bounds information in data structures and the flow of bounds information through programs. Static checking enforces the integrity of the bounds information and allows the eliding of some dynamic checking. Dynamic checking enforces the integrity of memory accesses at runtime when static checking cannot. Checked C is backwards-compatible: existing C programs work “as is”. Programmers incrementally opt-in to bounds checking, while maintaining binary compatibility.
Below are my summary of, and comments on, the Checked C version 0.5 specification.

Overview

Checked C extends C with checked arrays and pointers where memory accesses are checked at runtime, and a runtime error is produced if the memory access is out of range. The compiler is allowed to (and expected to) eliminate checks that it knows are always true.

Checked arrays are created using the checked keyword, so
int x checked[5];
creates a checked array x having 5 elements. There are three new checked pointer types that are declared using syntax borrowed from C++:
  • array_ptr<T> — A pointer to an element of an array of type T values. This pointer type works as a normal C pointer (but with bounds checking).
  • ptr<T> — A pointer to a value of type T. Pointer arithmetic is not allowed on this pointer type.
  • span<T> — The span pointer works in the same way as the array_ptr, but it is represented differently in the generated code (see below).
As an example
ptr<int> p;
declares a checked pointer to an int.

The checked pointer types can have const and volatile modifiers, so a pointer to a constant integer is written as
ptr<const int> p;
while
int x;
const ptr<int> p = &x;
defines a pointer that cannot be modified.

The checked arrays and pointers are used in the same way as normal C arrays and pointers, so a checked pointer p can be dereferenced as *p, and an array_ptr<T> or span<T> pointer p can be dereferenced using expressions such as *(p+4) or p[4].

The array_ptr<T> and span<T> pointers need bounds to be defined before they may be dereferenced. Defining the bounds for a pointer p is done using the count and bounds keywords
  • p : count(len) — the number of elements that are accessible beginning at p
  • p : bounds(low, high) — the range of memory that can be accessed through p
The bounds are placed on the declaration, such as
array_ptr<int> p : count(n) = malloc((sizeof(int) * n);
Using this pointer as p[i] will conceptually add a check
dynamic_check(0 <= i && i < n);
right before the access.1 Pointer assignment transfer the bounds in the natural way, so q will get the bound from p in
array_ptr<int> q = p;
The array_ptr<T> and ptr<T> pointers have the same size as normal C pointers (such that sizeof(array_ptr<int>) is equal to sizeof(int*)) so checked pointers can be used without changing the layout of strucures. This means that the bounds are maintained locally by the compiler. The span<T> pointers do however keep the bounds within the type, so sizeof(span<int>) is larger than sizeof(int*).

It is possible to add additional constraints to handle things like this aligned memcpy
int aligned_memcpy(array_ptr<char> dest : count(len) where aligned(dest, 4),
                   array_ptr<char> src : count(len) where aligned(src, 4),
                   int len where len % 4 == 0);
although the constraint specifications seems to be a bit under-specified in the document, and I am not completely sure how they work in detail...

Undefined behavior

Doing all of this is a bit meaningless if code such as a[i+j] can make anything happen because of undefined behavior from overflow of i+j, so Checked C defines the behavior for some things that invokes undefined behavior in standard C.

Checked C requires that signed integer overflow produces a value or a runtime error:
  • To be able to maintain pointer bounds safety, it is important that signed integer overflow produce a defined value. When a signed integer expression produces an out-of-range value, either (1) the operation must convert that value to an in-range integer value or (2) the expression shall produce a runtime error. The conversion must be a function of only the input values of the expression.
  • Integer division by 0 shall also produce a runtime error or produce a defined value.
Checked C does also define pointers to work more like hardware pointers than what is the case in standard C. The checked pointers are treated in the same way as unsigned integers (all values are valid, even if they do not point at an object), but they produce runtime errors for pointer wrap and pointer arithmetic involving NULL. The rules for undefined behavior for unchecked pointers are modified in a similar way:
Unchecked pointers shall be treated as addresses of locations in memory, just as checked pointers are treated as addresses. The addresses shall be unsigned integers with a defined range of 0 to UINTPTR_MAX:
  • Comparison of pointers for all different kinds of pointers shall be defined as the corresponding integer comparison.
  • Subtraction p - r of two pointers p and r of type T where one pointer is a checked pointer and the other is an unchecked pointer shall be done following the rules for subtraction of checked pointers, treating the unchecked pointer as a checked pointer in those rules.

Bounds evaluation

The bounds are evaluated each time a pointer is checked, so the program need to be careful when updating variables used in a bounds declaration. The compiler must report an error when the bound is extended
int sum(array_ptr<int> start : bounds(start, end), array_ptr<int> end)
{
    end = end + 1; // bounds(start, end) does not hold after this,
                   // so program is rejected
    start[5] = 0;
    ...
}
but Checked C allows modifying bounds in this way, so for example
array_ptr<int> x : bounds(x, high) = ...
int sum = 0;
while (x < high) {
    sum += *x;
    x++;
}
is fine as the bound is reduced when x is incremented.

And there are more problems... For example, let e be an array_ref<T> with bound(x+1, x+5). This will not work when assigning
x = e;
as the range depends on x. Or consider this example
w = ...
where w : bounds(x, x + y);
int t = *w + (y = tmp);
The bounds for w depends on y, but y is modified in the same expression that dereferences w, and it is unclear if y is updated before or after w is checked. The compiler must reject the code for both of these examples.

A big part of the specification deals with this, and there are rules for which expressions are valid in bounds declarations, and how to do data flow analysis to verify that variables are allowed to be changed. But data flow analysis is expensive, so there are restriction that limit how much the compiler need to check, with the result that small changes to the code may push the program over the limit and thus fail to compile.

This would be so much simpler if the bounds were evaluated where declared. The compiler could place the bounds in hidden temporary variables, but this is rejected in the rationale:
We considered eager evaluation, but rejected it because it would turn array_ptr types into span types. When bounds expressions are always eagerly evaluated, the results need to be stored somewhere so that they can be used when v is used. For local variables, hidden temporary variables could be introduced. This breaks the design principle of not introducing hidden costs, though.
I do not understand what they mean by this... I would say that the current specification adds hidden costs as the bounds may be evaluated each time the pointer is used, while keeping the bounds in hidden variables will only evaluate them once. Hidden variables may increase the register pressure, but the current specification will most likely increase the live ranges for the variables used in the bounds, which also increases register pressure, so I do not expect a difference in reality.

Doing eager evaluation would however cause problems for array_ptr<T> pointers in structures. They are currently handled as
struct S {
    array_ptr<int> arr : count(len);
    int len;
}
where the variables used in the bounds calculations lives in the structure. I have not thought this through in detail, but I think it would make sense to forbid derefencing such pointers, and require the program to copy them to a local variable in order to use them. I do not think this is a big problem, as I would guess that most pointers in arrays are of the ptr<T> type, which can be used directly as they do not have bounds.


1. The real checking is somewhat more complex, and it also checks that the count(n) is valid (i.e. that n is less than (INTPTR_MAX/sizeof(int)), etc.)

Updated 2016-07-07: Added clarification in note 1.

Sunday, June 19, 2016

Evolution of C programming practices – Unix 1973–2015

I found a recent paper that also looks at how the BSD code base has evolved, but from a very different perspective compared to my code-size investigation.

The paper "The Evolution of C Programming Practices: A Study of the Unix Operating System 1973–2015" investigates coding style, and tests seven hypotheses by looking at metrics (line length, number of volatile in the source code, etc.) in 66 releases of Unix from 1973 to 2014. The hypotheses are
  1. Programming practices reflect technology affordances (e.g. developers may be more liberal with screen space when using high resolution displays)
  2. Modularity increases with code size
  3. New language features are increasingly used to saturation point
  4. Programmers trust the compiler for register allocation
  5. Code formatting practices converge to a common standard
  6. Software complexity evolution follows self correction feedback mechanisms
  7. Code readability increases
and the result is that they seem to be true, as interpreted through the metrics.

There are several things that annoys me with this paper...

Source code change over time

The earliest source code releases in the study are the "Research systems" from Bell Laboratories, which are followed by Bell-32V, different BSD versions from University of California, BSD386, and FreeBSD. Of these, it is only FreeBSD that continues after 1995, so the early range 1973–1995 consists of multiple variants of the OS developed by different groups in parallel, while the data 1996–2014 consists of FreeBSD only. This may affect the interpretation of the data...

Consider for example this plot from the paper, showing the mean comment size
CMCHAR — mean comment size
The first part of the plot that consists of multiple variants of the OS have points all over the place, while the second part is FreeBSD only, and is rather constant. Several of the metrics have this kind of distribution, and I am not convinced that the data can be interpreted as evolution (rather than different projects have different coding style), or that fitting a cubic spline to the data have much value for this data set.

Language change over time

Some of the metrics are for new functionality in the C language, and are used for testing hypothesis 3 — "new language features are increasingly used to saturation point". But that hypothesis is in some sense obviously true for most of the metrics. For example, K&R C did not have volatile, but C90 requires volatile in order to prevent the compiler from optimizing away memory accesses to memory mapped hardware registers. Furthermore, there are only a finite number of places where it makes sense to use it, so it obviously saturates.

So the metrics mostly measure that the projects switched from K&R C to standardized C, and most of the graphs looks like

DVOL — volatile keyword density
That is, first no usage, then the feature is introduced in all relevant places when the project updates the compiler.

One other issue here is that the header files are excluded in the calculated metrics. I do not think it affects the result of the conclusion (as it is a rather weak statement anyway), but I have often seen things like
#define HW_REG volatile unsigned int
being defined in header files, and such usages are not included in the data (and this skews the data in the discussion of restrict, where the code uses it as __restrict in order to be able to define it in a suitable way for old compilers that do not handle restrict).

Release dates

The data points for the releases have somewhat random dates. One issue is that the paper use each release's mean file date (the average of the files' last modification time) instead of the release date (that is why the graphs stop at November 2010, even though FreeBSD 10 was released in 2014). The idea is that this better reflects the age of the code base, but this has the effect of compressing some of the data points (especially the clustering around 1993-1994), and it makes the spline fitting even more suspect.

One other problem is that the original data used by the researchers seems to have incorrect timestamps. For example, 4.3BSD Net/1 was released in 1989, but is listed as 1993-12-25 in the paper. The same is true for at least the Net/2 release too, which was released in 1991, but the paper list it as 1993-07-02.

Sunday, May 29, 2016

20 years of NetBSD code bloat

I started with NetBSD in the mid-nineties, on a Sun SPARC ELC with 32Mbyte of memory, where I used GCC and Emacs on X11 with FVWM as the window manager. I'm still using GCC, Emacs, and FVWM with the same configuration files (updated for pointless changes in Emacs and FVWM), but I now need much more memory and CPU performance... I thought it would be interesting to investigate why.

Much of this is likely due to GCC — the GCC 2.7.2 cc1 from NetBSD 1.2 has a text segment size of 1073152 bytes, while GCC 4.8.5 used for today's NetBSD has a size of 11012557, and I believe the dynamic memory and instruction count used during compilation has increased much more than that. But the operating system has also increased in size (for example, NetBSD 1.2 libc.so is 331776 bytes, compared to 1390980 today), and I plan to start my investigation by looking at the OS.

Building NetBSD

One thing I like about NetBSD is that it avoids pointless changes, so the build process has been the same for a long time. In addition, is is easy to cross-compile it for different architectures — in fact, NetBSD is always cross-compiled, even when building e.g. a x86 distribution on a x86 architecture.

There is a script build.sh that sets up everything, so building a release for e.g. Sun SPARC can be done as
./build.sh -u -U -m sparc release
This first compiles the tools needed for the build, then builds the full release of NetBSD, and packages the result. This works on "any" operating system — you can build NetBSD on e.g. Linux and Mac OS X.

Cross-compiling individual components are also trivial. The tools are per default placed in a directory named after the host platform, in my case obj/tooldir.NetBSD-6.1.5-i386/bin. The most useful tool here is a wrapper for make that sets up the environment for cross-compiling. This can be used to compile any component, e.g.
cd lib/libc
../../obj/tooldir.NetBSD-6.1.5-i386/bin/nbmake-sparc
will build libc.so.

Some results

As a first step, I have built a NetBSD release from trunk 1 January each year 2002–2016, mostly just in order to see that it was as easy to build as I thought, and to try to get a feel for what can be interesting to investigate closer. I stopped at 2002, as the build.sh was not available earlier...

I built i386 (which is the NetBSD name for x86, even though it does not support i386 CPUs any longer) and sun3. The reason I choose them is that they represent two extremes of architectures — sun3 is a hardware platform that was obsolete already in the mid-nineties, so software size "should not" increase for it unless new functionality is added, while the i386 hardware is still used, and it has got more features, bigger caches etc., during this time, which may increase code size.

And the code size has increased during this time. For example, this is how libc.so, libm.so, and the kernel have grown:

These plots do not say much, and code increase is not necessarily "bad", even on constrained platforms, if unused functionality never get paged in. My plan is to look into the details of the reason for these increases (such as new functionality, support for more hardware, compiler changes, careless developers, etc.), to get a feel for how much each reason contributes. Please let me know if there are some specific questions you want me to investigate!

Thursday, May 19, 2016

Type-based aliasing in C

There were some concern that type-based alias analysis could cause problems when it was implemented in GCC, but the conclusion (as expressed in the GCC development list) was that all other compilers already did this optimization,1 so most code had already been fixed:
There's less and less of this code IMHO — modern compilers have been doing this kind of alias analysis for some time and as a result folks have been forced to fix their code. Of course this doesn't apply to Linux and some other free software projects since they only use gcc.
I guess that mail was a bit too optimistic — it was written in 1998, and I still see lots of broken code and complaints about -fstrict-aiasing...

What type-based alias analysis means

An informal description of the type-based alias analysis rules is that every memory location has a type, and you are only allowed to access the memory using the "correct" type (the compiler can therefore assume that two accesses with incompatible types do not alias). The C11 standard describes this in 6.5 "Expressions":
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.
As a concrete example, a variable of type int can only be accessed through an int* (including unsigned int*, const int*, etc.) or through a char* (including unsigned char*, const char*, etc.).

These restrictions allows the compiler to reorder operations when the types ensure that they access different objects. Consider the function
int i;

void foo(float *f)
{
    i = 23;
    *f = 0.0;
    i = i + 19;
}
*f cannot modify i as it has a different type, so the compiler is allowed to move the store to i over the store to *f, and the function is optimized to
void foo(float *f)
{
    *f = 0.0;
    i = 42;
}

Note that the type-based aliasing rules only talks about how to access objects, and not about pointer casting — it is allowed to cast pointers between incompatible types (as long as you follow the rules for pointer casts) such as
int i;
float *f = (float *)&i;
But accessing this as *f will now invoke undefined behavior, as the object pointed to is an int, and it is not allowed to access it by an expression having a float type.

type punning — union

There are cases where you must access data using an incorrect type, and there are a few ways to do this. The usual cases is where you need to get the bitwise representation of some value, and we will consider the example from the wikipedia article on type punning that negates a floating point number by changing the most significant bit. The examples will assume 32-bit int and float.

The naive version of just casting the a pointer to int* does not work
bool is_negative(float x)
{
    unsigned int *ui = (unsigned int *)&x;
    return (*ui & 0x80000000u) != 0; // Undef behavior reading float as int
}
as it breaks the type-based aliasing rules. The best way to solve this is in most cases to use a union2
bool is_negative(float x)
{
    union
    {
        unsigned int ui;
        float f;
    } u;
    u.f = x;  
    return (u.ui & 0x80000000u) != 0;
}
Both GCC and LLVM are smart enough to generate as efficient code as you would have expected from the invalid version above.

The union trick requires that all accesses are done through the union — the result is not defined when accessing through a pointer, even if the pointer has the "correct" type
bool is_negative(float x)
{
    union
    {
        unsigned int ui;
        float f;
    } u;
    u.f = x;
    unsigned int *ui = &u.ui;
    return (*ui & 0x80000000u) != 0;  // Undefined behavior
}

type punning — character pointer

Character pointers can be used to access any type, so the is_negative function can be implemented as
bool is_negative(float x)
{
    unsigned char *p = (unsigned char *)&x;
    return (p[3] & 0x80) != 0;
}
assuming a little-endian architecture.

Note is that int8_t is not guaranteed to be of character type. That is, the following function may be invalid
bool is_negative(float x)
{
    uint8_t *p = (uint8_t *)&x;
    return (p[3] & 0x80) != 0;  // Possible undefined behavior
}
Treating int8_t as a character type is the reasonable thing to do, and I would assume all compilers do this. But there are developers that thinks this is a bug — see the discussion in GCC bug 66110...

type punning — memcpy

A third way to do the type punning is using memcpy
bool is_negative(float x)
{
    unsigned int ui;
    memcpy(&ui, &x, 4);
    return (ui & 0x80000000u) != 0;
}
Both GCC and LLVM are smart enough to optimize away the memcpy, and generate similar code as the version using a union.

This type punning does only work if the destination is a variable — you cannot use malloc:ed memory for this. The reason is that memcpy copies the effective type from its source when writing to allocated memory
bool is_negative(float x)
{
    unsigned int *p = malloc(4);
    if (p == NULL)
        abort();
    memcpy(p, &x, 4);                // Effective type of *p is now float
    return (*p & 0x80000000u) != 0;  // Undef behavior reading float as int
}

allocated memory

Memory returned from malloc does not have a type, so each memory location gets an effective type when it is written. Subsequent reads must then be done according to the type-based aliasing rules as usual.

The type of the allocated memory can be updated by writing with a new type
void *p = malloc(4);
if (p == NULL)
    abort();
*(float *)p = 1.0;      // Effective type is float
do_something(p);
*(int *)p = 0;          // Effective type is now int
which allows the buffer being used for different things over its lifetime.

This may have some surprising effects, such as the examples in GCC bug 69776 and 70484. Consider the function
int f(int *pi, long *pl)
{
    *pi = 1;
    *pl = 0;
    return *(char *)pi;
}
The type-based aliasing rules says that int and long cannot alias, and this function can be optimized to
int f(int *pi, long *pl)
{
    *pi = 1;
    *pl = 0;
    return 1;
}
But it is possible for the pointers to alias if both point to the same malloc:ed memory, so the following will print a different value depending on if the optimization is done or not
int main(void)
{
    void *p = malloc(sizeof(long));
    if (p == NULL)
        abort();
    printf("%d\n", f(p, p));
    return 0;
}
It is a bit unclear exactly what the C standard requires in these cases, and you can argue that the optimization is invalid in this case. Recent versions of GCC are conservative and do not optimize this, but older versions are more aggressive, so it prudent to try to avoid playing tricks with type changes in allocated memory regardless of what the standard says...


1 Many developers seem to think that type-based aliasing was introduced by C99, but that is not true; C90 has essentially the same aliasing rules (although C99 contain some minor clarifications/improvements). I guess the reason for the belief is that GCC added -fstrict-aliasing at roughly the same time as it implemented C99.
2 Many discussions of type punning (such as the wikipedia article) says that type punning through union is a GCC extension. The background of this is that C90 said
[...] if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.
I believe the committee intended it to work, but made it "implementation defined" as the concrete result depends on the implementation (byte order, trap representations, etc.). But "implementation defined" lets the compiler do whatever it want, as long as the behavior is documented (and the original implementation in GCC did in fact have additional restrictions, although that was fixed before the release). GCC documents this works for C90 too, so it is in some sense a GCC extension...