Joe Wolf has worked with compilers for high-performance computing for the past twenty years. He developed optimizing/parallelizing Ada and Fortran compilers at Lawrence Livermore National Laboratory and Cray Security Research, Inc. before joining Intel in 1996. Since then, he has focused on helping customers in all industries adopt and use the Intel® Compilers for all of Intel’s processors and platforms from hand-helds to large clusters.
Describe your current position inside Intel and some of the projects you are directly involved with.
At Intel, I manage the Intel® Compilers Technical Support and Consulting team, responsible for all technical support and training for our compiler products. I’ve been with the Intel compiler team since 1996 working as a compiler developer, support engineer and as a manager. We are currently working to help our Intel® Software Development Products for Mac OS X help developers in the marketplace get the best performance.
Can you describe some of the challenges in moving the compilers over to a new platform? Obviously the target CPU is the same, what are some of the technical considerations that must be considered when moving to a new OS?
Our goal is to be command-line, source, object, and debug compatible with the default development environment on the new platform. This means achieving full compatibility with GCC and Xcode* on Mac OS* X in each of these areas. Fortunately, we had a good base from which to start since Linux was already an operating system we supported. Still, there are object code differences, and initially the debug format was different as well. The runtime libraries, particularly our Fortran runtime libraries had to comprehend new OS calls and such. One exciting opportunity was that all the new machines for Apple had at least two cores, so our support for multi-core through our compilers and libraries apply to all the new Apple machines.
Another issue we faced was optimizing for position independent code (PIC). On Linux, the best performance was with static linking. Since that wasn’t an option on Mac OS X, we put some effort and creativity into optimizing the PIC code for shared objects. Now, there is no overhead.
Finally, we had a goal to ensure that our first product on Mac OS X was not de- featured. We didn’t want it so be a subset of our Linux products. So we went through and made sure each feature was supported on Mac OS X, including OpenMP, feedback-directed optimizations, IPO and whole program analysis, vectorization and parallelization.
While we are all dependent on compilers, many of us don’t really take the time to appreciate the fact that they are in fact very complicated pieces of software, developed by a wide variety of engineers. Can you give us an idea of what the general workflow is in developing and managing such a product?
They are indeed complicated pieces of software, broken roughly into language processing (front-end), optimization (middle), and code generation (back-end) components. In addition, there are runtime libraries, IDE integration, command-line driver, the Intel Debugger, installation scripts, etc. that make up the complete product. Our team is geographically dispersed around the world, so we have the complexity of managing a world-wide team as well. We follow a fairly rigid process of starting with customer and strategic requirements for the product version, break those into a set of engineering work items, and then schedule that into for the appropriate schedule for our product releases.
The announcement of a switch was made at WWDC 2005. How, in your opinion, had the mood changed about the transition to Intel between WWDC 2005 and WWDC 2006?
At WWDC 2006, it seemed that there was nothing but excitement about the transition. Everyone I met was genuinely enthusiastic about the Intel® processor-based Mac’s, especially for their performance vs. previous generations. People just seemed to want to get their hands on whatever information and software that could help them get their applications transitioned quickly.
Apple’s transition from PPC to Intel happened quite fast (and ahead of the announced schedule). How did the time constraints for Apple’s product delivery affect the schedule for the tools development cycle?
Needless to say, it had a dramatic impact on all of us in the Software group at Intel. It spanned the gamut from elation at the prospect, to exasperation at what it would do to our product plans. We had to manage our efforts to update our other products while scheduling to get the Mac OS products together quickly. Fortunately, the continued development work for our product lines was generally complementary to adding Mac OS X versions. Where it was not, we did make trade-offs to ensure our Apple support was strong and timely. Now, all work going forward includes benefits for Apple developers.
With a switch to a new OS, there exists a need for expertise with that OS to develop products for it. Does Intel now maintain a group of Mac OS X experts in house? How closely does the compiler group work with various groups at Apple?
We had to develop experts on Mac OS X fairly quickly. We do have a number of people in the tools development teams, and in our application performance engineering teams that are focused on Mac OS X and Xcode. They help us ensure we comprehend changes to the environment so that we can plan corresponding changes to our tools. We have established a good working relationship with Apple’s applications and tools teams to ensure there is a good transfer of knowledge of the microprocessor architecture and tuning, and for Mac OS X knowledge back to Intel. Our teams worked well together to teach each other what we needed to know, and answer questions we had either way.
Obviously the compiler group has to work closely with hardware engineers designing the CPU’s. At what stage in the development of a new CPU do the groups begin to interact to lay out a strategy for integrating support for new features into the compiler tools and libraries?
The tools and application performance teams get involved with the processor architects very early – during the conceptual design of the processors and new features, and through simulation/emulation prior to tapeout. We want to ensure that new processors support features that are important to a wide variety of software, and that the features are such that the tools can take advantage of them.
Commercial versions of the Intel C/C++ and Fortran compilers, as well as the Intel Performance Primitives and the Intel Threading Building Blocks, are already available for purchase. Approximately how long did it take to get the product shipping? What were some of the steps that had to be taken to get the project going initially?
It took us about a year to get the tools products ready. We had a small number of people that began investigation and prototyping work in complete secrecy from the rest of Intel beginning a few months before the WWDC 2005 announcement. After the announcement we were able to do full product planning without the veil of secrecy. It took about nine months from July 2005 to April 2006 when the Compilers and Libraries were released as products. We were in great shape in January 2006 and able to get a lot of developers using our preview (beta) once the MacWorld announcements were public.
During the beta period for the Intel compilers, I was extremely impressed with the integration of the C/C++ and Fortran compilers. In fact for a number of applications I was porting at the time, it was often as simple as using the icc/ifort Makefiles for Linux. What has the response been to the compilers from the Mac OS X community?
We have been very pleased with the reaction to our tools. At WWDC 2006, in the Performance Lab, we were able to witness a number of developers trying the tools or showing us the results of using them. Many cited big performance gains from them, and are pleased with the way they are integrated and are compatible with GCC and Xcode.
By almost every performance metric, the new Intel Core 2 microarchitecutre is a success. What are some of the new features in these CPUs that make them so powerful? What type of additional functionality has been added to the compilers to help programmers take advantage of newer features present on these CPUs?
The key features in the Intel® Core
The compilers allow users to take advantage of multi-threading through the support of the OpenMP* standard and auto-parallelization. OpenMP (details can be found at www.openmp.org) is a directive-based application programming interface (API) that allows the user to specify which parts of the code to parallelize, such as loop nests, and how to share or protect data. The compiler converts the directives into calls to the underlying OpenMP runtime library for implementation and scheduling of the threads. Auto-parallelization is where the compiler attempts to parallelize loop nests automatically, with little direction from the user.
To take advantage of the SIMD capabilities, the compiler will automatically vectorize code by generating the appropriate SSE/SSE2/SSE3 or SSSE instructions utilizing the SIMD registers for parallel execution. We also pay close attention to how we optimize loop nests for memory and cache utilization.
Scientists tend to have very computationally demanding applications. What has Intel’s approach been in the past toward scientific applications of their compiler technology? What features of the Intel developer tools do you think might be of particular use for scientific code?
Scientific applications are typically very computationally intensive, and thus, performance sensitive applications. Therefore, we pay close attention to their application performance as it showcases the capabilities of our processors.
We have spent many years perfecting the capability of the compiler to vectorize the key loops of many different types of scientific applications to take advantage of the SIMD features of the Intel processors. Many such applications can be very sensitive to floating-point precision. Higher levels of optimization can impact that as we may do some optimizations such as replacing a floating-point divide or square root with a reciprocal approximation, for example. To alleviate this, we have implemented a floating-point precision vs. performance tradeoff model to allow the user to specify how much precision can be traded for additional performance. In addition, we have highly tuned our math library (libm) that implements the common transcendental functions (sin, cos, tan, exp, etc.) and other math primitives to ensure it provides the best possible precision and performance for our processors, and thus, the user’s applications.
The Intel® Math Kernel Library implements the commonly used linear algebra domains (BLAS, LAPACK), sparse solvers, fast-fourier transforms, and random number generators. All of these are commonly used in the scientific community. These libraries are able to dynamically detect the processor type and threading capability (number of cores and processors) of the system on which it is running to invoke best implementation of the function for that processor, and with the right number of threads. This library can save a lot of time in performance tuning and threading.
The new Core 2 CPUs are 64-bit. A lot of scientific applications can benefit from the ability to address more than 4GB of memory (especially image processing applications). Are the Intel compilers currently capable of producing 64-bit binaries on the newer Mac Pro and iMac based systems? If not, when can we expect this functionality to be added?
The current shipping version of the Intel compilers is version 9.1. These compilers will generate code only for 32-bit applications, thus addressable memory is ~2GB. Our next compiler version will be in preview (beta testing) by early 2007 and will support 64-bit addressing. If one wants to join the beta program, please submit your email address to us at www.intel.com/software/products/apple and someone from my team will contact you when the beta program is available.
Developers may want to move to 64-bit binaries even if they are not addressing over 2GB of memory. This is because the number of processor registers available with 64-bit instructions is higher in the Core 2 micro-architecture. More registers will often result in higher performance for applications since the compiler can keep more data in them versus having them in memory.
One aspect of scientific code that many of us deal with is the massive code base that is in Fortran. Intel provides a high performance Fortran compiler. Who are some of the biggest adopters (on all platforms) for the various compiler technologies that Intel provides?
We are pleased to have arguably the best Fortran compiler for Mac OS X. We have worked with many major software vendors and end users in the scientific computing arena. Many of the national laboratories and government agencies use our products, as do many leading research universities. Also, we’ve seen great interest from leading researchers in the bio-sciences.
With the addition of Mac OS X to the Intel repertoire of supported platforms come additional technologies and programming languages that are unique to this particular environment. One of those languages is Objective-C. Does Intel plan to offer support for Objective-C in the future as an add-on to the C/C++ compilers?
We are looking at adding Objective-C support, but do not have plans to support it at this point. The Xcode environment is smart enough to use our compiler for C++ and C code and direct Objective-C code to the Apple compiler. This means our lack of support does not stop developers from getting some benefit for programs using a mixture of code.
Looking ahead, what are some of the new features that we can look forward to being added to the Intel developer tools and how will they be useful for those of us on Mac OS X, and in particular for scientists?
We are focused on adding 64-bit support for Mac OS X this year. Additionally, even though we believe we are the best compiler anywhere for multi-threading, we are really just scratching the surface of multi-threading capabilities. We want to expand the utility of auto-parallelization to make it able to thread more loops. The Intel® Threading Build Blocks is an example of how we want to find new and more efficient ways for developers to express parallelism. It was also the first new product since we started supporting Mac OS X, and so it became the first product to debut on Mac OS X the same day as the debut on Windows and Linux.
We know that abstraction of thread management is important, we know making it easy to debug and tune is important – and eliminating the need to debug or tune is even better. I think we have some great ideas which will lead to more and more support for developers.
We want to continually improve the capability of debugging threaded and optimized code so that developers don’t have to build both debug and optimized versions.
To many people the switch to Intel came as a big surprise, and for the majority of us, we found out during the WWDC 2005 keynote, including many people within Apple. At what level within Intel was the transition known about prior to the announcement? Can you describe some of your initial thoughts when you first found out about the switch?
As I said above, there were just a few people in the company in senior management and key software and hardware technologists that knew about it early on. Personally, I was ecstatic. Apple is such an innovative and creative company; we knew we could learn some interesting things in working with them. It meant that I could lift my self-imposed embargo on buying a Mac for my family because it did not have an Intel processor in it.
Of course, now that Mac’s come with Intel CPUs, do you expect to start seeing more Mac’s around the offices at Intel?
I hope so. There is a lot of interest in that. Right now, our group is developing software for Apple machines so we definitely see a lot more Apple machines around our offices.