Embedded Windows CE: February 2008

Tuesday, February 26, 2008

BSQUARE Creates Embedded CE BSP for ARM Cortex-A8, TI OMAP35x EVM

BSQUARE Corp. (Nasdaq:BSQR) announced that it is producing a production-quality board support package (BSP) for TI's new extensible OMAP35x(tm) Evaluation Module (EVM). The OMAP35x application processors are based on the new ARM® Cortex(tm)-A8 processor, which offers an unprecedented combination of laptop-like performance and superior power management for a wide range of applications, including digital set-top boxes, mobile Internet appliances, portable navigation devices, media players, and personal medical equipment. Together, the EVM and BSQUARE's BSP—the first Windows Embedded CE BSP to support the ARM Cortex-A8 processor—will give device designers everything they need to create next-generation Windows Embedded CE-powered devices using TI's new OMAP35x application processors.

BSQUARE, a leading systems integrator and solution provider for the global mobile and embedded community, will also provide engineering services, quality assurance, and support for device designers to help them get their Windows Embedded CE-powered OMAP35x products to market in record time.

Until now, TI's OMAP(tm) 3 processors were available only to mobile handset customers. With the introduction of the OMAP35x platform, TI is making its OMAP processors available to the broad market, allowing a wide range of original equipment manufacturers (OEMs) to create the most advanced embedded devices.

"As a leading systems integration company for Windows Embedded CE and Windows Mobile, BSQUARE has successfully completed many OMAP-based designs, allowing customers to further foster the development of smarter, multimedia enhanced products," said Gerard Andrews, OMAP marketing manager, TI. "Due to their extensive experience with both OMAP processors and Windows Embedded CE, BSQUARE has the expertise to help our customers get their OMAP35x-based products to market quickly."

"We're pleased and excited to collaborate with TI in helping the general embedded market create next-generation Windows Embedded CE- and Windows Mobile-powered devices using the OMAP35x processor family," said Raj Khera, vice president of products, BSQUARE. "With our production-quality BSP and world-class engineering services to go along with it, TI customers will not only be able to reduce their development risk and improve the quality of their devices, but also get them to market in record time."

"The availability of board support packages is vital in enabling the wide adoption of the Cortex-A8 processor-based OMAP processors into the general embedded market," said Eric Schorn, vice president of marketing, Processor Division, ARM. "The BSQUARE Windows Embedded CE BSP is an important step which will enable developers to rapidly create leading-edge OMAP35-based-devices."

ARM Cortex-A8: Marrying Super-High Performance with Power Efficiency
TI's scalable OMAP35x application processors, which offer the best general-purpose, multimedia and graphics processing in any combination, are based on the ARM Cortex-A8 processor—ARM's first superscalar processor. Featuring enhanced code density and performance technology, plus NEON(tm) technology for multimedia and signal processing, the ARM Cortex-A8 processor offers more than four times the processing power of today's 300-MHz ARM9(tm) family devices. With the ability to scale in speed from 600 MHz to more than 1 GHz, the Cortex-A8 processor can meet the requirements both for power-optimized mobile devices operating with less than 300 MW and for performance-optimized consumer applications requiring 2,000 Drhystone MIPS.

Availability
BSQUARE will be delivering its production-quality BSP and other technology and services for the OMAP35x platform beginning in Q2 2008. The BSP will be available from TI to developers who purchase the EVM. To find out more about BSQUARE's engineering services, quality assurance, and support, contact sales@bsquare.com or call 1-888-820-4500.

About BSQUARE
BSQUARE is a software solution provider to the global embedded device community. Committed to delivering quality and lowering project risk and time to market, our teams collaborate with smart device makers at any stage in their device development.

About the Texas Instruments Developer Network
BSQUARE is a member of the TI Developer Network, a community of respected, well-established companies offering products and services based on TI analog and digital technology. The Network provides a broad range of end-equipment solutions, embedded software, engineering services and development tools that help customers accelerate innovation to make the world smarter, healthier, safer, greener and more fun.

BSQUARE is a registered trademark of BSQUARE Corporation. OMAP3 and OMAP35x are trademarks of Texas Instruments.

Thursday, February 14, 2008

Application Compatibility in Windows CE 6.0

One of the goals for Windows CE 6.0 design was full backward compatibility at the binary level for ISV applications. We have gone to great lengths to maintain binary level compatibility by:
a) Maintaining the same exports from the standard core libraries (for ex: in coredll.dll)
b) Maintaining the same API signatures for all the exported functions.
c) Maintaining the same API functionality unless a change is warranted because of the new memory architecture or stricter security guidelines.
d) Leaving the function exported from coredll even if the API is deprecated. This lets an application to succeed on load time; it might fail at runtime but that is a very small percentage if it happens to use any of the deprecated APIs which we will discuss later.

Given this, we expect minimal impact to developers when they port their applications to CE 6.0. In most cases the application should just work on CE 6.0 without any porting. The level of impact to an application depends on how *well behaved* the application is. A well-behaved application is typically written using only SDK functions and doesn’t use any of the undocumented features or any OAL functions or myriad of other things mentioned in this topic (like passing handle values, passing memory pointers, assumptions about internal workings of a component such as memory mapped files etc.). By the way just to be clear when I say SDK functions, I mean those functions which are available in an installed SDK (for ex: Windows Mobile PPC SDK) or in an exported SDK (for ex: SDK exported from an OS design in Platform Builder). In both cases if the application uses only those functions which are documented as part of the SDK, then we expect minimal or no changes to the application to be able to run on CE 6.0 OS.

Now let us look at some of the changes which we had to make in CE 6.0 to support the new memory layout and what impact that might have on ISV applications.

There are three main categories of API changes:
a) Deprecated APIs: This list of APIs is completely deprecated in CE 6.0. The APIs are still exported from coredll but they mostly turn into no-ops or return failure when called. Some of the examples of the deprecated APIs are:

Memory pointer based APIs like MapCallerPtr: Since each process now gets its own mapping of 2GB address space (not including 2GB which is kernel space), mapping a memory pointer from one process to another involves reading process memory or making a virtual copy of the process memory. In both cases there are standard SDK functions to achieve this instead of the slot based APIs like MapCallerPtr.
Process index based APIs like GetProcessIndexFromId: These APIs no longer scale as CE 6.0 supports upto 32K process (pre-6.0 supported only 32 processes).
Permission based APIs like SetProcPermissions: Since we don’t have any process slots anymore this doesn’t apply anymore. You would need to use OpenProcess to get a valid process handle and then use standard SDK functions to read/write process memory.

All the deprecated APIs will now result in a debug check if a particular debug zone in coredll is enabled (DBGDEPRE == 0x00000040).

b) Kernel mode only APIs: This list of APIs is callable only by code loaded above 0x80000000 (i.e. code loaded in kernel space). This should not have much impact for applications as most of these APIs are hardware related like interrupt APIs, physical page mapping APIs, cross process memory allocation / de-allocation APIs to name a few. Some of these will impact drivers which are covered in a separate blog entry.

c) Usage discouraged APIs: This list of APIs has an alternate API one can call to get the same result as the original API or better. Some of the APIs in this list include checking for a PSL server (WaitForAPIReady is now preferred over IsAPIReady since WaitForAPIReady is a non-polling API and supports full 128 API sets), file mapping (CreateFile/CreateFileMapping is now preferred over CreateFileForMapping) etc.

Platform Builder ships a desktop side tool which you can run on your applications (exes and dlls) to see what APIs a particular module is using and to list out any API hits it registers from the above three buckets. The tool details are as follows:
Name: ceappcompat.exe
Location: public\common\oak\bin\i386
Usage: Run the tool from an OS build window to scan a particular dll/exe or all files within a folder.
Results: At the end of the run, the tool will generate a HTML file which will have all the API hits from the given module(s) and any recommendations for API usage.

This tool works at the binary level scanning the import table of a given module for list of APIs the module is calling. As a result it cannot detect API usage if the API is called via a function pointer obtained from GetProcAddress.

Now let us look at some of the core changes which have gone in CE 6.0 which might influence how applications behave or interact with other modules on the system.

Handles
Pre-6.0 behavior: Handles are global which means handles are accessible from any application.

6.0 behavior: Handles are per process which means a handle value in one application means nothing (zippo, zero, shunya etc.) in another process space unless that handle is duplicated to the second process using DuplicateHandle API call.

Impact to your application: If you have an application which is receiving a handle value from another process, you would need to call DuplicateHandle() on that handle value to create a copy of that object in your process space before you can access that handle object. You have to use DuplicateHandle to get access to that handle object in your own process space.

How come no one told me this before? DuplicateHandle() API was available in pre-6.0 OS also for duplicating the following handles types: mutex, semaphore, event. In CE 6.0 this API has been extended to duplicate any handle (for ex: API handle, message queue handles, process/thread handles to name a few) as long as the source handle value points to a valid handle object in the source process.

Memory Pointers
Pre-6.0 behavior: Virtual Memory (VM) was based on a single memory model where all applications VM is carved out of single 4GB address space. Kernel had 2GB VM reserved and in the lower 2GB we had space for shared memory region and 32MB slots for all the applications. Given this model, it was easy to map a memory pointer from one process slot to another process slot as any virtual address maps to a unique physical page.

6.0 behavior: Virtual Memory (VM) is based on multiple memory model where all applications still are bound by 4GB address space of which 2GB is still reserved for kernel. But the lower 2GB is now mapped separately for each process. There is a small shared heap area and dll code pages which we will ignore for now (as they have the VM mapping for all processes). Other than that rest of the 2GB in user VM is mapped differently for each process. In other words virtual address in one process space is not identical to another process. As a result you cannot pass memory pointers from one process to another and expect it to work!

Impact to your application: Well if you are passing memory pointers from one process to another and are using MapCalletPtr related functions to translate the virtual address in different process slots, you guessed it right. It won’t work in CE 6.0 anymore. Proper fix for this is to directly read/write process memory using ReadProcessMemory or WriteProcessMemory SDK functions. These APIs give you access to VM of another process (as long as you can open a handle to another process using OpenProcess SDK API call).

How come no one told me this before? This is primarily due to the memory re-architecture where 32MB VM limitation for a process was blown out of the window. You could have used SDK functions to read/write process memory in pre-6.0 also but hey who wouldn’t like shortcuts!
One small note: Don’t use SDK APIs to read/write kernel process memory. For obvious reasons user applications cannot read/write to kernel memory space (upper 2GB starting at 0x80000000). Actually applications cannot write to VM starting from 0x70000000 where the shared heap region starts. Shared heap VM region is from 0x70000000 – 0x80000000 and is user read-only and kernel read-write.

API Handling
Pre-6.0 behavior: If an application passes invalid arguments to a PSL server, kernel would simply forward the call to the PSL server and the behavior of the thread at that point is at the discretion of the PSL server. Some of the possible scenarios could be that the PSL server throws an exception or it returns failure code to the caller.

6.0 behavior: If an application passes invalid arguments to an API call, the call will be rejected at the kernel layer and what happens next depends on whether the PSL server (the server component which handles the particular API your application is trying to call) registered an API error handler or not. If the PSL server did not register an API error handler (this is new for CE 6.0), an exception will be raised back to the caller. If the caller doesn’t have any exception handler installed in that code path, then the calling thread will be terminated. On the other hand, if the PSL server registered an error handler API with the kernel, then kernel would forward the call to the error handler and in this case the behavior of the API call might be similar to what would happen in pre-6.0 depending on what PSL error handler API does.

Impact to your application: Well if you start seeing strange crashes on API calls, please check all the parameters you are passing to the API calls. One thing to note is that most Win32 APIs (Kernel is the PSL server for these APIs) will gracefully return an error on invalid arguments instead of throwing exceptions. But for some of the Win32 APIs we have explicitly made the decision of faulting the caller rather than continuing on an invalid API call so that any incorrect behavior can be caught by the application at the time of the call rather than some malicious effect downstream.

Trust check APIs
This is not a backward compatibility issue but I thought I will mention it as all the trust check APIs (CeGetCallerTrust and CeGetCurrentTrust) always return full trust in Windows CE 6.0. The only way an OEM can *lock-down* a device is by enabling certification module in the image (using SYSGEN_CERTMOD); this is a topic by itself which we will explore in depth in future articles. So as far as applications are concerned, they can still call the trust APIs but for Windows CE 6.0 understand that these APIs will always return full trust.

Interrupt APIs
Pre-6.0 behavior: Any application can call the interrupt APIs; in Windows Mobile world this is limited to trusted applications.

6.0 behavior: The interrupt APIs are limited to either kernel mode components (dlls loaded in nk.exe process space) or user mode drivers. Calls to these APIs from any other user mode component will simply return FALSE.

Impact to your application: Impact should be minimal since most components which need access to these APIs are drivers and not ISV applications. If you have to call these functions outside of kernel mode or user mode drivers, then the only option is to write a driver (user mode driver is preferred) which can call these APIs on behalf of an application.

Mapping Physical Memory
Pre-6.0 behavior: Any application (mainly drivers) could map physical memory to virtual memory using VirtualCopy or MmMapIoSpace function calls. Again in Windows Mobile, this is limited to trusted components.

6.0 behavior: These calls are limited to either code running in kernel mode or code running in user mode drivers. This might be a limitation for some of the ISV applications. The design decision behind this was to offer stability in the OS and at the same time some flexibility for those who really need this infrastructure.

Impact to your application: It is possible to expose these APIs to user applications via a kernel mode driver or extending a public component which we ship and is included in all images: oalioctl.dll. This is explained next.

OAL Ioctl Codes
Pre-6.0 behavior: User mode code could pass any valid OAL ioctl code when calling KernelIoControl or KernelLibIoControl function calls.

6.0 behavior: To provide better security, the list of OAL ioctl codes callable from user mode code is pre-defined and is limited to a small subset of all ioctl codes supported by OAL. This list of ioctl codes callable by user mode code can be extended by updating a dll which gets shipped as public code with CE 6.0. The code for this lives in public\common\oak\oalioctl. Its main purpose is to intercept all OEMIoctl calls coming from user mode before they are routed to OAL code.

Impact to your application: Your application might return failure when calling certain OAL ioctl codes (even if your application is trusted; remember there is no concept of application trust in CE 6.0). OEM of the particular device would have to explicitly allow certain ioctl codes to be callable from user applications. For example if your application was calling IOCTL_HAL_REBOOT, in CE 6.0 that call would fail unless OEM has explicitly added this ioctl to the user application callable ioctl list. By default in CE 6.0, this ioctl is not callable from user applications. The default OAL ioctls callable from user mode are listed below. As you can see this is not much as we limited the list to the same list that would have been callable by un-trusted applications in Windows Mobile.
IOCTL_HAL_GET_CACHE_INFO
IOCTL_HAL_GET_DEVICE_INFO
IOCTL_HAL_GET_DEVICEID
IOCTL_HAL_GET_UUID
IOCTL_PROCESSOR_INFORMATION

Memory Mapped Files
Pre-6.0 behavior: VM associated with memory mapped files in pre-6.0 was always allocated outside of the application memory slot. As a result the memory mapped file objects were uniquely identifiable by a virtual address accessible and visible to all applications. Another distinct feature is that when user creates a mapping object, the access permissions on the view returned could be different from what is requested if there is an existing mapping object by the same name.

6.0 behavior: VM associated with memory mapped files is always allocated in the process VM range (lower 2GB address space). As a result memory mapped file objects in one process are not accessible in another process unless the same object is opened using memory mapped file SDK functions. Also in 6.0, access permissions on the view are purely governed by the caller irrespective of whether the mapping object exists before the call or not.

Impact to your application: If your application was passing memory mapped file handles to other processes, other processes won’t be able to access the memory mapped file object using those handles. You would need to pass offsets to the memory mapped files so that the other process can open the same memory mapped file object (identified by a name) and use the given offset to read/write to the same memory mapped file object. Regarding the view permissions, it is probably easier to explain with an example. If an application calls to open a memory-backed map file with R/O (read-only) access and suppose there is already an existing map file with the same name opened with R/W (read-write) access. In this case in pre-6.0 OS, the new call to open the memory mapped file will get an R/W access to the map file whereas in CE 6.0 the application will get an R/O access to the map file. So if your application was opening a memory mapped file as a read-only but writing to it, this might have worked in pre-6.0 but in CE 6.0, this will result in a fault in your application. So check your CreateFileMapping and MapViewOfFile API calls if you are running into issues with memory mapped files.

Paging and the Windows CE Paging Pool

I’d like to explain a little more about memory management in Windows CE. I already explained a bit about paging in Windows CE when I discussed virtual memory. In short, the OS will delay committing memory as long as possible by only allocating pages on first access (known as demand paging). And when memory is running low, the OS will page data out of memory if it corresponds to a file – a DLL or EXE, or a file-backed memory-mapped file – because the OS can always page the data from the file back into memory later. (Win32 allows you to create “memory-mapped files” which do or do not correspond to files on disk – I call these file-backed and RAM-backed memory-mapped files, respectively.) Windows CE does not use a page file, which means that non file-backed data such as heap and thread stacks is never paged out to disk. So for the discussion of paging in this blog post I’m really talking only about memory that is used by executables and by file-backed memory-mapped files.

It’s relatively easy to guess how the OS decides when to page data in to memory – it doesn’t page it in until it absolutely has to, when you actually access it. But how does the OS decide when to remove pageable data from memory? Ahh, that’s the question!

The Paging Pool and How It Works

Back in the old days of CE 3.0 or so (I’m not sure) – Windows CE did not have a paging pool. What that means is that the OS had no limit on the number of pages it could use for holding executables and memory-mapped files. If you ran a lot of programs or accessed large memory-mapped files, you’d see memory usage climb correspondingly. Usage would continue to go up until the system ran out of memory. Other allocations could fail; memory would appear to be nearly gone when really there was actually a lot of potential to free up space by paging data out again. Until finally when the system hit a low memory limit, the kernel would walk through all of the pageable data, paging everything (yes, everything) out again. Then suddenly there would be a lot of free memory, and you’d take page faults to page in any data you’re still actually using.

The algorithm is simple, but it has a few bad effects. First, a bad effect of the simple paging algorithm was, obviously, that the system could encounter preventable RAM shortages. Also, it was really tough for applications or tools to measure free memory – where “free” includes currently-unused pages plus “temporary” pages that could be decommitted when necessary. Conversely, it was difficult for users to determine how much of an application’s memory usage is fixed in RAM vs. “temporary” pageable pages. Even today it is tough to answer the question “how much memory is my process using?” in simple terms without diving into explanations of paging, cross-process shared memory, etc. Another possible problem you can encounter when there’s no paging pool is that the rest of the system can take up all of the free memory, and leave you thrashing over just a few pages.

So we introduced the paging pool. The purpose of the paging pool is to serve as a limit on the amount of memory that could be consumed by pageable data. It also includes the algorithm for choosing the order in which to remove pageable data from memory. Pool behavior is under the OEM’s control – Microsoft sets default parameters for the paging pool, but OEMs have the ability to change those settings. Applications do not have the ability to set the behavior for their own executables or memory-mapped files.

Up to and including CE 5.x, the paging pool behavior was fairly simple.

· The pool only managed read-only pageable data. Executable code is read-only so it used the pool, and so did read-only file-backed memory-mapped files. Read-write memory-mapped files did not use the pool, however. The reason is that paging out read-write data can involve writing back to a file. This is more complicated to implement and requires more care to avoid file system deadlocks and other undesirable situations. So read-write memory-mapped files had no memory usage limitations and could still consume all of the available system RAM.

· The pool had one parameter, the size. OEMs could turn the pool off by setting the size to 0. Turning off the paging pool meant that the OS did not limit pageable data – behavior would follow the pattern described above from before we had a paging pool. Turning on the pool meant that the OS would reserve a fixed amount of RAM for paging. Setting the pool size too low meant that pages could be paged out too early, while they’re still in use. Setting the pool size too high meant that the OS would reserve too much RAM for paging. Pool memory would NOT be available for applications to use if the pool was underutilized. A 4MB pool took 4MB of physical RAM, no matter whether there was only 2MB of pageable data in use or 100MB. Setting the size of the pool was a tricky job, because you had to decide whether to optimize a typical steady-state situation with several applications running (and judge how much pool those applications would need), or optimize “spike” situations such as system boot where many more pages were needed for a short period of time.

· The kernel kept a round-robin FIFO ring of pool pages: the oldest page in memory – the earliest one to be paged in – was the first one paged out when something else needed to be paged in, regardless of whether the oldest page was still in use or not.

So the short roll-up of how the paging pool worked up through CE 5.x is that the paging pool allowed OEMs to set aside a fixed amount of memory to hold read-only pageable data, and it was freed in simple round-robin fashion.

In CE 6.0, the virtual memory architecture changes involved major rewriting of the Windows CE memory system, including the paging pool. The CE 6.0 paging pool behavior is still fairly simplistic, but is a little bit more flexible.

· CE 6.0 has two paging pools – the “loader” pool for executable code, and the “file” pool which is used by all file-backed memory-mapped files as well as the new CE 6.0 file cache filter, or “cache manager.” This way, OEMs can put limitations on memory usage for read-write data in addition to read-only data. And they can set separate limitations for memory usage by code vs. data.

· The two pools have several parameters. Primary of these are target and maximum sizes. The idea is that the OS always guarantees the pool will have at least its target amount of memory to use. If memory is available, the kernel allows the pool to consume memory above its target. But when that happens, it also wakes up a low-priority thread which starts paging data out again, back down to slightly below the target. That way, during busy “spikes” of memory usage, such as during system boot, the system can consume more memory for pageable data. But in the steady-state, the system will hover near its target pool memory usage. The maximum size puts a hard limit on the memory consumption – or OEMs could set the maximum to be very large to avoid placing a limit on the pool. OEMs can also get the old pre-CE6 behavior by setting the pool target and maximum to the same size.

· Due to the details of the new CE6 memory implementation, the FIFO ring of pages by age was not possible. The CE6 kernel pages out memory by walking the lists of modules and files, paging out one module/file at a time. This is no better than the FIFO ring, but still leaves us potential for implementing better use-based algorithms in the future.

There are some more details in our documentation under “Paging Pool” and “Paging Pool: Windows CE 5.0 vs. Windows Embedded CE 6.0.”

Overall, enabling the paging pool means that there is always some RAM reserved for code paging and we will be less likely to reach low-memory conditions. In general it's better to turn on the paging pool because it gives you more predictable performance, rather than occasional long delays you’d hit when cleaning up memory when you run out. But it does need to be sized based on the applications in use, which leads to my next point...

Choosing a Pool Size

In Windows CE (embedded) 5.0, the pool is turned off by default. In Windows Mobile, the pool is turned on and set to a default size chosen by Microsoft. I believe it varies between versions, but is somewhere in the neighborhood of 4-6 MB. In CE6, the loader pool has a target size of 3MB and the file pool has a target size of 1MB. Only the OEM of a device can set the pool size; applications cannot change it.

So how do you decide on the right pool size for your platform? I’m afraid it’s still a bit of a black art. :-( There aren’t many tools to help. You can turn on CeLog during boot and see how many page faults it records. You can see the page faults in Remote Kernel Tracker, but in truth that kind of view isn’t much help here. The best tool I know is that readlog.exe will print you a page fault report if you turn on the “verbose” and “summary” options. If you get multiple faults on the same pages, your pool may be too small (you may also be unloading and re-loading the same module, ejecting its pages from memory, so look for module load events in the log too). If you don’t get many repeats, your pool may be bigger than you need. In CE6 you can use IOCTL_KLIB_GET_POOL_STATE to get additional information about how many pages are currently in your pool and how many times the kernel has had to free up pool pages to get down to the target size. There aren’t any tools like “mi” that query the pool state, so you’ll have to call the IOCTL yourself. On debug builds of the OS, there is also a debug zone in the kernel you can turn on to see a lot of detail about paging and when the pool trim thread is running. But CeLog is probably a better choice to collect all of that data.

As I already mentioned, as of CE6 you can set separate “target” and “max” values for the paging pools. I don’t really like the semantics of having a “max” – it isn’t dependent on the other usage or availability in the system. If some application takes most of the available memory in the system, you’d want the pool to let go of more pages. If you have a lot of free memory, and some application is reading a lot of file data, you’d want the pool to grow to use most of the available memory. We supported the “max” as an option to limit the pool size, but I’m starting to think the best idea is to set your max to infinity, to let the pool grow up to the size of available memory. We’ll still page out down to the target in the background. I’d have liked to add more sophisticated settings like “leave at least X amount of free memory” but that’s quite difficult to implement.

You’ll want to examine your pool behavior during important “user scenarios” like boot or running a predefined set of applications. If the user runs a lot of applications at once, or a really big application, or one that reads a lot of file data, they could go through pool pages pretty quickly. There isn’t really a lot you can do about that. We don’t even have a set of recommended scenarios for you to examine. I wish we had more information and more tools for this, but I’ve described about all we have.

The approach I think most OEMs take is that they leave the pool at the default size until they discover a perf problem with too much paging (by profiling or otherwise observing) in a scenario that's important to users. Then they bump it up until the problem goes away. Not very scientific but it works, and it's not like we have any answer that's more scientific anyway.

What goes into the paging pool

This is repeating some of the information above, but in more detail – how do you know exactly what pages will use the pool and what pages won’t? Keep in mind that paging is actually independent of the paging pool. Paging can happen with or without the paging pool. If you turn off the paging pool then you turn off the limit that we set on the amount of RAM that can be taken up for paging. But pages can still be paged. If you turn ON the paging pool then we enforce some limits, that’s all. So this isn’t really a question of what pages can use the pool, it’s a question of what pages are “pageable.”

Executables from the FILES section of ROM will use the paging pool for their code and R/O data. R/W data from executables can’t be paged out, so it will not be part of the pool. Compressed executables from the MODULES section of ROM will use the pool for their code and R/O data. If the image is running from NOR or from RAM, uncompressed executables from MODULES will run directly out of the image without using any pool memory. Executables from MODULES in images on NAND will be paged using the pool. (And by the way, I’m not terribly familiar with how we manage data on NAND/IMGFS so I might be missing some details here.)

Executables that would otherwise page but are marked as “non-pageable” will be paged fully into RAM as soon as they’re loaded, and not paged out again until they’re unloaded. These pages don’t use the pool. You can also create “partially pageable” executables by telling the linker to make individual sections of the executable non-pageable. Generally code and data can’t be pageable if it’s part of an interrupt service routine (ISR) or if it’s called during suspend/resume or other power management, because paging could cause crashes and deadlocks. And code/data shouldn’t be pageable if it’s accessed by an interrupt service thread (IST) because paging would negatively impact real-time performance.

Memory-mapped files which don’t have a file underneath them (a.k.a. RAM-backed mapfiles) will not use the pool. In CE5 and earlier, R/O file-backed mapfiles will use the pool while R/W mapfiles will not. In CE6, all file-backed memory-mapped files use the file pool. And the new file cache filter (cache manager) essentially memory-maps all open files, so the cached file data uses the file pool.

To look at that information from the opposite angle, if you are running all executables directly out of your image – all are uncompressed in the MODULES section of ROM, and the image is executing out of NOR or RAM, then the loader paging pool is probably a waste. You might still want to use the file pool to limit RAM use for file caching and memory-mapped files, but in that case you might want to turn off the loader pool.

Other Paging Pool Details

Someone once asked me whether the pool size affects demand paging. It doesn’t change demand paging behavior or timing. Demand paging is about delaying committing pages as long as possible, and it applies to pages regardless of the paging pool. Pages can be demand paged without being part of the pool; they won’t be paged in until absolutely necessary, and then they’ll stay in RAM without being paged out. Pool pages will be demand paged in, and may eventually be paged out again.

Another question was whether the paging pool uses up virtual address space. Actually, no, it doesn’t. The pool pages that are currently in use are assigned to virtual addresses that are already reserved. For example, when you load a DLL, you reserve virtual address space for the DLL; and when you touch a page in the DLL, a physical page from the pool is assigned to the already-reserved virtual address in your DLL. The pool pages that are NOT in use are not assigned virtual addresses. The kernel tracks them using their physical addresses only. The pool *does* use up physical RAM. In CE5 it uses the whole size of the paging pool; on CE6 it consumes physical memory equal to the “target” size of the pool. This guarantees that you have at least a minimum number of pages to page with, to avoid heavy thrashing over just a few pages when the rest of the memory in the system is taken.

Other Paging-Related Details

A related detail that occasionally confuses people is the “Paging” flag on file system drivers. This flag doesn’t control whether the driver code itself is pageable. Rather, it controls whether the file system allows files to be loaded into memory a page at a time or all at once. On typical file systems like FATFS the “Paging” flag is turned on, allowing executables and memory-mapped files to be accessed a page at a time. On other file system drivers, such as our release directory file system (RELFSD) and our network redirector, it’s turned off by default, causing executables and memory-mapped files to be read into memory all at once. I believe the reasoning is to improve performance and minimize problems when the network connection is lost.

This flag actually derives from the original Windows CE implementation of memory-mapped files. If the file system supported a couple of APIs, ReadFileWithSeek and WriteFileWithSeek, memory-mapped files on that file system would be pageable. If the file system did not support those APIs, the memory-mapped files would be non-pageable, in which case they’d be read entirely into RAM at load time and never paged out until the memory-mapped file is unloaded. The OS required pageability for special memory-mapped files like registry hives and CEDB database volumes, so file systems that did not support the required APIs could not hold these files. (If you ask me, there is no real need to require the seek + read/write to occur in one atomic API call, so the requirement on the “WithSeek” APIs was unnecessary, but perhaps there was a good reason back in the old days.)

As I already mentioned, the new CE6 file cache also uses the paging pool. The file cache is basically just memory-mapping files to hold the file data in RAM for a while. The file cache is enabled by default on top of FATFS volumes.

CE6 OAL: What you need to know

This material is drawn from a talk that Travis Hobrla gave at MEDC 2006 (thanks Travis!) and contributed to by the whole Windows CE BSP team.

The driver changes that I have already written about the biggest CE6 differences that OEMs would care about. Much less significant are the CE6 OAL changes. The OAL, or OEM Adaptation Layer, is somewhat analogous to a HAL (Hardware Adaptation Layer). The OAL plus a set of drivers comprise the BSP (Board Support Package), the software that makes the Windows CE OS run on the OEM’s hardware.

One big OAL detail that did not change: the production quality OAL initiative, or PQOAL. PQOAL is a directory organization, a highly componentized set of libraries you can pick and choose from to compose your OAL. The following illustration shows how all of the components come together to build a complete OAL which interfaces between the kernel and the hardware.

PQOAL was introduced in CE5, and is optional. You can create CE5 and CE6 BSPs without using the PQOAL organization. But it is easier to port a PQOAL BSP from CE5 to CE6. If you have a non-PQOAL BSP that you want to port to CE6, you may choose whether to port to PQOAL while you are porting to CE6. Our BSP team recommends adopting PQOAL; their expectation is that you will find the componentized organization easier to maintain and easier to port to new OS versions over time. If you already have a BSP that uses the CE5 PQOAL organization, you’ll find the directory structure and available libraries to be quite similar in CE6.

So what did change? In CE6 we split up three components that previously linked together to make the kernel executable, nk.exe. CE5 the kernel, OAL and the Kernel Independent Transport Layer (KITL) all linked into nk.exe. In CE6 these are broken into kernel.dll, oal.exe and kitl.dll.

The primary reason for this change was updateability. In the past, if Microsoft released a kernel update, the OEM would have to take the updated kernel and link it again with their OAL to produce a new nk.exe. Now the OEM only has to distribute Microsoft’s new kernel.dll.

Another benefit of this change is that it formalizes the interface between the kernel, OAL and KITL. These components exchange tables of function pointers and variable values to communicate with each other, and cannot invoke functions other than those in the tables. In CE5 and earlier OS versions, some OEMs found that since the OAL and kernel linked together into the same executable, they could call undocumented kernel APIs. The problem with this is that Microsoft did not support the APIs being called this way. Some of them had special cases or calling rules that OEMs would not know about. Security holes and stability problems were possible. Supportability was also a risk; kernel hotfixes between releases could potentially break OEM code.

An additional benefit of the kernel / OAL / KITL separation is that each module now has its own debug zones and can be debugged independently. It is also a step on the way toward a dynamically loadable KITL – that’s not yet possible in CE6, but hopefully will be possible in the future.

So let’s dig further into the details of the separation of these binaries.

In Windows CE 5.0, the BSP directories built three different versions of the kernel exe.

OAL + Kernel = kern.exe
OAL + Kernel + KITL = kernkitl.exe
OAL + Kernel + KITL + Profiler = kernkitlprof.exe

Maintaining these three different directories could be bothersome, and some OEMs chose only to maintain kern.exe. This choice meant they could not use KITL for debugging or use the kernel profiler to measure performance bottlenecks. I cannot emphasize enough how strongly we at Microsoft believe it is worth your time to set up KITL. It is an up-front investment that will save you much debugging time as you work.

In Windows CE 6, we don’t build multiple versions of the OAL. The separation of KITL into kitl.dll gets rid of the distinction between kern.exe and kernkitl.exe. And we now recommend that OEMs always build profiling support into their OAL, getting rid of the need for separate kernkitl.exe and kernkitlprof.exe. As a result, the platform directory only needs to build one executable: oal.exe.

A side note about profiling from an observer who is definitely not impartial. :-) Profiling support is not actually required, unless you want to use the kernel profiler. But the required code to implement profiling is small, and won’t impact OS performance as long as the profiler is not actually running. I believe the benefits of the profiler justify the small amount of work needed to set it up. You could look at our sample BSPs to see how they implement profiling.

We tried to simplify the CE6 OAL modularization as much as possible, to ease porting of CE5 OALs to CE6. Each module builds a table of function pointers for use by the other modules. The kernel exports NKGLOBAL (See the CE6 %_WINCEROOT%\public\common\oak\inc\nkglobal.h). This table is too large to reproduce meaningfully here, but there are function pointers for things like debug output, interrupt hooking, synchronization objects, virtual memory operations, registry access, string operations, and other kernel interfaces. There are also shared global variables passed via the NKGLOBAL table.

But OEMs don’t need to completely revise their OAL code to call kernel functions using these pointers. We hid the existence of the kernel and KITL function pointer tables inside a library of wrapper functions that the OAL can use (nkstub.lib), so that exactly the same set of functions are available to the OAL as in the past. For example, to use a critical section, the OAL doesn’t need to call the function table entry pNKGlobal->pfnEnterCS(). Instead, nkstub.lib has a wrapper function EnterCriticalSection() for the OAL to call as it did in the past:

%_WINCEROOT%\private\winceos\coreos\nk\nkstub\nkstub.c:

void WINAPI EnterCriticalSection (LPCRITICAL_SECTION lpcs)

{

g_pNKGlobal->pfnEnterCS (lpcs);

}

This library of wrapper functions should make it easier to port OAL code to CE6. All you have to do is link nkstub.lib into the OAL in order to call the kernel APIs.

Similarly, the OAL exports an OEMGLOBAL table (%_WINCEROOT%\public\common\oak\inc\oemglobal.h) with OAL functions and global variables. Many of the functions in this table are required; the OS won’t work unless the OEM assigns the function pointers. Others are optional; the OAL can pass NULL pointers and the OS would continue to work. Required exports will be assigned to OEMGLOBAL by oemmain.lib (explained below), while you will need to assign optional exports inside your OEMInit routine. For example, the OAL exports used to implement kernel profiling are optional. So the Aspen7750R sample BSP assigns the profiling exports as follows.

%_WINCEROOT%\platform\Aspen7750R\src\oal\oallib\init.c:

void OEMInit()

{

// other code removed

g_pOemGlobal->pfnProfileTimerEnable = OEMProfileTimerEnable;

g_pOemGlobal->pfnProfileTimerDisable = OEMProfileTimerDisable;

}

To find out which OAL exports are optional and which are required, see the CE6 Platform Builder help for OEMGLOBAL.

As with NKGLOBAL, there is a wrapper library oemstub.lib that wraps the OAL exports. Kitl.dll can use oemstub.lib to call OAL functions.

%_WINCEROOT%\private\winceos\coreos\nk\oemstub\oemstub.c:

BOOL OEMGetRealTime(LPSYSTEMTIME pst)

{

return g_pOEMGlobal->pfnGetRealTime (pst);

}

Besides nkstub.lib and oemstub.lib, there are a few additional libraries in CE6 for OEM use:

Kitlcore.lib: This is a replacement for kitl.lib, implementing the KITL protocol and the initialization of the kitl.dll interface with the NKGLOBAL and OEMGLOBAL structures. You should link this library into kitl.dll instead of kitl.lib.
Nkldr.lib: This library implements KernelInitialize / KernelStart, to link into oal.exe.
Oemmain.lib: This library implements the OEMInitGlobals function which exchanges function pointers with the kernel. Link this into oal.exe.

The resulting work to port a CE5 BSP to CE6 would be to revise the platform directory structure slightly, and link with different libraries. The old CE5 directory structure:

CE5 Directory	Builds
%_WINCEROOT%\platform\BSP_NAME\src\kernel\oal	oal.lib
%_WINCEROOT%\platform\BSP_NAME\src\kernel\kern	kern.exe
%_WINCEROOT%\platform\BSP_NAME\src\kernel\kernkitl	kernkitl.exe
%_WINCEROOT%\platform\BSP_NAME\src\kernel\kernkitlprof	kernkitlprof.exe

Now becomes:

CE6 Directory	Builds
%_WINCEROOT%\platform\BSP_NAME\src\oal\oallib	oal.lib
%_WINCEROOT%\platform\BSP_NAME\src\oal\oalexe	oal.exe
%_WINCEROOT%\platform\BSP_NAME\src\kitl	kitl.dll

Oal.lib is exactly the same as before, so the remaining porting steps between CE5 and CE6 boil down to building oal.exe and kitl.dll.

The oalexe directory is the same as the old \kernel\kern\*, with nkldr.lib, oemmain.lib and nkstub.lib added to the sources file.
The kitl directory comes from taking the KITL specific elements out of the old \kernel\kernkitl. Convert it from an EXE to a DLL, remove the non KITL related binaries, and add nkstub.lib and oemstub.lib. You will also need to add some global variables that used to be common with the OAL.

There are other small changes you’ll run into, related to differences in the sets of PQOAL libraries available between CE5 and CE6, but they’re relatively minor and not related to the OAL / kernel / KITL separation.

As a side note, the separation of OAL and KITL is actually optional, so your other choice could be:

CE6 Directory	Builds
%_WINCEROOT%\platform\BSP_NAME\src\kernel\kern	oal.exe
%_WINCEROOT%\platform\BSP_NAME\src\kernel\kernkitl	oalkitl.exe

As I mentioned previously: Anecdotal experience from our beta partners said that BSP porting to CE6 took them mostly between one day and one month. Travis Hobrla, a member of our BSP team, developed an awesome demo for MEDC 2006 (Mobile & Embedded DevCon) where he ported the CE5 CEPC OAL to CE6 in about 15 minutes. If you are an OEM, your experiences may vary, but we don’t anticipate it being too painful. It was our goal to make it an easy port.

Note! Travis’ demo is also being developed into an online “eHowTo” you could follow as a porting exercise. I don’t see it yet on http://msdn.microsoft.com/embedded/getstart/basics/tutorialsce/default.aspx but that’s where I’d look for it.

The main area where you may have to make big changes is if your CE5 OAL called kernel APIs that we did not intend to expose to the OAL. In CE6 you would have to move that functionality out of the OAL, into a kernel-mode driver. You might need to design the OAL to expose IOCTLs to work together with the kernel-mode driver to implement the old functionality.

One final detail you should be aware of is that for security reasons, the OS only exposes a few OAL IOCTLs to user-mode code. By default, applications and user-mode drivers can’t call OAL IOCTLs that Microsoft didn’t specifically expose to user mode. For more detail, see the “OAL IOCTL Codes” .

CE6 Drivers: What you need to know

One of the biggest concerns people have about the new CE6 release is backward compatibility. Every release we try very hard to make existing applications, drivers and OALs as compatible as possible. With CE6 we expect very high compatibility for applications and even OAL code, but unfortunately I can’t say the same about drivers. Many, in fact most, drivers will need modifications in order to run on CE6. While binary compatibility (being able to run the exact same driver without a rebuild) is not likely, we do expect it to be easy to port almost all drivers. That was our goal once we realized many drivers would have to change.

The primary reasons that drivers will need change are:

Deprecated APIs
Memory passing
Asynchronous buffer access
User interface handling

The biggest difference in CE6 is how drivers access embedded pointers and other data, as I described in detail in my memory marshalling post. There are two main things you need to do to fix memory accesses. First, look through your existing code for calls to mapping APIs like MapCallerPtr or MapPtrToProcess, and convert them to calls to marshalling APIs like CeOpenCallerBuffer / CeCloseCallerBuffer. Second, look for calls to SetKMode and SetProcPermissions. They most likely correspond to asynchronous memory access, for which you’ll now need CeAllocAsynchronousBuffer / CeFreeAsynchronousBuffer.

That will take care of most of the porting work. The other thing to look for is UI functionality. If your driver has any UI, you won’t be able to run it in the kernel. And most CE6 drivers will run in the kernel. Even if your driver will run in user mode, we recommend using the kernel UI handling to maximize portability between user and kernel mode. In CE6, drivers that require UI should break that UI functionality out into a companion user-mode DLL. Move all the resources, shell calls, etc. into the new DLL. Then use the new CeCallUserProc API to call into the user-mode helper.

BOOL CeCallUserProc(
LPCWSTR pszDllName,
LPCWSTR pszFuncName,
LPVOID lpInBuffer, DWORD nInBufferSize,
LPVOID lpOutBuffer, DWORD nOutBufferSize,
LPDWORD lpBytesReturned);

This is something like a combination of LoadLibrary / GetProcAddress with an IOCTL call. When a kernel-mode driver calls this API, we’ll load the DLL inside an instance of udevice.exe. When a user-mode driver calls this API, the DLL will load in-proc inside the same instance of udevice.exe that the user-mode driver is running in. So drivers that use this API can run in kernel or user mode without change.

The one big difference between CeCallUserProc and an IOCTL is that CeCallUserProc does NOT allow embedded pointers. All arguments must be stored inside the single “in” buffer passed to CeCallUserProc, and return data must be stored in the single “out” buffer. The problem is, if kernel code calls user code, user code cannot use CeOpenCallerBuffer or any other method to get the contents of kernel memory. We never allow user-mode code to access kernel-mode memory.

And, while you are modifying your drivers to use the new marshalling helpers and CeCallUserProc, you might as well check to see if it needs to do any secure-copy or exception handling it never did before… As I outlined in the marshalling post. Remember, now that drivers run in the kernel, you must be more careful than ever to preserve the security and stability of the system.

User-Mode Drivers

As we’ve already mentioned, CE6 now supports running drivers inside a user-mode driver host, udevice.exe. User-mode drivers work pretty much the same as kernel-mode drivers: an application calls ActivateDevice(Ex) and DeactivateDevice on the driver. The device manager will check registry settings to see if the driver is supposed to be loaded in user mode. You can also use registry settings to specify an instance “ID” of udevice.exe to use, if you want multiple user-mode drivers to load into the same process.

For example, there is one user-mode driver group with ID 3. Multiple drivers load into this group. If you look inside the CE6 %_WINCEROOT%\public\common\oak\files\common.reg (an unprocessed version of what you get in your release directory), you’ll see how this group is created and a few drivers that belong to it.

[HKEY_LOCAL_MACHINE\Drivers\ProcGroup_0003]

    "ProcName"="udevice.exe"

    "ProcVolPrefix"="$udevice"

; Flags==0x10 is DEVFLAGS_LOAD_AS_USERPROC

[HKEY_LOCAL_MACHINE\Drivers\BuiltIn\Ethman]

   "Flags"=dword:12

   "UserProcGroup"=dword:3

[HKEY_LOCAL_MACHINE\Drivers\Console]

    "Flags"=dword:10

    "UserProcGroup"=dword:3

[HKEY_LOCAL_MACHINE\Drivers\BuiltIn\SIP]

    "Flags"=dword:10

    "UserProcGroup"=dword:3

If you don’t specify a process group, your driver will be launched inside a unique instance of udevice.exe.

The device manager creates a reflector service object to help the user-mode driver do its job. The reflector service launches udevice.exe, mounts the specified volume and registers the file system volume APIs for communicating with the driver. Communication between applications and the user mode driver pass through the reflector, which helps with buffer marshalling. The reflector also assists the user-mode driver with operations that user-mode code is not normally allowed to make, like mapping physical memory; more on this later.

It is our goal that drivers should be as close to 100% portable between kernel and user mode as possible. However, kernel code will always be more privileged than user code will be. Taking advantage of the increased kernel capabilities will make your kernel-mode driver impossible to port to user mode.

What are some of the incompatibilities you need to know about?

As I explained in the marshalling post, user-mode drivers cannot write back pointer parameters asynchronously. I take it a step further and say that user-mode drivers cannot operate on caller memory asynchronously. That you’re better off keeping such drivers in kernel mode for now, or restructuring their communication with the caller so that nothing is asynchronous.

Another detail you should know about is that user-mode drivers cannot receive embedded pointers from the kernel. This is exactly the same as saying that CeCallUserProc cannot support embedded pointers. If you’re writing a driver that talks to kernel-mode drivers, and those kernel-mode drivers pass you embedded pointers, then your driver may have no choice but to run in kernel mode. If you can reorganize the communication between drivers, you may be able to “flatten” the structure so that, like CeCallUserProc, all the data is stored directly in the IN and OUT buffers instead of referenced via embedded pointers.

There are some APIs which used to require trust that now are (mostly) blocked against use in user mode. One notable example is VirtualCopy, and its wrapper function MmMapIoSpace. Most user-mode code cannot call VirtualCopy. User-mode drivers can, with a little help from the reflector. The reflector can call VirtualCopy on behalf of a user-mode driver, but it will not do so unless it knows the driver is allowed to use the addresses it’s copying. Under each driver setup entry in the registry, there are IOBase and IOLen keys that we use to mark physical memory. When your driver calls VirtualCopy, the reflector will check these values to make sure your driver is allowed to access the physical address. For example, the serial driver might specify a physical address like this:

[HKEY_LOCAL_MACHINE\Drivers\BuiltIn\Serial]

"IoBase"=dword:02F8

"IoLen"=dword:8

If you have just one buffer to copy, use DWORD values. Use multi-strings to specify multiple base addresses and sizes.

[HKEY_LOCAL_MACHINE\Drivers\BuiltIn\Serial]

"IoBase"=multi_sz:"2f8","3f6"

"IoLen"=multi_sz:"8","2"

Since only privileged applications can write to this part of the registry, the registry keys should protect against unprivileged code trying to gain access to these addresses.

Notable APIs that user-mode code cannot call:

VM APIs: VirtualCopy[Ex], LockPages[Ex], CreateStaticMapping
Interrupt APIs: InterruptInitialize, InterruptDone, LoadIntChainHandler
You cannot install IISR directly, though you can install GIISR via the reflector. (GIISR exposes well known interfaces and the reflector can do the required checks on these calls.)
OAL IOCTLs that are not explicitly permitted by the kernel

Call-backs from a user-mode driver to any process are also prohibited. The most important repercussion of this is, if you move a bus driver to user mode, you’d have to move the client drivers to user mode too. You can’t have the client driver in the kernel since you cannot call back to the bus driver. You may want to put the bus driver and all of its client drivers in the same udevice.exe instance, so that the callbacks are all within a single process.

OEMs can choose to expose additional OAL IOCTLs and APIs to user mode by building a kernel-mode driver that provides these services – by essentially writing their own version of a reflector. There is a kernel-mode driver, the oalioctl driver, that OEMs can extend to this end. Anyone who’s not an OEM would have to write their own kernel-mode driver to do it. But be warned! Using oalioctl or writing new kernel-mode drivers to expose this functionality is essentially opening up a security gap that we (Microsoft) sought to close. Personally I advise against it.

Writing CE5 drivers to be compatible with CE6

I would like to mention that Steve Maillet, one of our eMVPs, had a good suggestion: you can set up abstractions which combine the CE5 and CE6 driver needs, so that all you have to do is reimplement the abstraction layer in order to port from CE5 to CE6. He even presented his abstraction layer at this year’s MEDC (Mobile & Embedded DevCon, 2006). I don’t know if he’s interested in giving it out widely, but you could contact him at EmbeddedFusion (http://www.embeddedfusion.com/), or steal his idea and implement your own layer.

Juggs Ravalia did a Channel 9 interview on the topic of drivers in CE6 – if you don’t like my explanation, maybe you’ll like his better. He knows much more about our user mode driver framework than I do. http://channel9.msdn.com/Showpost.aspx?postid=233119

Tuesday, February 26, 2008

BSQUARE Creates Embedded CE BSP for ARM Cortex-A8, TI OMAP35x EVM

Thursday, February 14, 2008

Application Compatibility in Windows CE 6.0

Paging and the Windows CE Paging Pool

CE6 OAL: What you need to know

CE6 Drivers: What you need to know

Search Here

Blog Archive

Important Sites

Followers

Labels

Live Traffic Feed

Traffic Seed of Nov08