0. Contents

Summary
Licensing
Versions
Package folders
Building the portable edition
Building the x86 editions
NASM and SSE data alignment
Getting MASM 6.14, 6.15, 7.0, 7.10, 8.0, 9.0, 10–12
Application Programming Interface (API)
Download Μεταχρον!

Μεταχρον
A u d i o Т i m e - S c a l i n g L i b r a r y

1. Summary

Μεταχρον is a small and fast audio DSP library for time-scale manipulation of 16-bit integer or 32-bit floating point stereo audio data streams. It employs a rigid phase-locked vocoder with dedicated transient detection and processing, and can work in real-time or non-real-time. Four editions are included – a portable edition and three x86 editions. The portable edition can be built with any ANSI C compiler and is OS- and architecture-independent. The three x86 editions are written in Assembly using the FPU, 3DNow!, and SSE instruction sets, respectively, with automatic selection between them depending on the CPU capabilities. They can be compiled with MASM, JWASM or NASM, producing libraries of object files in 8 formats.

2. Licensing

The Μεταχρον library and this page are licensed under the OSI-approved Simple Public Licence v2 – a plain language implementation of the GNU General Public Licence v2 written by Professor of Law Robert W. Gomułkiewicz. It was chosen because it’s much shorter and easier to understand for non-lawyers. The demonstration player is licensed under the No Problem Bugroff licence... ☺

3. Versions

Big thanks to Jean Laroche and Mark B. Dolson for their 1999 IEEE article describing the rigid phase-locked vocoder, Svillenn St. Stoyanoff for writing its C++ prototype implementation in 2005, and Stanisław B. Cyriloff for translating the latter into x86 assembly language and writing the demo player for version 1.0! Development past 1.0 was taken over by Lutshayzar Il. Gueorguieff.

First version 1.0 (2005) supported only 16-bit integer data stereo audio streams as input and output.
In version 1.00f (2006), variants for 32-bit floating point data streams and NASM support were added.
In version 1.01 (2007), the input and output buffer sizes were made variable, and 16-byte stack alignment propagation at CALL instructions was done (required by some operating systems like Mac OS X).
In version 1.02 (2012, June), a portable edition written in C was added, supporting all 32/64-bit platforms with an FPU and an ANSI C compiler.
In version 1.02j (2012, July), JWASM support was added.

4. Package folders

“METAXPON”: the portable edition, a command-line .WAV-file stretch utility, and the Μεταχρον C[++] header.
“METAXPON/x86”: the x86 (FPU / 3DNow! / SSE) editions, headers, and build scripts.

5. Building the portable edition

Any ANSI C compiler can be used. Depending on the desired variant, the constants PRO (sharper transients but more spectral noise and slower) and / or FLOAT_DATA may have to be defined on the compiler’s command line. This edition was successfully tested on the following platforms:

Native C compilers (26): ACK C, Borland, CC386, Clang/LLVM, CodeWarrior, DEC/Compaq/HP C, Digital Mars, EKOPath, GCC, High C, Intel C, LCC-Win32, MIPSpro, NDP C, Open Watcom, Open64, Pelles C, Portable C, Portland Group C, Salford C, Sun C, Tiny C, USL C, VectorC, Visual Age, Visual C.
Operating systems (26): AIX, AROS, BSD/OS, DOS, DragonFly BSD, FreeBSD, Haiku, HP-UX, Hurd, IRIX, Linux, Mac OS X, MINIX, MirOS BSD, NetBSD, OpenBSD, OpenIndiana, OpenVMS, OS/2, QNX, Solaris, Syllable Desktop, Tru64 UNIX, ULTRIX, UnixWare, Windows.
Architectures (13; real hardware, no emulators used!): Alpha, ARM, IA-64, MIPS-64, PA-RISC, PA-RISC/64, PowerPC, PowerPC-64, SPARC-64, VAX, x86, x86-64, z/Architecture.

The built-in Fast Fourier Transform (FFT) routine isn’t so fast, taking as much as 60% of the total CPU time. You can dramatically improve speed by replacing it with a faster one. There are several options, listed below:

For x86, x86-64, IA-64, and XScale (ARMv5TE) CPUs, Intel’s Integrated Performance Primitives (IPP, development led by Vladimir Vl. Dudnik) have the fastest FFT (the FFT of Intel’s MKL isn’t faster). When building Μεταχρον, define IPP on the compiler’s command line, and link against a DSP library (“ipps*.lib” on Windows, “−lipps” or “−lipps_l −lippcore_l” on Mac OS X or Linux).
For SIMD-capable CPUs, FFTW by Matteo Frigo and Steven G. Johnson is the fastest free FFT library. Before building it, add “−−enable-float”, and enable the SSE / SSE2 / AVX / Altivec / NEON SIMD optimization(s) supported by the target CPU on the “configure” command line. When building Μεταχρον, define FFTW on the compiler’s command line, and link against the FFTW library. Note that there is a noticeable delay in the MXinit function (see the API description below) when FFTW is used.
For x86 and x86-64 CPUs on Linux and Windows, the AMD Core Math Library (ACML) is almost as fast as FFTW. When building Μεταχρον, define ACML on the compiler’s command line, and link against an ACML library (“libacml_dll.lib” on Windows, “−lacml −lifcoremt_pic −lpthread −lrt” on Linux).
For Mac OS 9.1 and Mac OS X, Apple’s vDSP (a part of the Accelerate Framework since Mac OS X 10.3) is almost as fast as FFTW. When building Μεταχρον, define VDSP on the compiler’s command line, adding “−framework Accelerate”.
For x86 CPUs with SSE and/or AVX as well as ARM CPUs with NEON, SFFT by Anthony M. Blake is almost as fast as FFTW. Before building it, edit “src/target_*.conf”, adding a new line with a “b” instead next to each line with an “f” to enable backward transforms, and then add “−−enable-single” and enable the SIMD version supported by the target CPU on the “configure” command line. When building Μεταχρον, define SFFT on the compiler’s command line, and link against the SFFT library.
For Solaris, Sun mediaLib (development done in Boris Art. Babayan’s MCST) is almost as fast as FFTW. When building Μεταχρον, define MLIB on the compiler’s command line, and link against a mediaLib of your choice (“−lmlib” or “−lmlib_mt −lmlib”).
For Sun Studio, the Sun Performance Library (SPL, development led by Paul J. Hinker) is almost as fast as FFTW (but slower than mediaLib on Solaris / SPARC albeit faster on x86). When building Μεταχρον, define SPL on the Sun C compiler’s command line, adding “−dalign” and “−xlic_lib=sunperf”.
For most non-SIMD-enabled architectures (e.g., as tested, Alpha, ARM11 (without NEON), MIPS, PA-RISC, PA-RISC/64, PowerPC 600, PowerPC (POWER4+), x86 (P5), z/Architecture, but not IA-64, SPARC, or VAX!), the djbfft library by Daniel J. Bernstein (1999) is still faster than FFTW. You may change the “−O1” switch to “−O3” in the “conf-cc” file before building djbfft. When building Μεταχρον, define DJBFFT on the compiler’s command line, and link against the djbfft library.

With options 1–5 above and the Intel’s C compiler which is freely available for non-commercial development (includes IPP; used to build FFTW or SFFT too), the portable Μεταχρον is significantly faster than its own x86 assembly language counterpart on SSE2-enabled x86 CPUs! The so compiled (with IPP) .WAV-file stretch utility and demo player are available for download at the bottom of this page.

6. Building the x86 editions

The following assemblers can be used:

MASM (OMF and COFF formats only): Run “makelibs.bat” in Windows or DOS with HXRT. WLIB from Open Watcom is required to make the libraries. A MASM version with SSE support (i.e., 6.14 or later) is required. All such versions (6.14, 6.15, 7.0, 7.10, 8.0, 9.0, 10, and 11) were successfully tested.
JWASM (OMF, COFF, and ELF formats only): With a copy of JWASM named “ML.EXE” in your PATH, run “makelibs.bat” in Windows or DOS, or “makelibs.cmd” in OS/2 (as with MASM, WLIB is required). In UNIX-like environments, run “makelibs.jwa”. JWASM version 2.07 was successfully tested.
NASM (all formats but OMF): Run “makelibs.sh” in UNIX-like environments or “makelibs.cmd” in OS/2 with Open Watcom. NASM version 0.98.40 build 11 or later is required. Versions 0.98.40 build 11, 2.00, 2.07, and 2.10 were successfully tested.

After you’ve build the x86 libraries, you’ll find the following directories:

normal: “Fast” integer variant
pro: “Pro” integer variant
float: “Fast” floating point variant
floatpro: “Pro” floating point variant
both: Combined “Fast” and “Pro” integer variant
floatbot: Combined “Fast” and “Pro” floating point variants

In each directory, you’ll find from 2 to 7 (depending on the assembler used) libraries with different formats according to their file name, as follows:

Format	Operating system or compiler
a.out	Linux (older)/EMX (OS/2)
a.out-b	NetBSD/FreeBSD/OpenBSD (older)
COFF	DJGPP/UNIX System V (older)
ELF	Linux/UNIX System V/Solaris/BSD/NetBSD/FreeBSD/OpenBSD/QNX
Mach-O	NeXTstep/OpenStep/Rhapsody/Darwin/Mac OS X
OMF	MS-DOS/PC-DOS/DR-DOS/FreeDOS (32-bit)/OS/2
PE COFF	Microsoft Windows (32-bit)
RDF	Relocatable Dynamic Object File Format v2

Most compilers, linkers, and operating systems use one of the above formats. Library functions can be called from procedural programming languages via the “C” (“cdecl”) calling convention. You can translate the supplied C header file into the language used in your projects. Please refer to the corresponding language specification.

7. NASM and SSE data alignment

Some linkers don’t align each new input section of the NASM-produced object files on a 16-byte boundary, which results in misaligned SSE data and an “Invalid Instruction” exception on attempt to run Μεταχρον. If such exceptions occur, you have to get SSE data aligned at 16-byte boundaries. Read the NASM manual and your linker manual to find a way of doing it.

The following operating systems were tested, and the NASM data was ensured to have proper alignment for each of them: DOS, Windows, Linux, Mac OS X, FreeBSD, Solaris, SCO OpenServer, QNX, and OS/2 (with Open Watcom).

8. Getting MASM

All MASM versions supporting SSE could freely be downloaded from Microsoft. Most still can, and some of the rest can be retrieved from the Internet Archive. The following subsections show how to get each of these versions.

MASM 6.14

Download Windows Millennium Driver Development Kit (27 MB), and install it with the default component group selection. Add the “bin\win_me\bin” subdirectory of the DDK install directory (normally “C:\NTDDK”) to your PATH, or copy the files “ml.exe” and “ml.err” from it to a directory that is already in your PATH.

MASM 6.15

Download Visual C++ 6.0 Processor Pack (1.1 MB).
Install the package. It will complain saying: “This version of the Processor Pack will only install on Visual C++ 6.0 Service Pack 4”. Now, don’t press “OK” yet!
Go to the “IXP000.TMP” subdirectory in your %TMP% or %TEMP% directory, and copy or move the files “ml.err” and “ml.exe” to their permanent directory which must be in your PATH.
Now you can press “OK” on the above dialogue box, and the installer will delete the temporary directories and files.

MASM 7.0

Download Windows XP Driver Development Kit (140 MB). You can perform the standard installation procedure. Alternatively, you can extract just the two files “X86dBINS_FILE_15” and “X86dBINS_FILE_16” from the cabinet file “X86DBINS.CAB” in the “COMMON” subdirectory, and rename tem to “ml.err” and “ml.exe”, respectively. Then move them to their permanent directory which must be in your PATH.

MASM 7.10

Download Windows Server 2003 SP1 DDK (231 MB). You can perform the standard installation procedure of the whole package. Alternatively, you can extract just the file “X86dBINS_FILE_19” from the cabinet file “X86dBINS.cab” in the “common” subdirectory. Rename it to “ml.exe”, and move it to its permanent directory which must be in your PATH.

MASM 8.0

Download Microsoft Macro Assembler 8.0 (MASM) Package (x86, 311 KB).
Download Microsoft Visual C++ 2005 Redistributable Package (x86, 2.6 MB).
Install the second package. It will install “MSVCR80.DLL”, which the assembler needs in the Windows SYSTEM (SYSTEM32) directory.
Install the first package. It will complain saying: “Microsoft Visual C++ Express Edition 2005 required”. Now, don’t press “OK” yet!
Go to the “IXP000.TMP\IXP000.TMP” subdirectory in your %TMP% or %TEMP% directory. Find a cabinet (.CAB) file there. Extract its contents with EXTRACT or the Windows GUI.
Copy or move the so extracted file “FL_ML_EX.364” to its permanent directory (must be in your PATH), and rename it to “ML.EXE”. This is the assembler executable.
Now you can press “OK” on the above dialogue box, and the installer will delete the temporary directories and files.

Alternatively, you can download and install Visual C++ 2005 Express Edition (463 MB), and then perform the standard installation procedure of MASM 8.0, downloaded as shown in step 1 above.

MASM 9.0

If you prefer MASM version 9.0 (note that it requires Windows NT 5.0 (2000) or later!), it’s contained in the Windows Driver Kit version 7.1.0 (620 MB). You can perform the standard installation procedure of the whole package. Alternatively, you can extract just the two files “_ml.exe_00081” and “_msvcr90.dll_00086” from the cabinet file “buildtools_x86fre_cab001.cab” in the “WDK” subdirectory, and rename them to “ml.exe” and “msvcr90.dll”, respectively. Then, move the first one (the assembler executable) to its permanent directory (must be in your PATH) and the second one to the Windows SYSTEM32 directory.

Another free Microsoft product containing MASM 9.0 (albeit an older build than the above one) is Visual C++ 2008 Express Edition (749 MB, licence). You can extract it from there in the same way as described below for MASM 10. The only difference is that you will need the Visual C++ 2008 Redistributable Package (x86, 4 MB) for the file “msvcr90.dll” instead.

MASM 10

If you prefer MASM version 10 (note that it requires Windows NT 5.1 (XP) or later!), it’s contained in the Visual C++ 2010 Express Edition (694 MB). You can perform the standard installation procedure of the whole package. Alternatively, you can follow these steps:

Run “Ixpvc.exe” in the “VCExpress” subdirectory. It will complain saying: “To install this product, please run Setup.exe.” No, don’t press “OK” yet!
Find a directory whose name consists of 24 random hexadecimal digits in the root directory of your “C:” drive, and a cabinet file “vs_setup.cab” in it.
Extract the file “FL_ml_exe_19621_x86_ln.3643236F_FC70_11D3_A536_0090278A1BB8” from the cabinet file.
Rename the so extracted file to “ml.exe”, and move it to its permanent directory which must be in your PATH. This is the assembler executable.
Now you can press “OK” on the above dialogue box, and the installer will delete the temporary directories and files.
Download and install the Visual C++ 2010 Redistributable Package (x86, 4.8 MB). It contains “msvcr100.dll”, necessary for the assembler to run.

MASM 11

If you prefer MASM version 11 (note that it requires Windows NT 6.0 (Vista / Server 2008) or later!), it’s contained in the Visual C++ 2012 Express Edition (403 MB). You can perform the standard installation procedure of the whole package, if you have Windows NT 6.2 (8) or later. Alternatively, you can extract just two files:

“WinC_compiler_x86_ml.exe_F” from the cabinet file “vc_CompilerCore.cab” in the “packages/vc_compilercore” subdirectory.
“F_CENTRAL_msvcr110_x86” from the cabinet file “cab1.cab” in the “packages/vcRuntimeMinimum_x86” subdirectory.

Rename them to “ml.exe” and “msvcr110.dll”, respectively. Then, move the first one (the assembler executable) to its permanent directory (must be in your PATH) and the second one to the Windows SYSTEM32 directory.

MASM 12

If you prefer MASM version 12 (note that it requires Windows NT 6.0 (Vista / Server 2008) or later!), it’s contained in the Visual C++ 2013 Express Edition (790 MB). You can perform the standard installation procedure of the whole package, if you have Windows NT 6.1.7601 (7 SP1) or later. Alternatively, you can extract just two files:

“WinC_compiler_x86_ml.exe_F” from the cabinet file “vc_CompilerCore86.cab” in the “packages/vc_compilerCore86” subdirectory.
“F_CENTRAL_msvcr120_x86” from the cabinet file “cab1.cab” in the “packages/vcRuntimeMinimum_x86” subdirectory.

Rename them to “ml.exe” and “msvcr120.dll”, respectively. Then, move the first one (the assembler executable) to its permanent directory (must be in your PATH) and the second one to the Windows SYSTEM32 directory.

9. Application Programming Interface (API)

The Μεταχρον API is quite simple and almost solely built around a relatively small data structure – MXdata. Note that the current Μεταχρον version works only with stereo audio data (16-bit integer or 32-bit floating point), so if you need to process mono data, you have to copy it into both stereo channels.

9.1. Μεταχρον library public functions

MXinit – initialises the MXdata structure and other internal data; has to be called only once after your programme has been loaded, and before any call to MXprocess (MXstart must be called before MXprocess too). You don’t need to call MXinit in subsequent calls to MXstart to begin processing another data stream using the same MXdata structure, after a previous one has been finished or interrupted.

C prototype: int __cdecl MXinit(struct MXdata *);

Return value: 0 (or equals the return value of the GetSIMD function for the x86 editions) or −1 if the function fails due to a library version mismatch.
MXstart – initialises internal buffers and some MXdata fields; must be called each time before you begin processing a new data stream.

C prototype: int __cdecl MXstart(struct MXdata *);

Return value: 0 if the function succeeds or −1 if it fails due to an incorrect buffer size.
MXprocess – fills the corresponding output buffer with data processed from the input buffers.

C prototype: int __cdecl MXprocess(struct MXdata *);

Return value: −1 when the end of the data stream has not been reached or the non-negative number of samples written to the output buffer otherwise.
GetSIMD – a helper function that returns information about the SIMD capabilities of the CPU (available only for the x86 editions and not declared in the supplied header file).

C prototype: int __cdecl GetSIMD(void);

Return value: Nonzero if the CPU has floating point SIMD capabilities or 0 otherwise. Bit 0 is set if the CPU supports the 3DNow! instruction set. Bit 1 is set if the CPU supports the SSE instruction set. Bit 2 is set if the CPU family is higher than 6 (e.g. Pentium 4, Pentium D, AMD K8+, etc.). This bit is useful to distinguish between Palomino and later AMD K7 CPUs where 3DNow! is faster than SSE and K8+ where it isn’t.

The combined libraries have another 3 functions with the same names but with a “Pro” suffix added – MXinitPro, MXstartPro, and MXprocessPro. (To avoid confusion, they’re not declared in the header file; to use them, you have to add them.) In these libraries, standard function names are for the “Fast” variant, and the “Pro” functions are for the “Pro” variant. All non-combined libraries, either “Fast” or “Pro”, use the same standard function names, because they are separated in different library files and not combined.

Note that because the “Fast” and “Pro” variants have different input/output buffer requirements checked and parameters set by the MXstart[Pro] function, these two variants in the combined libraries must be used with two independent MXdata structures. The buffers could be shared between them though, as long as their length satisfies the requirements of both variants.

9.2. Steps to follow to time-scale an audio data stream

Make sure you link your application programme against one of the .lib (.a) libraries or load a .DLL (if you build one) at run-time.
Choose the output buffer size in kilobytes (OutBufSizeK). It can be 8, 16 or 32 KB for all the libraries, 4 KB for all “Pro” and integer variants, and 2 KB for the integer “Pro” variant only. Larger values make real-time playback in some operating systems smoother, but smaller values reduce the latency (i.e. the input-to-output delay). The size of an input buffer half will be 4 times larger than the output buffer size.
Allocate the memory:

2 * 4 * 1024 * OutBufSizeK + 32 bytes for the two input buffer halves.

n * 1024 * OutBufSizeK + 32 bytes for n output buffers (n >= 1). It’s easier to use only one output buffer when the output data is not to be played back in real-time. When you’re playing the output data in real-time, you have to use two or more output buffers.

INT_BUF_SIZE bytes needed for internal data processing.
Prepare the MXdata structure and fill the following fields:

Version – you must put the value of the METAXPON_VERSION constant here.

OutBufSizeK – put the above OutBufSizeK value that you’ve used to allocate input and output buffers here.

InBuf – put the address of the input buffer here.

OutBuf – put the address of the first output buffer here. The second, if present, will follow immediately after the first one at address OutBuf + 1024 * OutBufSizeK. The third, if present – immediately after the second at address OutBuf + 2 * 1024 * OutBufSizeK, and so on.

Memory – put the address of the memory needed for internal processing here.
Call MXinit with a pointer to the MXdata structure. Examine the returned value – if it’s a negative number, MXinit has failed. This may happen if you didn’t initialise the Version field of the MXdata structure or there is a library version mismatch. MXinit modifies many fields of the MXdata structure, including InBuf, OutBuf, and Memory. It aligns all addresses to 16-byte boundaries (32-byte boundaries for the portable edition). You should never use the addresses obtained when you allocated input and output buffers, except when you’re freeing the allocated memory. Use the addresses at fields InBuf and OutBuf instead!
(Optional, x86 only) You can override the value written in the MathUnit field of MXdata structure at any time after calling MXinit and force subsequent calls to MXprocess to use a different math unit (x87 FPU, 3DNow! or SSE). But be careful! Putting a wrong number here (for example, forcing a Pentium to use 3DNow!, which it doesn’t support) will cause an attempt for execution of an invalid instruction and will raise an exception or a deadlock. Three possible values are defined: 0 – use x87 FPU, 1 – use 3DNow!, and 2 – use SSE. In the current x86 version, all other values will force MXprocess to use not SIMD but x87 FPU instead. Note that this value isn’t the same as the one returned by MXinit and GetSIMD. Normally you don’t need to care about this field, because MXinit automatically initialises it with the optimal value.
(Optional) You can override the value written in the Threshold field of the MXdata structure at any time after calling MXinit. MXinit initialises this field to 2.0 (0x20000 – fixed point format 16.16). This is a single parameter controlling the attacks detection level of the attack detection algorithm. Lowering this value will result in finding more attacks (transients) but lowering it too much will result in some false transients detection and may distort the sound. Using higher values may eliminate false transients detection but the sound may lose its sharp attacks. This value must be greater than 1.0 (0x10000). Values closer to or less than 1.0 will cause the detection of many false attacks and will distort the sound! Values much higher than 2.0 (for example 4.0 or higher) may cause no transients to be found at all!
Fill the whole input buffer with audio data. Clear the E0F field. If the data is less than the size of a half or a whole input buffer, set bit 0 of E0F to indicate that the end of input data is in the first input buffer half, or set bit 1 of E0F to indicate that the end of input data is in the second input buffer half. Also, set the value of Last to the number of bytes (number_of_samples * 4 or 8) written to the input buffer half, in which the input data stream ends. Last also has a meaning of the offset from the beginning of the current input buffer half after the end of input data. You should clear all the other bits of E0F, and you should not modify them after you call MXprocess until the end of processing of the current data stream, because some of them are used as internal flags.
Put the desired tempo of the output data stream to the Tempo field. A value of 1.0 (0x10000 – fixed point format 16.16) indicates 100% tempo or no tempo change (or a stretching factor of 1.0). A value of 0.5 (0x08000) indicates 50% tempo (or a stretching factor of 2.0). A value of 2.0 (0x20000) indicates 200% tempo (or a stretching factor of 0.5).
Example: If you want to stretch the data stream to 125%, you have to calculate the desired tempo (Tempo = 1 / Stretching_factor): 1 / 125% = 80%. Then convert it to fixed point 16.16 format: 0.8 * 0x10000 = 0x0CCCD. Write this value to the Tempo field.

You may change the tempo dynamically before any call to MXprocess.
Call MXstart with a pointer to the MXdata structure. Examine its return value. If it’s nonzero, your chosen OutBufSizeK value was incorrect for this variant of the library, so you need to free the allocated memory, change OutBufSizeK, and execute all the previous steps again, until MXstart returns a zero value, indicating that OutBufSizeK is correct.
MXstart will clear the value of the Half field, so you don’t need to clear it.
Clear the Empty field. This field is a flag that will be set by MXprocess when all of the data of an input buffer half has been processed, and the half needs refilling. You should clear Empty (MXprocess only sets this flag but doesn’t clear it, so it must be cleared by you!) and refill the input buffer half with data before you next call MXprocess.
(Optional) Each of the steps 5, 6, and 8 could be moved here, if you want to change some of the parameters dynamically before each call to MXprocess. This is useful for real-time implementations.
Put the number of the desired output buffer to be filled by MXprocess to the OutBufNum field. MXprocess won’t modify this field, so you have to fill it only once (usually with a value of 0) if you’re going to use a single output buffer.
Call MXprocess with a pointer to the MXdata structure. MXprocess will fill the corresponding output buffer with processed data. Examine the returned value. A value of −1 indicates that the output buffer is full and the end of data stream is not reached yet. A positive value or a zero indicates the number of samples (number_of_bytes / 4 or 8) written to the output buffer. This also means that the end of data stream is reached, and therefore you should not call MXprocess any more before you reinitialise some of the fields of the MXdata structure by putting in their appropriate values, filling the input buffer, and calling MXstart to begin the processing of a new data stream.
MXprocess will never return a value other than −1 unless you set bit 0 or bit 1 of E0F to indicate the end of the input stream (you should also set Last to the correct value).

After MXprocess returns, you will have an output buffer containing processed data. You can use this data as you like.

Inportant note: MXprocess must be permitted to modify the data in the input buffer, and sometimes it will do so. So, don’t modify the input data, unless the Empty flag is set. Only then you can and have to fill the corresponding input buffer half.
Examine the value of the Empty field. MXprocess will set this flag when a half of the input buffer has been processed and needs refilling. Half contains the number (0 or 1) of the input buffer half being processed. Only in that case (Empty flag set) you should refill the other half (1−Half) and reset the Empty flag. If you have set bit 0 or bit 1 of E0F, MXprocess will never set the Empty flag.
Repeat steps 10–14 until the end of the output data stream is reached (indicated by a non-negative value returned by MXprocess).
If you want to process another data stream, no matter whether the processing of the previous one has finished or not, repeat steps 5–15 to use the same MXdata structure and interrupt the processing of the current data stream if it hasn’t finished yet, or use another MXdata structure and follow steps 2–15 to begin processing another data stream while processing of the current one is still in progress. You could have as many MXdata structures as needed and process the same number of data streams as the number of MXdata structures simultaneously.

9.3. Multithreading and multitasking

MXinit, MXstart, and MXprocess can be used in multithreaded applications to process several data streams simultaneously by passing them pointers to different MXdata structures. Each MXdata structure contains complete information about its corresponding data streams, for which it’s initialised.

For the x86 code, the used CPU registers (GP and FPU registers for the FPU code, GP, and FPU/MMX registers for the 3DNow! code, or GP, FPU/MMX, and XMM registers for the SSE code) must be saved and restored by the operating system when it switches tasks to support multitasking or multithreading when Μεταχρον is used. Older operating systems don’t save and restore the XMM registers, for example:

Windows 95 prior to OEM Service Release 2
Windows NT 4 prior to Service Pack (SP) 4
Windows NT 4 Service Pack 4 without an SSE driver
Linux prior to kernel version 2.2.10.

If the SSE instructions are supported and enabled by the operating system, bit 9 of CR4 of the CPU will be set. But it can be read only in Ring 0 by the kernel and not in user space. Therefore, such detection cannot be included in Μεταχρον. As a rule, operating systems released after 1999 do support SSE.

10. Download Μεταχρον!

Note: All x86 executables are 32-bit and packed with UPX. Unfortunately, some virus scanners give false positives for UPX-packed Windows executables. This is a problem of the virus scanners. Even more of them give false positives for executables packed with some of the better compressing packers like FSG, MEW 11, and UPACK. Please don’t worry about this! The executables are not really infected. To convince yourself in this, unpack the executables using the “−d” switch of UPX. Voilà! No false positives now.

Source code for the Μεταχρον library:
1. As a .RAR archive (32 KB) – use UnRAR to extract.
2. As a .ZIP archive (63 KB) – use UnZip to extract.
3. As a .tar.xz archive (31 KB) – use Tar to extract.
4. As a .tar.bz2 archive (35 KB) " " "
5. As a .tar.gz archive (45 KB) " " "
6. As a .tar.Z archive (86 KB) " " "
7. As a .tar.z archive (168 KB) " " "
8. As a .tar.C archive (168 KB) " " "
Precompiled command-line .WAV-file stretching utility (for 16-bit integer stereo audio data):
- Windows (version 4.0 or later):
  - “Fast” variant:
    1. Built from the x86 editions of Μεταχρον (9.5 KB) with auto-selection of FPU, 3DNow! or SSE code based on the CPU capabilities.
    2. Built from the portable edition of Μεταχρον (131 KB) with Intel IPP (see above) for top speed – significantly faster than (1) but requires an SSE2-enabled CPU.
  - “Pro” variant:
    - As (1) above (9.5 KB).
- Mac OS X universal binaries built from the portable edition of Μεταχρον – for PowerPC (Mac OS X 10.3+) with Apple vDSP, for x86 (Mac OS X 10.6+) with Intel IPP (see above):
  - “Fast” variant (90 KB).
  - “Pro” variant (90 KB).
- Linux (kernel 2.6.24 or later):
  - “Fast” variant:
    1. Built from the x86 editions of Μεταχρον (9.6 KB) with auto-selection of FPU, 3DNow! or SSE code based on the CPU capabilities.
    2. Built from the portable edition of Μεταχρον (327 KB) with Intel IPP (see above) for top speed – significantly faster than (1) but requires an SSE2-enabled CPU.
  - “Pro” variant:
    - As (1) above (9.6 KB).
- FreeBSD 9.x binaries built from the x86 editions of Μεταχρον with auto-selection of FPU, 3DNow! or SSE code based on the CPU capabilities:
  - “Fast” variant (9.9 KB).
  - “Pro” variant (10 KB).
- Solaris binaries built from the portable edition of Μεταχρον with Sun mediaLib (see above) for UltraSPARC:
  - “Fast” variant (55 KB).
  - “Pro” variant (56 KB).

Demonstration GUI player for Windows (version 4.0+ (and later than those listed in 9.3 for SSE) or WINE for Mac OS X or other UNIX-like OS (note that CPU load is doubled in 64-bit Ubuntu!)) with on-the-fly “Fast” / “Pro” variant toggle and a CPU usage meter; plays only 16-bit uncompressed stereo .WAV-files:

Source code for MASM32 (Assembly) or Intel IPP (see above, C). Extract to the same directory as the Μεταχρον library source code. For the Assembly code, first build the library by running “MAKELIBS” from the “METAXPON\x86” subdirectory, and then run “MAKE” from its “PLAYER” subdirectory to build the player. For the C code, run “MAKE_IPP”.
- As a .7z archive (8.9 KB) – use 7-Zip to extract.
- As a .RAR archive (9.1 KB).
- As a .ZIP archive (13 KB).
Precompiled executables. A .WAV-file can be loaded via a dialog box or from the command line. Usable control keys: Enter, Space, Tab, Home, End, PgUp, PgDn, arrows. To display the entire title on the title bar properly, set the latter’s font to an Unicode one, such as Lucida Sans Unicode. (Note: If the path to the .WAV-file or its name contains non-ASCII characters, it may fail to open under Windows NT 6.0 (Vista / 2008) or later!)
1. Built from the Assembly code (10 KB); with auto-selection of FPU, 3DNow! or SSE code based on the CPU capabilities.
2. Built from the C code (126 KB) with Intel IPP (see above) for top speed – significantly faster than (1) but requires an SSE2-enabled CPU. The screenshot on the right shows it scaling in real-time and playing a file with a sampling rate of 44.1 kHz on an AMD Sempron (Palermo) 3200+ (1.8 GHz). The table below shows the CPU load for such a file in “fast” mode for some Intel processors. In the (*) case, “/usepmtimer” had to be added in the “boot.ini” file to make the CPU load indicator work.

Microarchitecture	Codename	Model	Ghz	Load [%]
NetBurst	Willamette	Pentium 4 1.7	1.7	3.2
	Northwood-128	Celeron 2.0	2.0	4.0
	Northwood	Pentium 4 2.66	2.66	1.9
	Prescott-2M	Pentium 4 HT 630	3.0	1.9 *
Enhanced Pentium M	Yonah	Core Duo T2300	1.67	2.3
Intel Core	Allendale	Pentium Dual Core E2160	1.8	1.5
	Penryn	Pentium Dual Core T4300	2.1	1.3
	Merom	Core 2 Duo T7600	2.33	1.2
Sandy Bridge	Sandy Bridge	Pentium G630	2.7	1.3

If you happen to lack a 16-bit uncompressed stereo .WAV-file for testing, you can use this 2.4 MB (14 s) excerpt from the song “Sledgehammer” from the album “So” by Peter Gabriel (1986).