1. Summary
Μεταχρον is a small and fast audio DSP library for time-scale manipulation
of 16-bit integer or 32-bit floating point stereo audio data streams. It
employs a rigid phase-locked vocoder with dedicated transient detection and
processing, and can work in real-time or non-real-time. Four editions are
included – a portable edition and three x86 editions. The portable edition can
be built with any ANSI C compiler and is OS- and architecture-independent. The
three x86 editions are written in Assembly using the FPU, 3DNow!, and SSE
instruction sets, respectively, with automatic selection between them
depending on the CPU capabilities. They can be compiled with MASM, JWASM or
NASM, producing libraries of object files in 8 formats.
2. Licensing
The Μεταχρον library and this page are licensed under the OSI-approved Simple Public Licence v2 –
a plain language implementation of the GNU General Public Licence
v2 written by Professor of Law Robert W.
Gomułkiewicz. It was chosen because it’s much shorter and easier to
understand for non-lawyers. The demonstration player is licensed under the No Problem Bugroff licence... ☺
3. Versions
Big thanks to Jean
Laroche and Mark
B. Dolson for their
1999 IEEE article describing the rigid phase-locked vocoder, Svillenn St.
Stoyanoff for writing its C++ prototype implementation in 2005, and Stanisław
B. Cyriloff for translating the latter into x86 assembly language and writing
the demo player for version 1.0! Development past 1.0 was taken over by
Lutshayzar Il. Gueorguieff.
- First version 1.0 (2005) supported only 16-bit integer data stereo audio
streams as input and output.
- In version 1.00f (2006), variants for 32-bit floating point data streams
and NASM support were added.
- In version 1.01 (2007), the input and output buffer sizes were made
variable, and 16-byte stack alignment propagation at CALL instructions was done
(required by some operating systems like Mac OS X).
- In version 1.02 (2012, June), a portable edition written in C was added,
supporting all 32/64-bit platforms with an FPU and an ANSI C compiler.
- In version 1.02j (2012, July), JWASM support was added.
4. Package folders
- “METAXPON”: the portable edition, a command-line .WAV-file stretch utility,
and the Μεταχρον C[++] header.
- “METAXPON/x86”: the x86 (FPU / 3DNow! / SSE) editions, headers, and build
scripts.
5. Building the portable edition
Any ANSI C compiler can be used. Depending on the desired variant, the
constants PRO (sharper transients but more spectral noise and slower)
and / or FLOAT_DATA may have to be defined on the compiler’s command
line. This edition was successfully tested on the following platforms:
- Native C compilers (26): ACK C, Borland, CC386, Clang/LLVM,
CodeWarrior, DEC/Compaq/HP C, Digital Mars, EKOPath, GCC, High C, Intel C,
LCC-Win32, MIPSpro, NDP C, Open Watcom, Open64, Pelles C, Portable C, Portland
Group C, Salford C, Sun C, Tiny C, USL C, VectorC, Visual Age, Visual
C.
- Operating systems (26): AIX, AROS, BSD/OS, DOS, DragonFly BSD,
FreeBSD, Haiku, HP-UX, Hurd, IRIX, Linux, Mac OS X, MINIX, MirOS BSD, NetBSD,
OpenBSD, OpenIndiana, OpenVMS, OS/2, QNX, Solaris, Syllable Desktop, Tru64
UNIX, ULTRIX, UnixWare, Windows.
- Architectures (13; real hardware, no emulators used!): Alpha,
ARM, IA-64, MIPS-64, PA-RISC, PA-RISC/64, PowerPC, PowerPC-64, SPARC-64, VAX,
x86, x86-64, z/Architecture.
The built-in Fast Fourier Transform (FFT) routine isn’t so fast, taking
as much as 60% of the total CPU time. You can dramatically improve speed by
replacing it with a faster one. There are several options, listed below:
- For x86, x86-64, IA-64, and XScale (ARMv5TE) CPUs, Intel’s Integrated Performance
Primitives (IPP, development led by Vladimir Vl.
Dudnik) have the fastest FFT (the FFT of Intel’s MKL isn’t faster). When
building Μεταχρον, define IPP on the compiler’s command line, and link
against a DSP library (“ipps*.lib” on Windows, “−lipps”
or “−lipps_l −lippcore_l” on Mac OS X or Linux).
- For SIMD-capable CPUs, FFTW by
Matteo Frigo and Steven G. Johnson is the fastest free FFT library. Before
building it, add “−−enable-float”, and enable the SSE /
SSE2 / AVX / Altivec / NEON SIMD optimization(s) supported by the target CPU on
the “configure” command line. When building Μεταχρον, define FFTW on the
compiler’s command line, and link against the FFTW library. Note that there is
a noticeable delay in the MXinit function (see the API description
below) when FFTW is used.
- For x86 and x86-64 CPUs on Linux and Windows, the AMD Core Math Library (ACML) is almost as fast as FFTW. When building
Μεταχρον, define ACML on the compiler’s command line, and link against
an ACML library (“libacml_dll.lib” on Windows, “−lacml
−lifcoremt_pic −lpthread −lrt” on Linux).
- For Mac OS 9.1 and Mac OS X, Apple’s vDSP (a part of the Accelerate
Framework since Mac OS X 10.3) is almost as fast as FFTW. When building
Μεταχρον, define VDSP on the compiler’s command line, adding “−framework Accelerate”.
- For x86 CPUs with SSE and/or AVX as well as ARM CPUs with NEON,
SFFT by Anthony M. Blake is
almost as fast as FFTW. Before building it, edit “src/target_*.conf”, adding a
new line with a “b” instead next to each line with an “f” to enable backward
transforms, and then add “−−enable-single” and enable
the SIMD version supported by the target CPU on the “configure” command line.
When building Μεταχρον, define SFFT on the compiler’s command line, and
link against the SFFT library.
- For Solaris, Sun mediaLib (development done in
Boris Art. Babayan’s MCST) is almost as fast as FFTW. When building
Μεταχρον, define MLIB on the compiler’s command line, and link against a
mediaLib of your choice (“−lmlib” or “−lmlib_mt −lmlib”).
- For Sun Studio, the Sun Performance Library
(SPL, development led by Paul J.
Hinker) is almost as fast as FFTW (but slower than mediaLib on Solaris /
SPARC albeit faster on x86). When building Μεταχρον, define SPL on the
Sun C compiler’s command line, adding “−dalign” and
“−xlic_lib=sunperf”.
- For most non-SIMD-enabled architectures (e.g., as tested, Alpha,
ARM11 (without NEON), MIPS, PA-RISC, PA-RISC/64, PowerPC 600, PowerPC
(POWER4+), x86 (P5), z/Architecture, but not IA-64, SPARC, or VAX!), the djbfft library by Daniel J. Bernstein
(1999) is still faster than FFTW. You may change the “−O1” switch to “−O3” in the “conf-cc”
file before building djbfft. When building Μεταχρον, define DJBFFT on
the compiler’s command line, and link against the djbfft library.
With options 1–5 above and the Intel’s C compiler
which is freely available for
non-commercial development (includes IPP; used to build FFTW or SFFT too),
the portable Μεταχρον is significantly faster than its own x86 assembly
language counterpart on SSE2-enabled x86 CPUs! The so compiled (with IPP)
.WAV-file stretch utility and demo player are available for download at the
bottom of this page.
6. Building the x86 editions
The following assemblers can be used:
- MASM (OMF and COFF formats only): Run “makelibs.bat” in Windows
or DOS with HXRT. WLIB from Open Watcom is required to
make the libraries. A MASM version with SSE support (i.e., 6.14 or later) is
required. All such versions (6.14, 6.15, 7.0, 7.10, 8.0, 9.0, 10, and 11) were
successfully tested.
- JWASM (OMF, COFF, and
ELF formats only): With a copy of JWASM named “ML.EXE” in your PATH, run
“makelibs.bat” in Windows or DOS, or “makelibs.cmd” in OS/2 (as with MASM, WLIB
is required). In UNIX-like environments, run “makelibs.jwa”. JWASM version 2.07
was successfully tested.
- NASM (all formats but OMF): Run
“makelibs.sh” in UNIX-like environments or “makelibs.cmd” in OS/2 with Open
Watcom. NASM version 0.98.40 build 11 or later is required. Versions 0.98.40
build 11, 2.00, 2.07, and 2.10 were successfully tested.
After you’ve build the x86 libraries, you’ll find the following directories:
- normal: “Fast” integer variant
- pro: “Pro” integer variant
- float: “Fast” floating point variant
- floatpro: “Pro” floating point variant
- both: Combined “Fast” and “Pro” integer variant
- floatbot: Combined “Fast” and “Pro” floating point variants
In each directory, you’ll find from 2 to 7 (depending on the assembler used)
libraries with different formats according to their file name, as follows:
Format | Operating system or
compiler |
a.out | Linux (older)/EMX (OS/2) |
a.out-b | NetBSD/FreeBSD/OpenBSD (older) |
COFF | DJGPP/UNIX System V (older) |
ELF | Linux/UNIX System
V/Solaris/BSD/NetBSD/FreeBSD/OpenBSD/QNX |
Mach-O |
NeXTstep/OpenStep/Rhapsody/Darwin/Mac OS X |
OMF | MS-DOS/PC-DOS/DR-DOS/FreeDOS (32-bit)/OS/2 |
PE COFF | Microsoft Windows (32-bit) |
RDF | Relocatable Dynamic Object File Format v2 |
Most compilers, linkers, and operating systems use one of the above formats.
Library functions can be called from procedural programming languages via the
“C” (“cdecl”) calling convention. You can translate the supplied C header file
into the language used in your projects. Please refer to the corresponding
language specification.
7. NASM and SSE data alignment
Some linkers don’t align each new input section of the NASM-produced object
files on a 16-byte boundary, which results in misaligned SSE data and an
“Invalid Instruction” exception on attempt to run Μεταχρον. If such exceptions
occur, you have to get SSE data aligned at 16-byte boundaries. Read the NASM
manual and your linker manual to find a way of doing it.
The following operating systems were tested, and the NASM data was ensured
to have proper alignment for each of them: DOS, Windows, Linux, Mac OS X,
FreeBSD, Solaris, SCO OpenServer, QNX, and OS/2 (with Open Watcom).
8. Getting MASM
All MASM versions supporting SSE could freely be downloaded from Microsoft.
Most still can, and some of the rest can be retrieved from the Internet
Archive. The following subsections show how to get each of these versions.
MASM 6.14
Download Windows
Millennium Driver Development Kit (27 MB), and install it with the
default component group selection. Add the “bin\win_me\bin” subdirectory of the
DDK install directory (normally “C:\NTDDK”) to your PATH, or copy the files
“ml.exe” and “ml.err” from it to a directory that is already in your PATH.
MASM 6.15
- Download Visual C++ 6.0 Processor Pack (1.1 MB).
- Install the package. It will complain saying: “This version of the
Processor Pack will only install on Visual C++ 6.0 Service Pack 4”. Now, don’t
press “OK” yet!
- Go to the “IXP000.TMP” subdirectory in your %TMP% or %TEMP% directory, and
copy or move the files “ml.err” and “ml.exe” to their permanent directory
which must be in your PATH.
- Now you can press “OK” on the above dialogue box, and the installer will
delete the temporary directories and files.
MASM 7.0
Download Windows XP Driver Development Kit (140 MB). You can perform the
standard installation procedure. Alternatively, you can extract just the two
files “X86dBINS_FILE_15” and “X86dBINS_FILE_16” from the cabinet file
“X86DBINS.CAB” in the “COMMON” subdirectory, and rename tem to “ml.err” and
“ml.exe”, respectively. Then move them to their permanent directory which must
be in your PATH.
MASM 7.10
Download Windows Server 2003 SP1 DDK (231 MB). You can perform the standard
installation procedure of the whole package. Alternatively, you can extract
just the file “X86dBINS_FILE_19” from the cabinet file “X86dBINS.cab” in the
“common” subdirectory. Rename it to “ml.exe”, and move it to its permanent
directory which must be in your PATH.
MASM 8.0
- Download Microsoft
Macro Assembler 8.0 (MASM) Package (x86, 311 KB).
- Download Microsoft
Visual C++ 2005 Redistributable Package (x86, 2.6 MB).
- Install the second package. It will install “MSVCR80.DLL”, which the
assembler needs in the Windows SYSTEM (SYSTEM32) directory.
- Install the first package. It will complain saying: “Microsoft Visual C++
Express Edition 2005 required”. Now, don’t press “OK” yet!
- Go to the “IXP000.TMP\IXP000.TMP” subdirectory in your %TMP% or %TEMP%
directory. Find a cabinet (.CAB) file there. Extract its contents with EXTRACT
or the Windows GUI.
- Copy or move the so extracted file “FL_ML_EX.364” to its permanent
directory (must be in your PATH), and rename it to “ML.EXE”. This is the
assembler executable.
- Now you can press “OK” on the above dialogue box, and the installer will
delete the temporary directories and files.
Alternatively, you can download and install Visual C++ 2005 Express
Edition (463 MB), and then perform the standard installation procedure
of MASM 8.0, downloaded as shown in step 1 above.
MASM 9.0
If you prefer MASM version 9.0 (note that it requires Windows NT 5.0
(2000) or later!), it’s contained in the Windows
Driver Kit version 7.1.0 (620 MB). You can perform the standard
installation procedure of the whole package. Alternatively, you can extract
just the two files “_ml.exe_00081” and “_msvcr90.dll_00086” from the cabinet
file “buildtools_x86fre_cab001.cab” in the “WDK” subdirectory, and rename them
to “ml.exe” and “msvcr90.dll”, respectively. Then, move the first one (the
assembler executable) to its permanent directory (must be in your PATH) and the
second one to the Windows SYSTEM32 directory.
Another free Microsoft product containing MASM 9.0 (albeit an older build
than the above one) is Visual C++ 2008 Express
Edition (749 MB, licence). You can extract it from there in the same way as described below
for MASM 10. The only difference is that you will need the Visual C++
2008 Redistributable Package (x86, 4 MB) for the file “msvcr90.dll”
instead.
MASM 10
If you prefer MASM version 10 (note that it requires Windows NT 5.1
(XP) or later!), it’s contained in the Visual C++ 2010 Express
Edition (694 MB). You can perform the standard installation procedure
of the whole package. Alternatively, you can follow these steps:
- Run “Ixpvc.exe” in the “VCExpress” subdirectory. It will complain saying:
“To install this product, please run Setup.exe.” No, don’t press “OK” yet!
- Find a directory whose name consists of 24 random hexadecimal digits in the
root directory of your “C:” drive, and a cabinet file “vs_setup.cab” in
it.
- Extract the file
“FL_ml_exe_19621_x86_ln.3643236F_FC70_11D3_A536_0090278A1BB8” from the cabinet
file.
- Rename the so extracted file to “ml.exe”, and move it to its permanent
directory which must be in your PATH. This is the assembler executable.
- Now you can press “OK” on the above dialogue box, and the installer will
delete the temporary directories and files.
- Download and install the Visual C++
2010 Redistributable Package (x86, 4.8 MB). It contains “msvcr100.dll”,
necessary for the assembler to run.
MASM 11
If you prefer MASM version 11 (note that it requires Windows NT 6.0
(Vista / Server 2008) or later!), it’s contained in the Visual C++ 2012 Express
Edition (403 MB). You can perform the standard installation procedure
of the whole package, if you have Windows NT 6.2 (8) or later.
Alternatively, you can extract just two files:
- “WinC_compiler_x86_ml.exe_F” from the cabinet file “vc_CompilerCore.cab” in
the “packages/vc_compilercore” subdirectory.
- “F_CENTRAL_msvcr110_x86” from the cabinet file “cab1.cab” in the
“packages/vcRuntimeMinimum_x86” subdirectory.
Rename them to “ml.exe” and “msvcr110.dll”, respectively. Then, move the
first one (the assembler executable) to its permanent directory (must be in
your PATH) and the second one to the Windows SYSTEM32 directory.
MASM 12
If you prefer MASM version 12 (note that it requires Windows NT 6.0
(Vista / Server 2008) or later!), it’s contained in the Visual C++ 2013 Express
Edition (790 MB). You can perform the standard installation procedure
of the whole package, if you have Windows NT 6.1.7601 (7 SP1) or later.
Alternatively, you can extract just two files:
- “WinC_compiler_x86_ml.exe_F” from the cabinet file “vc_CompilerCore86.cab” in
the “packages/vc_compilerCore86” subdirectory.
- “F_CENTRAL_msvcr120_x86” from the cabinet file “cab1.cab” in the
“packages/vcRuntimeMinimum_x86” subdirectory.
Rename them to “ml.exe” and “msvcr120.dll”, respectively. Then, move the
first one (the assembler executable) to its permanent directory (must be in
your PATH) and the second one to the Windows SYSTEM32 directory.
9. Application Programming Interface (API)
The Μεταχρον API is quite simple and almost solely built around a relatively
small data structure – MXdata. Note that the current Μεταχρον version
works only with stereo audio data (16-bit integer or 32-bit floating point), so
if you need to process mono data, you have to copy it into both stereo
channels.
9.1. Μεταχρον library public functions
- MXinit – initialises the MXdata structure and other internal data;
has to be called only once after your programme has been loaded, and before any
call to MXprocess (MXstart must be called before MXprocess too). You don’t need
to call MXinit in subsequent calls to MXstart to begin processing another data
stream using the same MXdata structure, after a previous one has been finished
or interrupted.
C prototype: int __cdecl MXinit(struct MXdata
*);
Return value: 0 (or equals the return value of the GetSIMD function
for the x86 editions) or −1 if the function fails due to a library version
mismatch.
- MXstart – initialises internal buffers and some MXdata fields; must
be called each time before you begin processing a new data stream.
C prototype: int __cdecl MXstart(struct
MXdata *);
Return value: 0 if the function succeeds or −1 if it fails due to an
incorrect buffer size.
- MXprocess – fills the corresponding output buffer with data
processed from the input buffers.
C prototype: int __cdecl MXprocess(struct
MXdata *);
Return value: −1 when the end of the data stream has not been reached
or the non-negative number of samples written to the output buffer
otherwise.
- GetSIMD – a helper function that returns information about the SIMD
capabilities of the CPU (available only for the x86 editions and not declared
in the supplied header file).
C prototype: int __cdecl
GetSIMD(void);
Return value: Nonzero if the CPU has floating point SIMD capabilities
or 0 otherwise. Bit 0 is set if the CPU supports the 3DNow! instruction set.
Bit 1 is set if the CPU supports the SSE instruction set. Bit 2 is set if the
CPU family is higher than 6 (e.g. Pentium 4, Pentium D, AMD K8+, etc.). This
bit is useful to distinguish between Palomino and later AMD K7 CPUs where
3DNow! is faster than SSE and K8+ where it isn’t.
The combined libraries have another 3 functions with the same names but with
a “Pro” suffix added – MXinitPro, MXstartPro, and MXprocessPro. (To avoid
confusion, they’re not declared in the header file; to use them, you have to
add them.) In these libraries, standard function names are for the “Fast”
variant, and the “Pro” functions are for the “Pro” variant. All non-combined
libraries, either “Fast” or “Pro”, use the same standard function names,
because they are separated in different library files and not combined.
Note that because the “Fast” and “Pro” variants have different input/output
buffer requirements checked and parameters set by the MXstart[Pro] function,
these two variants in the combined libraries must be used with two independent
MXdata structures. The buffers could be shared between them though, as long as
their length satisfies the requirements of both variants.
9.2. Steps to follow to time-scale an audio data stream
- Make sure you link your application programme against one of the
.lib (.a) libraries or load a .DLL (if you build one) at run-time.
- Choose the output buffer size in kilobytes (OutBufSizeK). It can
be 8, 16 or 32 KB for all the libraries, 4 KB for all “Pro” and integer
variants, and 2 KB for the integer “Pro” variant only. Larger values make
real-time playback in some operating systems smoother, but smaller values
reduce the latency (i.e. the input-to-output delay). The size of an input
buffer half will be 4 times larger than the output buffer size.
Allocate the memory:
2 * 4 * 1024 * OutBufSizeK + 32 bytes for the two input buffer
halves.
n * 1024 * OutBufSizeK + 32 bytes for n output buffers (n >= 1).
It’s easier to use only one output buffer when the output data is not to be
played back in real-time. When you’re playing the output data in real-time,
you have to use two or more output buffers.
INT_BUF_SIZE bytes needed for internal data processing.
- Prepare the MXdata structure and fill the following fields:
Version – you must put the value of the METAXPON_VERSION constant
here.
OutBufSizeK – put the above OutBufSizeK value that you’ve used to
allocate input and output buffers here.
InBuf – put the address of the input buffer here.
OutBuf – put the address of the first output buffer here. The second,
if present, will follow immediately after the first one at address OutBuf +
1024 * OutBufSizeK. The third, if present – immediately after the second at
address OutBuf + 2 * 1024 * OutBufSizeK, and so on.
Memory – put the address of the memory needed for internal processing
here.
- Call MXinit with a pointer to the MXdata structure.
Examine the returned value – if it’s a negative number, MXinit has failed. This
may happen if you didn’t initialise the Version field of the MXdata
structure or there is a library version mismatch. MXinit modifies many fields
of the MXdata structure, including InBuf, OutBuf, and
Memory. It aligns all addresses to 16-byte boundaries (32-byte
boundaries for the portable edition). You should never use the addresses
obtained when you allocated input and output buffers, except when you’re
freeing the allocated memory. Use the addresses at fields InBuf and
OutBuf instead!
- (Optional, x86 only) You can override the value written in the
MathUnit field of MXdata structure at any time after calling MXinit and
force subsequent calls to MXprocess to use a different math unit (x87 FPU,
3DNow! or SSE). But be careful! Putting a wrong number here (for example,
forcing a Pentium to use 3DNow!, which it doesn’t support) will cause an
attempt for execution of an invalid instruction and will raise an exception or
a deadlock. Three possible values are defined: 0 – use x87 FPU, 1
– use 3DNow!, and 2 – use SSE. In the current x86 version, all other
values will force MXprocess to use not SIMD but x87 FPU instead. Note that this
value isn’t the same as the one returned by MXinit and GetSIMD. Normally you
don’t need to care about this field, because MXinit automatically initialises
it with the optimal value.
- (Optional) You can override the value written in the
Threshold field of the MXdata structure at any time after calling
MXinit. MXinit initialises this field to 2.0 (0x20000 – fixed point format
16.16). This is a single parameter controlling the attacks detection level of
the attack detection algorithm. Lowering this value will result in finding more
attacks (transients) but lowering it too much will result in some false
transients detection and may distort the sound. Using higher values may
eliminate false transients detection but the sound may lose its sharp attacks.
This value must be greater than 1.0 (0x10000). Values closer to or less than
1.0 will cause the detection of many false attacks and will distort the sound!
Values much higher than 2.0 (for example 4.0 or higher) may cause no
transients to be found at all!
- Fill the whole input buffer with audio data. Clear the
E0F field. If the data is less than the size of a half or a whole input
buffer, set bit 0 of E0F to indicate that the end of input data is in
the first input buffer half, or set bit 1 of E0F to indicate that the
end of input data is in the second input buffer half. Also, set the value of
Last to the number of bytes (number_of_samples * 4 or 8) written to the
input buffer half, in which the input data stream ends. Last also has a
meaning of the offset from the beginning of the current input buffer half
after the end of input data. You should clear all the other bits of
E0F, and you should not modify them after you call MXprocess until the
end of processing of the current data stream, because some of them are used as
internal flags.
- Put the desired tempo of the output data stream to the
Tempo field. A value of 1.0 (0x10000 – fixed point format 16.16)
indicates 100% tempo or no tempo change (or a stretching factor of 1.0). A
value of 0.5 (0x08000) indicates 50% tempo (or a stretching factor of 2.0). A
value of 2.0 (0x20000) indicates 200% tempo (or a stretching factor of 0.5).
Example: If you want to stretch the data stream to 125%, you have to
calculate the desired tempo (Tempo = 1 / Stretching_factor): 1 / 125% = 80%.
Then convert it to fixed point 16.16 format: 0.8 * 0x10000 = 0x0CCCD. Write
this value to the Tempo field.
You may change the tempo dynamically before any call to MXprocess.
- Call MXstart with a pointer to the MXdata structure.
Examine its return value. If it’s nonzero, your chosen OutBufSizeK value was
incorrect for this variant of the library, so you need to free the allocated
memory, change OutBufSizeK, and execute all the previous steps again, until
MXstart returns a zero value, indicating that OutBufSizeK is correct.
MXstart will clear the value of the Half field, so you don’t need to
clear it.
- Clear the Empty field. This field is a flag that will be
set by MXprocess when all of the data of an input buffer half has been
processed, and the half needs refilling. You should clear Empty
(MXprocess only sets this flag but doesn’t clear it, so it must be cleared by
you!) and refill the input buffer half with data before you next call
MXprocess.
- (Optional) Each of the steps 5, 6, and 8 could be moved here,
if you want to change some of the parameters dynamically before each call
to MXprocess. This is useful for real-time implementations.
- Put the number of the desired output buffer to be filled by
MXprocess to the OutBufNum field. MXprocess won’t modify this field, so
you have to fill it only once (usually with a value of 0) if you’re going to
use a single output buffer.
- Call MXprocess with a pointer to the MXdata structure.
MXprocess will fill the corresponding output buffer with processed data.
Examine the returned value. A value of −1 indicates that the output buffer is
full and the end of data stream is not reached yet. A positive value or a zero
indicates the number of samples (number_of_bytes / 4 or 8) written to the
output buffer. This also means that the end of data stream is reached, and
therefore you should not call MXprocess any more before you reinitialise some
of the fields of the MXdata structure by putting in their appropriate values,
filling the input buffer, and calling MXstart to begin the processing of a new
data stream.
MXprocess will never return a value other than −1 unless you set bit 0 or
bit 1 of E0F to indicate the end of the input stream (you should also
set Last to the correct value).
After MXprocess returns, you will have an output buffer containing processed
data. You can use this data as you like.
Inportant note: MXprocess must be permitted to modify the data in
the input buffer, and sometimes it will do so. So, don’t modify the input data,
unless the Empty flag is set. Only then you can and have to fill the
corresponding input buffer half.
- Examine the value of the Empty field. MXprocess will set
this flag when a half of the input buffer has been processed and needs
refilling. Half contains the number (0 or 1) of the input
buffer half being processed. Only in that case (Empty flag set) you
should refill the other half (1−Half) and reset the Empty flag.
If you have set bit 0 or bit 1 of E0F, MXprocess will never set the
Empty flag.
- Repeat steps 10–14 until the end of the output data stream is
reached (indicated by a non-negative value returned by MXprocess).
- If you want to process another data stream, no matter whether
the processing of the previous one has finished or not, repeat steps 5–15 to
use the same MXdata structure and interrupt the processing of the current data
stream if it hasn’t finished yet, or use another MXdata structure and follow
steps 2–15 to begin processing another data stream while processing of the
current one is still in progress. You could have as many MXdata structures as
needed and process the same number of data streams as the number of MXdata
structures simultaneously.
9.3. Multithreading and multitasking
MXinit, MXstart, and MXprocess can be used in
multithreaded applications to process several data streams simultaneously by
passing them pointers to different MXdata structures. Each MXdata structure
contains complete information about its corresponding data streams, for which
it’s initialised.
For the x86 code, the used CPU registers (GP and FPU registers for the FPU
code, GP, and FPU/MMX registers for the 3DNow! code, or GP, FPU/MMX, and XMM
registers for the SSE code) must be saved and restored by the operating system
when it switches tasks to support multitasking or multithreading when Μεταχρον
is used. Older operating systems don’t save and restore the XMM registers, for
example:
- Windows 95 prior to OEM Service Release 2
- Windows NT 4 prior to Service Pack (SP) 4
- Windows NT 4 Service Pack 4 without an SSE driver
- Linux prior to kernel version 2.2.10.
If the SSE instructions are supported and enabled by the operating system,
bit 9 of CR4 of the CPU will be set. But it can be read only in Ring 0 by the
kernel and not in user space. Therefore, such detection cannot be included in
Μεταχρον. As a rule, operating systems released after 1999 do support SSE.
10. Download Μεταχρον!
Note: All x86 executables are 32-bit and packed with UPX. Unfortunately, some virus
scanners give false positives for UPX-packed Windows executables. This is a
problem of the virus scanners. Even more of them give false positives for
executables packed with some of the better compressing packers like FSG, MEW
11, and UPACK. Please don’t worry about this! The executables are not
really infected. To convince yourself in this, unpack the executables using the
“−d” switch of UPX. Voilà! No false positives now.
- Source code for the Μεταχρον library:
- As a .RAR archive (32 KB) – use UnRAR to extract.
- As a .ZIP archive (63 KB) – use UnZip to extract.
- As a .tar.xz archive (31 KB) – use Tar to extract.
- As a .tar.bz2 archive (35 KB) "
" "
- As a .tar.gz archive (45 KB) "
" "
- As a .tar.Z archive (86 KB) " "
"
- As a .tar.z archive (168 KB) " "
"
- As a .tar.C archive (168 KB) " "
"
- Precompiled command-line .WAV-file stretching
utility (for 16-bit integer stereo audio data):
-
Windows (version 4.0 or later):
-
Mac OS X universal binaries built from the portable edition of Μεταχρον
– for PowerPC (Mac OS X 10.3+) with Apple vDSP, for x86 (Mac OS X 10.6+) with
Intel IPP (see above):
-
Linux (kernel 2.6.24 or later):
-
FreeBSD 9.x binaries built from the x86 editions of Μεταχρον with
auto-selection of FPU, 3DNow! or SSE code based on the CPU capabilities:
-
Solaris binaries built from the portable edition of Μεταχρον with Sun
mediaLib (see above) for UltraSPARC:
- Demonstration GUI player for Windows (version
4.0+ (and later than those listed in 9.3 for SSE) or WINE for Mac
OS X or other UNIX-like OS
(note that CPU load is doubled in 64-bit Ubuntu!)) with on-the-fly “Fast” /
“Pro” variant toggle and a CPU usage meter; plays only 16-bit uncompressed
stereo .WAV-files:
-
Source code for MASM32 (Assembly)
or Intel IPP (see above, C). Extract to the same directory as the Μεταχρον
library source code. For the Assembly code, first build the library by running
“MAKELIBS” from the “METAXPON\x86” subdirectory, and then run “MAKE” from its
“PLAYER” subdirectory to build the player. For the C code, run “MAKE_IPP”.
-
Precompiled executables. A .WAV-file can be loaded via a dialog box or
from the command line. Usable control keys: Enter, Space, Tab, Home, End,
PgUp, PgDn, arrows. To display the entire title on the title bar properly,
set the latter’s font to an Unicode one, such as Lucida Sans Unicode.
(Note: If the path to the .WAV-file or its name contains non-ASCII
characters, it may fail to open under Windows NT 6.0 (Vista / 2008) or
later!)
- Built from the Assembly code (10 KB); with
auto-selection of FPU, 3DNow! or SSE code based on the CPU capabilities.
- Built from the C code (126 KB) with Intel IPP
(see above) for top speed – significantly faster than (1) but requires an
SSE2-enabled CPU. The screenshot on the right shows it scaling in real-time and
playing a file with a sampling rate of 44.1 kHz on an AMD Sempron (Palermo)
3200+ (1.8 GHz). The table below shows the CPU load for such a file in “fast”
mode for some Intel processors. In the (*) case, “/usepmtimer” had to be added
in the “boot.ini” file to make the CPU load indicator work.
Microarchitecture | Codename | Model | Ghz | Load [%] |
NetBurst | Willamette | Pentium 4
1.7 | 1.7 | 3.2 |
Northwood-128 | Celeron 2.0 | 2.0 | 4.0 |
Northwood | Pentium 4 2.66 | 2.66 | 1.9 |
Prescott-2M | Pentium 4 HT 630 | 3.0 | 1.9
* |
Enhanced Pentium M | Yonah | Core Duo
T2300 | 1.67 | 2.3 |
Intel Core | Allendale | Pentium
Dual Core E2160 | 1.8 | 1.5 |
Penryn | Pentium Dual Core
T4300 | 2.1 | 1.3 |
Merom | Core 2 Duo T7600 | 2.33 | 1.2 |
Sandy Bridge | Sandy Bridge | Pentium
G630 | 2.7 | 1.3 |
|
|
|
If you happen to lack a 16-bit uncompressed stereo .WAV-file for testing,
you can use this 2.4 MB (14 s) excerpt from the song “Sledgehammer” from the album
“So” by Peter Gabriel (1986).