Previous section (1)Next section (3)


Q2: Algorithms and standards


Q2.1: Where can I get public domain algorithms for general-purpose DSP?

Updated 12/31/96
 
 
The following archives contain things such as matrix operations, FFT's and generally useful things like that, as opposed to complete applications.

Netlib

Netlib serves some of this software via email. Try mail to netlib@ORNL.GOV with "send help" in the subject field.
To Obtain:
 For Europe:


Internet: netlib@nac.no
EARN/BITNET: netlib%nac.no@norunix.bitnet
X.400: s=netlib; o=nac; c=no;
EUNET/uucp: nac!netlib

For more information:
See Jack J. Dongarra and Eric Grosse, "Distribution of Mathematical Software Via Electronic Mail," Comm. ACM (1987) 30,403--407.


A similar collection of statistical software is available from statlib@temper.stat.cmu.edu.

The symbolic algebra system REDUCE is supported by reduce-netlib@rand.org.

NSWC Library

The Naval Surface Warfare Center has a library of mathematical Fortran subroutines that may be of use. The NSWC library is a library of general-purpose Fortran subroutines that provide a basic computational capability in a variety of mathematical activities. Emphasis has been placed on the transportability of the codes. Subroutines are available in the following areas: Elementary Operations, Geometry, Special Functions, Polynomials, Vectors, Matrices, Large Dense Systems of Linear Equations, Banded Matrices, Sparse Matrices, Eigenvalues and Eigenvectors, l1 Solution of Linear Equations, Least-Squares Solution of Linear Equations, Optimization, Transforms, Approximation of Functions, Curve Fitting, Surface Fitting, Manifold Fitting, Numerical Integration, Integral Equations, Ordinary Differential Equations, Partial Differential Equations
For more information:
 NSWC Library of Mathematical Subroutines

Report No.: NSWC TR 90-21, January 1990
by Alfred H. Morris, Jr.

Naval Surface Warfare Center (E43)
Dahlgren, VA 22448-5000
U.S.A.

[Witold Waldman, Witold.Waldman@dsto.defence.gov.au]

IEEE Press book "Programs For Digital Signal Processing"

You can get the Fortran source code from the IEEE Press book "Programs For Digital Signal Processing." See question 1.3.6.


 

Q2.2: What are CELP and LPC? Where can I get the source for CELP and LPC?

Updated 9/24/98
 
 
CELP stands for "code excited linear prediction". LPC stands for "linear predictive coding". They are compression algorithms used for low bit rate (2400 and 4800 bps) speech coding.


The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited linear prediction voice coder version 3.2 (CELP 3.2) Fortran and C simulation source codes are available for worldwide distribution (on DOS diskettes, but configured to compile on Sun SPARC stations) from NTIS and DTIC. Example input and processed speech files are included. A Technical Information Bulletin (TIB), "Details to Assist in Implementation of Federal Standard 1016 CELP," and the official standard, "Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP)," are also available.

To obtain CELP:
 Available through the National Technical Information Service:
NTIS

U.S. Department of Commerce
5285 Port Royal Road
Springfield, VA 22161
USA
(800) 553-6847
FS-1016 CELP 3.2 may also be obtained from file://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/celp_3.2a.tar.Z or file://ftp.super.org/pub/speech/celp_3.2a.tar.Z.

LPC is available from ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz or file://svr-ftp.eng.cam.ac.uk/pub/speech/coding/lpc10-1.0.tar.gz.

MATLAB software for LPC-10 is available from http://www.eas.asu.edu/~spanias/srtcrs.html. Also, postscript copies of tutorials of speech coding can be found at http://www.eas.asu.edu/~spanias/papers.html. [Andreas Spanias, spanias@asu.edu]

For more information:
[Most of the above from Joe Campbell, jpcampb@afterlife.ncsc.mil, with additions from Dan Frankowski, drankow@winternet.com, and Ed Hall, edhall@rand.org]


Q2.3: What is ADPCM? Where can I get source for it?

Updated: 1/7/97
 
 
ADPCM stands for Adaptive Differential Pulse Code Modulation. It is a family of speech compression and decompression algorithms. A common implementation takes 16-bit linear PCM samples samples and converts them to 4-bit samples, yielding a compression rate of 4:1.
To obtain:
 There is public domain C code available via anonymous ftp at file://ftp.cwi.nl/pub/audio/adpcm.shar written by Jack Jansen (email Jack.Jansen@cwi.nl). It is very programmer-friendly. The ADPCM code used is the Intel/DVI ADPCM code which is being recommended by the IMA Digital Audio Technical Working Group. It allows the following calls:
adpcm_coder(short inbuf[], char outbuf[], int nsample,
        struct adpcm_state *state);
adpcm_decoder(char inbuf[], short outbuf[], int nsample,
        struct adpcm_state *state);
Note that this is NOT a G.722 coder. The ADPCM standard is much more complicated, probably resulting in better quality sound but also in much more computational overhead.
Platforms:
 The routines have been tested on numerous platforms, and will easily compress and decompress millions of samples per second on current hardware.
For more information:
 The G.721/722/723 packages are available from ITU at http://www.itu.ch/.


This is also available as: file://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G722_G723.tar.gz

[From Dan Frankowski, dfrankow@winternet.com; Jack Jansen, Jack.Jansen@cwi.nl]


Q2.4: What is GSM? Where can I get source for it?

Updated 1/7/96
 
 
GSM is a standard for digital cellular telephony used in Europe. In this context, GSM also refers to the speech coder used in GSM telephones.


The Communications and Operating Systems Research Group (KBS) at the Technische Universitaet Berlin is currently working on a set of UNIX-based tools for computer-mediated telecooperation that will be made freely available.

As part of this effort we are publishing an implementation of the European GSM 06.10 provisional standard for full-rate speech transcoding, prI-ETS 300 036, which uses RPE/LTP (residual pulse excitation/long term prediction) coding at 13 kbit/s.

GSM 06.10 compresses frames of 160 13-bit samples (8 kHz sampling rate, i.e. a frame rate of 50 Hz) into 260 bits; for compatibility with typical UNIX applications, our implementation turns frames of 160 16-bit linear samples into 33-byte frames (1650 Bytes/s). The quality of the algorithm is good enough for reliable speaker recognition; even music often survives transcoding in recognizable form (given the bandwidth limitations of 8 kHz sampling rate).

The interfaces offered are a front end modeled after compress(1), and a library API. Compression and decompression run faster than realtime on most SPARCstations. The implementation has been verified against the ETSI standard test patterns.

Jutta Degener jutta@cs.tu-berlin.de, Carsten Bormann cabo@cs.tu-berlin.de)

Communications and Operating Systems Research Group, TU Berlin
Fax: +49.30.31425156, Phone: +49.30.31424315

To obtain:
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/gsm-1.0.6.tar.gz. An alternative site is ftp://ftp.cwi.nl/pub/audio/gsm-1.0.7.tar.gz. Try also: http://kbs.cs.tu-berlin.de/~jutta/toast.html.
[From Dan Frankowski, dfrankow@winternet.com; Jutta Degener, jutta@cs.tu-berlin.de]


Q2.5: How does pitch perception work, and how do I implement it on my DSP chip?

Updated 6/3/98
 
 
Pitch is officially defined as "That attribute of auditory sensation in terms of which sounds may be ordered on a musical scale." Several good examples illustrating the subtleties of pitch perception are included in the "Auditory Demonstrations CD" which is available from the Acoustical Society of America, Woodbury, NY 10797 for $20.
Books/papers:
 A good general reference about the psychology of pitch perception is the book:
B.C.J. Moore, An Introduction to the Psychology of Hearing, Academic Press, London, 1997.
This book is available in paperback and makes a good desk reference.

An algorithm implementation that matches a large body of psychoacoustical work, but which is computationally very intensive, is presented in the paper:

Malcolm Slaney and Richard Lyon, "A Perceptual Pitch Detector," Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, 1990, Albuquerque, New Mexico. Available for ftp at ftp://worldserver.com/pub/malcolm/ICASSP90.psc.Z
The definitive papers describing the use of such a perceptual pitch detector as applied to the classical pitch literature is in:
Ray Meddis and M. J. Hewitt. "Virtual pitch and phase sensitivity of a computer model of the auditory periphery. "Journal of the Acoustical Society of America 89 (6 1991): 2866-2682. and 2883-2894.
The current work that argues for a pure spectral method starts with the work of Goldstein:
J. Goldstein, "An optimum processor theory for the central formation of the pitch of complex tones," Journal of the Acoustical Society of America 54, 1496-1516, 1973.
Two approaches are worth considering if something approximating pitch is appropriate. The people at IRCAM have proposed a harmonic analysis approach that can be implemented on a DSP:
Boris Doval and Xavier Rodet, "Estimation of Fundamental Frequency of Musical Sound Signals," Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, Toronto, Volume 5, pp. 3657-3660.
The classic paper for time domain (peak picking) pitch algorithms is:
B. Gold and L. Rabiner, "Parallel processing techniques for estimating pitch periods of speech in the time domain," Journal of the Acoustical Society of America, 46, pp 441-448, 1969.
Finally, a word of caution:
Pitch is not single-valued. We can hear a sound and match it to several different pitches. Imagine the number of instruments in an orchestra, each with its own pitch. Even a single sound can have more than one pitch. See for example Demonstration 27 from the ASA Auditory Demonstrations CD.


[The above from Malcolm Slaney, Interval Research, and John Lazzaro, U.C. Berkeley.]

Another interesting piece of information on pitch perception can be found at http://www.prosoniq.com/time_pitch_faq.html. [Stephan M. Sprenger, sms@prosoniq.com]

Q2.6: What standards exist for digital audio? What is AES/EBU? What is S/PDIF?

Updates 1/8/97
 
 

Q2.6.1: Where can I get copies of ITU (formerly CCITT) standards?

Try the ITU (International Telecommunication Union) homepage at http://www.itu.ch/.

 

 


Q2.6.2: What standards are there for digital audio?

AES/EBU

The "AES/EBU" (Audio Engineering Society / European Broadcast Union) digital audio standard is probably the most popular digital audio standard today. Most consumer and professional digital audio devices (CD players, DAT decks, etc.) that feature digital audio I/O support AES/EBU.


AES/EBU is a bit-serial communications protocol for transmitting digital audio data through a single transmission line. It provides two channels of audio data (up to 24 bits per sample), a method for communication control and status information ("channel status bits"), and some error detection capabilities. Clocking information (i.e., sample rate) is derived from the AES/EBU bit stream, and is thus controlled by the transmitter. The standard mandates use of 32 kHz, 44.1 kHz, or 48 kHz sample rates, but some interfaces can be made to work at other sample rates.

AES/EBU provides both "professional" and "consumer" modes. The big difference is in the format of the channel status bits mentioned above. The professional mode bits include alphanumeric channel origin and destination data, time of day codes, sample number codes, word length, and other goodies. The consumer mode bits have much less information, but do include information on copy protection (naturally). Additionally, the standard provides for "user data", which is a bit stream containing user-defined (i.e., manufacturer-defined) data. According to Tim Channon, "CD user data is almost raq CD subcode; DAT is StartID and SkipID. In professional mode, there is an SDLC protocol or, if DAT, it may be the same as consumer mode."

The physical connection media are commonly used with AES/EBU: balanced (differential), using two wires and shield in three-wire microphone cable with XLR connectors; unbalanced (single-ended), using audio coax cable with RCA jacks; and optical (via fiber optics).

S/P-DIF

"S/P-DIF" (Sony/Philips Digital Interface Format) typically refers to AES/EBU operated in consumer mode over unbalanced RCA cable. Note that S/P-DIF and AES/EBU mean different things depending on how much of a purist you are in the digital audio world; see the Finger article below.
References:
 Finger, Robert, AES3-199X: The Revised Two Channel Digital Audio Interface (DRAFT), presented at the 91st Convention of the Audio Engineering Society, October 4-8, 1991. Reprints: AES, 60 East 42nd St., New York, NY, 10165.


[The above from Phil Lapsley and Tim Channon, tchannon@black.demon.co.uk]

Painter, E. M., and Spanias, A. S. (1997 and revised 1999). A Review of Algorithms for Perceptual Coding of Digital Audio Signals. (PostScript, 3MB) http://www.eas.asu.edu/~spanias/papers.html

[Andreas Spanias, spanias@asu.edu]


Q2.7: What is mu-law encoding? Where can I get source for it?

Updated 9/13/99
 
 
Mu-law (also "u-law") encoding is a form of logarithmic quantization or companding. It's based on the observation that many signals are statistically more likely to be near a low signal level than a high signal level. Therefore, it makes more sense to have more quantization points near a low level than a high level. In a typical mu-law system, linear samples of 14 to 16 bits are companded to 8 bits. Most telephone quality codecs (including the Sparcstation's audio codec) use mu-law encoded samples.


Desktop Sparc machines come with routines to convert between linear and mu-law samples. On a desktop Sparc, see the man page for audio_ulaw2linear in /usr/demo/SOUND/man.

To obtain:
 Craig Reese posted the source of similar routines to comp.dsp in August '92. These are archived on ftp://mirriwinni.cse.rmit.edu.au/pub/dsp/misc/ulaw_reese.
References:
 ITU-T (formerly CCITT) Recommendation G.711 (very difficult to follow).


Michael Villeret, et. al, A New Digital Technique for Implementation of Any Continuous PCM Companding Law, IEEE Int. Conf. on Communications, 1973, vol. 1, pp. 11.12-11.17.

MIL-STD-188-113, Interoperability and Performance Standards for Analog-to-Digital Conversion Techniques, 17 February 1987.

TI Digital Signal Processing Applications with the TMS320 Family (TI literature number SPRA012A), pp. 169-198.

[From Joe Campbell; Craig Reese, cfreese@super.org; Sepehr Mehrabanzad, sepehr@falstaff.dev.cdx.mot.com; Keith Kendall, KLK3%mimi@magic.itg.ti.com]


Q2.8: How can I do CD <-> DAT sample rate conversion?

Updated 9/13/99
CD players use a 44.1 kHz sample rate, whereas DAT uses a 48 kHz sample rate. This means that you must do sample rate conversion before you can get data from a CD player directly into a DAT deck.

[From Ed Hall, edhall@rand.org:]

For a start, look at Multirate Digital Signal Processing by Crochiere and Rabiner (see FAQ section 1.1).

Almost any technique for producing good digital low-pass filters will be adaptable to sample-rate conversion. 44.1:48 and vice-versa is pretty hairy, though, because the lowest whole-number ratio is 147:160. To do all that in one go would require a FIR with thousands of coefficients, of which only 1/147th or 1/160th are used for each sample--the real problem is memory, not CPU for most DSP chips. You could chain several interpolators and decimators, as suggested by factoring the ratio into 3*7*7:2*2*2*2*2*5. This adds complexity, but reduces the number of coefficients required by a considerable amount.

[From Lou Scheffer:]

Theory of operation: 44.1 and 48 are in the ratio 147/160. To convert from 44.1 to 48, for example, we (conceptually):
 
 

  1. interpolate 159 zeros between every input sample. This raises that data rate to 7.056 MHz. Since it is equivalent to reconstructing with delta functions, it also creates images of frequency f at 44.1-f, 44.1+f, 88.2-f, 88.2+f, ...
  2. We remove these with an FIR digital filter, leaving a signal containing only 0-20 KHz information, but still sampled at a rate of 7.056 MHz.
  3. We discard 146 of every 147 output samples. It does not hurt to do so since we have no content above 24 KHz. In practice, of course, we never compute the values of the samples we will throw out.
So we need to design an FIR filter that is flat to 20 KHz, and down at least X db at 24 KHz. How big does X need to be? You might think about 100 db, since the max signal size is roughly +-32767, and the input quantization +- 1/2, so we know the input had a signal to broadband noise ratio of 98 db at most. However, the noise in the stopband (20KHz-3.5MHz) is all folded into the passband by the decimation in step 3, so we need another 22 db (that's 160 in db) to account for the noise folding. Thus 120 db rejection yields a broadband noise equal to the original quantizing noise. If you are a fanatic, you can shoot for 130 db to make the original quantizing errors dominate, and a 22.05 KHz cutoff to eliminate even ultrasonic aliasing. You will pay for your fanaticism with a penance of more taps, however.
To obtain:
 There's a free implementation of Julius O. Smith III and someone else's "bandwidth-limited interpolation" rate conversion algorithm.


A paper available as file://ccrma-ftp.stanford.edu/pub/DSP/Tutorials/BandlimitedInterpolation.eps.Z explains the algorithm. Free source code, as well as an HTML discussion of the algorithm, is available at http://ccrma-www.stanford.edu/~jos/resample/. It all works quite well.

[From Kevin Bradley, kb+@andrew.cmu.edu:]

There is an implementation of polyphase resampling for various rates as a part of the Sox audio toolkit at http://home.sprynet.com/~cbagwell/sox.html. See file polyphas.c for details.

Sox also contains an implementation of bandlimited interpolation and linear interpolation, and serves as a ready vehicle for module experimentation.

[From Fritz M. Rothacher, f.rothacher@ieee.org:]

You can add my Ph.D. thesis on sample-rate conversion to the FAQ:

Fritz M. Rothacher, Sample-Rate Conversion: Algorithms and VLSI Implementation, Ph.D. thesis, Integrated Systems Lab, Swiss Federal Institute of Technology, ETH Zuerich, 1995, ISBN 3-89191-873-9

It can also be downloaded from my homepage at http://www.guest.iis.ee.ethz.ch/~rota.


Q2.9: Wavelets

Updated 6/3/98

Q2.9.1 What are wavelets? Where can I get more information?

In short, wavelets are a way to analyze a signal using base functions which are localized both in time (as diracs, but unlike sine waves), and in frequency (as sine waves, but unlike diracs). They can be used for efficient numerical algorithms and many DSP or compression applications.


The mathematical theory behind wavelets (and other related transforms) is given in the appendix of the XWPL reference manual. The XWPL manual can be found at http://venus.javeriana.edu.co/WAVELETS/.

Other sources of information on wavelets are:
 
 


Q2.9.2 What are some good books and papers on wavelets

The best introduction to wavelet transforms is in:
Wavelets and Signal Processing- Oliver Rioul and Martin Vetterli, IEEE Signal Processing magazine, Oct. 91, pp 14-38
A good introductory book on wavelets:
Randy K. Young, Wavelet Theory and Its Applications, Kluwer Academic Publishers, ISBN 0-7923-9271-X, 1993.
A more thorough book:
Ali N. Akansu and Richard A. Haddad, Multiresolution Signal Decomposition Transforms, Subbands, Wavelets Academic Press, Inc., ISBN 0-12-047140-X
A couple more interesting papers:
Wavelets and Filter banks: Theory and Design, IEEE Transactions on Signal Processing, Vol. 40, No.9, Sept. 1992, pp 2207-2232

Mac Cody's articles in Dr. Dobb's Journal, April 1992 and April 1993

Paper by Ingrid Daubechies in IEEE Trans. on Info. theory , vol 36. No.5 , Sept 1990 and a book titled " Ten lectures on Wavelets" deal with the mathematical aspects of the WT.


Q2.9.3: Where can I get some software for wavelets?

ftp://pascal.math.yale.edu/pub/wavelets/software/xwpl

Binaries are available for the following platforms: Sun Sparcstations running SunOS 4.1 or Solaris 2.3, NeXT machines running NeXTstep 3.0 or higher, with an X server, Silicon Graphics machines (IRIS), DEC Alpha AXP running OSF/1 1.2 or higher, i386/i486 PC compatible with Linux 0.99.

There is also a sample data directory containing interesting signals.

More information:
http://www.math.yale.edu/users/majid/


[From Fazal Majid majid@math.yale.edu]:

Rice Wavelet Tools

Description:
Announcing Release 2.0 of rice-wlet-tools. This is a collection of MATLAB of "mfiles" and "mex" files for twoband and M-band filter bank/wavelet analysis from the DSP group and Computational Mathematics Laboratory (CML) at Rice University, Houston, TX. This release includes application code for Synthetic Aperture Radar despeckling and for deblocking of JPEG decompressed Images.


The programs have been tested on Sparcstations running SunOS 4.1.n with MATLAB 4.1. However, the "mex" code is generic and should run on other platforms (you may have to tinker the Makefiles a little bit to make this work). There are several utility routines all of them callable from MATLAB. All the C files (leading to the mex files) can also be directly accessed from other C or Fortran code.

A collection of of papers and tech. reports from the DSP group is also available. You could obtain this distribution of software and papers by anonymous ftp from cml.rice.edu.

To obtain:
 Anonymous FTP: cml.rice.edu (128.42.62.23) in directories /pub/reports and /pub/software.


Report problems/bugs and installation info on non-SUN/non-unix platforms send mail to wlet-tools@rice.edu (or ramesh@dsp.rice.edu)


Q2.10: How do I calculate the coefficients for a Hilbert transformer?

Updated 6/3/98
For all the gorey details, I suggest the paper: Andrew Reilly and Gordon Frazer and Boualem Boashash: Analytic signal generation---tips and traps, IEEE Transactions on Signal Processing, no. 11, vol. 42, Nov. 1994, pp. 3241-3245.

For comp.dsp, the gist is:

  1. Design a half-bandwidth real low-pass FIR filter using whatever optimal method you choose, with the principle design criterion being minimisation of the maximum attenuation in the band f_s/4 to f_s/2.

  2.  

     

  3. Modulate this by exp(2 pi f_s/4 t), so that now your stop-band is the negative frequencies, the pass-band is the positive frequencies, and the roll-off at each end does not extend into the negative frequency band.

  4.  

     

  5. either use it as a complex FIR filter, or a pair of I/Q real filters in whatever FIR implementation you have available.
If your original filter design produced an impulse response with an even number of taps, then the filtering in 3 will introduce a spurious half-sample delay (resampling the real signal component), but that does not matter for many applications, and such filters have other features to recommend them.

Andrew Reilly [Reilly@zeta.org.au]

Previous section (1)Next section (3)