Joke Collection Website - Mood Talk - Brief introduction of digital signal processor knowledge

Brief introduction of digital signal processor knowledge

There are many algorithms of DSP. Most DSP processors adopt fixed-point algorithm, and the numbers are expressed as integers or decimals between-1.0 and+1.0. Some processors use floating-point algorithm, and the data is expressed as mantissa plus exponent: mantissa ×2 exponent.

Floating-point algorithm is a relatively complex conventional algorithm, and a large dynamic range of data can be realized by using floating-point data (this dynamic range can be expressed by the ratio of the maximum value to the minimum value). In the application of floating-point DSP, design engineers don't have to care about dynamic range and accuracy. Floating-point DSP is easier to program than fixed-point DSP, but it has high cost and power consumption.

Because of the cost and power consumption, fixed-point DSP is generally used for mass production. Programmers and algorithm designers determine the required dynamic range and accuracy through analysis or simulation. If the requirements are easy to develop, wide dynamic range and high precision, floating-point DSP can be considered.

Floating-point operation can also be realized by software when using fixed-point DSP, but this kind of software program will take up a lot of processor time, so it is rarely used. An effective method is "block floating point", that is, a group of data with the same exponent but different mantissas are regarded as data blocks. Block floating-point processing is usually implemented in software. The word width of all floating-point DSPs is 32 bits, while the word width of fixed-point DSPs is generally 16 bits. There are also 24-bit and 20-bit DSPs, such as Motorola's DSP563XX series and Zoran's ZR3800X series. Because the word width has a great relationship with the external size of DSP, the number of pins and the size of the required memory, the word width directly affects the cost of the device. The wider the word width, the larger the size, the more pins, the greater the memory demand and the corresponding increase in cost. Under the condition of meeting the design requirements, try to choose DSP with small word width to reduce the cost.

When choosing between fixed-point and floating-point, we can balance the relationship between word width and development complexity. For example, a 16 bit wide DSP device can also realize a 32-bit wide double-precision algorithm by combining instructions (of course, the double-precision algorithm is much slower than the single-precision algorithm). If single precision can meet most calculation requirements and only a few codes need double precision, this method is also feasible, but if most calculations require high precision, you need to choose a processor with large characters and wide widths.

Please note that the widths of instruction words and data words of most DSP devices are the same, but there are some differences. For example, ADI's ADSP-2 1XX series has a data word of 16 bits and an instruction word of 24 bits. Whether the processor meets the design requirements depends on whether it meets the speed requirements. There are many ways to test the speed of the processor. The most basic thing is to measure the instruction cycle of the processor, that is, the time required for the processor to execute the fastest instruction. Divide the reciprocal of the instruction cycle by one million and multiply it by the number of instructions executed in each cycle, and the result MIPS the highest speed of the processor, in millions of instructions per second.

However, the instruction execution time does not indicate the real performance of the processor. Different processors have different tasks in one instruction, so simply comparing the execution time of instructions can't fairly distinguish the performance differences. At present, some new DSPs adopt VLIW architecture. In this architecture, multiple instructions can be implemented in a single cycle time, and each instruction completes fewer tasks than the traditional DSP. Therefore, compared with VLIW and general DSP devices, it is misleading to compare the size of MIPS.

Even the MIPS size between traditional DSPs is one-sided. For example, some processors allow shifting several bits together in a single instruction, while some DSP instructions can only shift a single data bit; Some DSPs can process data unrelated to the executing ALU instruction in parallel (loading operands while executing the instruction), and some DSPs can only support parallel processing of data related to the executing ALU instruction; Some new DSPs allow two Macs to be defined in one instruction. Therefore, only MIPS comparison can not accurately get the performance of the processor.

One way to solve the above problems is to compare the performance of processors with a basic operation (not an instruction). MAC operation is commonly used, but the MAC operation time does not provide enough information to compare the performance differences of DSP. In most DSP, MAC operation is only realized in a single instruction cycle, and its MAC time is equal to the instruction cycle time. As mentioned above, some DSPs handle more tasks than others in a single MAC cycle. MAC time does not reflect the performance used in all applications, such as round robin operation.

The most common method is to define a set of standard routines and compare the execution speed on different DSPs. This routine can be the "core" function of an algorithm, such as FIR or IIR filter, or it can be all or part of an application (such as a speech encoder). Figure 1 shows the performance of several DSP devices tested with BDTI tools.

When comparing the speed of DSP processors, we should pay attention to the publicized MOPS (millions of operations per second) and MFLOPS (millions of floating-point operations per second) parameters, because different manufacturers have different understandings of "operation" and the meaning of the indicators is different. For example, some processors can perform floating-point multiplication and floating-point addition at the same time, so they advertise that their products have twice as many MFLOPS as MIPS.

Secondly, when comparing the processor clock rates, the input clock of DSP may be the same as its instruction rate, or it may be 2 to 4 times the instruction rate, and different processors may be different. In addition, many DSPs have clock multipliers or phase-locked loops, which can use external low-frequency clocks to generate high-frequency clock signals needed on the chip. Speech processing: speech coding, speech synthesis, speech recognition, speech enhancement, voice mail, speech storage, etc.

Image/graphics: 2D and 3D graphics processing, image compression and transmission, image recognition, animation, robot vision, multimedia, electronic map, image enhancement, etc.

Military; Secure communication, radar processing, sonar processing, navigation, global positioning, frequency hopping radio, search and anti-search, etc.

Instruments: spectrum analysis, function generation, data acquisition, seismic processing, etc.

Automatic control: control, deep space operation, automatic driving, robot control, disk control, etc.

Medical treatment: hearing AIDS, ultrasound equipment, diagnostic tools, patient monitoring, electrocardiogram, etc.

Household appliances: digital audio, digital TV, videophone, music synthesis, tone control, toys and games, etc.

Examples of biomedical signal processing:

CT: Computerized X-ray tomography equipment. (Among them, Hausfield of British EMI Company who invented skull CT won the Nobel Prize. )

Computer x-ray space reconstruction device. Whole body scanning, three-dimensional pattern of heart activity, foreign body in brain tumor and reconstruction of human torso image appear.

Electrocardiogram analysis. The performance of DSP is affected by its ability to manage the memory subsystem. As mentioned above, MAC and other signal processing functions are the basic signal processing capabilities of DSP devices. Fast MAC execution requires reading one instruction word and two data words from the memory every instruction cycle. There are many ways to achieve this reading, including multi-interface memory (allowing multiple accesses to the memory in each instruction cycle), independent instruction and data memory ("Harvard" structure and its derivatives) and instruction cache (allowing instructions to be read from the cache instead of the memory, thus freeing the memory for data reading). Figures 2 and 3 show the differences between Harvard memory structure and "von Norman" structure adopted by many microcontrollers.

In addition, pay attention to the supported memory space. The main target market of many fixed-point DSPs is embedded application systems, in which the memory is generally small, so this DSP device has small and medium-sized on-chip memory (about 4K to 64K words) and narrow external data bus. In addition, the address bus of most fixed-point DSPs is less than or equal to 16 bits, so the external storage space is limited.

Some floating-point DSPs have little or no on-chip memory, but the external data bus is very wide. For example, the TMS320C30 of TI Company only has 6K on-chip memory, and its external bus is 24-bit and 13-bit external address bus. ADSP2-2 1060 of analog devices has 4Mb on-chip memory, which can be divided into program memory and data memory in many aspects.

When choosing DSP, it needs to be selected according to the size of storage space and the requirements of external bus for specific applications. DSP processors and processors such as Intel, Pentium or Power.

The general purpose processor (GPPs) of a PC is quite different. These differences stem from the fact that the structure and instructions of DSP are specially designed and developed for signal processing. It has the following characteristics.

Hardware multiplication and accumulation operation

In order to effectively complete multiplication and accumulation operations such as signal filtering, the processor must carry out effective multiplication operations. GPPs was not originally designed for multiplication. The first major technical improvement of DSP, which is different from the early GPP, is the addition of specialized hardware and clear MAC instructions that can perform one-cycle multiplication.

Harvard structure

Traditional GPPs uses von Norman storage structure. In this structure, the memory space is connected to the processor core through two buses (address bus and data bus). This structure can't meet the requirement that MAC must access the memory four times in one instruction cycle. DSP generally adopts Harvard structure. In Harvard structure, there are two storage spaces: program storage space and data storage space. The processor core is connected to these memory spaces through two sets of buses, allowing two accesses to the memory at the same time, thus doubling the bandwidth of the processor. In the Harvard structure, a second data storage space and a bus are sometimes added to achieve greater storage bandwidth. Modern high-performance GPPs usually has two on-chip caches, one for storing data and the other for storing instructions. Theoretically, this dual-chip cache and bus connection is equivalent to Harvard structure. However, GPPs uses control logic to determine which data and instruction words reside in the on-chip cache, which is usually invisible to programmers. In DSP, programmers can clearly control which data and instructions are stored in the on-chip storage unit or cache.

Zero consumption cycle control

The same feature of DSP algorithm: most of the processing time is spent executing a few instructions contained in a relatively small loop. Therefore, most DSP processors have special zero-power loop control hardware. Zero consumption cycle means that the processor can execute a set of instruction cycles without taking time to test the value of the cycle counter, the hardware completes the cycle jump and the cycle counter decays. Some DSPs also implement high-speed single instruction loop through an instruction cache.

Special addressing mode

DSP usually contains a special address generator, which can generate special addressing required by signal processing algorithms, such as cyclic addressing and bit flip addressing. Circular addressing corresponds to pipeline FIR filtering algorithm, and bit flip addressing corresponds to FFT algorithm.

Predictability of execution time

Most DSP applications have strict real-time requirements, and all processing work must be completed within the specified time. This real-time limitation requires programmers to determine how much time each sample needs, or at least in the worst case. The process of DSP executing programs is transparent to programmers, so it is easy to predict the execution time of each job. However, for high-performance GPPs, due to the use of a large number of ultra-high-speed data and program cache, the program is dynamically allocated, so the prediction of execution time becomes complicated and difficult.

Have a wealth of peripherals.

DSP has peripherals such as DMA, serial port, link port and timer.