Joke Collection Website - Blessing messages - Audio streaming introduction and details

Audio streaming introduction and details

Introduction to research background

For a long time, audio data itself is just an opaque binary stream represented by a series of non-semantic symbols. It lacks the description of the structured organization and high-level semantics of sound types, giving audio In-depth processing and analysis of signals brings many difficulties, which greatly limits applications such as Automatic Speech Recognition (ASR) and Content-Based Audio Retrieval (CBAR). How to extract structured information of audio types from continuous audio stream signals, divide them into single-category audio clips according to different acoustic types such as speech, music, and environmental sounds, and mark the demarcation point position and type information of each audio clip. That is, continuous audio stream classification is a key technology and basic premise for solving problems such as in-depth processing, analysis, and retrieval of audio information. It is also a powerful tool to assist in the automatic segmentation and classification of video signals, and has broad application prospects. Structured information

The audio type structured information given by continuous audio stream classification is the basis for establishing an audio database index and realizing the association of low-level structural units to high-level semantic structural units. CBAR technology usually analyzes the structure and semantics of audio streams, extracts high-level semantic information from different types of audio signals, such as extracting keywords from speech parts, and establishing their structured organization and index to make "disordered" audio streams Become "organized" to facilitate user retrieval and browsing. It can be seen that if the sound category and location information contained in the audio stream are not known, it is impossible to extract high-level semantics. Related introduction

Continuous audio stream classification technology can be used as an effective auxiliary tool for automatic segmentation and classification of video footage. Due to limitations of technological development, computers cannot "understand" the content of videos by relying solely on existing image and video processing technologies. The accuracy of video shot segmentation is not high, and story units cannot be classified based on content. If continuous audio stream classification technology is used to segment the audio corresponding to the video stream, more practical speech recognition technology is used to perform speech recognition on the speech part, and mature full-text retrieval technology is used to process the speech recognition results and extract Text summarization can better solve these problems. Knowledge in the field of audio and video editing and some basic rules tell us that if the audio type corresponding to the video stream does not change, the corresponding video does not need to be segmented. Therefore, by properly segmenting the audio stream, the segmentation efficiency and accuracy of the video can be greatly improved.

In addition, continuous audio stream classification technology can also be used in fields such as audio content understanding (Audio Content Understanding), audio monitoring (Audio Surveillance) and audio scene analysis (Audio Scene Analysis). In short, the characteristics of the audio data itself and the constraints of existing technologies limit the further processing of the audio stream. However, continuous audio stream classification technology can solve this problem well and provide audio stream structuring and in-depth analysis and analysis of audio information. Utilization provides a solid foundation. Classification technology

The use of continuous audio stream classification technology can provide a single speech segment for ASR, which is one of the prerequisites for the practical application of existing speech recognition systems. Existing continuous speech recognition systems have extremely strict requirements for input speech signals. They generally cannot include other types of sounds such as music and environmental sounds, and must have a high signal-to-noise ratio. ASR requires automatic transcription of real-life audio streams such as broadcast news, film and television, conferences, speeches, etc., and generate a "transcription" that contains semantic content, location and other information. We know that the audio streams listed above are usually composed of speech and other types of sounds. If they are not pre-segmented and the non-speech parts are removed, it will definitely have a serious impact on the performance of the speech recognition system.

Streaming media definition

Streaming media refers to continuous time-based media using streaming technology on the Internet. It is audio, video or multimedia files transmitted over the Internet. The key is streaming technology. Streaming mainly refers to the general term for the technology that transmits media (such as video and audio) through the Internet. Its specific meaning is to transmit audio and video programs to a single PC through the Internet.

There are two methods to implement streaming: real-time streaming and progressive streaming. Commonly used formats

Common streaming audio formats on the Internet mainly include the *.RA format of the American Realneork Company and the *.WMA format of the Microsoft Corporation. There is also an American Apple* format that is mostly used in professional fields. .MOV format. Among these three formats, the MOV format has the best sound quality, especially in terms of MIDI. It supports both GS and GM sounds, and the playback effect is obviously better than Windows media player. I will give you a detailed explanation below. Introducing the characteristics of these formats:

RealAudio format

This is a veteran product of the American RealNeork company and is currently the most popular streaming media technology on the Internet. Many Internet music stations and video on demand sites use it. RealMedia includes three types of files: RealAudio (sound files), RealVideo (video files) and RealFlash (vector animation).

QuickTime format

QuickTime, like RealMedia, is fully compatible with Macs and PCs. Under the same network speed and file size, its audio and video quality is the best. It consists of three different parts: QuickTime Movie (movie) file format, QuickTime media abstraction layer, and QuickTime built-in media service system.

Windows Media Audio format

WMA (Windows Media Audio) is a heavyweight player from Microsoft. Its predecessor is Microsoft's Netshow. It has a strong background and is also used by many Windows users. Most familiar to readers, its core technology is ASF (Advanced Streaming Format, Advanced Streaming Format). The ASF format supports any compression/decompression encoding method and can use any underlying network transmission protocol. It has great flexibility. Compared with compression standards such as MPEG, it adds the function of control command scripts, which reduces data A high-volume but archive-quality approach to streaming multimedia content distribution.

Although FLASH, which is popular on the Internet, is a vector animation technology, it can also contain sound information and also supports streaming. High-quality FLASH SWF format files have better sound and smaller file size. There are also music websites that use this technology. When listening, you need to install a FLASH plug-in. Considering the Internet speed in 2013, a FLASH plug-in of a few hundred KB can be installed in a short time, and then You can enjoy music in SWF format. Streaming Media Transfer Protocol

In browsers, our common addresses start with: and ftp:. Web servers can also handle streaming media files through protocols, but the design of the Web server itself cannot efficiently deliver streaming media files.

Streaming media must occupy an uninterrupted packet stream and remain connected to the server for a long time. If too many visitors are online to watch at the same time, the performance will be greatly reduced. To solve this problem, streaming media files have its own set of protocols.

1. Real Time Streaming Protocol (RTSP): It is an open web page standard for transmitting streaming media established with the help of RealNeorks. Although it requires the use of a special server called RealServer, RTSP can improve the quality of streaming videos, improve transmission efficiency, and provide better high-traffic processing capabilities. If your ISP has a RealServer service, it is recommended that you use RealServer instead of a Web server to deliver streaming media files.

2. MMS (Media Server protocol, MMS): This is a streaming format transmission protocol defined by Microsoft.

3. Real-time transmission protocol (Theater Server protocol, RTP): This is a transmission protocol used for multimedia data streams on the Internet. RTP is defined as working under one-to-one or one-to-many transmission conditions. Its purpose is to provide time information and achieve stream synchronization. In layman's terms, it is a WEB server on the network.

4. Resource Reserve Protocol (RSVP). Since audio and video data streams are more sensitive to network delays than traditional data, high-quality audio and video information must be transmitted over the network. In addition to bandwidth requirements, other more conditions are required. RSVP is a resource reservation protocol on the Internet under development. RSVP is used to reserve a portion of network resources (ie, bandwidth).

Instead of ftp, these protocols are like mms:61.139.25.41/quake, starting with MMS or RTSP, etc. Player

Each of the above three formats has its own player, which are RealPlayer, QuickTime Player and Windows Media Player.

1. RealPlayer

The Real format has a high compression ratio and good compression and transmission capabilities. It is especially suitable for online playback or online live broadcast. Among video streaming formats The RM format has the lowest quality, but the files are also the smallest. Low-speed network users (non-ADSL and broadband network users) can also easily enjoy video programs online. RealPlay player is also very convenient to use. The system resources occupied are between the other two, making it the best choice for users with low configuration. With the excellent technology of ReaNeorks, it has occupied more than half of the online streaming video and audio on demand market.

2. QuickTime Player

QuickTime Player can provide real-time digital information flow, workflow and file playback functions through the Internet. The quality of QuickTime files is extremely high. The disadvantage is that the files are relatively large. Of course, high-definition and high-quality images often mean larger files and more transmission time. Because of this, QuickTime can only be used on the Internet for some video programs that require high-definition presentation, such as multimedia advertisements, product demonstrations, and high-definition videos. It is a bit difficult to watch in places where the Internet speed is not smooth, and QuickTime Player takes up a lot of system resources, so it requires your machine to be equipped with a good configuration to be able to do it. It is best to have a high-performance computer with a fast CPU and larger memory. .

The latest version of QuickTime PLAYER is now 5.0. Note that QuickTime Player is not free. You can download it from Apple's homepage.

3. Windows Media Player

For WMA player, just use the Windows Media Player that comes with Windows. A major feature of WMA format music is that it does not require an additional player. You can find it in "Start-Programs-Affiliated Files-Entertainment". Its production, publishing and playback software are also integrated with Windows NT/2000/9x. What is even more powerful is that Windows Media has added copyright protection functions, which can limit playback time, playback times and even operating systems, etc. This is a problem for piracy. This is a boon to the beleaguered audiovisual publishers. WindowsMedia files are larger than RealMedia files, and can achieve faster and smoother results than QuichTime during online playback. Streaming function

The audio streaming function is used to play digital music that is too large to be placed in the regular SAMPLE structure. This is also because these files are too large and you want to load the required data each time. part, or you want to do something clever, like generate flying waveforms.

AUDIOSTREAM *play_audio_stream(int len, bits, stereo, freq, vol, pan);

This function creates a new audio stream and starts playing. The length is the size of each transport buffer (sampled sound), which should be at least 2K: larger buffers are more economical and require fewer updates, but there are more buffers between when you provide the data and when it is actually played. wait. The bits parameter must be 8 or 16, freq is the sampling frequency of the data, and the vol and pan values ??use the same 0-255 range as the regular sampled sound playback routine. If once playback starts, you want to adjust the pitch, volume, or audio For the pan value of a stream, you can use the regular voice_*() functions, taking stream->voice as the argument. Sampled sound data is always in unsigned format, and the stereo waveform consists of alternating left/right samples.

void s_audio_stream(AUDIOSTREAM *stream);

Destroy an audio stream when it is no longer needed.

void *get_audio_stream_buffer(AUDIOSTREAM *stream);

While the audio stream is playing, you must call this function at regular intervals to provide the next buffer of sampled sound data (the smaller the buffer, the more frequently it must be called). If Returns NULL, the audio stream still has more to play, so you don't have to do anything. If a value is returned, it is the position at which the next buffer will be played, and you should load the appropriate number of sampled sounds (however you specified when creating the stream) to that address, for example if you fread() it from disk. Load in file. After filling the buffer with data, call free_audio_stream_buffer() so that the new data is available. Note that this function cannot be called from a clock handle.

void free_audio_stream_buffer(AUDIOSTREAM *stream);

Call this function after get_audio_stream_buffer() and return a non-NULL address to indicate that you have loaded a new sample sound into At that address, the data is ready to be played.

Audio streaming

During the 2007 International Consumer Electronics Show (CES) in Las Vegas, STMicroelectronics (ST) demonstrated a practical portable application using Bluetooth interface, infrared interface and Sound Terminal technology. Future products from Sound Terminal include ASSPs (Application Specific Standard Products) integrating these and other interfaces.

"Sound Terminal" is a digital audio streaming concept proposed by ST. Its purpose is to bring high sound quality, low power consumption and low manufacturing cost to popular and fast-growing application fields, such as flat-panel TVs. machines, wireless products and personal audio systems. The high integration level of a single package solution, combined with pure digital stream processing capabilities from sound source to speaker, provides the possibility to design low-cost, high-efficiency, compact sound systems.

The initial products of the Sound Terminal product family include a series of high-quality audio single-chip systems, such as: already on the market for high power (20-80W), medium power (10-20W) and low power (less than 1.5W) STA326 and STA323, these products monolithically integrate a digital audio processor, digital amplifier controller and a DDX digital power output stage. The power output of the STA326 can drive two 30W channels or one 60W channel. Through digital control, it is easy to configure the product into several different output modes; the product has a variety of processing and equalization options, including per-channel Channel up to 4 programmable 28-bit second-order filters and bass/treble controls. Preset modes for various listening conditions can shorten software development time and simplify the product design process.

Because it is a fully digital stream, signal processing in the amplification chain does not require an analog-to-digital converter, so this is a low-cost solution that ensures overall audio quality, with a signal-to-noise ratio (SNR) of up to 100dB and wide dynamic range. scope. The Sound Terminal chip prototype has been successfully developed. The amplifier using ST's digital modulation technology (FFX) with independent intellectual property rights is an example of a product targeted at portability. The chip's amplification efficiency is as high as 94, which is the highest level in the current market; It can provide "heat-free audio power" for portable systems, which helps to significantly extend battery life, and also greatly reduces the size of the radiator, making advanced product design possible.

Built-in digital processing is particularly useful for improving sound quality and tailoring features to specific audio applications; for example, as flat-panel TV designs become thinner, speakers become smaller, and the acoustic properties of the chassis become smaller The less ideal it is, the more important it is to correct the audio signal.

Additionally, Digital Streaming technology is ideal for integration with audio interfaces for wireless speakers and wireless headphones using diffuse infrared, Bluetooth Wireless 2.0 EDR (Enhanced Data Rate), WiFi and UWB (Ultra Wideband) technologies.