Joke Collection Website - Public benefit messages - What's the use of TTS

What's the use of TTS

TTS & lt/B& gt； Is the abbreviation of Text To Speech, that is, "from text to speech". It is an outstanding work that uses linguistics and psychology at the same time. With the support of the built-in chip, it intelligently converts words into natural speech streams through the design of neural networks. TTS technology can convert text files in real time, and the conversion time can be calculated in seconds. Under the function of its unique intelligent voice controller, the voice rhythm of text output is smooth, which makes listeners feel natural when listening to information, without the indifference and astringency of machine voice output. TTS speech synthesis technology will soon cover the first and second Chinese characters in the national standard, with English interface, automatic identification of Chinese and English, and support mixed reading of Chinese and English. All the voices are pronounced in real Mandarin, which realizes the rapid speech synthesis of 120- 150 Chinese characters per second, and the reading speed reaches 3-4 Chinese characters per second, allowing users to hear clear and pleasant sound quality and coherent and smooth intonation. Now a few MP3 Walkmans have TTS function.

TTS is a speech synthesis application, which converts files (such as help files or web pages) stored in a computer into natural speech output. TTS can help people with visual impairment to read information on the computer, or it can be simply used to increase the readability of text documents. Today's TTL applications include voice-driven e-mail and voice induction systems. TTS is usually used with speech recognition programs. There are many TTS products now, including Read Please 2000, Proverbe phonetic unit, and TextAloud with Next Up technology. Lucent, Elan, and at & amp； T has its own speech synthesis products.

In addition to TTS software, many manufacturers also provide hardware products, including Quick Link Pen from WizCom Technologies in Israel, which is a pen-like device that can scan and read texts. There is also Road Runner of ostrich software, a handheld device that can read ASCII text; In addition, DecTalk TTS of DEC Company in the United States is an external hardware device that can replace the sound card. It contains an internal software device that can work with the sound card that comes with the personal computer. TTS text-to-speech conversion is widely used, including e-mail reading and voice prompt of IVR system. At present, IVR system has been widely used in various industries (such as telecommunications, transportation and so on). ).

The key technology of TTS is speech synthesis. The early TTS was generally realized by special chips, such as TMS50C 10/TMS50C57 of Texas Instruments and PH84H36 of Philips, but it was mainly used in household appliances or children's toys.

TTS based on microcomputer application is generally realized by pure software, which mainly includes the following parts:

Text analysis-Linguistic analysis of the input text, lexical, grammatical and semantic analysis sentence by sentence, and determination of the underlying structure of the sentence and the phoneme composition of each word, including sentence breaking, word segmentation, polyphonic word processing, number processing, abbreviation processing, etc.

● Speech synthesis-words or phrases corresponding to the processed text are extracted from the speech synthesis database, and the language description is converted into speech waveforms.

● Prosodic processing-The quality of synthesized speech refers to the quality of speech output by the speech synthesis system, which is generally evaluated subjectively in terms of clarity (or intelligibility), naturalness and coherence. Clarity is the percentage of correct listening to meaningful words; Naturality is used to evaluate whether the quality of synthesized speech is close to human voice and whether the intonation of synthesized words is natural. Coherence is used to evaluate the fluency of synthesized sentences.

To synthesize high-quality speech, the algorithm used is extremely complex, so the requirements for the machine are also very high. The complexity of the algorithm determines the system capacity of microcomputer concurrent multi-channel TTS at present.

In the general CTI application system, there will be IVR (Interactive Voice Response System). IVR system is an important part of call center. Through IVR system, users can input information by pressing buttons with audio, and get pre-recorded digital or synthesized voice information from the system. IVR with TTS function can speed up the service and save the service cost, so that IVR can provide callers with 7*24 hours service.

At present, most common IVR systems are composed of voice cards plugged into the general industrial computer platform, which support Chinese speech synthesis TTS and other technologies.

Typical telephone service processes, including TTS services, can be divided into:

When the user dials in, the system IVR will respond and get the user's key and other information.

IVR applies for relevant data from the database server according to the key information of users.

The database server returns text data to IVR.

IVR sends text information to TTS server through its TCP communication interface.

The TTS server sends the voice data segment synthesized by the user text to the IVR server through the TCP communication interface.

IVR server assembles segmented voice data into independent voice files.

IVR plays corresponding voice files to telephone users.

General public network access (IVR) mostly adopts industrial computer+voice card, and the synthesized voice data is transmitted to IVR through local area network. This structure is only suitable for simple applications. Includes Chinese speech processing and speech synthesis, which uses Chinese prosody and other related knowledge to perform word segmentation, part-of-speech judgment, phonetic notation and digital symbol conversion on Chinese sentences, and speech synthesis obtains speech by querying Chinese speech database. At present, the famous TTS systems in China are: IBM, Microsoft, Fujitsu, Iflytek and Jietong Huasheng. At present, there are many problems in Chinese prosody processing, symbol number, polyphonic words and word formation, which need to be studied continuously to make Chinese speech synthesis more natural. CTI technology combines telecommunications and computers, overcomes the shortcomings of traditional telecommunications and computer services, and combines them perfectly. It has a wide range of applications, and CTI technology will be used in any system that needs voice and data communication, especially those systems that want to combine computer network and communication network to complete voice and data information exchange.

TTS (Text To Speech) involves acoustics, linguistics, mathematical signal processing technology, multimedia technology and other disciplines. TTS is a cutting-edge technology in the field of Chinese information processing, which realizes the conversion of any text appearing in a computer into natural and smooth voice output.

TTS can be applied to IVR (Interactive Voice Response) server in CTI system, providing a voice interaction platform, providing voice prompts for users' telephone calls, guiding users to select service content and input data needed for telephone transactions, accepting information input by users on the telephone dialing keyboard, and realizing interactive access to computer databases and other information materials.

The application of TTS in IVR can automatically convert text information into voice files, and can also synthesize text information into voice in real time and publish it by telephone. Realize the automatic two-way conversion between text and voice, so as to realize the automatic interaction between people and the system and serve customers anytime and anywhere. Maintenance personnel no longer need manual recording, but only need to import the electronic document into the system, and the system can automatically convert the electronic document into voice information and play it to customers. A large number of data stored in the database can be queried at any time according to the query conditions, and synthetic voice can be played without recording in advance, which greatly reduces the workload of the seating staff.

So how to apply additional TTS functions to CTI? Some advanced switching platforms have realized the function of TTS inside the switch and provided it as a part of the standard interface. Business developers can use this function in their business simply by calling them.

For PBX without TTS function, business developers need to choose a suitable platform and carry out secondary development on this basis, that is, call the standard interface provided by the selected TTS platform to realize speech synthesis function.

At present, CTI has become one of the fastest growing industries in the world, with an annual growth rate as high as 50%. CTI, like the computer industry, is a pyramid-shaped industrial chain, which will increase its value by at least 20 times from top to bottom. As an attractive new technology, TTS will have a better application prospect if it can be well embedded in the application of value-added services.

Hangzhou Yintong Software Co., Ltd. is a high-tech company approved by the Ministry of Education and the People's Government of Zhejiang Province, relying on Zhejiang University. Yintong Company mainly devotes itself to the research and development of computer voice technology, and gradually develops research in other voice fields such as voice recognition and voice streaming media transmission. Its core technology (Intone_TTS) is a Chinese speech synthesis technology with independent intellectual property rights, which was unanimously recognized as the leading position in China by experts in the appraisal organized by Zhejiang Science and Technology Department, and applied for a number of national patents.

Intone_TTS is a development toolkit for converting text information into voice information, which provides a complete interface function and programming examples for system integrators and software developers, enabling users to flexibly call and integrate into other application systems. The interface needs the support of speech synthesis runtime and is suitable for various development environments. Developers can choose according to specific applications.

It can synthesize all Chinese characters, English and Arabic numerals;

Support the editing of traditional Chinese characters and polyphonic characters;

Synthetic effect: natural and smooth;

Standard function call interface, supporting the call of Microsoft SAPI; Support synchronous calls and asynchronous calls;

Support PCM Wave, uLaw/aLaw Wave, ADPCM, Dialogvox and other voice formats;

Support GB23 12 code (Simplified Chinese), BIG5 code (Traditional Chinese) and UNICODE code;

Support multi-channel synchronous synthesis;

Support mainstream voice boards such as Dialogic, Jin Dong and Sanhui; TTS means text to speech, text to speech, and text reading, with similar meanings. It is often used in the development of voice systems.

At present, there are many TTS in the market, and the implementation methods are various, some of which are very expensive, such as Iflytek, which is said to be funded by the 863 Program and has high technology; Some are relatively cheap, such as Jietong Huasheng, InfoTalk；; There are also free, such as Microsoft's TTS products.

Compared with ASR (Automatic Speech Recognition), the technical difficulty required to realize a TTS product is not great, which seems to me to be a chore.

If we want to make a TTS that can read Chinese sentences aloud, what will we do?

There is also the simplest TTS, that is, every word must be pronounced. You will ask, aren't you going to record more than 6 thousand Chinese characters? Fortunately, Chinese has fewer syllables and more homophones. At most, we only need to record: the number of initials × the number of finals ×4 (in fact, not every pronunciation has four sounds), so we only need to record hundreds of sounds at most.

When synthesizing, you need a Chinese character comparison table corresponding to pinyin, and the Chinese pinyin input method also depends on this table, which can be found online, but usually there are no four tones, so you have to add them yourself, hehe, or it's hard work.

TTS can be effective, especially reading some Chinese sentences with no special meaning, such as name, home address and stock code, which sounds clear enough. This is because our great mother tongue is usually monosyllabic. Since ancient times, every Chinese character has a word to express a meaning. Moreover, Chinese characters are different from English. English is read continuously, and the tone rhythm changes greatly, so Chinese characters are much simpler.

Of course, you still have to deal with some details, such as polyphony, it is wrong to pronounce "bank" as "bank"; Such as punctuation, numbers and letters, these problems are certainly not difficult for you who have written many programs.

Some TTS with voice cards in China, whether they are sold for money or free, generally do this, which is the effect.

If you want to improve the effect of TTS, you should make more efforts to record basic words as pronunciations, such as common two-character idioms and four-character idioms, and then make a comparison table between thesaurus and pronunciation database, and look it up in the thesaurus every time you need to synthesize them. In this way, taking words as a unit is naturally much better than taking words as a unit. Of course, there is another technology, which is word segmentation technology. It is also a bit technical to break down complex sentences into reasonable word order. This is also due to the pioneers of the new culture, who advocated vernacular Chinese and introduced the horizontal format and punctuation of western languages, but did not introduce the spatial word segmentation of western languages. However, even if the word segmentation algorithm is not so efficient and accurate, it is not a big problem. As I said before, Chinese characters are monosyllabic words, and there are generally no phonetic mistakes.

Of course, Iflytek has done a lot of hard work. It is said that he has evolved to record common sentences. It is conceivable that more efforts should be made to get better results.

As for adding some "characters" at the junction and getting some decorative colors, I think it doesn't matter, and the overall effect is not improved much.

Commercialized TTS in the market generally supports Cantonese, so please ask a Cantonese announcer to record it and do it again.

On the other hand, many people think it is best to find an announcer on the radio or TV station to record it. In fact, find a female colleague around you to record it, as long as it is clear and clear. Sometimes, ordinary voice is more lovely than clear pronunciation and mellow voice news broadcast.

Let's talk about the recognition of words first. For complex text, some content programs can't handle it and need to be identified. For example, should the simple number "128" be read as "128" or "128"? The solution is usually to add XML tags, such as Microsoft TTS: "

Let me talk about TTS application programming first. Microsoft's TTS programming interface is called SAPI, which is a COM interface. It is still a bit troublesome to develop, but fortunately, the information on the MSDN website is very comprehensive. Although Microsoft's TTS is free, at present, the Chinese character is a male voice, and the voice is a little cloudy and uncomfortable.

Generally, domestic manufacturers provide API calling interfaces, which are relatively simple and easy to embed in applications.

Commercial TTS also has a concurrent license restriction, which is to limit the number of concurrent threads synthesized at the same time. I don't think this restriction is very useful. No matter what TTS, you can convert a text file into a voice file for the voice card to play. Most applied sentences are short, generally no more than 100 Chinese characters, and the synthesis time is very short. As long as one thread is responsible for composition, other applications can request from this thread. In case the sentence is long, break it into several short sentences, and the playback speed is always slower than the synthesis speed.

Many applications are synthesized offline, and there is no real-time requirement, let alone buying multiple licenses.

In more cases, we don't even need to buy TTS, such as the common cost reminder in voice development. After dialing, we typed: "Dear customer, your fee this month is 2 12 yuan". The former part is the same for all customers. Just record a voice file. Digital synthesis is very simple. You only need to record 10 digital voice, plus 100 yuan.

TTS (Training+Tools+Scheme) exceeded the plan.

In view of the human resources problems encountered by growing enterprises at present, it is an important intellectual project to solve the bottleneck of human resources in a three-dimensional way and construct and realize the direction of human resources through talents and experts. Train senior human resource management talents for enterprises, provide advanced human resource management tools and assist enterprises to establish modern human resource strategic planning. Through the method of "training)+tools)+scheme", we can solve the human resource problems for the enterprise system and then build a scientific and perfect human resource management system.

TTS Tianjin wharf surcharge

Tianjin port surcharge. Fees collected through ships in Japan and South Korea in 2009.

Previous article:What does the Internet account risk reminder SMS mean?
Next article:How to do disability identification?