Joke Collection Website - Bulletin headlines - How many bits and bytes does a character equal?

How many bits and bytes does a character equal?

In different encoding, the corresponding relationship between characters and bytes is different, which is related to the encoding method. 1 byte corresponds to 8 bits.

The common correspondence between coded characters and bytes is as follows:

1, ASCII code, an English letter (regardless of case) takes up one byte of space, and a Chinese character takes up two bytes of space. As a digital unit in a computer, a sequence of binary numbers is usually an 8-bit binary number, which is converted into a decimal number. The minimum value is 0 and the maximum value is 255.

2. In UTF 8 coding, one English character is equal to one byte, and one Chinese (including traditional Chinese) is equal to three bytes.

3. In Unicode coding, one English is equal to two bytes, and one Chinese (including traditional) is equal to two bytes.

Symbols: English punctuation marks account for one byte and Chinese punctuation marks account for two bytes. For example, the British period "." 1 byte, Chinese full stop. Its size is 2 bytes.

4. The encoding method of 4.GBK is that Chinese accounts for two bytes and English accounts for 1 byte.

Extended data:

UTF 8 is a very common coding method. It is precisely because the conversion of UTF 8 characters and bytes is not fixed that you can't judge the number of bytes of UTF 8 text from the number of UNICODE characters.

UTF-8 is a variable-length encoding, which requires 2 bytes to encode those characters that only need 1 byte in the extended ASCII character set.

ISO Latin- 1 is a subset of UNICODE, but not UTF-8. The 8-character UTF-8 code will be filtered by the e-mail gateway, because Internet information was originally designed as 7-bit ASCII code. Therefore, UTF-7 coding is produced. ?

The probability that UTF-8 uses the value of 100xxxxx in its representation exceeds 50%, but the existing systems such as ISO 2022, 4873, 6429 and 8859 will mistake it for the control code of C 1. Therefore, UTF-7.5 coding is produced.

Baidu Encyclopedia-People

Baidu encyclopedia-bytes

Baidu Encyclopedia-Coding