Complete Video Compression Guide
We will start with basic discussions of analog and digital video, continues with the principles of video compression, and concludes with a description of three compression methods designed specifically for video, namely MPEG-1, MPEG-4, and H.261.
Analog Video
An analog video camera converts the image it “sees” through its lens to an electric voltage (a signal) that varies with time according to the intensity and color of the light emitted from the different image parts. Such a signal is called analog, since it is analogous (proportional) to the light intensity. The best way to understand this signal is to see how a television receiver responds to it.
The CRT
A television receiver (a CRT, or cathode ray tube), is a glass tube with a familiar shape. In the back it has an electron gun (the cathode) that emits a stream of electrons. Its front surface is positively charged, so it attracts the electrons (which have a negative electric charge). The front is coated with a phosphor compound that converts the kinetic energy of the electrons hitting it to light. The flash of light only lasts a fraction of a second, so in order to get a constant display, the picture has to be refreshed several times a second. The actual refresh rate depends on the persistence of the compound . For certain types of work, such as architectural drawing, long persistence is acceptable. For animation, short persistence is a must.
The early pioneers of motion pictures found, after much experimentation, that the minimum refresh rate required for smooth animation is 15 pictures (or frames) per second (fps), so they adopted 16 fps as the refresh rate for their cameras and projectors. However, when movies began to show fast action (such as in westerns), the motion pictures industry decided to increased the refresh rate to 24 fps, a rate that is used to this day. At a certain point it was discovered that this rate can artificially be doubled, to 48 fps (which produces smoother animation), by projecting each frame twice. This is done by employing a double-blade rotating shutter in the movie projector. The shutter exposes a picture, covers it, and exposes it again, all in 1/24 of a second, thereby achieving an effective refresh rate of 48 fps. Modern movie projectors have very bright lamps and can even use a triple-blade shutter, for an effective refresh rate of 72 fps.
The frequency of electric current in Europe is 50 Hz, so television standards used there, such as PAL and SECAM, employ a refresh rate of 25 fps. This is convenient for transmitting a movie on television. The movie, which was filmed at 24 fps, is shown at 25 fps, an undetectable difference. The frequency of electric current in the United States is 60 Hz, so when television came, in the 1930s, it used a refresh rate of 30 fps. When color was added, in 1953, that rate was decreased by 1%, to 29.97 fps, because of the need for precise separation of the video and audio signal carriers. Because of interlacing, a complete television picture is made of two frames, so a refresh rate of 29.97 pictures per second requires a rate of 59.94 frames per second. It turns out that the refresh rate for television should be higher than the rate for movies. A movie is normally watched in darkness, whereas television is watched in a lighted room, and human vision is more sensitive to flicker under conditions of bright illumination. This is why 30 (or 29.97) fps is better than 25. The electron beam can be turned off and on very rapidly. It can also be deflected horizontally and vertically by two pairs (X and Y) of electrodes. Displaying a single point on the screen is done by turning the beam off, moving it to the part of the screen where the point should appear, and turning it on. This is done by special hardware in response to the analog signal received by the television set. The signal instructs the hardware to turn the beam off, move it to the top-left corner of the screen, turn it on, and sweep a horizontal line on the screen. While the beam is swept horizontally along the top scan line, the analog signal is used to adjust the beam’s intensity according to the image parts being displayed. At the end of the first scan line, the signal instructs the television hardware to turn the beam off, move it back and slightly down, to the start of the third (not the second) scan line, turn it on, and sweep that line. Moving the beam to the start of the next scan line is known as a retrace. The time it takes to retrace is the horizontal blanking time.
This way, one field of the picture is created on the screen line by line, using just the odd-numbered scan lines. At the end of the last line, the signal contains instructions for a frame retrace. This turns the beam off and moves it to the start of the next field (the second scan line) to scan the field of even-numbered scan lines. The time it takes to do the vertical retrace is the vertical blanking time. The picture is therefore created in two fields that together make a frame. The picture is said to be interlaced. This process is repeated several times each second, to refresh the picture. This order of scanning (left to right, top to bottom, with or without interlacing) is called raster scan. The word raster is derived from the Latin rastrum, meaning rake, since this scan is done in a pattern similar to that left by a rake on a field. A consumer television set uses one of three international standards. The standard used in the United States is called NTSC (National Television Standards Committee), although the new digital standard is fast becoming popular. NTSC specifies a television transmission of 525 lines (today, this would be 29 = 512 lines, but since television was developed before the advent of computers with their preference for binary numbers, the NTSC standard has nothing to do with powers of two). Because of vertical blanking, however, only 483 lines are visible on the screen. Since the aspect ratio (width/height) of a television screen is 4:3, each line has a size of 4/3 of 483 = 644 pixels.
The resolution of a standard television set is thus 483×644. This may be considered at best medium resolution. (This is the reason why text is so hard to read on a standard television.)The aspect ratio of 4:3 was selected by Thomas Edison when he built the first movie cameras and projectors, and was adopted by early television in the 1930s. In the 1950s, after many tests on viewers, the movie industry decided that people prefer larger aspect ratios and started making wide-screen movies, with aspect ratios of 1.85 or higher. Influenced by that, the developers of digital video opted for the large aspect ratio of 16:9.
Image formats - Aspect ratio
NTSC, PAL, and SECAM TV 1.33
16 mm and 35 mm film 1.33
HDTV 1.78
Widescreen film 1.85
70 mm film 2.10
Cinemascope film 2.35
The concept of pel aspect ratio is also useful and should be mentioned. We usually think of a pel (or a pixel) as a mathematical dot, with no dimensions and no shape. In practice, however, pels are printed or displayed, so they have shape and dimensions. The use of a shadow mask (see below) creates circular pels, but computer monitors normally display square or rectangular pixels, thereby creating a crisp, sharp image (because square or rectangular pixels completely fill up space). It should be emphasized that analog television does not display pixels. When a line is scanned, the beam’s intensity is varied continuously. The picture is displayed line by line, but each line is continuous. The image displayed by analog television is, consequently, sampled only in the vertical dimension. NTSC also specifies a refresh rate of 59.94 (or 60/1.001) frames per second and can be summarized by the notation 525/59.94/2:1, where the 2:1 indicates interlacing. The notation 1:1 indicates progressive scanning (not the same as progressive image compression). The PAL television standard (phase alternate line), used in Europe and Asia, is summarized by 625/50/2:1. The quantity 262.5×59.94 = 15734.25 KHz is called the line rate of the 525/59.94/2:1 standard. This is the product of the frame size (number of lines per frame) and the refresh rate.
It should be mentioned that NTSC and PAL are standards for color encoding. They specify how to encode the color into the analog black-and-white video signal. However, for historical reasons, television systems using 525/59.94 scanning normally employ NTSC color coding, whereas television systems using 625/50 scanning normally employ PAL color coding. This is why 525/59.94 and 625/50 are loosely called NTSC and PAL, respectively. A word on color: Most color CRTs today use the shadow mask technique They have three guns emitting three separate electron beams. Each beam is associated with one color, but the beams themselves, of course, consist of electrons and do not have any color. The beams are adjusted such that they always converge a short distance behind the screen. By the time they reach the screen they have diverged a bit, and they strike a group of three different (but very close) points called a triad.
The screen is coated with dots made of three types of phosphor compounds that emit red, green, and blue light, respectively, when excited. At the plane of convergence there is a thin, perforated metal screen: the shadow mask. When the three beams converge at a hole in the mask, they pass through, diverge, and hit a triad of points coated with different phosphor compounds. The points glow at the three colors, and the observer sees a mixture of red, green, and blue whose precise color depends on the intensities of the three beams. When the beams are deflected a little, they hit the mask and are absorbed. After some more deflection, they converge at another hole and hit the screen at another triad. At a screen resolution of 72 dpi (dots per inch) we expect 72 ideal, square pixels per inch of screen. Each pixel should be a square of side 25.4/72 . 0.353 mm. However, as Figure 6.4a shows, each triad produces a wide circular spot, with a diameter of 0.63 mm, on the screen. These spots highly overlap, and each affects the perceived colors of its neighbors. When watching television, we tend to position ourselves at a distance from which it is comfortable to watch. When watching from a greater distance we miss some details, and when watching closer, the individual scan lines are visible. Experiments show that the comfortable viewing distance is determined by the rule: The smallest detail that we want to see should subtend an angle of about one minute of arc (1/60).. We denote by P the height of the image and by L the number of scan lines. The relation between
Bookmarks