The development and characteristics of MPEG video compression technology

1 The development of MPEG and its characteristics

1.1 MPEG-1

Prior to the advent of MPEG, there were two standards for image compression, namely JPEG for still image data compression and H.261 for video telephony, conference television image compression, but they were all independent of computer data standards. This requires the development of a unified standard for computer systems and broadcast television in terms of image, sound, storage and transmission, thus facilitating the wide exchange of various media, so MPEG came into being.

The basic task of the MPEG-1 standard is that the image of the appropriate quality (including sound) must be a kind of computer data, and the existing data (such as text, drawing and Other data) is compatible in the computer, and the data must be present. Some computer networks and broadcast networks are compatible with transmission networks. The MPEG-1 standard has three components: MPEG video, MPEG audio, and MPEG systems. So the problems involved in MPEG are video compression, audio compression, and the complex and synchronization of multiple compressed data streams. The MPEG-1 standard is a coding standard for digital storage media dynamic images and accompanying sounds for transmitting 1.5 Mbps data transmission rate. It can process various types of moving images. The basic algorithm is for compressing the horizontal direction by 360 pixels, vertical direction. The spatial resolution of 288 pixels has a good effect on moving images of 24 to 30 frames per second. Unlike JPEG, which does not define the detailed algorithms needed to generate a legitimate data stream, it provides a lot of flexibility in the encoder design. A series of parameters that define the encoded bitstream and decoder are included in the bitstream itself. Among these, these features allow the algorithm to be used for images of different size and width ratios, as well as for channels and devices with a wide range of operating speeds.

MPEG-1 standard compression first sub-samples the color difference signal, reduces the amount of data, uses motion compensation technology, reduces inter-frame redundancy, uses 2D DCT transform cloud to remove spatial correlation, quantizes DCT components, and eliminates importance The information is obtained, and the quantized DCT components are newly sorted according to the frequency, the DCT components are subjected to variable word length coding, and finally, the DC component (DC) of each data block is subjected to predictive differential coding. The block diagram of the encoding and decoding of MPEG video is shown in Figure 1.


1.2 MPEG-2

The MPEG-2 standard is called "motion image and its accompanying audio coding". It is mainly used for video and audio signals required by high definition television (HDTV) at a transmission rate of 10 Mbps.

The MPEG-2 standard is divided into eight parts, collectively referred to as the ISO/IEC1318 international standard. The first part: the system, describes the way of multiple video, audio and data basic code streams to synthesize the transmission code stream and program code stream; the second part: video, describes the video coding method; the third part: audio, description and MPEG-1 audio Standard backward compatible audio coding method; Part 4: Compliance test, description of a software implementation method for testing whether an encoded code stream conforms to the first, second and third parts of the MPEG-2 standard; Part 5: Digital Memory Body - Command And control, describing the session signaling set between the server and the user in the interactive multimedia network; Part 6; non-backward compatible audio, specifying multi-channel audio coding that is not backward compatible with MPEG-1 audio; Part 7: 10-bit video, now stopped; Part 8: Real-time interface, specifies the real-time interface for transporting the code stream.

The MPEG-2 video coding standard is a series of graded, divided into four "levels" according to the resolution of the encoded image: low level (LL: low level), and the pixels of the input signal are one quarter of the ITU-R601 format; The main level (ML: main level), the input signal pixel is ITU-R601; the advanced -1440 (H14L: high-1440 level) is the 4:3 mode TV high definition format; the advanced (HL: high level) is 16: High-definition format for 9-mode TV. According to the set of coding tools used, it is divided into five "classes": simple class (SP: simple profile), only reference frame I and prediction frame P; main class (MP: main profile), bidirectional speculative frame B is added than SP SNRP: SNR scalable profile; Spatially scalable profile (SSP); High class (HP: high profile). Several combinations of "levels" and "classes" constitute a subset of the MPEG-2 video coding standard under certain applications. For a certain input format image, a specific set of compression coding tools is used to generate an encoded code stream within a specified rate range. The encoded stream of MPEG-2 is divided into six levels. From top to bottom: video sequence layer (GOP: Group of Picture); image layer (Picture); image layer (Slice); macroblock layer (MacroBlock) and block layer (Block).

The encoding process of MPEG-2: In the case of intraframe coding, the coded picture only passes through the DCT, and the quantizer and the bit stream coder generate the coded bit stream without undergoing prediction loop processing. DCT is applied directly to the original image data. In the case of inter-frame coding, the original image is first compared with the predicted image in the frame memory to calculate a motion vector, whereby the motion vector and the reference frame generate a predicted image of the original image. Then, the difference image data generated by the original image and the predicted pixel difference value is DCT-transformed, and then the output coded bit stream is generated by the quantizer and the bit stream encoder.

1.3 MPEG-4

The goal of the MPEG-4 standard is to support multiple multimedia applications (mainly focusing on access to multimedia content), and to configure the decoder on-site according to the different requirements of the application. MPEG-4 is designed to provide a flexible framework and an open set of coding tools for the communication, access and management of video (audio) data.

In the MPEG-4 image and video standard, the goal of the video presentation tool is to provide a standardized core technology for efficient storage, transmission and management of texture, image and video data in a multimedia environment. Particular emphasis is placed on the ability of these tools to encode and decode atomic units of image and video content, called video objects VO. A video object of any shape is effectively represented to support a so-called content-based feature set. This feature set supports separate encoding and decoding of content (ie, physical object VO in the scene). This feature provides powerful underlying mechanism support for interactivity and flexible VO content for images or video in the compressed domain. Representation and management provide favorable conditions. The MPEG-4 image and video standard unifies the encoding and decoding of traditional rectangular and arbitrarily shaped images and video. For content-based applications, the input image sequence may have any shape and position. The shape can be represented by an 8-bit transparent component (when one VO is composed of a plurality of other objects) or by a binary mask. In addition, the compression ratio of certain video sequences can be greatly improved by employing appropriate and elaborate object-based motion prediction tools for each physics in the scene. The content-based encoding for the MPEG-4 extension can be seen as a logical extension of the traditional VLBV kernel or HBV tool from rectangular input to arbitrary shape input. In this sense, content-based coding is a superset of the VLBV and HBV cores.

The MPEG-4 standard adds seven new features to the original. Features of the added features:

(1) Content-based operations and bitstream editing support Content-based operations and bitstream editing can be performed without encoding. (2) Mixed coding of natural and synthetic data. Provides a way to effectively combine natural video images with synthetic data (text, graphics) while supporting interactive operations. (3) Enhanced time domain random access. MPEG-4 will provide an efficient random access method: random access to a sequence of audio and video over a frame or any shape of object over a limited time interval. (4) Improve coding efficiency. The MPEG-4 standard will provide better subjective visual quality images at comparable rates to existing standards being formed. (5) Encoding of multiple concurrent data streams. MPEG-4 will provide efficient multi-view coding of a scene, plus multi-according channel coding and effective audio-visual synchronization. In stereo video applications, MPEG-4 will use the information redundancy caused by multi-view observation of the same scene to effectively describe three-dimensional natural scenes under sufficient observation viewpoints. (6) Error-proof in error-prone environments “Flexible and diverse” refers to the use of various wired networks and various storage media. MPEG-4 will improve the ability to resist errors, especially in environments prone to serious errors. Bit application (mobile communication link). MPEG-4 is the first standard to consider channel characteristics in its audio and video representation specifications. The purpose is not to replace the error control techniques already provided by the communication network, but to provide a tight toughness against residual errors. (7) Content-based scale variability. Content scale variability means assigning priorities to individual objects in an image. Content-based scale variability is the core of MPEG-4, because once the directory of objects contained in the image and the corresponding priority are determined, the functions of other set contents are easier to implement. For very low bit rate applications, scaling can form a critical factor because it provides the ability to adapt to available resources.

The above seven new features can be grouped into three categories: content-based interactivity, high compression ratio, and flexible access modes. The first three functions are content-based interactivity, four and five are high compression ratio modes, and the last two are flexible and diverse access modes.

1.4 MPEG-7

The MPEG-7 standard, known as the "Multimedia Content Description Interface," will extend the limited capabilities of existing content-aware solutions, especially including more data types. In other words, MPEG-7 will specify a standard set of descriptors for describing various types of multimedia information.

MPEG-7 standardizes methods for defining other descriptors and their structures (description schemes) and their relationships. This description (that is, the combination of descriptors and description schemes) will be associated with the content itself to quickly and efficiently search for material of interest to the user. MPEG-7 standardizes a language used to define a description scheme, the Descriptive Definition Language (DDL). The AV material associated with the MPEG-7 data can be indexed and retrieved.

MPEG-7, like other members of the MPEG family, is a standardized representation of audio and video information that meets specific needs. MPEG-7 descriptors do not depend on the way the description is encoded or stored. MPEG- The description of 7 is attached to the simulated movie or printed on paper. However, although the MPEG-7 description does not depend on the (encoding) representation of the material being processed, it is developed on the basis of MPEG-4 to a certain extent, and MPEG-4 is used in a time-dependent manner. The object of the spatial relationship is processed by the audio and video encoding, so it is possible to attach the description to the members (objects) in the scene by MPEG-4 encoding. Therefore, MPEG-7 has to provide different degrees in the description to enable different levels of recognition.

Because the description feature must be meaningful in the application environment, it will vary depending on the scope of the user and the application domain. This means that the same material, because it matches the scope of the application, may be described using different types of features. Of course, all of these descriptions are coded in an efficient manner, and performance can improve search efficiency. At the same time, there may be intermediate levels of abstraction in the middle. The level of abstraction is related to the way in which features are extracted. Many low-level features can be extracted in a fully automated manner, while high-level features require more human interaction.

2 MPEG's future prospects

The MPEG video compression system is a complex and highly integrated system. Only a few powerful companies in the world can launch commercial products. Due to the complicated technology and expensive equipment of the MPEG video compression system, the popularity has been low so far. However, with the advancement of technology, the maturity of the process and the decline in prices, its application is expanding. In the past, it took millions of dollars to make things, and now it can be realized in tens of thousands. If you have such an MPEG video compression system, you can easily compress videos, photos, pictures, movies and other programs into various video productions such as live video and file management.

MPEG has developed a series of standards. In fact, in many cases, no specific implementation has been given. The final implementation will be implemented by various vendors and R&D personnel. The research of MPEG mainly focuses on two aspects: (1) research on MPEG implementation; (2) further study that image compression method has obtained the compression ratio of larger data and realizes the function of human-machine dialogue.

From the current MPGE standard, the author believes that the main focus will be on the object-based processing method, and different methods, content, and requirements will be selected according to different situations. First of all, this is to meet the requirements of human-machine dialogue based on the requirements, but also to meet the requirements of people-oriented purposes. Each person can request different processing methods according to their own needs. Secondly, this is the requirement to further obtain a larger image data compression ratio. Previously, based on the data itself and its compression and statistical personality compression methods, it is difficult to meet the data flow speed on the highway, and the object-based processing method, if the actual model-based compression method, can adopt different compression for different objects (contents). The method thus achieves a huge compression ratio and meets the visual requirements of the person. This problem has been noted in the MPEG-4 and MPEG-7 standards, introducing even objects or research called content. Therefore, the author believes that object-based image processing methods will be the future direction of MPEG.

MPGE video compression technology and VCD production have opened up a new path for us to develop. The promotion and application of MPEG video compression technology may lead to a new industry, namely multimedia production. The market in this area has just started, and it is almost blank in education and training. It is an industry with great development potential and needs to be developed. The future is an information society. The transmission and storage of various multimedia data is the basic problem of information processing. This article has only elaborated on the MPEG standard. There are still many technologies to be researched and developed. I hope that I am interested in it. The researchers of this study discussed together.

Vertex repeater

Guangzhou Etmy Technology Co., Ltd. , https://www.digitaltalkie.com