142nd meeting of MPEG

The 142^nd meeting of MPEG took place in Antalya from 2023-04-24 until 2023-04-28. Find more information here.

MPEG issues Call for Proposals for Feature Coding for Video Coding for Machines (FCVCM)

At the 142^nd MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Proposals (CfP) for technologies and solutions enabling efficient feature compression for video coding for machine vision tasks.

This work on “Feature Coding for Video Coding for Machines (FCVCM)” aims at compressing intermediate features within neural networks for machine tasks. As applications for neural networks become more prevalent and the neural networks increase in complexity, use cases such as computational offload become more relevant to facilitate widespread deployment of applications utilizing such networks.

Initially as part of the “Video Coding for Machines” activity, over the last four years, MPEG has investigated potential technologies for efficient compression of feature data encountered within neural networks. This activity has resulted in establishing a set of ‘feature anchors’ that demonstrate the achievable performance for compressing feature data using state-of-the-art standardized technology. These feature anchors include tasks performed on four datasets.

This CfP welcomes submissions of proposals from companies and other organizations. Registration is required by the 3^rd of July 2023; the submission of bitstream files, results, and decoder packages is required by the 13^th of September 2023; and the submission of proponent documentation is due by the 9^th of October 2023. Evaluation of the submissions in response to the CfP will be performed at the 144^th MPEG meeting in October 2023.

Companies and organizations that have developed FCVCM technologies are kindly invited to bring such information in response to this CfP by contacting Dr. Igor Curcio, MPEG Technical Requirements Convenor at igor.curcio@nokia.com. The CfP is available at https://www.mpeg.org/.

MPEG finalizes the 9^th Edition of MPEG-2 Systems

At the 142^nd MPEG meeting, MPEG Systems (WG 3) ratified the 9^th edition of its Emmy® award-winning standard ISO/IEC 13818-1 MPEG-2 Systems. The new edition includes support for Low Complexity Enhancement Video Coding (LCEVC), the youngest in the MPEG family of video coding standards on top of more than 50 media stream types, including, but not limited to, 3D Audio and Versatile Video Coding (VVC). The new edition also supports new options for signaling different kinds of media, which can aid the selection of the best audio or other media tracks for specific purposes or user preferences. As an example, it can indicate that a media track provides information about a current emergency.

MPEG reaches the First Milestone for Storage and Delivery of Haptics Data

At the 142^nd MPEG meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data, has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.

MPEG completes 2^nd Edition of Neural Network Coding (NNC)

At the 142^nd MPEG meeting, MPEG Video Coding (WG 4) completed the development of the second edition of Neural Network Coding (NNC; ISO/IEC 15938-17), promoting it to the Final Draft International Standard (FDIS) stage.

Many applications of artificial neural networks for multimedia analysis and processing (e.g., visual and acoustic classification, extraction of multimedia descriptors, or image and video coding) utilize edge-based content processing or federated training. The trained neural networks for these applications contain many parameters (weights), resulting in a considerable size. Therefore, the MPEG standard for the compressed representation of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2022) was developed, which provides a broad set of technologies for parameter reduction and quantization to compress entire neural networks efficiently.

Recently, an increasing number of artificial intelligence applications, such as edge-based content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). Such updates include changes of the neural network parameters but may also involve structural changes in the neural network (e.g., when extending a classification method with a new class). In scenarios like federated training, these updates must be exchanged frequently, such that much more bandwidth over time is required, e.g., in contrast to the initial deployment of trained neural networks.

The second edition of NNC addresses these applications through efficient representation and coding of incremental updates and extending the set of compression tools that can be applied to both entire neural networks and updates. Trained models can be compressed to at least 10-20% and, for several architectures, even below 3% of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network. NNC also provides synchronization mechanisms, particularly for distributed artificial intelligence scenarios, e.g., if clients in a federated learning environment drop out and later rejoin.

A second edition of the corresponding conformance guidelines and reference software (ISO/IEC 15938-18) is under preparation.

MPEG completes Verification Test Report and Conformance and Reference Software for MPEG Immersive Video

At the 142^nd MPEG meeting, MPEG Video Coding (WG 4) issued the verification test report of ISO/IEC 23090-12 MPEG immersive video (MIV) and completed the development of the conformance and reference software for MIV (ISO/IEC 23090-23), promoting it to the Final Draft International Standard (FDIS) stage.

MIV was developed to support the compression of immersive video content, in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables the storage and distribution of immersive video content over existing and future networks for playback with 6 degrees of freedom (6DoF) of view position and orientation. MIV is a flexible standard for multi-view video plus depth (MVD) and multi-planar video (MPI) that leverages strong hardware support for commonly used video formats to compress volumetric video. The standard includes the MIV Main profile for MVD, the MIV Extended profile, which enables MPI, and the MIV Geometry Absent profile, which is suitable for use with cloud-based and decoder-side depth estimation.

A formal subjective quality evaluation with naïve test subjects watching pre-defined pose traces in an immersive scene was performed for the verification test report. On average, MIV demonstrates a clear benefit over the previous state-of-the-art MPEG video standard for coding multiple views (i.e., the multi-view extension of HEVC (MV-HEVC)).

ISO/IEC 23090-23 specifies how to conduct conformance tests and provides reference encoder and decoder software for MIV. This draft includes 23 verified and validated conformance bitstreams spanning all profiles and encoding and decoding reference software based on version 15.1.1 of the test model for MPEG immersive video (TMIV). The test model, objective metrics, and other tools are publicly available at https://gitlab.com/mpeg-i-visual. Finally, a real-time decoding and rendering demo of MIV content on a smartphone was shown at the meeting.

MPEG finalizes work on metadata-based MPEG-D DRC Loudness Leveling

At the 142^nd MPEG meeting, MPEG Audio Coding (WG 6) completed the development of ISO/IEC 23003-4:2020/Amd 2, Loudness leveling, promoting it to the Final Draft Amendment (FDAM) stage. This new amendment includes the specification of metadata-based loudness leveling for live workflows. The technology offers producers of live content, such as sports broadcasts and concerts, an alternative way to integrate loudness leveling into their existing workflows seamlessly. This new metadata-based approach provides an attractive method for high-quality loudness processing while retaining flexibility and control in playback devices. The technology can be tightly integrated with MPEG-D USAC, MPEG-H 3D audio, or other audio codecs supporting MPEG-D DRC.

The Final Draft Amendment also includes conformance bitstreams to test devices for their compliance with the new technology and a reference software implementation, which can be used as a basis for building products, including MPEG-D DRC Loudness leveling.

MPEG White Papers

At the 142^nd MPEG meeting, MPEG Liaison and Communication (AG 3) approved the following two MPEG white papers, which are available at https://www.mpeg.org/whitepapers/.

White paper on Geometry based Point Cloud Compression (G-PCC)

The MPEG-I standard aims to provide standardized solutions for encoding, encapsulation, and delivery of immersive media. Geometry-based Point Cloud Compression (G-PCC) provides a standard for coded representation of point cloud media. Point clouds may be created in various manners. Recently, 3D sensors such as Light Detection And Ranging (LiDAR) or Time of Flight (ToF) devices have been widely used to scan dynamic 3D scenes. To precisely describe 3D objects or real-world scenes, point clouds come with a large set of points in the 3D space with geometry information and attribute information. The geometry information represents the 3D coordinates of each point in the point cloud; the attribute information describes the characteristics (e.g., colour and reflectance) of each point. Point clouds require a large amount of data, bringing huge challenges to data storage and transmission.

White paper on Coding of Genomic Annotations

The introduction of high-throughput DNA sequencing has led to the generation of large quantities of genomic sequencing data that must be stored, transferred and analyzed. The ISO/IEC 23092 family of standards, Part 1 to 5, have addressed the problem of an efficient representation, compression and transport of genome sequencing data. Once the sequencing data is available, an important usage of the data is the association of the data with the results of the analysis that are generated by genomic processing pipelines and by the information added by analysts. Analysis results and additional information are referred to as “genomic annotations”. The newest ISO/IEC 23092 standard, Part 6, addresses the need to provide compressed representations of genomic annotations linked to the compressed representation of raw sequencing data and metadata.

By doing this ISO/IEC 23092, Part 6 is extending the MPEG Genomics standard to incorporate not only the primary (raw sequencing data) and secondary (aligned sequencing data), but also tertiary genomic data, including variant calls, gene expressions, mapping statistics, contact matrices (e.g., Hi-C), genomic tracks information and functional annotations, which are collectively called Annotation Data in the ISO/IEC 23092 standard, with efficient compression, indexing, and searching capabilities. The extended format also includes advanced features, including selective encryption and signing of the data, auditing support, data provenance information, traceability and support for direct linkage to external clinical data repositories expressed in common standard formats.