Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 11, No. 05, p.305-315

ISSN (online) :

2287-5255

Received : 29 June 2022Revised : 07 August 2022Accepted : 31 August 2022

DOI :

https://doi.org/10.5573/IEIESPC.2022.11.5.305

Regular Paper

A Practical Light Field Representation and Coding Scheme with an Emphasis on Refocu

Zhao Chun¹ Jeon Byeungwoo¹

(Department of Electrical and Computer Engineering, Sungkyunkwan University / Suwon 16419, Korea {zhaochun83, bjeon}@skku.edu )

^* Corresponding Author: Byeungwoo Jeon

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Light field images have drawn a great deal of interest owing to their flexible postprocessing and versatile functionalities. However, the tremendous data volume has imposed a practical limit in applications, which requires a better way to represent and compress the data. In this paper, we propose a practical representation and coding scheme for light field data with a special emphasis on retaining the refocusing functionality as much as possible. Under the proposed method, light field data are represented and compressed in the form of a rendered all-in-focus image and a depth map. Before encoding, the all-in-focus image is rendered accurately based on the depth map using the proposed difference focus measure. At the decoder, a focal stack comprising multiple images having different focus levels can be reconstructed by applying a defocusing function to the compressed all-in-focus image where the defocusing function parameter depends on the selected depth level of the compressed depth map. Compared with well-known state-of-the-art methods that compress the Fourier Disparity Layer or the focal stack, the proposed coding scheme shows much smaller loss in the refocusing capability (16.2% and 17.8% less than two state-of-theart methods) and provides PSNR improvements of 1.60dB to 2.38dB at the same compression ratio.

Keywords

Light field representation, Light field coding, All-in-focus image, Depth map, Focal stack reconstruction

1. Introduction

Light field (LF) cameras can capture light coming at all directions from every point of a scene ^[1]. The rich data make it possible to realize various applications ^[2,^3] for refocusing, depth estimation, viewing angle change, three-dimensional (3D) object reconstruction, etc. With the recent explosive interest in implementing and improving augmented reality (AR) and virtual reality (VR) systems ^[4], demand is increasing for rich information to provide more realistic visual experiences. While the light field image is one of the most important content sources, its data require much more storage space or incur high transmission costs. The collection and management of such large amounts of LF data is not easy for many practical applications, and therefore, efficient representation and compression with low computational requirements is essential for practical light field data storage, transmission, and display ^[5]. LF data have a lot of redundancy since the data will capture image information from different viewpoints of the same scene ^[6,^7].

While the developments for representation and compression of LF data have so far concentrated on how to maximally compress the data in general, very little attention has been given to compression with special emphasis on keeping certain selected functionalities from being greatly affected. Among the different application requirements listed in Table 1, it may be desirable for a certain functionality to be less affected by compression than the others. For example, a smart phone ^[8,^9] in daily casual use may only need the refocusing function for post-processing. In this regard, the motivation of this paper differs from the many existing compression approaches in that we design a representation and coding scheme for light field images that keeps the refocusing functionality of LF images as faithfully as possible at relatively low computational complexity.

The rest of the paper is organized as follows. We briefly review related work in Section 2. Section 3 describes the proposed representation and coding scheme in detail. Experiment results are given in Section 4, and Section 5 concludes the paper.

Table 1. Light Field Image Coding with Emphasis on Selected Functionality.

Functionality	Coding with special emphasis on a certain functionality
Refocusing	Compression for refocusing can generate refocused images quite well with compressed data
Viewing Angle Change	Compression of the view angle can generate different images quite well with compressed data
Exposure Adjust	Compression of exposure adjustments can generate different images quite well with compressed data

2. Related Work

Recently, many researchers have worked on advanced representation and coding techniques to reduce redundancy in light field images. Some work provided comprehensive evaluation of LF image coding schemes after grouping them into two main coding strategies ^[10-^13]. The first strategy relates to the international standards in JPEG Pleno Part 1 (Framework) ^[14] and Part 2 (LF coding) ^[15]. They support MuLE ^[16] and WaSP ^[17] as coding modes. Standardization of the JPEG LF image coding framework was described in ^[18], and its 4D-Transform coding solution was explained in ^[19]. The core framework of the second strategy compresses LF data using the High Efficiency Video Coding (HEVC) scheme by forming multiple views of light field images into one pseudo-video sequence. Chen et al. ^[20] proposed a disparity-guided sparse coding scheme for light field data based on structural key sub-aperture views. Jiang et al. ^[21] developed an LF compression scheme using a depth image-based view synthesis technique in which a small subset of views is compressed using HEVC inter-coding tools, and an entire light field is reconstructed using the subset. Jiang and colleagues ^[22] introduced another LF compression scheme based on homographic low-rank approximation in which the LF views are aligned by homography and then compressed using HEVC. Han et al.~^[23] compressed a pseudo-video sequence consisting of central sub-aperture images and a sequence consisting of residual images between the central image and adjacent images using HEVC.

Additionally, we noted studies on converting a light field to a new representation before encoding. Le Pendu et al. ^[24] used light field data for Fourier disparity layer (FDL) representation under which the root image is encoded with FDL layers. This technique was shown to provide higher coding performance than JPEG-based MuLE ^[16] and WaSP ^[17], which belong to the first category of LF coding schemes. Therefore, in a performance evaluation of our proposed method, Le Pendu et al.’s FDL-based scheme ^[24] was one of the anchors for comparison. Duong et al. ^[25] proposed representing LF data in a focal stack (FS) in order to compress the given LF data as a pseudo-video sequence using HEVC. This compression scheme was specifically designed with the refocusing application in mind, showing that about 50% of the amount is saved by compressing focal stack data consisting of sampled refocusing images instead of compressing a pseudo-video sequence formed with sub-aperture views. Thus, the encoding scheme with the FS ^[25] was also taken for comparison.

In this paper, we keep the refocus functionality from being affected by compression, as it is in ^[24] and ^[25], but in a different way. We represent light field data in the form of one single all-in-focus (AIF) image and its depth map, both of which are compressed using the well-known HEVC compression technique. The proposed scheme not only covers the full refocus range, but also achieves higher compression. Fig. 1 illustrates the proposed scheme together with two well-known anchors of the FDL-based method ^[24] and the method compressing the images in a focal stack ^[25] as a pseudo-video sequence. The proposed representation and compression methods are shown in Fig. 1(c), and the detailed AIF image rendering and depth map generation are in Fig. 1(d). As illustrated in Fig. 1, a focal stack is generated by shifting and adding, as explained in ^[25]. Assume there are $K$ refocused images in a focal stack, and the $k$th refocused image is $I_{k\_ org}\left(x,y\right)$, where its distance from the aperture plane is $F'=\alpha F$ in which $\alpha =F'/F$ is defined as the relative depth, written as

(1)

$I_{k\_ org}\left(x,y\right)=I_{\alpha }\left(x,y\right)$,

(2)

$ I_{\alpha }\left(x,y\right)=\sum _{u}\sum _{v}L^{\left(u,v\right)}\left(x+u\left(1-\frac{1}{\alpha }\right),y+v\left(1-\frac{1}{\alpha }\right)\right), $

where $L^{\left(u,v\right)}$ represents a sub-aperture image at position $\left(u,v\right)$ from the main lens, and ($u\left(1-\frac{1}{\alpha }\right),v\left(1-\frac{1}{\alpha }\right))$ is a shift offset in the $x,y$ direction.

3. The Proposed Scheme

In this section, we address the proposed representation and compression scheme, which can keep the refocus functionality as much as possible under compression. Unlike existing methods that encode sub-aperture image sequences ^[20,^21,^23], the focal stack ^[25], or the hierarchical FDL ^[24], we first represent light fields as all-in-focus images and a depth map, and then encode them. During decoding, a focal stack consisting of multiple images having different focus levels is reconstructed from the compressed all-in-focus image using the depth map. Fig. 1(c) shows the main structure of the proposed framework consisting of three parts: refocusing representation, all-in-focus image and depth map generation, and post-focal-stack reconstruction at the decoder.

3.1 Proposed Representation

The proposed light field representation aims at faithfully maintaining refocusing functionality during compression. The refocusing functionality refers to how flexibly and accurately a desired refocused image can be generated. The array of the refocused image is called the focal stack ^[26]. However, such a focal stack demands a huge volume of data.

Fig. 1. Different frameworks for light field representation and coding: (a) coding with the FDL model[24]; (b) coding with the focal stack[25]; (c) the proposed scheme with emphasis on the refocusing capability; (d) generation of the all-in-focus image and depth map in the proposed scheme.

In the proposed scheme, the AIF image and the depth map are used to represent the light field image to be encoded and transmitted for applications that put the emphasis on the refocusing functionality. The all-in-focus image and the depth map can replace a focal stack since they are bi-directional (that is, the all-in-focus image and the depth map can be generated from a focal stack), and the focal stack can be reconstructed from the all-in-focus image and the depth map as well. Refocused images at any depth can be generated from the decoded AIF image and the depth map by using a defocusing filter. These two conversions are used before encoding and after decoding, respectively. The AIF and the depth map can effectively provide refocusing functionality.

The advantages of the proposed scheme are analyzed below. The first advantage is the refocus coverage range. Since users may like to refocus at any depth, the refocusing capability should be able to cover all potential refocusing ranges. Duong et al. ^[25] represented the light field with a focal stack that includes 24 refocused images before compression, and thus, the refocusing range is limited to the 24 images. However, since the AIF and the depth map data in the proposed scheme are encoded and transmitted, any refocused image can be generated from the decoded AIF image with help from the depth map by using a defocusing filter. Second, in terms of storage, representation and compression using sub-aperture images ^[20,^21,^23], a focal stack ^[25], or the hierarchical FDL ^[24] are much heavier, because the proposed compression deals with only one AIF image and one gray-level depth map. The third advantage is the generation complexity of the refocused image at the decoder. Complexity is an important factor in practical applications. In the FDL ^[24], the refocused image generation process should convert the Fourier disparity layer to sub-aperture images. It further calculates shifting slopes and adds all sub-aperture images for display rendering. The focal stack-based scheme ^[25] compresses only a few sample depth slices. Thus, pixel-wise interpolation should be executed among relevant neighboring sample depth slices if the target refocused depth is not the sampled depth. However, in our case, any refocused image can be generated.

There are in-focus pixels and out-of-focus pixels in one refocused image. The in-focus pixels are directly obtained from the all-in-focus image, and the out-of-focus pixels are obtained by defocusing the relevant all-in-focus image using a predefined filter. Our proposed method demands very low computational complexity.

Fig. 2. Volumetric comparison of light field refocusing representations.

Fig. 3. The proposed difference focus measure with adaptive refinement.

3.2 All-In-Focus Image and Depth Map Generation

To generate the AIF image and the depth map, we investigated several state-of-the-art methods. There are learning-based depth map estimation algorithms, most of which are based on a fully convolutional neural network ^[27-^30], and they provide high accuracy but with high complexity. On the other hand, rule-based depth estimation methods ^[31] and AIF image rendering methods ^[34] that utilize a focal stack have relatively low complexity. In this paper, in order to make the trade-off between accuracy and complexity in the encoding process as a whole, we utilize the focal stack to render both the all-in-focus image and the depth map. Therein, we define the focus map, which indicates how well a given pixel is focused. The degree of focus is measured by a selected focus measure ^[32,^33]. The more in-focus a pixel is, the higher its value in the focus map. Note that in out-of-focus regions where blurred texture-rich pixels, blurred edges, or artifacts are statistically abundant, most of the well-known focus measures, such as LAP2 ^[35], STA2 ^[36], GRA7 ^[37], and RDF ^[31], may suffer from focus measure error due to high variance, which is typically seen in the in-focus regions. To overcome this problem, a new, very simple focus measure is proposed, which is shown in Fig. 3 and named the difference focus measure.

The difference between two focus maps, one from focal stack $I_{k\_ org}$ and the other from guided-filtered focal stack $I_{k\_ GF}$, is designed to counteract any variance. A difference focus map $F_{k\_ d}$ is defined as

(3)

$F_{k\_ d}\equiv \left\| F_{k{\_ _{org}}}-F_{k\_ GF}\right\| $，

where$~ F_{k\_ org}=FM\left(I_{k\_ org}\right)$ and $F_{k\_ GF}=FM\left(I_{k\_ GF}\right),$ in which $I_{k\_ GF}$ is a smoothed focal stack that maintains the boundary while smoothing the others by a guided filter $G\left(.\right)$ ^[38], denoted as $I_{k\_ GF}=G\left(I_{k\_ org},I_{k\_ org}\right)$. $FM$(.) indicates the focus measure of choice, and in this paper, the ring difference filter ^[31] is selected owing to its robustness coming from incorporating both local and non-local characteristics in the filtering window. An example of the difference focus map is shown in Fig. 4. The first-row images are when the entire image is out-of-focus, for which the proposed difference focus map shows a correct focus level (=0), whereas the focus maps from $I_{k\_ org}$ or $I_{k\_ GF}~ $ incorrectly detect the out-of-focus edge area as the in-focus region.

Additionally, adaptive refinement is applied to the proposed difference focus map, $F_{k\_ d}$, to more clearly clarify in-focus and out-of-focus regions. The in-focus region (the white region) in $F_{k\_ d}$ is enhanced; the out-of-focus region (the black region) in $F_{k\_ d}$ is smoothed with a Gaussian filter to remove occasional errors caused by noise or artifacts. To avoid gaps, a blending process on $F_{k\_ d}$ and on the one after refinement is executed to generate final focus map $F_{k}$.

To render the all-in-focus image as seen in Fig. 1(d), the best in-focus pixels at each$~ $position are collected. That is, for a pixel at a position ($x,y)$, its best in-focus pixel value is selected from among $I_{{1_{\_ org}}}\left(x,y\right),\,\,I_{2{\_ _{org}}}\left(x,y\right),$ $\ldots ,\,\,I_{K\_ org}\left(x,y\right)$ by referring to the focus map, $F_{k}\left(x,y\right),$ $k=1,\ldots ,K$. The one giving the maximum focus at position $(x,y)$ from among the $K$ refocused images is selected as the best in-focus pixel, and its image index, denoted by $k\max \left(x,y\right)$, is decided as follows:

Fig. 4. An example of the proposed difference focus map: (a) an image in focal stack $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (b) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}$ from image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (c) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$ from guided-filtered image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{GF}}$; (d) the proposed difference focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}-\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$.

Fig. 5. AIF images and depth maps (1st and 2nd rows are comparisons of the rendered AIF images; the 3rd row is a comparison of generated depth maps): (a) Jeon et al.’s method[31]; (b) Chantara and Ho’s method[34]; (c) the proposed difference focus measure with adaptive refinement. GT: ground truth.

(4)

$k\max \left(x,y\right)=\underset{k,~ k=1,\ldots ,K}{argmax}\left(F_{k}\left(x,y\right)\right)$.

The best in-focus pixels at all $\left(x,y\right)$positions are collected to form the rendered all-in-focus image as described in

(5)

$AIF\left(x,y\right)=I_{k\max \left(x,y\right)}\left(x,y\right)$.

Depth map $D$ is a collection of pixel-wise indices to the focal stack images that indicate the maximum focus:

(6)

$D\left(x,y\right)=k\max \left(x,y\right)$.

Fig. 6. The proposed focal stack reconstruction.

A comparison experiment was carried out for the proposed method and two state-of-the-art methods ^[31,^34] that also utilize a focal stack. The AIF image and depth map result shown in Fig. 5 demonstrate that the proposed method is much closer to ground truth, being cleaner and with higher contrast, better quality, and fewer artifacts.

3.3 Proposed Focal Stack Reconstruction

At the decoder, the focal stack is reconstructed from the AIF image and its depth map. The proposed reconstruction method is explained in this section. The number of images in a focal stack corresponds to the resolution of depth map. Each depth level corresponds to one image in the focal stack.

In generating refocus image $I_{k\_ est}$, which is focused at the $k$th depth, there are two cases to consider: in-focus and out-of-focus pixels. For in-focus pixels, that is, $D\left(x,y\right)=k$, the pixel values are directly available in the AIF image; for out-of-focus pixels, that is, $D\left(x,y\right)\neq k$, the pixel values are obtained by defocusing the AIF image with a blur filter where the defocusing strength depends on the distance between depth $D\left(x,y\right)$ and target depth $k$. When refocusing at depth level $k$, the estimated $k$th image in focal stack $I_{k\_ est}\left(x,y\right)$ at position $\left(x,y\right)$ is computed as follows:

(7)

$I_{k\_ est}\left(x,y\right)=AIF\left(x,y\right)*f\left(\sigma \right)$,

(8)

$\sigma ~ =g\left(\Delta k\right)$,

(9)

$\Delta k=\left| D\left(x,y\right)-k\right| $,

where $f\left(\sigma \right)$ is a defocusing filter in which Gaussian blur is used, and * is the convolution operator. A higher value for $\sigma $ indicates a higher blur strength. Defocusing filter parameter $~ \sigma $ is a function of $\Delta k$, which is the depth distance between target focus depth $k$ and depth level $D\left(x,y\right)$ at the given pixel position, $\left(x,y\right)$. $~ $

Fig. 6 depicts our method for focal stack reconstruction. The marked rectangles are local areas at different depth levels. Depending on the related depth level in the depth map, the defocusing strength of the green rectangle is weak, and the blur strength of the orange rectangle is strong. How strong or weak is represented by blur parameter $\sigma $, and therefore, the proper blur parameter is essential in order to generate the focal stack accurately. We define the difference between the generated pixel in the focal stack from (7) and the pixel in the original focal stack from (1) and (2) as follows:

(10)

$V=\left| I_{k\_ org}\left(x,y\right)-I_{k\_ est}\left(x,y\right)\right| $.

Note that a smaller value for $V$ implies higher accuracy from the $\sigma $ value. Parameter $\sigma $ is a function of $\Delta k$, as shown in (8). To define function $g\left(.\right)$, we first select $N$ pairs of $\left(\Delta k,\sigma \right)$ values, and then these $N$ pairs are fitted to a linear function, as shown in Fig. 7(b).

Regarding the $N$ pairs of $\left(\Delta k,\sigma \right)$ values, $\Delta k$ should be set to cover the range specified by $k=1,2,\ldots ,K-1$. For each $\Delta k$, an appropriate$~ \sigma $ value is calculated by a full search. The search flowchart is shown in Fig. 7(a). For example, to estimate pixel $I_{k\_ est}\left(x,y\right)$ with $\Delta k=1$, an appropriate $\sigma $ value is set as follows: using an initial value, $\sigma =\sigma _{0}$, calculate $V=V_{0}$ with (7) and (10); set $g=1$; update $\sigma =\sigma +g\times \Delta \sigma $ and calculate $V_{i}$; if $V_{i}<V_{i-1}$, then keep the sign of $g$ the same as before and update $\sigma =\sigma +g\times \Delta \sigma $; otherwise, change the sign of $g$ to its opposite, $g=g\times \left(-1\right)$, and update $\sigma =\sigma +g\times \Delta \sigma $; keep updating the $\sigma $ value until $V_{i}<V_{THD}$ or $i>I$. Here, $V_{THD}$ is a predefined threshold for a small $V$ value, and $I$ is a predefined number of iterations. The $N$ pairs are clustered into different groups according to $\Delta k$, and are then curve-fitted using a linear function model; lastly, the fitted linear function is presented in Fig. 7(b). This fitting model shows that the higher the value of depth distance $\Delta k$, the higher the value of defocusing filter strength parameter $\sigma $.

Fig. 7. Decision on the defocusing filter parameter $\boldsymbol{\sigma }$: (a) the search process for $\boldsymbol{\sigma }$ (defocusing filter parameter); (b) linear fitting of defocusing filter parameter $\boldsymbol{\sigma }$.

4. Performance Evaluation

In this section, we compare the proposed method with two state-of-the-art representation and compression methods: one is Le Pendu’s method ^[24], which represents a light field image as the Fourier Disparity Layer and encodes the FDL layers as a pseudo-sequence using HEVC; the other is Duong’s method ^[25], which converts a light field image to a focal stack, and compresses it as a pseudo-video sequence using HEVC. In the experiment, the proposed method also employs the HEVC reference software (HM) version 16.17 ^[39] for encoding and decoding to keep the same test condition in the two state-of-the-art methods. The configuration of the encoder is set as follows: the GOP structure is I-B-B-B, as in ^[40]; test with the six LF data (I01 to I06) in the JPEG-Pleno dataset ^[41] (Bikes, Danger de Mort, Flowers, Stone Pillars Outside, Fountain Vincent 2, and Ankylosaurus and Diplodocus 1) captured with a Lytro Illum camera.

The performance comparison was made in both terms of $PSNR$ of YUV video and refocusing capability loss due to compression. The $PSNR$ values for each focal stack image were averaged to obtain a representative PSNR value associated with the LF data. It is denoted as LF-PSNR and computed as in (10) where $I_{k\_ comp}$ is the k-th reconstructed focal stack image using (7) at the decoder, and $I_{k\_ org}$ is the anchor focal stack image rendered from the light field data as seen in (1).

The LF-PSNR performance of the proposed method and the two anchor methods are compared in Fig. 8, calculated as follows:

(11)

$LF-PSNR\equiv \frac{1}{K}\sum _{k=1}^{K}PSNR\left(I_{k\_ org}~ ,I_{k\_ comp}\right)$.

Fig. 8 demonstrates that our proposed method attained the highest LF-PSNR among the three methods, especially at low bits per pixel (bpp). For example, for the I01 image, when bpp was 0.01, the proposed method’s LF-PSNR was 1.24dB higher than FDL ^[24] and 1.74 dB higher than FS representation and compression ^[25]. When bpp was 0.02, the proposed method’s LF-PSNR values were 0.34dB higher than FDL ^[24] and 0.09dB higher than the FS ^[25]. With the I01~I06 results, the average LF-PSNR gain was about 2.38dB and 1.60dB higher than FDL ^[24] and FS ^[25], respectively, at bits per pixel less than 0.03 in most cases. In the other methods, a higher bits per pixel leads to less compression loss in the representation of LF data transmitted to the encoder; that is, to the Fourier disparity layers in FDL ^[24] or to the focal stack in FS ^[25], and thus, the focal stack PSNR increases according to the reduced coding loss in higher bits per pixel. In our scheme, sent to the encoder are the all-in-focus images and depth maps from which the focal stacks are reconstructed. While less compression loss happens at the depth map at higher bits per pixel, unless the accuracy quality of the estimated depth map is sufficient enough, a consequential PSNR increase in the focal stack is expected to be limited, even as the bits per pixel get higher. This explains why the focal stack PSNR performance from the proposed scheme was not always higher than the other methods with high bits per pixel. It also suggests future research work for improving the accuracy of the depth map estimation, so our scheme can keep gaining PSNR at higher bits per pixel as well.

We analyzed the refocusing capability loss, $LF-RL$, evaluated as the ratio of absolute differences for the two focus maps, $F_{k\_ comp}$ and $F_{k\_ org}$, as calculated in (12), where RL stands for refocusing loss. $F_{k\_ comp}$ is computed using the reconstructed focal stack, $I_{k\_ comp}$, that is, $F_{k\_ comp}=FM\left(I_{k\_ comp}\right)$, and $F_{k\_ org}$ is the focus map of the original (that is, uncompressed) focal stack image of $I_{k\_ org}$, that is, $~ F_{k\_ org}=FM\left(I_{k\_ org}\right).$ In the experiment, we set $K=64,$ which was the depth map resolution. For focus measure operator $FM$ in (3), the proposed difference focus measured in Section 3.2 was applied. The range of the refocusing capability loss, $LF-RL$, was 0 to 1, where a higher value indicates higher loss:

Fig. 8. PSNR comparison of the proposed and state-of-the-art FDL representation & compression[24], and FS representation & compression[25].

Fig. 9. Refocusing capability loss ($\boldsymbol{LF}-\boldsymbol{RL}$) comparison of the proposed and state-of-the-art FDL representation & compression[24]and FS representation & compression[25].

(12)

$ LF-RL\equiv \frac{1}{K}\sum _{k=1}^{K}\frac{\left| F_{k\_ comp}-F_{k\_ org}\right| }{F_{k\_ org}}. $

Fig. 9 compares the refocusing capability loss ($LF-RL$) for the proposed and the two state-of-the-art methods. The result shows that the proposed method attained the minimum loss in refocusing capability at the same compression ratio. For example, at bpp = 0.01, FDL ^[24], FS ^[25], and the proposed method had refocusing capability losses of 0.30, 0.35, and 0.16, respectively. That means the refocusing capability loss under the proposed method was smaller by 14% and 19% compared to FDL ^[24] and FS ^[25], respectively. Thus, in practical applications targeting low transmission speeds or less storage space, such as in mobile phones or head-mounted display devices, the proposed method is a good choice.

In our experiment, different bits per pixel are realized with different QP settings from 17 to 42. Fig. 9 also indicates that the refocusing capability loss was less than 0.2 when bpp≤0.05 (about QP$\leq $32). The refocusing capability is perceived as almost intact when $LF-RL$≤0.2 according to our internal subjective perceptual evaluation. Thus, coding with QP$\leq $32 is thought of as an allowable range for practical applications as far as refocusing functionality is concerned.

5. Conclusion

In this paper we have presented an efficient representation and coding scheme for light field data designed to pay special attention to keeping the refocusing functionality as uncompromised as possible. We designed a scheme in which LF data are represented by an all-in-focus image and a depth map, where the AIF/depth map package is encoded with HEVC. After decoding, the refocused focal stack is estimated by convoluting the compressed all-in-focus image with a defocusing function where the strength of the defocusing filter is controlled according to the desired focus level. Our experiment results indicated that at the same compression ratio, the proposed representation and coding strategy had a 2.38 dB average PSNR improvement compared to the state-of-the-art Le Pendu FDL ^[24], and a 1.60dB improvement over Duong’s FS representation and coding method ^[25]. At the decoder, the proposed method had smaller refocusing capability losses at 16.2% and 17.8% lower than the two well-known state-of-the-art methods ^[24,^25]. The proposed representation and coding approach with an all-in-focus image and a depth map was shown to provide good compression performance while maintaining the refocusing capability very well.

ACKNOWLEDGMENTS

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2020R1A2C2007673).

REFERENCES

Li H., Guo C., Jia S., 2017, High-Resolution Light-Field Microscopy, in Frontiers in Optics 2017, OSA Technical Digest (online) (Optica Publishing Group), paper FW6D.3,

Tsai D., Dansereau D. G., Peynot T., Corke P., 2017, Image-based visual servoing with light field cameras, IEEE Robotics and Automation Letters, Vol. 2, No. 2, pp. 912-919

Dricot A., Jung J., Cagnazzo M., Pesquet B., Dufaux F., Kovács P. T., 2015, Adhikarla,Subjective evaluation of Super Multi-View compressed contents on high-end light-field 3D displays, Signal Processing: Image Communication, Vol. 39, pp. 369-385

Vetro A., Yea S., Matusik W., Pfister H., Zwicker M., Mar. 2011, Method and system for acquiring, encoding, decoding and displaying 3D light fields, U.S. Patent No. 7,916,934.29

Wu G., et al. , 2017, Light field image processing: An overview, IEEE Journal of Selected Topics in Signal Processing, Vol. 11.7, pp. 926-954

Rerabek M., Bruylants T., Ebrahimi T., Pereira F., Schelkens P., ICME 2016 grand challenge: Light-field image compression, Call for proposals and evaluation procedure 2016.

Takahashi K., Naemura T., 2016, Layered light-field rendering with focus measurement, Signal Processing: Image Communication, Vol. 21, No. 6, pp. 519-530

Kim M., et al. , Mobile terminal and control method for the mobile terminal, 2018 Nov.20, US10135963B2

Light Field Selfie Camera for smartphones, Wooptix Company, Wooptix Company

Brites C., Ascenso J., Pereira F., Jan. 2021, Lenslet Light Field Image Coding: Classifying, Reviewing and Evaluating, in: IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, No. 1, pp. 339-354

Viola I., Řeřábek M., Ebrahimi T., 2017, Comparison and evaluation of light field image coding approaches, IEEE Journal of selected topics in signal processing, Vol. 11, No. 7, pp. 1092-1106

Conti C., Soares L. D., Nunes P., 2020, Dense Light Field Coding: A Survey, IEEE Access, Vol. 8, pp. 49244-49284

Avramelos V., Praeter J. D., Van Wallendael G., Lambert P., Jun. 2019, Light field image compression using versatile video coding, in: Proc. IEEE 9th Int. Conf. Consum, Electron, pp. 1-6

2020, ISO/IEC 21794-1:2020 Information technology - Plenoptic image coding system (JPEG Pleno) - Part 1: Framework

2021, ISO/IEC 21794-2:2021 Information technology - Plenoptic image coding system (JPEG Pleno) - Part 2: Light field coding

de Carvalho M. B., Pereira M. P., Alves G., da Silva E. A. B., Pagliari C. L., Pereira F., et al. , Oct. 2018, A 4D DCT-based lenslet light field codec, in: Proc. 25th IEEE Int. Conf. Image Process. (ICIP), pp. 435-439

Astola P., Tabus I., Nov. 2018, Hierarchical warping merging and sparse prediction for light field image compression, in: Proc. 7th Eur. Workshop Vis. Inf. Process. (EUVIP), pp. 1-6

Astola P., da Silva Cruz L. A., et al. , Jun. 2020, JPEG Pleno: Standardizing a coding framework and tools for plenoptic imaging modalities, ITU J. ICT Discoveries, Vol. 3, No. 1, pp. 1-15

De Oliveira Alves G., et al. , 2020, The JPEG Pleno Light Field Coding Standard 4D-Transform Mode: How to Design an Efficient 4D-Native Codec, IEEE Access, Vol. 8, pp. 170807-170829

Chen J., Hou J., Chau L. P., 2017, Light field compression with disparity-guided sparse coding based on structural key views, IEEE Transactions on Image Processing, Vol. 27.1, pp. 314-324

Jiang X., Le Pendu M., Guillemot C., 2017, Light field compression using depth image based view synthesis, in: International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, pp. 19-24

Jiang X., Le Pendu M., Farrugia R. A., Guillemot C., 2017, Light field compression with homography-based low-rank approximation, IEEE Journal of Selected Topics in Signal Processing, Vol. 11.7, pp. 1132-1145

Han H., Xin J., Dai Q., Sep. 2018, Plenoptic image compression via simplified subaperture projection, Pacific Rim Conference on Multimedia, Springer, Cham, pp. 274-284

Le Pendu M., Ozcinar C., Smolic A., 2020, Hierarchical Fourier Disparity Layer Transmission For Light Field Streaming, in: IEEE International Conference on Image Processing (ICIP), pp. 2606-2610

Duong V. V., Canh T. N., Huu T. N., Jeon B., Dec. 2019, Focal stack based light field coding for refocusing applications, Journal of Broadcast Engineering, Vol. 24, No. 7, pp. 1246-1258

Ng R., Levoy M., Brédif M., et al. , 2005, Light field photography with a hand-held plenoptic camera, Computer Science Technical Report CSTR, Vol. 2, No. 11, pp. 1-11

Shin C., Jeon H. G., Yoon Y., Kweon I. S., Kim S. J., 2018, Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4748-4757

Mun J. H., Ho Y. S., 2018, Depth Estimation from Light Field Images via Convolutional Residual Network, in: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE, Vol. ieee, pp. 1495-1498

Li K., Zhang J., Sun R., Zhang X., Gao J., 2020, EPI-based Oriented Relation Networks for Light Field Depth Estimation, arXiv preprint arXiv:2007.04538

Zhou W., Zhou E., Yan Y., Lin L., Lumsdaine A., 2019, Learning Depth Cues from Focal Stack for Light Field Depth Estimation, in: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1074-1078

Jeon H. G., Surh J., Im S., Kweon I. S., 2019, Ring difference filter for fast and noise robust depth from focus, IEEE Trans. on Image Processing, Vol. 29, pp. 1045-1060

Pertuz S., Puig D., Garcia M. A., 2013, Analysis of focus measure operators for shape-from-focus, Pattern Recognition, Vol. 46, No. 5, pp. 1415-1432

Zhao C., Jeon B., 2022, Refocusing Metric of Light Field Image using Region-Adaptive Multi-Scale Focus Measure, in IEEE Access

Chantara W., Ho Y. S., 2016, Focus Measure of Light Field Image Using Modified Laplacian and Weighted Harmonic Variance, in: Proceedings of the International Workshop on Advanced Image Technology, pp. 6-8

Nayar S. K., Nakagawa Y., 1994, Shape from focus[J], IEEE Transactions on Pattern analysis and machine intelligence, Vol. 16, No. 8, pp. 824-831

Wee C. Y., Paramesran R., 2008, Image sharpness measure using eigenvalues, in: IEEE 9th International Conference on Signal Processing, pp. 840-843

Pech-Pacheco J. L., Cristóbal G., Chamorro-Martinez J., 2000, Diatom autofocusing in brightfield microscopy: a comparative study, in: Proceedings 15th International Conference on Pattern Recognition, Vol. 3, pp. 314-317

He K., Sun J., 2015, Fast guided filter, arXiv preprint arXiv:1505.00996

HEVC reference software, HM 16.17.

Canh T. N., Duong V. V., Jeon B., Jan. 2019, Boundary handling for video based light field coding with a new hybrid scan order, in: Proc. Inter. Workshop on Advanced Image Tech., pp. 1-4

Řeřábek M., Ebrahimi T., 2016, New Light Field Image Dataset, in: 8th International Workshop on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal

Author

Chun Zhao

Chun Zhao received a BS in 2005 and an MS in 2008 from the Department of Electronics Science and Technology, North University of China, Shanxi, China. She joined the MS exchange student program in 2008, and started working in 2016 toward a PhD, in the Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Korea. From 2008 to 2014, she worked in the Research & Design Center, Samsung Electronics, Korea, on Image/Video Enhancement algorithm development and System on Chip (SOC) design, implementing an algorithm based on FPGA/Chip and RTL design. Since 2015, she has been a senior engineer for the Visual Display Business, Samsung Electronics, Korea, where she worked on practical algorithm development for various displays by analyzing panel characteristics. Her research interests include multimedia signal processing, panel color calibration, machine learning, and light field refocusing representation.

Byeungwoo Jeon

Byeungwoo Jeon (M’90, SM’02) received a BS (Magna Cum Laude) in 1985 and an MS in 1987 from the Department of Electronics Engineering, Seoul National University, Seoul, Korea, and received a PhD from the School of Electrical Engineering, Purdue University, West Lafayette, USA, in 1992. From 1993 to 1997, he was in the Signal Processing Laboratory, Samsung Electronics, Korea, where he worked on research and development of video compression algorithms, design of digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. Since September 1997, he has been at Sungkyunkwan University (SKKU), Korea, where he is currently a professor. His research interests include multimedia signal processing, video compression, statistical pattern recognition, and remote sensing. He served as Project Manager of Digital TV and Broadcasting in the Korean Ministry of Information and Communications from 2004 to 2006 where he supervised all digital TV-related R&D in Korea. From 2015 to 2016, he was Dean of the College of Information and Communication Engineering, SKKU. In 2019, he was President of the Korean Institute of Broadcast and Media Engineers. Dr. Jeon is a senior member of IEEE, a member of SPIE, an associate editor of IEEE Trans. on Broadcasting and IEEE Trans. on Circuits and Systems for Video Technology. He was a recipient of the 2005 IEEK Haedong Paper Award from the Signal Processing Society in Korea, and received the 2012 Special Service Award and the 2019 Volunteer Award, both from the IEEE Broadcast Technology Society. In 2016, a Korean President’s Commendation was conferred upon him for his key role in promoting international standardization for video coding technology in Korea.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

A Practical Light Field Representation and Coding Scheme with an Emphasis on Refocu

Abstract

Keywords

1. Introduction

Table 1. Light Field Image Coding with Emphasis on Selected Functionality.

2. Related Work

(1)

(2)

3. The Proposed Scheme

3.1 Proposed Representation

Fig. 1. Different frameworks for light field representation and coding: (a) coding with the FDL model[24]; (b) coding with the focal stack[25]; (c) the proposed scheme with emphasis on the refocusing capability; (d) generation of the all-in-focus image and depth map in the proposed scheme.

Fig. 2. Volumetric comparison of light field refocusing representations.

Fig. 3. The proposed difference focus measure with adaptive refinement.

3.2 All-In-Focus Image and Depth Map Generation

(3)

(4)

(5)

(6)

Fig. 6. The proposed focal stack reconstruction.

3.3 Proposed Focal Stack Reconstruction

(7)

(8)

(9)

(10)

Fig. 7. Decision on the defocusing filter parameter $\boldsymbol{\sigma }$: (a) the search process for $\boldsymbol{\sigma }$ (defocusing filter parameter); (b) linear fitting of defocusing filter parameter $\boldsymbol{\sigma }$.

4. Performance Evaluation

(11)

Fig. 8. PSNR comparison of the proposed and state-of-the-art FDL representation & compression[24], and FS representation & compression[25].

Fig. 9. Refocusing capability loss ($\boldsymbol{LF}-\boldsymbol{RL}$) comparison of the proposed and state-of-the-art FDL representation & compression[24]and FS representation & compression[25].

(12)

5. Conclusion

ACKNOWLEDGMENTS

REFERENCES

Author

Chun Zhao

Byeungwoo Jeon

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing