Mobile QR Code QR CODE

  1. (Department of Electrical and Computer Engineering, Sungkyunkwan University / Suwon 16419, Korea {zhaochun83, bjeon}@skku.edu )



Light field representation, Light field coding, All-in-focus image, Depth map, Focal stack reconstruction

1. Introduction

Light field (LF) cameras can capture light coming at all directions from every point of a scene [1]. The rich data make it possible to realize various applications [2,3] for refocusing, depth estimation, viewing angle change, three-dimensional (3D) object reconstruction, etc. With the recent explosive interest in implementing and improving augmented reality (AR) and virtual reality (VR) systems [4], demand is increasing for rich information to provide more realistic visual experiences. While the light field image is one of the most important content sources, its data require much more storage space or incur high transmission costs. The collection and management of such large amounts of LF data is not easy for many practical applications, and therefore, efficient representation and compression with low computational requirements is essential for practical light field data storage, transmission, and display [5]. LF data have a lot of redundancy since the data will capture image information from different viewpoints of the same scene [6,7].

While the developments for representation and compression of LF data have so far concentrated on how to maximally compress the data in general, very little attention has been given to compression with special emphasis on keeping certain selected functionalities from being greatly affected. Among the different application requirements listed in Table 1, it may be desirable for a certain functionality to be less affected by compression than the others. For example, a smart phone [8,9] in daily casual use may only need the refocusing function for post-processing. In this regard, the motivation of this paper differs from the many existing compression approaches in that we design a representation and coding scheme for light field images that keeps the refocusing functionality of LF images as faithfully as possible at relatively low computational complexity.

The rest of the paper is organized as follows. We briefly review related work in Section 2. Section 3 describes the proposed representation and coding scheme in detail. Experiment results are given in Section 4, and Section 5 concludes the paper.

Table 1. Light Field Image Coding with Emphasis on Selected Functionality.

Functionality

Coding with special emphasis on a certain functionality

Refocusing

Compression for refocusing can generate refocused images quite well with compressed data

Viewing Angle Change

Compression of the view angle can generate different images quite well with compressed data

Exposure Adjust

Compression of exposure adjustments can generate different images quite well with compressed data

2. Related Work

Recently, many researchers have worked on advanced representation and coding techniques to reduce redundancy in light field images. Some work provided comprehensive evaluation of LF image coding schemes after grouping them into two main coding strategies [10-13]. The first strategy relates to the international standards in JPEG Pleno Part 1 (Framework) [14] and Part 2 (LF coding) [15]. They support MuLE [16] and WaSP [17] as coding modes. Standardization of the JPEG LF image coding framework was described in [18], and its 4D-Transform coding solution was explained in [19]. The core framework of the second strategy compresses LF data using the High Efficiency Video Coding (HEVC) scheme by forming multiple views of light field images into one pseudo-video sequence. Chen et al. [20] proposed a disparity-guided sparse coding scheme for light field data based on structural key sub-aperture views. Jiang et al. [21] developed an LF compression scheme using a depth image-based view synthesis technique in which a small subset of views is compressed using HEVC inter-coding tools, and an entire light field is reconstructed using the subset. Jiang and colleagues [22] introduced another LF compression scheme based on homographic low-rank approximation in which the LF views are aligned by homography and then compressed using HEVC. Han et al.~[23] compressed a pseudo-video sequence consisting of central sub-aperture images and a sequence consisting of residual images between the central image and adjacent images using HEVC.

Additionally, we noted studies on converting a light field to a new representation before encoding. Le Pendu et al. [24] used light field data for Fourier disparity layer (FDL) representation under which the root image is encoded with FDL layers. This technique was shown to provide higher coding performance than JPEG-based MuLE [16] and WaSP [17], which belong to the first category of LF coding schemes. Therefore, in a performance evaluation of our proposed method, Le Pendu et al.’s FDL-based scheme [24] was one of the anchors for comparison. Duong et al. [25] proposed representing LF data in a focal stack (FS) in order to compress the given LF data as a pseudo-video sequence using HEVC. This compression scheme was specifically designed with the refocusing application in mind, showing that about 50% of the amount is saved by compressing focal stack data consisting of sampled refocusing images instead of compressing a pseudo-video sequence formed with sub-aperture views. Thus, the encoding scheme with the FS [25] was also taken for comparison.

In this paper, we keep the refocus functionality from being affected by compression, as it is in [24] and [25], but in a different way. We represent light field data in the form of one single all-in-focus (AIF) image and its depth map, both of which are compressed using the well-known HEVC compression technique. The proposed scheme not only covers the full refocus range, but also achieves higher compression. Fig. 1 illustrates the proposed scheme together with two well-known anchors of the FDL-based method [24] and the method compressing the images in a focal stack [25] as a pseudo-video sequence. The proposed representation and compression methods are shown in Fig. 1(c), and the detailed AIF image rendering and depth map generation are in Fig. 1(d). As illustrated in Fig. 1, a focal stack is generated by shifting and adding, as explained in [25]. Assume there are $K$ refocused images in a focal stack, and the $k$th refocused image is $I_{k\_ org}\left(x,y\right)$, where its distance from the aperture plane is $F'=\alpha F$ in which $\alpha =F'/F$ is defined as the relative depth, written as

(1)
$I_{k\_ org}\left(x,y\right)=I_{\alpha }\left(x,y\right)$,
(2)
$ I_{\alpha }\left(x,y\right)=\sum _{u}\sum _{v}L^{\left(u,v\right)}\left(x+u\left(1-\frac{1}{\alpha }\right),y+v\left(1-\frac{1}{\alpha }\right)\right), $

where $L^{\left(u,v\right)}$ represents a sub-aperture image at position $\left(u,v\right)$ from the main lens, and ($u\left(1-\frac{1}{\alpha }\right),v\left(1-\frac{1}{\alpha }\right))$ is a shift offset in the $x,y$ direction.

3. The Proposed Scheme

In this section, we address the proposed representation and compression scheme, which can keep the refocus functionality as much as possible under compression. Unlike existing methods that encode sub-aperture image sequences [20,21,23], the focal stack [25], or the hierarchical FDL [24], we first represent light fields as all-in-focus images and a depth map, and then encode them. During decoding, a focal stack consisting of multiple images having different focus levels is reconstructed from the compressed all-in-focus image using the depth map. Fig. 1(c) shows the main structure of the proposed framework consisting of three parts: refocusing representation, all-in-focus image and depth map generation, and post-focal-stack reconstruction at the decoder.

3.1 Proposed Representation

The proposed light field representation aims at faithfully maintaining refocusing functionality during compression. The refocusing functionality refers to how flexibly and accurately a desired refocused image can be generated. The array of the refocused image is called the focal stack [26]. However, such a focal stack demands a huge volume of data.

Fig. 1. Different frameworks for light field representation and coding: (a) coding with the FDL model[24]; (b) coding with the focal stack[25]; (c) the proposed scheme with emphasis on the refocusing capability; (d) generation of the all-in-focus image and depth map in the proposed scheme.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig1.png

In the proposed scheme, the AIF image and the depth map are used to represent the light field image to be encoded and transmitted for applications that put the emphasis on the refocusing functionality. The all-in-focus image and the depth map can replace a focal stack since they are bi-directional (that is, the all-in-focus image and the depth map can be generated from a focal stack), and the focal stack can be reconstructed from the all-in-focus image and the depth map as well. Refocused images at any depth can be generated from the decoded AIF image and the depth map by using a defocusing filter. These two conversions are used before encoding and after decoding, respectively. The AIF and the depth map can effectively provide refocusing functionality.

The advantages of the proposed scheme are analyzed below. The first advantage is the refocus coverage range. Since users may like to refocus at any depth, the refocusing capability should be able to cover all potential refocusing ranges. Duong et al. [25] represented the light field with a focal stack that includes 24 refocused images before compression, and thus, the refocusing range is limited to the 24 images. However, since the AIF and the depth map data in the proposed scheme are encoded and transmitted, any refocused image can be generated from the decoded AIF image with help from the depth map by using a defocusing filter. Second, in terms of storage, representation and compression using sub-aperture images [20,21,23], a focal stack [25], or the hierarchical FDL [24] are much heavier, because the proposed compression deals with only one AIF image and one gray-level depth map. The third advantage is the generation complexity of the refocused image at the decoder. Complexity is an important factor in practical applications. In the FDL [24], the refocused image generation process should convert the Fourier disparity layer to sub-aperture images. It further calculates shifting slopes and adds all sub-aperture images for display rendering. The focal stack-based scheme [25] compresses only a few sample depth slices. Thus, pixel-wise interpolation should be executed among relevant neighboring sample depth slices if the target refocused depth is not the sampled depth. However, in our case, any refocused image can be generated.

There are in-focus pixels and out-of-focus pixels in one refocused image. The in-focus pixels are directly obtained from the all-in-focus image, and the out-of-focus pixels are obtained by defocusing the relevant all-in-focus image using a predefined filter. Our proposed method demands very low computational complexity.

Fig. 2. Volumetric comparison of light field refocusing representations.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig2.png
Fig. 3. The proposed difference focus measure with adaptive refinement.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig3.png

3.2 All-In-Focus Image and Depth Map Generation

To generate the AIF image and the depth map, we investigated several state-of-the-art methods. There are learning-based depth map estimation algorithms, most of which are based on a fully convolutional neural network [27-30], and they provide high accuracy but with high complexity. On the other hand, rule-based depth estimation methods [31] and AIF image rendering methods [34] that utilize a focal stack have relatively low complexity. In this paper, in order to make the trade-off between accuracy and complexity in the encoding process as a whole, we utilize the focal stack to render both the all-in-focus image and the depth map. Therein, we define the focus map, which indicates how well a given pixel is focused. The degree of focus is measured by a selected focus measure [32,33]. The more in-focus a pixel is, the higher its value in the focus map. Note that in out-of-focus regions where blurred texture-rich pixels, blurred edges, or artifacts are statistically abundant, most of the well-known focus measures, such as LAP2 [35], STA2 [36], GRA7 [37], and RDF [31], may suffer from focus measure error due to high variance, which is typically seen in the in-focus regions. To overcome this problem, a new, very simple focus measure is proposed, which is shown in Fig. 3 and named the difference focus measure.

The difference between two focus maps, one from focal stack $I_{k\_ org}$ and the other from guided-filtered focal stack $I_{k\_ GF}$, is designed to counteract any variance. A difference focus map $F_{k\_ d}$ is defined as

(3)
$F_{k\_ d}\equiv \left\| F_{k{\_ _{org}}}-F_{k\_ GF}\right\| $,

where$~ F_{k\_ org}=FM\left(I_{k\_ org}\right)$ and $F_{k\_ GF}=FM\left(I_{k\_ GF}\right),$ in which $I_{k\_ GF}$ is a smoothed focal stack that maintains the boundary while smoothing the others by a guided filter $G\left(.\right)$ [38], denoted as $I_{k\_ GF}=G\left(I_{k\_ org},I_{k\_ org}\right)$. $FM$(.) indicates the focus measure of choice, and in this paper, the ring difference filter [31] is selected owing to its robustness coming from incorporating both local and non-local characteristics in the filtering window. An example of the difference focus map is shown in Fig. 4. The first-row images are when the entire image is out-of-focus, for which the proposed difference focus map shows a correct focus level (=0), whereas the focus maps from $I_{k\_ org}$ or $I_{k\_ GF}~ $ incorrectly detect the out-of-focus edge area as the in-focus region.

Additionally, adaptive refinement is applied to the proposed difference focus map, $F_{k\_ d}$, to more clearly clarify in-focus and out-of-focus regions. The in-focus region (the white region) in $F_{k\_ d}$ is enhanced; the out-of-focus region (the black region) in $F_{k\_ d}$ is smoothed with a Gaussian filter to remove occasional errors caused by noise or artifacts. To avoid gaps, a blending process on $F_{k\_ d}$ and on the one after refinement is executed to generate final focus map $F_{k}$.

To render the all-in-focus image as seen in Fig. 1(d), the best in-focus pixels at each$~ $position are collected. That is, for a pixel at a position ($x,y)$, its best in-focus pixel value is selected from among $I_{{1_{\_ org}}}\left(x,y\right),\,\,I_{2{\_ _{org}}}\left(x,y\right),$ $\ldots ,\,\,I_{K\_ org}\left(x,y\right)$ by referring to the focus map, $F_{k}\left(x,y\right),$ $k=1,\ldots ,K$. The one giving the maximum focus at position $(x,y)$ from among the $K$ refocused images is selected as the best in-focus pixel, and its image index, denoted by $k\max \left(x,y\right)$, is decided as follows:

Fig. 4. An example of the proposed difference focus map: (a) an image in focal stack $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (b) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}$ from image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (c) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$ from guided-filtered image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{GF}}$; (d) the proposed difference focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}-\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig4.png
Fig. 5. AIF images and depth maps (1st and 2nd rows are comparisons of the rendered AIF images; the 3rd row is a comparison of generated depth maps): (a) Jeon et al.’s method[31]; (b) Chantara and Ho’s method[34]; (c) the proposed difference focus measure with adaptive refinement. GT: ground truth.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig5.png
(4)
$k\max \left(x,y\right)=\underset{k,~ k=1,\ldots ,K}{argmax}\left(F_{k}\left(x,y\right)\right)$.

The best in-focus pixels at all $\left(x,y\right)$positions are collected to form the rendered all-in-focus image as described in

(5)
$AIF\left(x,y\right)=I_{k\max \left(x,y\right)}\left(x,y\right)$.

Depth map $D$ is a collection of pixel-wise indices to the focal stack images that indicate the maximum focus:

(6)
$D\left(x,y\right)=k\max \left(x,y\right)$.
Fig. 6. The proposed focal stack reconstruction.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig6.png

A comparison experiment was carried out for the proposed method and two state-of-the-art methods [31,34] that also utilize a focal stack. The AIF image and depth map result shown in Fig. 5 demonstrate that the proposed method is much closer to ground truth, being cleaner and with higher contrast, better quality, and fewer artifacts.

3.3 Proposed Focal Stack Reconstruction

At the decoder, the focal stack is reconstructed from the AIF image and its depth map. The proposed reconstruction method is explained in this section. The number of images in a focal stack corresponds to the resolution of depth map. Each depth level corresponds to one image in the focal stack.

In generating refocus image $I_{k\_ est}$, which is focused at the $k$th depth, there are two cases to consider: in-focus and out-of-focus pixels. For in-focus pixels, that is, $D\left(x,y\right)=k$, the pixel values are directly available in the AIF image; for out-of-focus pixels, that is, $D\left(x,y\right)\neq k$, the pixel values are obtained by defocusing the AIF image with a blur filter where the defocusing strength depends on the distance between depth $D\left(x,y\right)$ and target depth $k$. When refocusing at depth level $k$, the estimated $k$th image in focal stack $I_{k\_ est}\left(x,y\right)$ at position $\left(x,y\right)$ is computed as follows:

(7)
$I_{k\_ est}\left(x,y\right)=AIF\left(x,y\right)*f\left(\sigma \right)$,
(8)
$\sigma ~ =g\left(\Delta k\right)$,
(9)
$\Delta k=\left| D\left(x,y\right)-k\right| $,

where $f\left(\sigma \right)$ is a defocusing filter in which Gaussian blur is used, and * is the convolution operator. A higher value for $\sigma $ indicates a higher blur strength. Defocusing filter parameter $~ \sigma $ is a function of $\Delta k$, which is the depth distance between target focus depth $k$ and depth level $D\left(x,y\right)$ at the given pixel position, $\left(x,y\right)$. $~ $

Fig. 6 depicts our method for focal stack reconstruction. The marked rectangles are local areas at different depth levels. Depending on the related depth level in the depth map, the defocusing strength of the green rectangle is weak, and the blur strength of the orange rectangle is strong. How strong or weak is represented by blur parameter $\sigma $, and therefore, the proper blur parameter is essential in order to generate the focal stack accurately. We define the difference between the generated pixel in the focal stack from (7) and the pixel in the original focal stack from (1) and (2) as follows:

(10)
$V=\left| I_{k\_ org}\left(x,y\right)-I_{k\_ est}\left(x,y\right)\right| $.

Note that a smaller value for $V$ implies higher accuracy from the $\sigma $ value. Parameter $\sigma $ is a function of $\Delta k$, as shown in (8). To define function $g\left(.\right)$, we first select $N$ pairs of $\left(\Delta k,\sigma \right)$ values, and then these $N$ pairs are fitted to a linear function, as shown in Fig. 7(b).

Regarding the $N$ pairs of $\left(\Delta k,\sigma \right)$ values, $\Delta k$ should be set to cover the range specified by $k=1,2,\ldots ,K-1$. For each $\Delta k$, an appropriate$~ \sigma $ value is calculated by a full search. The search flowchart is shown in Fig. 7(a). For example, to estimate pixel $I_{k\_ est}\left(x,y\right)$ with $\Delta k=1$, an appropriate $\sigma $ value is set as follows: using an initial value, $\sigma =\sigma _{0}$, calculate $V=V_{0}$ with (7) and (10); set $g=1$; update $\sigma =\sigma +g\times \Delta \sigma $ and calculate $V_{i}$; if $V_{i}<V_{i-1}$, then keep the sign of $g$ the same as before and update $\sigma =\sigma +g\times \Delta \sigma $; otherwise, change the sign of $g$ to its opposite, $g=g\times \left(-1\right)$, and update $\sigma =\sigma +g\times \Delta \sigma $; keep updating the $\sigma $ value until $V_{i}<V_{THD}$ or $i>I$. Here, $V_{THD}$ is a predefined threshold for a small $V$ value, and $I$ is a predefined number of iterations. The $N$ pairs are clustered into different groups according to $\Delta k$, and are then curve-fitted using a linear function model; lastly, the fitted linear function is presented in Fig. 7(b). This fitting model shows that the higher the value of depth distance $\Delta k$, the higher the value of defocusing filter strength parameter $\sigma $.

Fig. 7. Decision on the defocusing filter parameter $\boldsymbol{\sigma }$: (a) the search process for $\boldsymbol{\sigma }$ (defocusing filter parameter); (b) linear fitting of defocusing filter parameter $\boldsymbol{\sigma }$.
../../Resources/ieie/IEIESPC.2022.11.5.305/fig7.png

4. Performance Evaluation

In this section, we compare the proposed method with two state-of-the-art representation and compression methods: one is Le Pendu’s method [24], which represents a light field image as the Fourier Disparity Layer and encodes the FDL layers as a pseudo-sequence using HEVC; the other is Duong’s method [25], which converts a light field image to a focal stack, and compresses it as a pseudo-video sequence using HEVC. In the experiment, the proposed method also employs the HEVC reference software (HM) version 16.17 [39] for encoding and decoding to keep the same test condition in the two state-of-the-art methods. The configuration of the encoder is set as follows: the GOP structure is I-B-B-B, as in [40]; test with the six LF data (I01 to I06) in the JPEG-Pleno dataset [41] (Bikes, Danger de Mort, Flowers, Stone Pillars Outside, Fountain Vincent 2, and Ankylosaurus and Diplodocus 1) captured with a Lytro Illum camera.

The performance comparison was made in both terms of $PSNR$ of YUV video and refocusing capability loss due to compression. The $PSNR$ values for each focal stack image were averaged to obtain a representative PSNR value associated with the LF data. It is denoted as LF-PSNR and computed as in (10) where $I_{k\_ comp}$ is the k-th reconstructed focal stack image using (7) at the decoder, and $I_{k\_ org}$ is the anchor focal stack image rendered from the light field data as seen in (1).

The LF-PSNR performance of the proposed method and the two anchor methods are compared in Fig. 8, calculated as follows:

(11)
$LF-PSNR\equiv \frac{1}{K}\sum _{k=1}^{K}PSNR\left(I_{k\_ org}~ ,I_{k\_ comp}\right)$.

Fig. 8 demonstrates that our proposed method attained the highest LF-PSNR among the three methods, especially at low bits per pixel (bpp). For example, for the I01 image, when bpp was 0.01, the proposed method’s LF-PSNR was 1.24dB higher than FDL [24] and 1.74 dB higher than FS representation and compression [25]. When bpp was 0.02, the proposed method’s LF-PSNR values were 0.34dB higher than FDL [24] and 0.09dB higher than the FS [25]. With the I01~I06 results, the average LF-PSNR gain was about 2.38dB and 1.60dB higher than FDL [24] and FS [25], respectively, at bits per pixel less than 0.03 in most cases. In the other methods, a higher bits per pixel leads to less compression loss in the representation of LF data transmitted to the encoder; that is, to the Fourier disparity layers in FDL [24] or to the focal stack in FS [25], and thus, the focal stack PSNR increases according to the reduced coding loss in higher bits per pixel. In our scheme, sent to the encoder are the all-in-focus images and depth maps from which the focal stacks are reconstructed. While less compression loss happens at the depth map at higher bits per pixel, unless the accuracy quality of the estimated depth map is sufficient enough, a consequential PSNR increase in the focal stack is expected to be limited, even as the bits per pixel get higher. This explains why the focal stack PSNR performance from the proposed scheme was not always higher than the other methods with high bits per pixel. It also suggests future research work for improving the accuracy of the depth map estimation, so our scheme can keep gaining PSNR at higher bits per pixel as well.

We analyzed the refocusing capability loss, $LF-RL$, evaluated as the ratio of absolute differences for the two focus maps, $F_{k\_ comp}$ and $F_{k\_ org}$, as calculated in (12), where RL stands for refocusing loss. $F_{k\_ comp}$ is computed using the reconstructed focal stack, $I_{k\_ comp}$, that is, $F_{k\_ comp}=FM\left(I_{k\_ comp}\right)$, and $F_{k\_ org}$ is the focus map of the original (that is, uncompressed) focal stack image of $I_{k\_ org}$, that is, $~ F_{k\_ org}=FM\left(I_{k\_ org}\right).$ In the experiment, we set $K=64,$ which was the depth map resolution. For focus measure operator $FM$ in (3), the proposed difference focus measured in Section 3.2 was applied. The range of the refocusing capability loss, $LF-RL$, was 0 to 1, where a higher value indicates higher loss:

Fig. 8. PSNR comparison of the proposed and state-of-the-art FDL representation & compression[24], and FS representation & compression[25].
../../Resources/ieie/IEIESPC.2022.11.5.305/fig8.png
Fig. 9. Refocusing capability loss ($\boldsymbol{LF}-\boldsymbol{RL}$) comparison of the proposed and state-of-the-art FDL representation & compression[24]and FS representation & compression[25].
../../Resources/ieie/IEIESPC.2022.11.5.305/fig9.png
(12)
$ LF-RL\equiv \frac{1}{K}\sum _{k=1}^{K}\frac{\left| F_{k\_ comp}-F_{k\_ org}\right| }{F_{k\_ org}}. $

Fig. 9 compares the refocusing capability loss ($LF-RL$) for the proposed and the two state-of-the-art methods. The result shows that the proposed method attained the minimum loss in refocusing capability at the same compression ratio. For example, at bpp = 0.01, FDL [24], FS [25], and the proposed method had refocusing capability losses of 0.30, 0.35, and 0.16, respectively. That means the refocusing capability loss under the proposed method was smaller by 14% and 19% compared to FDL [24] and FS [25], respectively. Thus, in practical applications targeting low transmission speeds or less storage space, such as in mobile phones or head-mounted display devices, the proposed method is a good choice.

In our experiment, different bits per pixel are realized with different QP settings from 17 to 42. Fig. 9 also indicates that the refocusing capability loss was less than 0.2 when bpp≤0.05 (about QP$\leq $32). The refocusing capability is perceived as almost intact when $LF-RL$≤0.2 according to our internal subjective perceptual evaluation. Thus, coding with QP$\leq $32 is thought of as an allowable range for practical applications as far as refocusing functionality is concerned.

5. Conclusion

In this paper we have presented an efficient representation and coding scheme for light field data designed to pay special attention to keeping the refocusing functionality as uncompromised as possible. We designed a scheme in which LF data are represented by an all-in-focus image and a depth map, where the AIF/depth map package is encoded with HEVC. After decoding, the refocused focal stack is estimated by convoluting the compressed all-in-focus image with a defocusing function where the strength of the defocusing filter is controlled according to the desired focus level. Our experiment results indicated that at the same compression ratio, the proposed representation and coding strategy had a 2.38 dB average PSNR improvement compared to the state-of-the-art Le Pendu FDL [24], and a 1.60dB improvement over Duong’s FS representation and coding method [25]. At the decoder, the proposed method had smaller refocusing capability losses at 16.2% and 17.8% lower than the two well-known state-of-the-art methods [24,25]. The proposed representation and coding approach with an all-in-focus image and a depth map was shown to provide good compression performance while maintaining the refocusing capability very well.

ACKNOWLEDGMENTS

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2020R1A2C2007673).

REFERENCES

1 
Li H., Guo C., Jia S., 2017, High-Resolution Light-Field Microscopy, in Frontiers in Optics 2017, OSA Technical Digest (online) (Optica Publishing Group), paper FW6D.3,DOI
2 
Tsai D., Dansereau D. G., Peynot T., Corke P., 2017, Image-based visual servoing with light field cameras, IEEE Robotics and Automation Letters, Vol. 2, No. 2, pp. 912-919DOI
3 
Dricot A., Jung J., Cagnazzo M., Pesquet B., Dufaux F., Kovács P. T., 2015, Adhikarla,Subjective evaluation of Super Multi-View compressed contents on high-end light-field 3D displays, Signal Processing: Image Communication, Vol. 39, pp. 369-385DOI
4 
Vetro A., Yea S., Matusik W., Pfister H., Zwicker M., Mar. 2011, Method and system for acquiring, encoding, decoding and displaying 3D light fields, U.S. Patent No. 7,916,934.29URL
5 
Wu G., et al. , 2017, Light field image processing: An overview, IEEE Journal of Selected Topics in Signal Processing, Vol. 11.7, pp. 926-954DOI
6 
Rerabek M., Bruylants T., Ebrahimi T., Pereira F., Schelkens P., ICME 2016 grand challenge: Light-field image compression, Call for proposals and evaluation procedure 2016.URL
7 
Takahashi K., Naemura T., 2016, Layered light-field rendering with focus measurement, Signal Processing: Image Communication, Vol. 21, No. 6, pp. 519-530DOI
8 
Kim M., et al. , Mobile terminal and control method for the mobile terminal, 2018 Nov.20, US10135963B2URL
9 
Light Field Selfie Camera for smartphones, Wooptix Company, Wooptix CompanyURL
10 
Brites C., Ascenso J., Pereira F., Jan. 2021, Lenslet Light Field Image Coding: Classifying, Reviewing and Evaluating, in: IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, No. 1, pp. 339-354DOI
11 
Viola I., Řeřábek M., Ebrahimi T., 2017, Comparison and evaluation of light field image coding approaches, IEEE Journal of selected topics in signal processing, Vol. 11, No. 7, pp. 1092-1106DOI
12 
Conti C., Soares L. D., Nunes P., 2020, Dense Light Field Coding: A Survey, IEEE Access, Vol. 8, pp. 49244-49284DOI
13 
Avramelos V., Praeter J. D., Van Wallendael G., Lambert P., Jun. 2019, Light field image compression using versatile video coding, in: Proc. IEEE 9th Int. Conf. Consum, Electron, pp. 1-6DOI
14 
2020, ISO/IEC 21794-1:2020 Information technology - Plenoptic image coding system (JPEG Pleno) - Part 1: FrameworkURL
15 
2021, ISO/IEC 21794-2:2021 Information technology - Plenoptic image coding system (JPEG Pleno) - Part 2: Light field codingURL
16 
de Carvalho M. B., Pereira M. P., Alves G., da Silva E. A. B., Pagliari C. L., Pereira F., et al. , Oct. 2018, A 4D DCT-based lenslet light field codec, in: Proc. 25th IEEE Int. Conf. Image Process. (ICIP), pp. 435-439DOI
17 
Astola P., Tabus I., Nov. 2018, Hierarchical warping merging and sparse prediction for light field image compression, in: Proc. 7th Eur. Workshop Vis. Inf. Process. (EUVIP), pp. 1-6DOI
18 
Astola P., da Silva Cruz L. A., et al. , Jun. 2020, JPEG Pleno: Standardizing a coding framework and tools for plenoptic imaging modalities, ITU J. ICT Discoveries, Vol. 3, No. 1, pp. 1-15DOI
19 
De Oliveira Alves G., et al. , 2020, The JPEG Pleno Light Field Coding Standard 4D-Transform Mode: How to Design an Efficient 4D-Native Codec, IEEE Access, Vol. 8, pp. 170807-170829DOI
20 
Chen J., Hou J., Chau L. P., 2017, Light field compression with disparity-guided sparse coding based on structural key views, IEEE Transactions on Image Processing, Vol. 27.1, pp. 314-324DOI
21 
Jiang X., Le Pendu M., Guillemot C., 2017, Light field compression using depth image based view synthesis, in: International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, pp. 19-24DOI
22 
Jiang X., Le Pendu M., Farrugia R. A., Guillemot C., 2017, Light field compression with homography-based low-rank approximation, IEEE Journal of Selected Topics in Signal Processing, Vol. 11.7, pp. 1132-1145DOI
23 
Han H., Xin J., Dai Q., Sep. 2018, Plenoptic image compression via simplified subaperture projection, Pacific Rim Conference on Multimedia, Springer, Cham, pp. 274-284DOI
24 
Le Pendu M., Ozcinar C., Smolic A., 2020, Hierarchical Fourier Disparity Layer Transmission For Light Field Streaming, in: IEEE International Conference on Image Processing (ICIP), pp. 2606-2610DOI
25 
Duong V. V., Canh T. N., Huu T. N., Jeon B., Dec. 2019, Focal stack based light field coding for refocusing applications, Journal of Broadcast Engineering, Vol. 24, No. 7, pp. 1246-1258DOI
26 
Ng R., Levoy M., Brédif M., et al. , 2005, Light field photography with a hand-held plenoptic camera, Computer Science Technical Report CSTR, Vol. 2, No. 11, pp. 1-11URL
27 
Shin C., Jeon H. G., Yoon Y., Kweon I. S., Kim S. J., 2018, Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4748-4757DOI
28 
Mun J. H., Ho Y. S., 2018, Depth Estimation from Light Field Images via Convolutional Residual Network, in: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE, Vol. ieee, pp. 1495-1498DOI
29 
Li K., Zhang J., Sun R., Zhang X., Gao J., 2020, EPI-based Oriented Relation Networks for Light Field Depth Estimation, arXiv preprint arXiv:2007.04538URL
30 
Zhou W., Zhou E., Yan Y., Lin L., Lumsdaine A., 2019, Learning Depth Cues from Focal Stack for Light Field Depth Estimation, in: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1074-1078DOI
31 
Jeon H. G., Surh J., Im S., Kweon I. S., 2019, Ring difference filter for fast and noise robust depth from focus, IEEE Trans. on Image Processing, Vol. 29, pp. 1045-1060DOI
32 
Pertuz S., Puig D., Garcia M. A., 2013, Analysis of focus measure operators for shape-from-focus, Pattern Recognition, Vol. 46, No. 5, pp. 1415-1432DOI
33 
Zhao C., Jeon B., 2022, Refocusing Metric of Light Field Image using Region-Adaptive Multi-Scale Focus Measure, in IEEE AccessDOI
34 
Chantara W., Ho Y. S., 2016, Focus Measure of Light Field Image Using Modified Laplacian and Weighted Harmonic Variance, in: Proceedings of the International Workshop on Advanced Image Technology, pp. 6-8URL
35 
Nayar S. K., Nakagawa Y., 1994, Shape from focus[J], IEEE Transactions on Pattern analysis and machine intelligence, Vol. 16, No. 8, pp. 824-831DOI
36 
Wee C. Y., Paramesran R., 2008, Image sharpness measure using eigenvalues, in: IEEE 9th International Conference on Signal Processing, pp. 840-843DOI
37 
Pech-Pacheco J. L., Cristóbal G., Chamorro-Martinez J., 2000, Diatom autofocusing in brightfield microscopy: a comparative study, in: Proceedings 15th International Conference on Pattern Recognition, Vol. 3, pp. 314-317URL
38 
He K., Sun J., 2015, Fast guided filter, arXiv preprint arXiv:1505.00996URL
39 
HEVC reference software, HM 16.17.Google Search
40 
Canh T. N., Duong V. V., Jeon B., Jan. 2019, Boundary handling for video based light field coding with a new hybrid scan order, in: Proc. Inter. Workshop on Advanced Image Tech., pp. 1-4URL
41 
Řeřábek M., Ebrahimi T., 2016, New Light Field Image Dataset, in: 8th International Workshop on Quality of Multimedia Experience (QoMEX), Lisbon, PortugalURL

Author

Chun Zhao
../../Resources/ieie/IEIESPC.2022.11.5.305/au1.png

Chun Zhao received a BS in 2005 and an MS in 2008 from the Department of Electronics Science and Technology, North University of China, Shanxi, China. She joined the MS exchange student program in 2008, and started working in 2016 toward a PhD, in the Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Korea. From 2008 to 2014, she worked in the Research & Design Center, Samsung Electronics, Korea, on Image/Video Enhancement algorithm development and System on Chip (SOC) design, implementing an algorithm based on FPGA/Chip and RTL design. Since 2015, she has been a senior engineer for the Visual Display Business, Samsung Electronics, Korea, where she worked on practical algorithm development for various displays by analyzing panel characteristics. Her research interests include multimedia signal processing, panel color calibration, machine learning, and light field refocusing representation.

Byeungwoo Jeon
../../Resources/ieie/IEIESPC.2022.11.5.305/au2.png

Byeungwoo Jeon (M’90, SM’02) received a BS (Magna Cum Laude) in 1985 and an MS in 1987 from the Department of Electronics Engineering, Seoul National University, Seoul, Korea, and received a PhD from the School of Electrical Engineering, Purdue University, West Lafayette, USA, in 1992. From 1993 to 1997, he was in the Signal Processing Laboratory, Samsung Electronics, Korea, where he worked on research and development of video compression algorithms, design of digital broadcasting satellite receivers, and other MPEG-related research for multimedia applications. Since September 1997, he has been at Sungkyunkwan University (SKKU), Korea, where he is currently a professor. His research interests include multimedia signal processing, video compression, statistical pattern recognition, and remote sensing. He served as Project Manager of Digital TV and Broadcasting in the Korean Ministry of Information and Communications from 2004 to 2006 where he supervised all digital TV-related R&D in Korea. From 2015 to 2016, he was Dean of the College of Information and Communication Engineering, SKKU. In 2019, he was President of the Korean Institute of Broadcast and Media Engineers. Dr. Jeon is a senior member of IEEE, a member of SPIE, an associate editor of IEEE Trans. on Broadcasting and IEEE Trans. on Circuits and Systems for Video Technology. He was a recipient of the 2005 IEEK Haedong Paper Award from the Signal Processing Society in Korea, and received the 2012 Special Service Award and the 2019 Volunteer Award, both from the IEEE Broadcast Technology Society. In 2016, a Korean President’s Commendation was conferred upon him for his key role in promoting international standardization for video coding technology in Korea.