Zhao Chun1
                     Jeon Byeungwoo1
               
                  - 
                           
                        (Department of Electrical and Computer Engineering, Sungkyunkwan University / Suwon
                        16419, Korea
                        							{zhaochun83, bjeon}@skku.edu
                        						)
                        
 
               
             
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Keywords
               
                Light field representation,  Light field coding,  All-in-focus image,  Depth map,  Focal stack reconstruction
             
            
          
         
            
                  1. Introduction
               Light field (LF) cameras can capture light coming at all directions from every point
                  of a scene [1]. The rich data make it possible to realize various applications [2,3] for refocusing, depth estimation, viewing angle change, three-dimensional (3D) object
                  reconstruction, etc. With the recent explosive interest in implementing and improving
                  augmented reality (AR) and virtual reality (VR) systems [4], demand is increasing for rich information to provide more realistic visual experiences.
                  While the light field image is one of the most important content sources, its data
                  require much more storage space or incur high transmission costs. The collection and
                  management of such large amounts of LF data is not easy for many practical applications,
                  and therefore, efficient representation and compression with low computational requirements
                  is essential for practical light field data storage, transmission, and display [5]. LF data have a lot of redundancy since the data will capture image information from
                  different viewpoints of the same scene [6,7].
               
               While the developments for representation and compression of LF data have so far concentrated
                  on how to maximally compress the data in general, very little attention has been given
                  to compression with special emphasis on keeping certain selected functionalities from
                  being greatly affected. Among the different application requirements listed in Table
                  1, it may be desirable for a certain functionality to be less affected by compression
                  than the others. For example, a smart phone [8,9] in daily casual use may only need the refocusing function for post-processing. In
                  this regard, the motivation of this paper differs from the many existing compression
                  approaches in that we design a representation and coding scheme for light field images
                  that keeps the refocusing functionality of LF images as faithfully as possible at
                  relatively low computational complexity.
               
               The rest of the paper is organized as follows. We briefly review related work in Section
                  2. Section 3 describes the proposed representation and coding scheme in detail. Experiment
                  results are given in Section 4, and Section 5 concludes the paper.
               
               
                     Table 1. Light Field Image Coding with Emphasis on Selected Functionality.
                  
                        
                           
                              | 
                                 
                              								
                               Functionality 
                              							
                            | 
                           
                                 
                              								
                               Coding with special emphasis on a certain functionality 
                              							
                            | 
                        
                     
                     
                           
                              | 
                                 
                              								
                               Refocusing 
                              							
                            | 
                           
                                 
                              								
                               Compression for refocusing can generate refocused images quite well with compressed
                                 data
                               
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Viewing Angle Change 
                              							
                            | 
                           
                                 
                              								
                               Compression of the view angle can generate different images quite well with compressed
                                 data
                               
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Exposure Adjust 
                              							
                            | 
                           
                                 
                              								
                               Compression of exposure adjustments can generate different images quite well with
                                 compressed data
                               
                              							
                            | 
                        
                     
                  
                
             
            
                  2. Related Work
               Recently, many researchers have worked on advanced representation and coding techniques
                  to reduce redundancy in light field images. Some work provided comprehensive evaluation
                  of LF image coding schemes after grouping them into two main coding strategies [10-13]. The first strategy relates to the international standards in JPEG Pleno Part 1 (Framework)
                  [14] and Part 2 (LF coding) [15]. They support MuLE [16] and WaSP [17] as coding modes. Standardization of the JPEG LF image coding framework was described
                  in [18], and its 4D-Transform coding solution was explained in [19]. The core framework of the second strategy compresses LF data using the High Efficiency
                  Video Coding (HEVC) scheme by forming multiple views of light field images into one
                  pseudo-video sequence. Chen et al. [20] proposed a disparity-guided sparse coding scheme for light field data based on structural
                  key sub-aperture views. Jiang et al. [21] developed an LF compression scheme using a depth image-based view synthesis technique
                  in which a small subset of views is compressed using HEVC inter-coding tools, and
                  an entire light field is reconstructed using the subset. Jiang and colleagues [22] introduced another LF compression scheme based on homographic low-rank approximation
                  in which the LF views are aligned by homography and then compressed using HEVC. Han
                  et al.~[23] compressed a pseudo-video sequence consisting of central sub-aperture images and
                  a sequence consisting of residual images between the central image and adjacent images
                  using HEVC.
               
               Additionally, we noted studies on converting a light field to a new representation
                  before encoding. Le Pendu et al. [24] used light field data for Fourier disparity layer (FDL) representation under which
                  the root image is encoded with FDL layers. This technique was shown to provide higher
                  coding performance than JPEG-based MuLE [16] and WaSP [17], which belong to the first category of LF coding schemes. Therefore, in a performance
                  evaluation of our proposed method, Le Pendu et al.’s FDL-based scheme [24] was one of the anchors for comparison. Duong et al. [25] proposed representing LF data in a focal stack (FS) in order to compress the given
                  LF data as a pseudo-video sequence using HEVC. This compression scheme was specifically
                  designed with the refocusing application in mind, showing that about 50% of the amount
                  is saved by compressing focal stack data consisting of sampled refocusing images instead
                  of compressing a pseudo-video sequence formed with sub-aperture views. Thus, the encoding
                  scheme with the FS [25] was also taken for comparison.
               
               In this paper, we keep the refocus functionality from being affected by compression,
                  as it is in [24] and [25], but in a different way. We represent light field data in the form of one single
                  all-in-focus (AIF) image and its depth map, both of which are compressed using the
                  well-known HEVC compression technique. The proposed scheme not only covers the full
                  refocus range, but also achieves higher compression. Fig. 1 illustrates the proposed scheme together with two well-known anchors of the FDL-based
                  method [24] and the method compressing the images in a focal stack [25] as a pseudo-video sequence. The proposed representation and compression methods are
                  shown in Fig. 1(c), and the detailed AIF image rendering and depth map generation are in Fig. 1(d). As illustrated in Fig. 1, a focal stack is generated by shifting and adding, as explained in [25]. Assume there are $K$ refocused images in a focal stack, and the $k$th refocused
                  image is $I_{k\_ org}\left(x,y\right)$, where its distance from the aperture plane
                  is $F'=\alpha F$ in which $\alpha =F'/F$ is defined as the relative depth, written
                  as
               
               
               
               where $L^{\left(u,v\right)}$ represents a sub-aperture image at position $\left(u,v\right)$
                  from the main lens, and ($u\left(1-\frac{1}{\alpha }\right),v\left(1-\frac{1}{\alpha
                  }\right))$ is a shift offset in the $x,y$ direction.
               
             
            
                  3. The Proposed Scheme
               In this section, we address the proposed representation and compression scheme, which
                  can keep the refocus functionality as much as possible under compression. Unlike existing
                  methods that encode sub-aperture image sequences [20,21,23], the focal stack [25], or the hierarchical FDL [24], we first represent light fields as all-in-focus images and a depth map, and then
                  encode them. During decoding, a focal stack consisting of multiple images having different
                  focus levels is reconstructed from the compressed all-in-focus image using the depth
                  map. Fig. 1(c) shows the main structure of the proposed framework consisting of three parts: refocusing
                  representation, all-in-focus image and depth map generation, and post-focal-stack
                  reconstruction at the decoder.
               
               
                     3.1 Proposed Representation
                  The proposed light field representation aims at faithfully maintaining refocusing
                     functionality during compression. The refocusing functionality refers to how flexibly
                     and accurately a desired refocused image can be generated. The array of the refocused
                     image is called the focal stack [26]. However, such a focal stack demands a huge volume of data.
                  
                  
                        Fig. 1. Different frameworks for light field representation and coding: (a) coding with the FDL model[24]; (b) coding with the focal stack[25]; (c) the proposed scheme with emphasis on the refocusing capability; (d) generation of the all-in-focus image and depth map in the proposed scheme.
 
                  In the proposed scheme, the AIF image and the depth map are used to represent the
                     light field image to be encoded and transmitted for applications that put the emphasis
                     on the refocusing functionality. The all-in-focus image and the depth map can replace
                     a focal stack since they are bi-directional (that is, the all-in-focus image and the
                     depth map can be generated from a focal stack), and the focal stack can be reconstructed
                     from the all-in-focus image and the depth map as well. Refocused images at any depth
                     can be generated from the decoded AIF image and the depth map by using a defocusing
                     filter. These two conversions are used before encoding and after decoding, respectively.
                     The AIF and the depth map can effectively provide refocusing functionality. 
                  
                  The advantages of the proposed scheme are analyzed below. The first advantage is the
                     refocus coverage range. Since users may like to refocus at any depth, the refocusing
                     capability should be able to cover all potential refocusing ranges. Duong et al. [25] represented the light field with a focal stack that includes 24 refocused images
                     before compression, and thus, the refocusing range is limited to the 24 images. However,
                     since the AIF and the depth map data in the proposed scheme are encoded and transmitted,
                     any refocused image can be generated from the decoded AIF image with help from the
                     depth map by using a defocusing filter. Second, in terms of storage, representation
                     and compression using sub-aperture images [20,21,23], a focal stack [25], or the hierarchical FDL [24] are much heavier, because the proposed compression deals with only one AIF image
                     and one gray-level depth map. The third advantage is the generation complexity of
                     the refocused image at the decoder. Complexity is an important factor in practical
                     applications. In the FDL [24], the refocused image generation process should convert the Fourier disparity layer
                     to sub-aperture images. It further calculates shifting slopes and adds all sub-aperture
                     images for display rendering. The focal stack-based scheme [25] compresses only a few sample depth slices. Thus, pixel-wise interpolation should
                     be executed among relevant neighboring sample depth slices if the target refocused
                     depth is not the sampled depth. However, in our case, any refocused image can be generated.
                  
                  There are in-focus pixels and out-of-focus pixels in one refocused image. The in-focus
                     pixels are directly obtained from the all-in-focus image, and the out-of-focus pixels
                     are obtained by defocusing the relevant all-in-focus image using a predefined filter.
                     Our proposed method demands very low computational complexity.
                  
                  
                        Fig. 2. Volumetric comparison of light field refocusing representations.
 
                  
                        Fig. 3. The proposed difference focus measure with adaptive refinement.
 
                
               
                     3.2 All-In-Focus Image and Depth Map Generation
                  To generate the AIF image and the depth map, we investigated several state-of-the-art
                     methods. There are learning-based depth map estimation algorithms, most of which are
                     based on a fully convolutional neural network [27-30], and they provide high accuracy but with high complexity. On the other hand, rule-based
                     depth estimation methods [31] and AIF image rendering methods [34] that utilize a focal stack have relatively low complexity. In this paper, in order
                     to make the trade-off between accuracy and complexity in the encoding process as a
                     whole, we utilize the focal stack to render both the all-in-focus image and the depth
                     map. Therein, we define the focus map, which indicates how well a given pixel is focused.
                     The degree of focus is measured by a selected focus measure [32,33]. The more in-focus a pixel is, the higher its value in the focus map. Note that in
                     out-of-focus regions where blurred texture-rich pixels, blurred edges, or artifacts
                     are statistically abundant, most of the well-known focus measures, such as LAP2 [35], STA2 [36], GRA7 [37], and RDF [31], may suffer from focus measure error due to high variance, which is typically seen
                     in the in-focus regions. To overcome this problem, a new, very simple focus measure
                     is proposed, which is shown in Fig. 3 and named the difference focus measure.
                  
                  The difference between two focus maps, one from focal stack $I_{k\_ org}$ and the
                     other from guided-filtered focal stack $I_{k\_ GF}$, is designed to counteract any
                     variance. A difference focus map $F_{k\_ d}$ is defined as
                  
                  
                  where$~ F_{k\_ org}=FM\left(I_{k\_ org}\right)$ and $F_{k\_ GF}=FM\left(I_{k\_ GF}\right),$
                     in which $I_{k\_ GF}$ is a smoothed focal stack that maintains the boundary while
                     smoothing the others by a guided filter $G\left(.\right)$ [38], denoted as $I_{k\_ GF}=G\left(I_{k\_ org},I_{k\_ org}\right)$. $FM$(.) indicates
                     the focus measure of choice, and in this paper, the ring difference filter [31] is selected owing to its robustness coming from incorporating both local and non-local
                     characteristics in the filtering window. An example of the difference focus map is
                     shown in Fig. 4. The first-row images are when the entire image is out-of-focus, for which the proposed
                     difference focus map shows a correct focus level (=0), whereas the focus maps from
                     $I_{k\_ org}$ or $I_{k\_ GF}~ $ incorrectly detect the out-of-focus edge area as the
                     in-focus region.
                  
                  Additionally, adaptive refinement is applied to the proposed difference focus map,
                     $F_{k\_ d}$, to more clearly clarify in-focus and out-of-focus regions. The in-focus
                     region (the white region) in $F_{k\_ d}$ is enhanced; the out-of-focus region (the
                     black region) in $F_{k\_ d}$ is smoothed with a Gaussian filter to remove occasional
                     errors caused by noise or artifacts. To avoid gaps, a blending process on $F_{k\_
                     d}$ and on the one after refinement is executed to generate final focus map $F_{k}$.
                  
                  To render the all-in-focus image as seen in Fig. 1(d), the best in-focus pixels at each$~ $position are collected. That is, for a pixel
                     at a position ($x,y)$, its best in-focus pixel value is selected from among $I_{{1_{\_
                     org}}}\left(x,y\right),\,\,I_{2{\_ _{org}}}\left(x,y\right),$ $\ldots ,\,\,I_{K\_
                     org}\left(x,y\right)$ by referring to the focus map, $F_{k}\left(x,y\right),$ $k=1,\ldots
                     ,K$. The one giving the maximum focus at position $(x,y)$ from among the $K$ refocused
                     images is selected as the best in-focus pixel, and its image index, denoted by $k\max
                     \left(x,y\right)$, is decided as follows:
                  
                  
                        Fig. 4. An example of the proposed difference focus map: (a) an image in focal stack $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (b) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}$ from image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (c) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$ from guided-filtered image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{GF}}$; (d) the proposed difference focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}-\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$.
 
                  
                        Fig. 5. AIF images and depth maps (1st and 2nd rows are comparisons of the rendered AIF images; the 3rd row is a comparison of generated depth maps): (a) Jeon et al.’s method[31]; (b) Chantara and Ho’s method[34]; (c) the proposed difference focus measure with adaptive refinement. GT: ground truth.
 
                  
                  The best in-focus pixels at all $\left(x,y\right)$positions are collected to form
                     the rendered all-in-focus image as described in
                  
                  
                  Depth map $D$ is a collection of pixel-wise indices to the focal stack images that
                     indicate the maximum focus:
                  
                  
                  
                        Fig. 6. The proposed focal stack reconstruction.
 
                  A comparison experiment was carried out for the proposed method and two state-of-the-art
                     methods [31,34] that also utilize a focal stack. The AIF image and depth map result shown in Fig. 5 demonstrate that the proposed method is much closer to ground truth, being cleaner
                     and with higher contrast, better quality, and fewer artifacts.
                  
                
               
                     3.3 Proposed Focal Stack Reconstruction
                  At the decoder, the focal stack is reconstructed from the AIF image and its depth
                     map. The proposed reconstruction method is explained in this section. The number of
                     images in a focal stack corresponds to the resolution of depth map. Each depth level
                     corresponds to one image in the focal stack.
                  
                  In generating refocus image $I_{k\_ est}$, which is focused at the $k$th depth, there
                     are two cases to consider: in-focus and out-of-focus pixels. For in-focus pixels,
                     that is, $D\left(x,y\right)=k$, the pixel values are directly available in the AIF
                     image; for out-of-focus pixels, that is, $D\left(x,y\right)\neq k$, the pixel values
                     are obtained by defocusing the AIF image with a blur filter where the defocusing strength
                     depends on the distance between depth $D\left(x,y\right)$ and target depth $k$. When
                     refocusing at depth level $k$, the estimated $k$th image in focal stack $I_{k\_ est}\left(x,y\right)$
                     at position $\left(x,y\right)$ is computed as follows:
                  
                  
                  
                  
                  where $f\left(\sigma \right)$ is a defocusing filter in which Gaussian blur is used,
                     and * is the convolution operator. A higher value for $\sigma $ indicates a higher
                     blur strength. Defocusing filter parameter $~ \sigma $ is a function of $\Delta k$,
                     which is the depth distance between target focus depth $k$ and depth level $D\left(x,y\right)$
                     at the given pixel position, $\left(x,y\right)$. $~ $
                  
                  Fig. 6 depicts our method for focal stack reconstruction. The marked rectangles are local
                     areas at different depth levels. Depending on the related depth level in the depth
                     map, the defocusing strength of the green rectangle is weak, and the blur strength
                     of the orange rectangle is strong. How strong or weak is represented by blur parameter
                     $\sigma $, and therefore, the proper blur parameter is essential in order to generate
                     the focal stack accurately. We define the difference between the generated pixel in
                     the focal stack from (7) and the pixel in the original focal stack from (1) and (2) as follows:
                  
                  
                  Note that a smaller value for $V$ implies higher accuracy from the $\sigma $ value.
                     Parameter $\sigma $ is a function of $\Delta k$, as shown in (8). To define function $g\left(.\right)$, we first select $N$ pairs of $\left(\Delta
                     k,\sigma \right)$ values, and then these $N$ pairs are fitted to a linear function,
                     as shown in Fig. 7(b).
                  
                  Regarding the $N$ pairs of $\left(\Delta k,\sigma \right)$ values, $\Delta k$ should
                     be set to cover the range specified by $k=1,2,\ldots ,K-1$. For each $\Delta k$, an
                     appropriate$~ \sigma $ value is calculated by a full search. The search flowchart
                     is shown in Fig. 7(a). For example, to estimate pixel $I_{k\_ est}\left(x,y\right)$ with $\Delta k=1$,
                     an appropriate $\sigma $ value is set as follows: using an initial value, $\sigma
                     =\sigma _{0}$, calculate $V=V_{0}$ with (7) and (10); set $g=1$; update $\sigma =\sigma +g\times \Delta \sigma $ and calculate $V_{i}$;
                     if $V_{i}<V_{i-1}$, then keep the sign of $g$ the same as before and update $\sigma
                     =\sigma +g\times \Delta \sigma $; otherwise, change the sign of $g$ to its opposite,
                     $g=g\times \left(-1\right)$, and update $\sigma =\sigma +g\times \Delta \sigma $;
                     keep updating the $\sigma $ value until $V_{i}<V_{THD}$ or $i>I$. Here, $V_{THD}$
                     is a predefined threshold for a small $V$ value, and $I$ is a predefined number of
                     iterations. The $N$ pairs are clustered into different groups according to $\Delta
                     k$, and are then curve-fitted using a linear function model; lastly, the fitted linear
                     function is presented in Fig. 7(b). This fitting model shows that the higher the value of depth distance $\Delta k$,
                     the higher the value of defocusing filter strength parameter $\sigma $.
                  
                  
                        Fig. 7. Decision on the defocusing filter parameter $\boldsymbol{\sigma }$: (a) the search process for $\boldsymbol{\sigma }$ (defocusing filter parameter); (b) linear fitting of defocusing filter parameter $\boldsymbol{\sigma }$.
 
                
             
            
                  4. Performance Evaluation
               In this section, we compare the proposed method with two state-of-the-art representation
                  and compression methods: one is Le Pendu’s method [24], which represents a light field image as the Fourier Disparity Layer and encodes
                  the FDL layers as a pseudo-sequence using HEVC; the other is Duong’s method [25], which converts a light field image to a focal stack, and compresses it as a pseudo-video
                  sequence using HEVC. In the experiment, the proposed method also employs the HEVC
                  reference software (HM) version 16.17 [39] for encoding and decoding to keep the same test condition in the two state-of-the-art
                  methods. The configuration of the encoder is set as follows: the GOP structure is
                  I-B-B-B, as in [40]; test with the six LF data (I01 to I06) in the JPEG-Pleno dataset [41] (Bikes, Danger de Mort, Flowers, Stone Pillars Outside, Fountain Vincent 2, and Ankylosaurus
                  and Diplodocus 1) captured with a Lytro Illum camera.
               
               The performance comparison was made in both terms of $PSNR$ of YUV video and refocusing
                  capability loss due to compression. The $PSNR$ values for each focal stack image were
                  averaged to obtain a representative PSNR value associated with the LF data. It is
                  denoted as LF-PSNR and computed as in (10) where $I_{k\_ comp}$ is the k-th reconstructed focal stack image using (7) at the decoder, and $I_{k\_ org}$ is the anchor focal stack image rendered from the
                  light field data as seen in (1).
               
               The LF-PSNR performance of the proposed method and the two anchor methods are compared
                  in Fig. 8, calculated as follows:
               
               
               Fig. 8 demonstrates that our proposed method attained the highest LF-PSNR among the three
                  methods, especially at low bits per pixel (bpp). For example, for the I01 image, when
                  bpp was 0.01, the proposed method’s LF-PSNR was 1.24dB higher than FDL [24] and 1.74 dB higher than FS representation and compression [25]. When bpp was 0.02, the proposed method’s LF-PSNR values were 0.34dB higher than
                  FDL [24] and 0.09dB higher than the FS [25]. With the I01~I06 results, the average LF-PSNR gain was about 2.38dB and 1.60dB higher
                  than FDL [24] and FS [25], respectively, at bits per pixel less than 0.03 in most cases. In the other methods,
                  a higher bits per pixel leads to less compression loss in the representation of LF
                  data transmitted to the encoder; that is, to the Fourier disparity layers in FDL [24] or to the focal stack in FS [25], and thus, the focal stack PSNR increases according to the reduced coding loss in
                  higher bits per pixel. In our scheme, sent to the encoder are the all-in-focus images
                  and depth maps from which the focal stacks are reconstructed. While less compression
                  loss happens at the depth map at higher bits per pixel, unless the accuracy quality
                  of the estimated depth map is sufficient enough, a consequential PSNR increase in
                  the focal stack is expected to be limited, even as the bits per pixel get higher.
                  This explains why the focal stack PSNR performance from the proposed scheme was not
                  always higher than the other methods with high bits per pixel. It also suggests future
                  research work for improving the accuracy of the depth map estimation, so our scheme
                  can keep gaining PSNR at higher bits per pixel as well.
               
               We analyzed the refocusing capability loss, $LF-RL$, evaluated as the ratio of absolute
                  differences for the two focus maps, $F_{k\_ comp}$ and $F_{k\_ org}$, as calculated
                  in (12), where RL stands for refocusing loss. $F_{k\_ comp}$ is computed using the reconstructed
                  focal stack, $I_{k\_ comp}$, that is, $F_{k\_ comp}=FM\left(I_{k\_ comp}\right)$,
                  and $F_{k\_ org}$ is the focus map of the original (that is, uncompressed) focal stack
                  image of $I_{k\_ org}$, that is, $~ F_{k\_ org}=FM\left(I_{k\_ org}\right).$ In the
                  experiment, we set $K=64,$ which was the depth map resolution. For focus measure operator
                  $FM$ in (3), the proposed difference focus measured in Section 3.2 was applied. The range of
                  the refocusing capability loss, $LF-RL$, was 0 to 1, where a higher value indicates
                  higher loss:
               
               
                     Fig. 8. PSNR comparison of the proposed and state-of-the-art FDL representation & compression[24], and FS representation & compression[25].
 
               
                     Fig. 9. Refocusing capability loss ($\boldsymbol{LF}-\boldsymbol{RL}$) comparison of the proposed and state-of-the-art FDL representation & compression[24]and FS representation & compression[25].
 
               
               Fig. 9 compares the refocusing capability loss ($LF-RL$) for the proposed and the two state-of-the-art
                  methods. The result shows that the proposed method attained the minimum loss in refocusing
                  capability at the same compression ratio. For example, at bpp = 0.01, FDL [24], FS [25], and the proposed method had refocusing capability losses of 0.30, 0.35, and 0.16,
                  respectively. That means the refocusing capability loss under the proposed method
                  was smaller by 14% and 19% compared to FDL [24] and FS [25], respectively. Thus, in practical applications targeting low transmission speeds
                  or less storage space, such as in mobile phones or head-mounted display devices, the
                  proposed method is a good choice.
               
               In our experiment, different bits per pixel are realized with different QP settings
                  from 17 to 42. Fig. 9 also indicates that the refocusing capability loss was less than 0.2 when bpp≤0.05
                  (about QP$\leq $32). The refocusing capability is perceived as almost intact when
                  $LF-RL$≤0.2 according to our internal subjective perceptual evaluation. Thus, coding
                  with QP$\leq $32 is thought of as an allowable range for practical applications as
                  far as refocusing functionality is concerned.
               
             
            
                  5. Conclusion
               In this paper we have presented an efficient representation and coding scheme for
                  light field data designed to pay special attention to keeping the refocusing functionality
                  as uncompromised as possible. We designed a scheme in which LF data are represented
                  by an all-in-focus image and a depth map, where the AIF/depth map package is encoded
                  with HEVC. After decoding, the refocused focal stack is estimated by convoluting the
                  compressed all-in-focus image with a defocusing function where the strength of the
                  defocusing filter is controlled according to the desired focus level. Our experiment
                  results indicated that at the same compression ratio, the proposed representation
                  and coding strategy had a 2.38 dB average PSNR improvement compared to the state-of-the-art
                  Le Pendu FDL [24], and a 1.60dB improvement over Duong’s FS representation and coding method [25]. At the decoder, the proposed method had smaller refocusing capability losses at
                  16.2% and 17.8% lower than the two well-known state-of-the-art methods [24,25]. The proposed representation and coding approach with an all-in-focus image and a
                  depth map was shown to provide good compression performance while maintaining the
                  refocusing capability very well.
               
             
          
         
            
                  ACKNOWLEDGMENTS
               This research was supported by the Basic Science Research Program through the National
                  Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2020R1A2C2007673).
                  
                  			
               
             
            
                  
                     REFERENCES
                  
                     
                        
                        Li H., Guo C., Jia S., 2017, High-Resolution Light-Field Microscopy, in Frontiers
                           in Optics 2017, OSA Technical Digest (online) (Optica Publishing Group), paper FW6D.3,

 
                      
                     
                        
                        Tsai D., Dansereau D. G., Peynot T., Corke P., 2017, Image-based visual servoing with
                           light field cameras, IEEE Robotics and Automation Letters, Vol. 2, No. 2, pp. 912-919

 
                      
                     
                        
                        Dricot A., Jung J., Cagnazzo M., Pesquet B., Dufaux F., Kovács P. T., 2015, Adhikarla,Subjective
                           evaluation of Super Multi-View compressed contents on high-end light-field 3D displays,
                           Signal Processing: Image Communication, Vol. 39, pp. 369-385

 
                      
                     
                        
                        Vetro A., Yea S., Matusik W., Pfister H., Zwicker M., Mar. 2011, Method and system
                           for acquiring, encoding, decoding and displaying 3D light fields, U.S. Patent No.
                           7,916,934.29

 
                      
                     
                        
                        Wu G., et al. , 2017, Light field image processing: An overview, IEEE Journal of Selected
                           Topics in Signal Processing, Vol. 11.7, pp. 926-954

 
                      
                     
                        
                        Rerabek M., Bruylants T., Ebrahimi T., Pereira F., Schelkens P., ICME 2016 grand challenge:
                           Light-field image compression, Call for proposals and evaluation procedure 2016.

 
                      
                     
                        
                        Takahashi K., Naemura T., 2016, Layered light-field rendering with focus measurement,
                           Signal Processing: Image Communication, Vol. 21, No. 6, pp. 519-530

 
                      
                     
                        
                        Kim M., et al. , Mobile terminal and control method for the mobile terminal, 2018
                           Nov.20, US10135963B2

 
                      
                     
                        
                        Light Field Selfie Camera for smartphones, Wooptix Company, Wooptix Company

 
                      
                     
                        
                        Brites C., Ascenso J., Pereira F., Jan. 2021, Lenslet Light Field Image Coding: Classifying,
                           Reviewing and Evaluating, in: IEEE Transactions on Circuits and Systems for Video
                           Technology, Vol. 31, No. 1, pp. 339-354

 
                      
                     
                        
                        Viola I., Řeřábek M., Ebrahimi T., 2017, Comparison and evaluation of light field
                           image coding approaches, IEEE Journal of selected topics in signal processing, Vol.
                           11, No. 7, pp. 1092-1106

 
                      
                     
                        
                        Conti C., Soares L. D., Nunes P., 2020, Dense Light Field Coding: A Survey, IEEE Access,
                           Vol. 8, pp. 49244-49284

 
                      
                     
                        
                        Avramelos V., Praeter J. D., Van Wallendael G., Lambert P., Jun. 2019, Light field
                           image compression using versatile video coding, in: Proc. IEEE 9th Int. Conf. Consum,
                           Electron, pp. 1-6

 
                      
                     
                        
                        2020, ISO/IEC 21794-1:2020 Information technology - Plenoptic image coding system
                           (JPEG Pleno) - Part 1: Framework

 
                      
                     
                        
                        2021, ISO/IEC 21794-2:2021 Information technology - Plenoptic image coding system
                           (JPEG Pleno) - Part 2: Light field coding

 
                      
                     
                        
                        de Carvalho M. B., Pereira M. P., Alves G., da Silva E. A. B., Pagliari C. L., Pereira
                           F., et al. , Oct. 2018, A 4D DCT-based lenslet light field codec, in: Proc. 25th IEEE
                           Int. Conf. Image Process. (ICIP), pp. 435-439

 
                      
                     
                        
                        Astola P., Tabus I., Nov. 2018, Hierarchical warping merging and sparse prediction
                           for light field image compression, in: Proc. 7th Eur. Workshop Vis. Inf. Process.
                           (EUVIP), pp. 1-6

 
                      
                     
                        
                        Astola P., da Silva Cruz L. A., et al. , Jun. 2020, JPEG Pleno: Standardizing a coding
                           framework and tools for plenoptic imaging modalities, ITU J. ICT Discoveries, Vol.
                           3, No. 1, pp. 1-15

 
                      
                     
                        
                        De Oliveira Alves G., et al. , 2020, The JPEG Pleno Light Field Coding Standard 4D-Transform
                           Mode: How to Design an Efficient 4D-Native Codec, IEEE Access, Vol. 8, pp. 170807-170829

 
                      
                     
                        
                        Chen J., Hou J., Chau L. P., 2017, Light field compression with disparity-guided sparse
                           coding based on structural key views, IEEE Transactions on Image Processing, Vol.
                           27.1, pp. 314-324

 
                      
                     
                        
                        Jiang X., Le Pendu M., Guillemot C., 2017, Light field compression using depth image
                           based view synthesis, in: International Conference on Multimedia & Expo Workshops
                           (ICMEW), IEEE, pp. 19-24

 
                      
                     
                        
                        Jiang X., Le Pendu M., Farrugia R. A., Guillemot C., 2017, Light field compression
                           with homography-based low-rank approximation, IEEE Journal of Selected Topics in Signal
                           Processing, Vol. 11.7, pp. 1132-1145

 
                      
                     
                        
                        Han H., Xin J., Dai Q., Sep. 2018, Plenoptic image compression via simplified subaperture
                           projection, Pacific Rim Conference on Multimedia, Springer, Cham, pp. 274-284

 
                      
                     
                        
                        Le Pendu M., Ozcinar C., Smolic A., 2020, Hierarchical Fourier Disparity Layer Transmission
                           For Light Field Streaming, in: IEEE International Conference on Image Processing (ICIP),
                           pp. 2606-2610

 
                      
                     
                        
                        Duong V. V., Canh T. N., Huu T. N., Jeon B., Dec. 2019, Focal stack based light field
                           coding for refocusing applications, Journal of Broadcast Engineering, Vol. 24, No.
                           7, pp. 1246-1258

 
                      
                     
                        
                        Ng R., Levoy M., Brédif M., et al. , 2005, Light field photography with a hand-held
                           plenoptic camera, Computer Science Technical Report CSTR, Vol. 2, No. 11, pp. 1-11

 
                      
                     
                        
                        Shin C., Jeon H. G., Yoon Y., Kweon I. S., Kim S. J., 2018, Epinet: A fully-convolutional
                           neural network using epipolar geometry for depth from light field images, in: Proceedings
                           of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4748-4757

 
                      
                     
                        
                        Mun J. H., Ho Y. S., 2018, Depth Estimation from Light Field Images via Convolutional
                           Residual Network, in: Asia-Pacific Signal and Information Processing Association Annual
                           Summit and Conference (APSIPA ASC), IEEE, Vol. ieee, pp. 1495-1498

 
                      
                     
                        
                        Li K., Zhang J., Sun R., Zhang X., Gao J., 2020, EPI-based Oriented Relation Networks
                           for Light Field Depth Estimation, arXiv preprint arXiv:2007.04538

 
                      
                     
                        
                        Zhou W., Zhou E., Yan Y., Lin L., Lumsdaine A., 2019, Learning Depth Cues from Focal
                           Stack for Light Field Depth Estimation, in: 2019 IEEE International Conference on
                           Image Processing (ICIP), pp. 1074-1078

 
                      
                     
                        
                        Jeon H. G., Surh J., Im S., Kweon I. S., 2019, Ring difference filter for fast and
                           noise robust depth from focus, IEEE Trans. on Image Processing, Vol. 29, pp. 1045-1060

 
                      
                     
                        
                        Pertuz S., Puig D., Garcia M. A., 2013, Analysis of focus measure operators for shape-from-focus,
                           Pattern Recognition, Vol. 46, No. 5, pp. 1415-1432

 
                      
                     
                        
                        Zhao C., Jeon B., 2022, Refocusing Metric of Light Field Image using Region-Adaptive
                           Multi-Scale Focus Measure, in IEEE Access

 
                      
                     
                        
                        Chantara W., Ho Y. S., 2016, Focus Measure of Light Field Image Using Modified Laplacian
                           and Weighted Harmonic Variance, in: Proceedings of the International Workshop on Advanced
                           Image Technology, pp. 6-8

 
                      
                     
                        
                        Nayar S. K., Nakagawa Y., 1994, Shape from focus[J], IEEE Transactions on Pattern
                           analysis and machine intelligence, Vol. 16, No. 8, pp. 824-831

 
                      
                     
                        
                        Wee C. Y., Paramesran R., 2008, Image sharpness measure using eigenvalues, in: IEEE
                           9th International Conference on Signal Processing, pp. 840-843

 
                      
                     
                        
                        Pech-Pacheco J. L., Cristóbal G., Chamorro-Martinez J., 2000, Diatom autofocusing
                           in brightfield microscopy: a comparative study, in: Proceedings 15th International
                           Conference on Pattern Recognition, Vol. 3, pp. 314-317

 
                      
                     
                        
                        He K., Sun J., 2015, Fast guided filter, arXiv preprint arXiv:1505.00996

 
                      
                     
                        
                        HEVC reference software, HM 16.17.

 
                      
                     
                        
                        Canh T. N., Duong V. V., Jeon B., Jan. 2019, Boundary handling for video based light
                           field coding with a new hybrid scan order, in: Proc. Inter. Workshop on Advanced Image
                           Tech., pp. 1-4

 
                      
                     
                        
                        Řeřábek M., Ebrahimi T., 2016, New Light Field Image Dataset, in: 8th International
                           Workshop on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal

 
                      
                   
                
             
            Author
            
            
               			Chun Zhao received a BS in 2005 and an MS in 2008 from the Department of Electronics
               Science and Technology, North University of China, Shanxi, China. She joined the MS
               exchange student program in 2008, and started working in 2016 toward a PhD, in the
               Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon,
               Korea. From 2008 to 2014, she worked in the Research & Design Center, Samsung Electronics,
               Korea, on Image/Video Enhancement algorithm development and System on Chip (SOC) design,
               implementing an algorithm based on FPGA/Chip and RTL design. Since 2015, she has been
               a senior engineer for the Visual Display Business, Samsung Electronics, Korea, where
               she worked on practical algorithm development for various displays by analyzing panel
               characteristics. Her research interests include multimedia signal processing, panel
               color calibration, machine learning, and light field refocusing representation.
               		
            
            
            
               			Byeungwoo Jeon (M’90, SM’02) received a BS (Magna Cum Laude) in 1985 and an MS
               in 1987 from the Department of Electronics Engineering, Seoul National University,
               Seoul, Korea, and received a PhD from the School of Electrical Engineering, Purdue
               University, West Lafayette, USA, in 1992. From 1993 to 1997, he was in the Signal
               Processing Laboratory, Samsung Electronics, Korea, where he worked on research and
               development of video compression algorithms, design of digital broadcasting satellite
               receivers, and other MPEG-related research for multimedia applications. Since September
               1997, he has been at Sungkyunkwan University (SKKU), Korea, where he is currently
               a professor. His research interests include multimedia signal processing, video compression,
               statistical pattern recognition, and remote sensing. He served as Project Manager
               of Digital TV and Broadcasting in the Korean Ministry of Information and Communications
               from 2004 to 2006 where he supervised all digital TV-related R&D in Korea. From 2015
               to 2016, he was Dean of the College of Information and Communication Engineering,
               SKKU. In 2019, he was President of the Korean Institute of Broadcast and Media Engineers.
               Dr. Jeon is a senior member of IEEE, a member of SPIE, an associate editor of IEEE
               Trans. on Broadcasting and IEEE Trans. on Circuits and Systems for Video Technology.
               He was a recipient of the 2005 IEEK Haedong Paper Award from the Signal Processing
               Society in Korea, and received the 2012 Special Service Award and the 2019 Volunteer
               Award, both from the IEEE Broadcast Technology Society. In 2016, a Korean President’s
               Commendation was conferred upon him for his key role in promoting international standardization
               for video coding technology in Korea.