All work-items in a sub-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the subgroup barrier
| 
                    void
                    
                        sub_group_barrier
                    
                ( | cl_mem_fence_flags flags ) | 
| 
                    void
                    
                        sub_group_barrier
                    
                ( | cl_mem_fence_flags flags, | 
| memory_scope scope ) | 
All work-items in a sub-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the subgroup barrier. This function must be encountered by all workitems in a sub-group executing the kernel. These rules apply to ND-ranges implemented with uniform and nonuniform work-groups.
          If sub_group_barrier is inside 
          a conditional statement, then all work-items
          must enter the conditional if any work-item 
          enters the conditional statement and
          executes the subgroup barrier.
        
          If sub_group_barrier is inside a loop, 
          all work-items within the sub-group must execute the
          sub_group_barrier for each 
          iteration of the loop before any are allowed
          to continue execution beyond the sub_group_barrier.
        
          The sub_group_barrier function 
          also queues a memory fence (reads and writes)
          to ensure correct ordering of memory operations to local or global memory.
        
          The flags argument specifies 
          the memory address space and can be
          set to a combination of the following values.
        
          CLK_LOCAL_MEM_FENCE - The 
          sub_group_barrier function
          will either flush any variables stored in local memory or queue a memory fence to
          ensure correct ordering of memory operations to local memory.
        
          CLK_GLOBAL_MEM_FENCE - The 
          sub_group_barrier function
          will queue a memory fence to ensure correct ordering of memory operations to global
          memory. This can be useful when work-items, for example, write to buffer or image
          objects and then want to read the updated data from these buffer objects.
        
          CLK_IMAGE_MEM_FENCE - The 
            sub_group_barrier function
            will queue a memory fence to ensure correct 
            ordering of memory operations to image objects. This 
            can be useful when work-items, for example, write to 
            image objects and then want to read the updated data 
            from these image objects.
        
 Copyright © 2007-2013 The Khronos Group Inc. 
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and/or associated documentation files (the
"Materials"), to deal in the Materials without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Materials, and to
permit persons to whom the Materials are furnished to do so, subject to
the condition that this copyright notice and permission notice shall be included
in all copies or substantial portions of the Materials.
Copyright © 2007-2013 The Khronos Group Inc. 
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and/or associated documentation files (the
"Materials"), to deal in the Materials without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Materials, and to
permit persons to whom the Materials are furnished to do so, subject to
the condition that this copyright notice and permission notice shall be included
in all copies or substantial portions of the Materials.