[go: up one dir, main page]

CN119343930A - Dynamic camera selection - Google Patents

Dynamic camera selection Download PDF

Info

Publication number
CN119343930A
CN119343930A CN202380045271.6A CN202380045271A CN119343930A CN 119343930 A CN119343930 A CN 119343930A CN 202380045271 A CN202380045271 A CN 202380045271A CN 119343930 A CN119343930 A CN 119343930A
Authority
CN
China
Prior art keywords
camera
image
cameras
location
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380045271.6A
Other languages
Chinese (zh)
Inventor
G·E·威廉姆斯
S·鲍威尔斯
R·M·舒勒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/203,560 external-priority patent/US20230401732A1/en
Application filed by Apple Inc filed Critical Apple Inc
Publication of CN119343930A publication Critical patent/CN119343930A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/2224Studio circuitry; Studio devices; Studio equipment related to virtual studio applications
    • H04N5/2226Determination of depth image, e.g. for foreground/background separation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • G06T7/596Depth or shape recovery from multiple images from stereo images from three or more stereo images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/65Control of camera operation in relation to power supply
    • H04N23/651Control of camera operation in relation to power supply for reducing power consumption by affecting camera operations, e.g. sleep mode, hibernation mode or power off of selective parts of the camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • H04N23/661Transmitting camera control signals through networks, e.g. control via the Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/667Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/268Signal distribution or switching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了用于确定关于物理环境的信息的更有效和/或高效的技术。此类技术可选地补充或替换用于确定关于物理环境的信息的其他技术。本文所述的一些技术涵盖切换使用哪些相机来计算物理环境中的位置的深度。该切换可在当前图像不具有用于计算该位置的该深度的足够特征相关性时发生。本文所述的其他技术涵盖切换使用哪些相机来获得针对物理环境的表示(例如,三维表示)内的位置的足够数据。该切换可响应于确定不存在针对该位置的足够数据而发生。

The present disclosure provides more effective and/or efficient techniques for determining information about a physical environment. Such techniques may optionally supplement or replace other techniques for determining information about a physical environment. Some techniques described herein encompass switching which cameras are used to calculate the depth of a location in a physical environment. The switch may occur when the current image does not have sufficient feature correlations for calculating the depth of the location. Other techniques described herein encompass switching which cameras are used to obtain sufficient data for a location within a representation (e.g., a three-dimensional representation) of a physical environment. The switch may occur in response to determining that there is not enough data for the location.

Description

Dynamic camera selection
Cross Reference to Related Applications
The present application claims the benefit of U.S. provisional patent application Ser. No. 63/350,595, entitled "DYNAMIC CAMERA SELECTION," filed on 6/9 of 2022, which is hereby incorporated by reference in its entirety for all purposes.
Background
Some devices include a camera for capturing an image of a physical environment to determine information about the physical environment, such as the depth of a computing location. Such determination is limited by the image used and/or captured. Accordingly, there is a need to provide more effective and/or efficient techniques for determining information about a physical environment.
Disclosure of Invention
The present disclosure provides more effective and/or efficient techniques for determining information about a physical environment. Such techniques optionally supplement or replace other techniques for determining information about the physical environment.
Some techniques described herein encompass switching which cameras are used to calculate the depth of a location in a physical environment. The switching may occur when the current image does not have sufficient feature correlation for calculating the depth of the location. Other techniques described herein contemplate switching which cameras are used to obtain sufficient data for a location within a representation (e.g., a three-dimensional representation) of a physical environment. The handoff may occur in response to determining that there is not enough data for the location.
In the techniques described herein, cameras may be configured differently on a device. For example, three cameras may be positioned in a triangular pattern, with two cameras on the horizontal axis and a third camera above or below the horizontal axis. For another example, four cameras may be positioned in a rectangular pattern.
Drawings
For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings, in which like reference numerals designate corresponding parts throughout the several views.
FIG. 1 is a block diagram illustrating a computing system.
Fig. 2 is a block diagram illustrating a device having an interconnect subsystem.
Fig. 3 is a block diagram illustrating an apparatus for determining information about a physical environment.
Fig. 4A is a block diagram illustrating a camera array of three cameras.
Fig. 4B is a block diagram illustrating a camera array of four cameras.
Fig. 5 is a flowchart illustrating a method for calculating a depth of a location.
Fig. 6 is a flowchart illustrating a method for obtaining sufficient data about a physical environment.
Detailed Description
The following description sets forth exemplary methods, parameters, and the like. However, it should be recognized that such description is not intended as a limitation on the scope of the present disclosure, but is instead provided as a description of exemplary embodiments.
Some techniques described herein encompass switching which cameras are used to calculate the depth of a location in a physical environment. The switching may occur when the current image does not have sufficient feature correlation for calculating the depth of the location. In one example, a device includes a plurality of cameras having at least partially overlapping fields of view of a physical environment. The device causes a set of cameras to capture images of the physical environment. The image is then used to attempt to calculate the depth of different locations within the image. The device causes the different sets of cameras to capture images of the physical environment when the images do not have sufficient feature correlation to calculate the depth of the location. The depth of the location is then calculated using images captured by the different sets. The different groups may or may not include cameras from the original group of cameras. In some examples, a different set of cameras is selected from a plurality of possible different sets of cameras. In some examples, in response to the images not having sufficient feature correlation, the device causes a single camera to capture multiple images of depth to be used to calculate the location.
Other techniques described herein contemplate switching which cameras are used to obtain sufficient data for a location within a representation of a physical environment. The handoff may occur in response to determining that there is not enough data for the location. In one example, a device includes a plurality of cameras having at least partially overlapping fields of view of a physical environment. The device causes a set of cameras to capture images of the physical environment. The image is used to generate a depth map of the physical environment, the depth map comprising distances of different locations in the physical environment. In some examples, a representation of the physical environment is generated using the depth map and the image, the representation including a location of the identified object within the physical environment. The device then determines that the representation does not include sufficient data for the particular location. For example, the set of cameras may not be able to capture images of a particular location. After determining the shortcomings of the representation, the device causes a different set of one or more cameras to capture images of the physical environment.
In the techniques described herein, multiple cameras may be configured differently on a device. For example, the plurality of cameras may include three cameras positioned in a triangular pattern, with two cameras located on a horizontal axis and a third camera above or below the horizontal axis. When a problem arises, this configuration may allow one camera to be switched with another. For another example, the plurality of cameras may include four cameras positioned in a rectangular pattern. Such a configuration may allow a current pair of cameras having a particular distance between the current pair to be switched to a new pair of cameras having a different distance between the new pair.
In the methods described herein wherein one or more steps are dependent on one or more conditions having been met, it should be understood that the method may be repeated in multiple iterations such that during the iteration, all conditions that determine steps in the method have been met in different iterations of the method. For example, if a method requires performing a first step (if a condition is met) and performing a second step (if a condition is not met), one of ordinary skill will know that the stated steps are repeated until both the condition and the condition are not met (not sequentially). Thus, a method described as having one or more steps depending on one or more conditions having been met may be rewritten as a method that repeats until each of the conditions described in the method have been met. However, this does not require the system or computer-readable medium to claim that the system or computer-readable medium contains instructions for performing the contingent operation based on the satisfaction of the corresponding condition or conditions, and thus is able to determine whether the contingent situation has been met without explicitly repeating the steps of the method until all conditions to decide on steps in the method have been met. It will also be appreciated by those of ordinary skill in the art that, similar to a method with optional steps, a system or computer readable storage medium may repeat the steps of the method as many times as necessary to ensure that all optional steps have been performed.
Although the following description uses the terms "first," "second," etc. to describe various elements, these elements should not be limited by the terms. In some examples, these terms are used to distinguish one element from another element. For example, a first device may be referred to as a second device, and similarly, a second device may be referred to as a first device, without departing from the scope of the various described embodiments. In some examples, the first device and the second device are two separate references to the same device. In some embodiments, the first device and the second device are both devices, but they are not the same device or the same type of device.
The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and in the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term "if" is optionally interpreted to mean "when..once", "at..once..once.," or "in response to a determination" or "in response to detection", depending on the context. Similarly, the phrase "if determined" or "if detected [ stated condition or event ]" is optionally interpreted to mean "upon determination" or "in response to determination" or "upon detection of [ stated condition or event ]" or "in response to detection of [ stated condition or event ]" depending on the context.
Turning now to FIG. 1, a block diagram of a computing system 100 is depicted. Computing system 100 is a non-limiting example of a computing system that may be used to perform the functions described herein. It should be appreciated that other computer architectures of a computing system can be used to perform the functions described herein.
In the illustrated example, the computing system 100 includes a processor subsystem 110 coupled (e.g., wired or wireless) to a memory 120 (e.g., system memory) and an I/O interface 130 via an interconnect 150 (e.g., a system bus, one or more memory locations, or other communication channel for connecting the various components of the computing system 100). Further, the I/O interface 130 is coupled (e.g., wired or wirelessly) to the I/O device 140. In some examples, I/O interface 130 is included with I/O device 140 such that both are a single component. It should be appreciated that there may be one or more I/O interfaces, where each I/O interface is coupled to one or more I/O devices. In some examples, multiple instances of processor subsystem 110 may be coupled to interconnect 150.
The computing system 100 may be any of a variety of types of devices including, but not limited to, a system on a chip, a server system, a personal computer system (e.g., an iPhone, iPad, or MacBook), a sensor, and the like. In some examples, computing system 100 is included with or coupled to the physical component for the purpose of modifying the physical component in response to the instruction (e.g., computing system 100 receives an instruction to modify the physical component and, in response to the instruction, causes the physical component to be modified (e.g., by an actuator)). Examples of such physical components include acceleration controls, brakes, gearboxes, motors, pumps, refrigeration systems, suspension systems, steering controls, vacuum systems, valves, and the like. As used herein, a sensor includes one or more hardware components that detect information about the physical environment in the vicinity (e.g., surrounding) of the sensor. In some examples, the hardware components of the sensor include a sensing component (e.g., an image sensor or a temperature sensor), a transmitting component (e.g., a laser or radio transmitter), a receiving component (e.g., a laser or radio receiver), or any combination thereof. Examples of sensors include angle sensors, chemical sensors, brake pressure sensors, contact sensors, non-contact sensors, electrical sensors, flow sensors, force sensors, gas sensors, humidity sensors, cameras, inertial measurement units, leak sensors, level sensors, light detection and ranging systems, metal sensors, motion sensors, particle sensors, photoelectric sensors, position sensors (e.g., global positioning systems), precipitation sensors, pressure sensors, proximity sensors, radio detection and ranging systems, radiation sensors, speed sensors (e.g., measuring the speed of an object), temperature sensors, time-of-flight sensors, torque sensors, and ultrasonic sensors. Although a single computing system is shown in fig. 1, computing system 100 may also be implemented as two or more computing systems operating together.
In some examples, the processor subsystem 110 includes one or more processors or processing units configured to execute program instructions to perform the functions described herein. For example, the processor subsystem 110 may execute an operating system, a middleware system, one or more application programs, or any combination thereof.
In some examples, the operating system manages the resources of computing system 100. Examples of types of operating systems contemplated herein include batch operating systems (e.g., multiple Virtual Storage (MVS)), time-shared operating systems (e.g., unix), distributed operating systems (e.g., advanced interactive execution (AIX)), network operating systems (e.g., microsoft Windows Server), real-time operating systems (e.g., QNX). In some examples, an operating system includes various programs, sets of instructions, software components, and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and for facilitating communication between various hardware and software components. In some examples, the operating system uses a priority-based scheduler that assigns priorities to different tasks to be performed by the processor subsystem 110. In such examples, the priority assigned to the task is used to identify the next task to be performed. In some examples, the priority-based scheduler identifies the next task to be performed when the previous task completes execution (e.g., the highest priority task runs to completion unless another higher priority task is ready).
In some examples, the middleware system provides one or more services and/or capabilities to applications (e.g., one or more applications running on processor subsystem 110) other than the services provided by the operating system (e.g., data management, application services, messaging, authentication, API management, etc.). In some examples, the middleware system is designed for heterogeneous computer clusters to provide hardware abstraction, low-level device control, implementation of common functions, messaging between processes, packet management, or any combination thereof. Examples of middleware systems include lightweight communication and grouping (LCM), PX4, robotic Operating Systems (ROS), zeroMQ. In some examples, middleware systems use graph construction to represent processes and/or operations, where processing occurs in nodes that can receive, publish, and multiplex sensor data, controls, states, plans, actuators, and other messages. In such examples, an application (e.g., an application executing on processor subsystem 110 as described above) may be defined using a graph architecture such that different operations of the application are included with different nodes in the graph architecture.
In some examples, a message is sent from a first node in a graph architecture to a second node in the graph architecture using a publish-subscribe model, wherein the first node publishes data on a channel to which the second node can subscribe. In such examples, the first node may store the data in a memory (e.g., memory 120 or some local memory of processor subsystem 110) and inform the second node that the data has been stored in the memory. In some examples, a first node informs a second node that data has been stored in memory by sending a pointer (e.g., a memory pointer, such as an identification of a memory location) to the second node so that the second node can access the data from a location where the first node stored the data. In some examples, the first node will send the data directly to the second node such that the second node will not need to access memory based on the data received from the first node.
Memory 120 may include a computer-readable medium (e.g., a non-transitory or transitory computer-readable medium) that may be used to store program instructions that may be executed by processor subsystem 110 to cause computing system 100 to perform various operations described herein. For example, memory 120 may store program instructions to implement the functions associated with the processes described in fig. 4 and/or fig. 5.
The memory 120 may be implemented using different physical non-transitory memory media, such as hard disk storage, floppy disk storage, removable disk storage, flash memory, random access memory (RAM-SRAM, EDO RAM, SDRAM, DDR SDRAM, RAMBUS RAM, etc.), read only memory (PROM, EEPROM, etc.), etc. The memory in computing system 100 is not limited to a primary storage device, such as memory 120. Rather, computing system 100 may also include other forms of storage such as cache memory in processor subsystem 110 and secondary storage (e.g., hard disk drives, storage arrays, etc.) on I/O device 140. In some examples, these other forms of storage may also store program instructions that are executed by processor subsystem 110 to perform the operations described herein. In some examples, processor subsystem 110 (or each processor within processor subsystem 110) includes a cache or other form of on-board memory.
The I/O interface 130 may be any of various types of interfaces configured to couple to and communicate with other devices. In some examples, I/O interface 130 includes a bridge chip (e.g., a south bridge) from a front-side bus to one or more back-side buses. The I/O interface 130 may be coupled to one or more I/O devices (e.g., I/O device 140) via one or more corresponding buses or other interfaces. Examples of I/O devices include storage devices (hard disk drives, optical disk drives, removable flash drives, storage arrays, SANs, or their associated controllers), network interface devices (e.g., to a local or wide area network), sensor devices (e.g., cameras, radar, lidar, ultrasonic sensors, GPS, inertial measurement devices, etc.), and audible or visual output devices (e.g., speakers, lights, screens, projectors, etc.). In some examples, computing system 100 is coupled to a network via a network interface device (e.g., configured to communicate over Wi-Fi, bluetooth, ethernet, etc.).
Fig. 2 depicts a block diagram of a device 200 having an interconnect subsystem. In the illustrated example, the device 200 includes three different subsystems (i.e., a first subsystem 210, a second subsystem 220, and a third subsystem 230) coupled (e.g., wired or wireless) to each other. An example of a possible computer architecture for a subsystem as included in fig. 2 (i.e., computing system 100) is depicted in fig. 1. Although three subsystems are shown in fig. 2, device 200 may include more or fewer subsystems.
In some examples, some subsystems are not connected to another subsystem (e.g., the first subsystem 210 may be connected to the second subsystem 220 and the third subsystem 230, but the second subsystem 220 may not be connected to the third subsystem 230). In some examples, some subsystems are connected via one or more wires, while other subsystems are connected wirelessly. In some examples, one or more subsystems are wirelessly connected to one or more computing systems external to device 200, such as a server system. In such examples, the subsystem may be configured to communicate wirelessly with one or more computing systems external to device 200.
In some examples, device 200 includes a housing that completely or partially encloses subsystems 210-230. Examples of the apparatus 200 include a home appliance apparatus (e.g., a refrigerator or an air conditioning system), a robot (e.g., a robot arm or a robot cleaner), a vehicle, and the like. In some examples, the device 200 is configured to navigate the device 200 (with or without direct user input) in a physical environment.
In some examples, one or more subsystems of device 200 are used to control, manage, and/or receive data from one or more other subsystems of device 200 and/or one or more computing systems remote from device 200. For example, the first subsystem 210 and the second subsystem 220 may each be cameras that are capturing images for use by the third subsystem 230 in making decisions. In some examples, at least a portion of device 200 functions as a distributed computing system. For example, the tasks may be divided into different portions, with a first portion being performed by the first subsystem 210 and a second portion being performed by the second subsystem 220.
Attention is now directed to techniques for determining information about a physical environment using an example of a device having a camera that captures an image of the physical environment. In some examples, the device determines that there is a lack of feature correlation between images from the current set of cameras, and selects a different set of cameras for achieving sufficient feature correlation. In other examples, the device determines that there is insufficient information in the representation of the physical environment and selects a new set of cameras for capturing sufficient information to add to the representation. It should be understood that more or fewer cameras (including single cameras) and other types of sensors are within the scope of the present disclosure and may benefit from the techniques described herein.
In accordance with the techniques described herein, the device uses sensors to locate objects within a physical environment. Such positioning may include estimating (e.g., calculating) a depth (e.g., distance from the device) of the object (e.g., substantially the object or a portion (e.g., not all) of the object). Different sensors or combinations of sensors may be used to estimate depth, including cameras and range of motion sensors (e.g., light or radio detection and ranging systems).
Estimating depth with a camera may use a monocular image (e.g., a single camera sensor that captures still or sequential images) or a stereo image (e.g., multiple camera sensors that capture still or sequential images). The following techniques for estimating depth may be used in any combination with each other.
One technique for estimating depth uses depth cues to identify the relative locations of different objects in a physical environment. Examples of depth cues include comparing the size of different objects (e.g., objects appear smaller when the objects are farther), texture (e.g., textures of objects are less identifiable (e.g., lower quality) when the objects are farther), shading (e.g., shadows of objects may indicate that a portion of an object is farther than another portion), linear perspective (e.g., objects converge to the horizon as the objects are farther), motion parallax (e.g., objects farther appear to move slower than objects farther), binocular parallax (e.g., objects farther have greater parallax between two images than objects farther), and apparent size of known objects (e.g., the size of objects may be constrained by the typical size of objects of the type when the type of object is identified). Using one or more of these depth cues, the technique determines that an object is closer or farther than another object.
Another technique for estimating depth identifies correspondence between two images (e.g., captured using two different image sensors (such as two different cameras) or a single image sensor (such as a single camera) at two different times). The technique then uses geometry (e.g., epipolar geometry) to estimate the depth of the region within the image. In one example, estimating the depth using the correspondence includes identifying features in the first image (e.g., one or more pixels in the image, such as corners, edges, or any distinguishing portion of the image) and identifying corresponding features in the second image (e.g., features in the second image determined to match features in the first image). In some examples, such features are identified independently and compared to each other. In other examples, features in the first image are used to identify corresponding features in the second image. After identifying the features, differences (sometimes referred to as shifts or disparities) between where the features are located in their respective images are calculated. The depth of the feature is determined based on the parallax, the focal length of the cameras capturing the images, and the distance between the cameras capturing the images. In another example, estimating the depth using the correspondence includes dividing the image into a plurality of regions and identifying a plurality of features in each region. The different images are then compared with the features in each region to find the corresponding features. If sufficient features are identified in sufficient regions (e.g., above a threshold), a depth is calculated for the region using a calibrated model of the calculated disparity based on the relative geometric position of the camera as described above. Thus, in such an example, a different threshold number of corresponding features are required than the region in which depth is being calculated.
Another technique for estimating depth uses a neural network to estimate depth. For example, the neural network takes the image as input and outputs a depth value based on a depth cue (such as the depth cue described above). In some examples, the neural network learns the regression depth from depth cues in the image by minimizing a loss function (e.g., regression loss) via supervised learning.
In the techniques described herein, one or more cameras may be calibrated before, during, or after performing particular steps. In some examples, calibration includes a process of determining specific camera parameters to determine an accurate relationship between a three-dimensional point in a physical environment and its corresponding two-dimensional projection (e.g., pixel) in an image. Such parameters eliminate distortions in the image, thereby establishing a relationship between the image pixels and the physical environment size. The distortion may be captured by a distortion coefficient whose value reflects the amount of radial distortion (e.g., that occurs when the light rays curve more at the edge of the lens than the optical center of the lens) and tangential distortion (e.g., that occurs when the lens is not parallel to the image plane) in the image. The distortion coefficients include intrinsic parameters (e.g., focal length and optical center) and extrinsic parameters (e.g., rotation and translation of the camera) of each camera. In some examples, external parameters are used to transfer between world (e.g., physical environment) coordinates and camera coordinates, and intrinsic parameters are used to transfer between camera coordinates and pixel coordinates.
The techniques described herein use images captured by a camera to perform calibration. For example, calibration may include comparing an image captured by a first camera with an image captured by a second camera to identify differences in order to determine distortion coefficients. In such an example, the distortion coefficients are determined by comparing image features captured by cameras in a control environment for which the ground truth geometry of these features is known. In some examples, the camera used for calibration may or may not be included in a set of cameras used to determine information about the physical environment. For example, cameras in the set of cameras may be calibrated using cameras not included in the set of cameras, such as determining distortion coefficients for the cameras in the set of cameras.
The above-described techniques rely on capturing one or more images of an object to determine the depth of the object. Thus, some techniques described herein switch from using a first set of images to another set of images, such as images from different cameras oriented differently and/or located in different locations than at least some of the cameras used to capture the first set.
Fig. 3 is a block diagram illustrating an apparatus 300 for determining information about a physical environment in accordance with some techniques described herein. Although fig. 3 is described primarily with respect to computing depth, it should be appreciated that similar techniques may be used for other determinations (e.g., identifying, classifying, or determining information about objects or other elements in a physical environment).
The device 300 includes a plurality of subsystems that are at least partially interconnected. For example, device 300 includes a plurality of cameras (i.e., cameras 310) each connected (e.g., wired or wireless) to camera selector 320. It should be appreciated that one or more subsystems of device 300 may be combined or further broken down into more subsystems. For example, the camera subsystem may include a camera selector and a depth processor, or the depth processor may include a camera selector.
In some examples, each of the cameras 310 is configured to capture an image and send the image to the camera selector 320 or store the image in a particular location specific to each camera. In some examples where the camera stores the image in a particular location, the camera informs the camera selector 320 that the image has been stored (and optionally includes a location where the image has been stored by, for example, a pointer to a memory location), and the camera selector 320 is configured to access the location where the image has been stored. In other examples where the camera stores images in a particular location, the camera does not inform the camera selector 320 that the images have been stored, but rather the camera selector 320 requests access to a known location of one or more stored images. As used herein, "transmitting" an image from one subsystem to another may refer to actually transmitting the image to the other subsystem or storing the image so that the image is accessible to the other subsystem.
In some examples, cameras 310 correspond to two or more cameras configured to capture images of at least partially overlapping areas in a physical environment. Examples of configurations of cameras 310 are illustrated in fig. 4A and 4B, which are discussed further below.
As described above, the device 300 includes a camera selector 320. The camera selector 320 may be configured to identify one or more cameras from the cameras 310 for further processing, such as depth processing by the depth processor 330. In some examples, camera selector 320 receives images from each of cameras 310 and determines which images to send to depth processor 330. In other examples, the camera selector 320 receives images from only a subset (e.g., less than all) of the cameras in the cameras 310 (where the subset is selected by the camera selector 320 before or after the camera selector 320 receives any images from the cameras in the cameras 310) and sends all of the received images to the depth processor 330. In some examples, the camera selector 320 causes the camera to capture an image and then sends the captured image to the depth processor 330. In some examples, the camera selector 320 causes one or more cameras to cease capturing images and/or send images to the camera selector 320.
As shown in fig. 3, the apparatus 300 further includes a depth processor 330. The depth processor 330 may be configured to calculate a depth of a location within the physical environment based on images captured by one or more cameras selected by the camera selector 320. In other examples, the depth processor 330 is included in a device separate from the device 300 (such as a remote server). Depth processor 330 may calculate the depth using any one of the techniques described herein (or any combination thereof). For example, the depth processor 330 may (1) receive the first image and the second image, (2) identify features in the first image and corresponding features in the second image, and (3) calculate the depth of the features using the epipolar geometry.
In some examples, different computing systems (e.g., different systems on a chip executing the camera selector 320 and/or the depth processor 330) are assigned to receive images from different cameras. For example, a first computing system may be configured to receive images from a first camera and a second camera to determine depth information for a location using the images from the first camera and the second camera, and another computing system may be configured to receive images from other cameras to determine depth information using the images from the other cameras. In such examples, the images may be stored on memory local to the respective computing systems to reduce the time to access the pixel information and/or the need to send the pixel information between the computing systems. In another example, a first computing system is configured to receive an image from a first camera and store the received image in a memory local to the first computing system, and a second computing system is configured to receive an image from a second camera and store the received image in a memory local to the second computing system, wherein one of the two computing systems is configured to determine depth information using the images from the two cameras. In such an example, the computing system that does not determine the depth information may send only a portion (e.g., less than all) of the image to the computing system that determines the depth information to reduce the amount of data that moves to a different computing system. The portion may correspond to lower resolution images and/or to only a subset of the images required for the computation (e.g., some images are sent and some images are not). In other examples, feature correlation is performed in object space such that objects are identified in a first image and objects are identified in a second image, and then objects are compared to each other to determine correspondence between the images. Such comparison would not require comparing individual pixels.
In some examples, depth processor 330 generates a depth map for the physical environment using the calculated depth for the location. In such examples, the depth map includes depths of different locations within the physical environment. The depth map may then be used by the device 300 to make decisions, such as how to navigate the physical environment. As used herein, a depth map is sometimes referred to as a representation of a physical environment. In some examples, depth processor 330 uses other data detected by other types of sensors in conjunction with the image to generate a depth map. For example, light or radio detection and ranging systems may be used to identify the depth of the featureless areas and/or provide calibration for depth calculation. In such examples, the light or radio detection and ranging system may capture data only for a particular region and/or with a particular resolution (e.g., a lower resolution than an image captured by a camera (such as a camera in a ready mode as described herein).
In some examples, device 300 generates a representation (e.g., a three-dimensional representation or an object view) of a physical environment using a depth map and one or more images captured by camera 310. In such examples, the representation includes additional information about the physical environment (i.e., in addition to depth), such as identification of objects within the physical environment, and any other information that will assist the device 300 in making decisions about the physical environment. In some examples, other data detected by other types of sensors (e.g., light or radio detection and ranging systems) are used in combination with the image to generate the representation.
In some examples, the depth processor 330 is unable to match a particular feature in the first image with a particular feature in the second image. In such examples, the depth of a particular feature cannot be calculated due to the lack of feature correlation. In other examples, depth processor 330 determines that, in addition to the particular feature, the subset of images lacks feature correlation. In such examples, the particular location cannot be located within the physical environment.
In some examples, the depth processor 330 sends a message to the camera selector 320 in response to determining that the feature correlation is lacking. The message may include an identification of the location where the lack of feature correlation is affecting, a confidence level that the lack of feature correlation has occurred, a depth map (or an update to a depth map if, for example, the depth map is managed by or stored in memory local to the camera selector 320), a representation generated from the depth map (or an update to a representation if, for example, the representation is managed by or stored in memory local to the camera selector 320), a depth map, or an indication that the representation generated from the depth map has been updated, or any combination thereof. In other examples, the lack of feature correlation is reflected in the depth map and/or a representation generated from the depth map, which may be accessed by the camera selector 320. In some examples, the camera selector 320 may be configured to operate at a fixed rate such that the camera selector 320 identifies one or more cameras from the cameras 310 for further processing according to the fixed rate (e.g., every 100 milliseconds).
In response to receiving the message, determining that information related to the physical environment has been updated, or initiating an operation, the camera selector 320 may determine whether to switch and to which camera or cameras to switch to in order to obtain sufficient information. Such determination may be based on (1) whether feature correlation is absent, (2) whether a representation of the physical environment lacks depth information for the location, (3) whether there is no data about the location, (4) whether there is sufficient information to classify the object located at the location, (5) whether there is sufficient depth computation (e.g., the current depth computation has been determined to be incorrect or there is no depth computation), (6) whether the object is determined to be in the line of sight of a particular camera such that objects behind the object are hidden, or (7) any combination thereof. In some examples, the camera selector 320 uses the representation to perform a geospatial search to determine an area in which information is needed. The geospatial search may include identifying a portion of the physical environment in which the device 300 is moving and determining which portions of the representation are relevant to the region. In some examples, the geospatial search may include a spatial decomposition of one or more portions of the physical environment and rank other objects in the physical environment based on semantic knowledge of the physical environment, such as the location, speed, and heading of the device 300 within the physical environment, the location and classification of other objects in the physical environment, and particular goals in terms of their importance to the device 300. In some examples, selection based on such criteria prioritizes high resolution in small windows in one direction at one time and low resolution in larger windows at another time. In other examples, selection based on such criteria prioritizes depth perception between twenty meters and one hundred meters at one time and depth resolution between five meters and twenty-five meters at another time.
In some examples, the camera selector 320 identifies a portion (e.g., less than all) of the physical environment in which the device 300 needs additional information and sends an image corresponding to the portion to the depth processor 330. In such examples, a partial image (e.g., not an entire image) may be sent to the depth processor 330 to reduce the amount of data that the depth processor 330 needs to process.
In some examples, the camera selector 320 determines that depth information for the location is not needed for the current determination that is needed to be made by the device 300. For example, the device 300 may be traveling at a speed that does not require that location at this time to make the determination. For another example, device 300 may use information other than the image captured by the camera (e.g., a map received by device 300 or a light or radio detection and ranging system) to identify sufficient information about the location.
In some examples, the camera selector 320 determines to select a new camera set in response to identifying a problem with the current camera set. For example, problems may include lens flashes, veiling glare, occlusions, hardware failures, software failures, incorrect locations of depth calculations determined, areas of the physical environment that are not adequately covered by the current camera set, and so forth. In some examples, the device 300 continues to navigate the physical environment while attempting to solve the problem using a different set of one or more cameras.
With more than two cameras, the camera selector 320 may form a set of cameras (e.g., stereo pairs of two cameras) between different cameras to capture a portion of the physical environment, and/or have redundant sets of cameras to capture the same portion of the physical environment (e.g., multiple stereo pairs covering at least partially overlapping areas). In some examples, having multiple cameras increases availability (e.g., in the event of individual camera failure or occlusion) and/or information about the physical environment.
In some examples, a first set of cameras 310 is established by default (e.g., predefined before device 300 begins executing an application). In such examples, the first group may change to the second group of cameras (e.g., the second group may or may not include cameras in the first group) to alleviate the problem determined by the camera selector 320. In some examples, the first group and/or the second group are established based on a likelihood that cameras included in the respective groups are able to capture images of a particular location in the physical environment.
In some examples, the camera selector 320 determines the best camera set from a plurality of different camera sets. In some examples, the optimal group consists of one or more cameras. In one example, the optimal camera set is selected based on a lookup table that indicates the next set of cameras to use (e.g., in a particular case). In some examples, the lookup table may be populated based on the distance of the location from the device 300. For example, a first order set of possible cameras that prioritizes short distances between cameras (e.g., small baselines) may be used when the location is within a certain distance from the device 300, and a second order set of possible cameras that prioritizes long distances between cameras (e.g., large baselines) may be used when the location exceeds the certain distance from the device 300. In some examples, a look-up table is generated (e.g., established) before any images are captured by camera 310. In other examples, the look-up table is generated (e.g., built up) based on images received by the camera 310, such as by learning how best to respond to problems encountered by a particular configuration of cameras on the device 300.
In some examples, the best camera set is selected based on information identified by the camera selector 320, such as information indicating the context of the device 300 (e.g., speed, acceleration, path, weather, time of day, etc.). In such examples, the images used to select the best group may be captured while the device 300 is moving.
In some examples, the feature comparison operation occurs simultaneously with respect to multiple different sets of cameras to determine which set of cameras is used by the depth processor 330. In some examples, different sets of cameras capture images at different rates and/or resolutions to reduce the computational cost of performing multiple feature comparison operations simultaneously. In such examples, some feature comparison operations are used for diagnostic operations, while other feature comparison operations are used by the depth processor 330 to calculate depth, where the feature comparison operations for diagnostic operations are performed at a lower rate than that used by the depth processor 330.
In some examples, different sets of cameras alternate at a faster rate than is needed for determination purposes. In such examples, a particular set of cameras captures images at a rate required for a determined purpose, and another set of cameras captures images between times when the particular set of cameras captures images for determining when to switch to the other set of cameras.
In some examples, the camera selector 320 determines whether the device 300 needs depth information in a location that is near or far from the device 300. In such examples, the camera selector 320 may cause one or more cameras to capture images at a lower resolution when the locations are closer and to capture images at a higher resolution when the locations are farther.
After using one or more cameras of the new group, the camera selector 320 may determine whether the new group is able to determine sufficient information for a location. In some examples, when the new group is able to be used to determine sufficient information for a location, the device 300 may perform one or more operations based on the information for the location, such as navigating the device 300 in a physical environment. In some examples, when the new group cannot be used to determine sufficient information for a location, the camera selector 320 may select a different group of one or more cameras based on one or more techniques discussed above. In such examples, the camera selector 320 may continue to select a different set of one or more cameras until sufficient information is determined for the location or a threshold number of sets of cameras are attempted. In some examples, when a threshold number of groups of one or more cameras are attempted and information is still needed to make the determination, device 300 may perform operations other than changing to a different group of one or more cameras to attempt to solve the problem. Examples of operations include changing a navigational characteristic of the device 300 (e.g., changing from a first path to a second path) or an operational characteristic of the device 300 (e.g., reducing a speed of the device 300).
In some examples, the camera selector 320 is configured to change the mode of the camera from the camera 310, such as changing the camera from standby (e.g., off or in a lower power mode) to ready (e.g., on or in a mode capable of capturing images at a particular rate and/or resolution). In some examples, the camera selector 320 predicts that a camera will be needed and changes the camera from standby to ready. In such examples, the camera selector 320 may cause the camera to change modes so that the camera may be used by the depth processor 330 without having to change modes when determined to be necessary. For example, the camera selector 320 may cause the camera to transition to a ready mode before determining that there is a lack of feature correlation between current images or that there is a lack of information in the representation of the physical environment.
In some examples, the camera selector 320 determines to transition from the current group of cameras to a previous group of cameras, such as a default group of cameras. In such examples, the determination may be based on determining that sufficient information has been determined for the location or that a cause of the lack of feature correlation for the previous group of cameras has been resolved (e.g., a predetermined amount of time has elapsed, images from the previous group of cameras have been determined to have sufficient feature correlation, or the operational state of the device 300 has changed (such as an operational state that may have caused the previous group of cameras to now have sufficient feature correlation)).
Fig. 4A is a block diagram of a camera array 400 illustrating three cameras. The camera array 400 includes a first camera 410, a second camera 420, and a third camera 430. In some examples, camera array 400 is attached to a device (e.g., device 300) and is configured to capture an image of a physical environment. In such examples, the fields of view of each camera in the camera array 400 may at least partially overlap such that all cameras are able to capture images of a particular region of the physical environment. In some examples, the first camera 410, the second camera 420, and the third camera 430 are each oriented in a different direction. In other examples, at least two of the first camera 410, the second camera 420, and the third camera 430 are each oriented in the same direction.
In some examples, the first camera 410 and the second camera 420 are on a first axis (e.g., a horizontal axis), with each camera separated by a first distance. In such examples, the third camera 430 may be offset from the other cameras and on a second axis different from the first axis. The third camera 430 may be below (as shown in fig. 4A) or above (not illustrated in fig. 4A) the other cameras. Locating the cameras at different axes may allow different cameras to capture fields of view at different angles. In some examples, the third camera 430 is a second distance from the first camera 410 and a third distance from the second camera 420. In such examples, the second distance and the third distance may be the same or different. In some examples, the second distance and/or the third distance may be the same as or different from the first distance. Having cameras at different distances from each other may allow different groups of cameras to have different baselines to change how computations are performed when processing images for information about the physical environment. In some examples, the third camera 430 is offset from the vertical axis associated with the first camera 410 and the vertical axis associated with the second camera 420 such that the third camera 430 is between the vertical axis associated with the first camera 410 and the vertical axis associated with the second camera 420. As described above, having the cameras at different axes may allow different cameras to capture fields of view at different angles.
Fig. 4B is a block diagram of a camera array 440 illustrating four cameras. The camera array includes a first camera 450, a second camera 460, a third camera 470, and a fourth camera 480. In some examples, camera array 440 is attached to a device (e.g., device 300) and is configured to capture images of a physical environment. In such examples, the fields of view of each camera in the camera array 440 may at least partially overlap such that all cameras are able to capture images of a particular region of the physical environment. In some examples, the first camera 450, the second camera 460, the third camera 470, and the fourth camera 480 are each oriented in a different direction. In other examples, at least two of the first camera 450, the second camera 460, the third camera 470, and the fourth camera 480 are each oriented in the same direction (e.g., the first camera 450 and the second camera 460; the first camera 450 and the third camera 470; or the first camera 450 and the fourth camera 480).
In some examples, the first camera 450 and the second camera 460 are on a first axis (e.g., a horizontal axis), with each camera separated by a first distance. In such examples, the third camera 470 and the fourth camera 480 may be offset from the other cameras and on a second axis that is different from the first axis (e.g., a different horizontal axis that is parallel to the other horizontal axis). The third camera 470 and/or the fourth camera 480 may be below (below illustrated in fig. 4B) or above (not illustrated in fig. 4B) the other cameras. As described above, having the cameras at different axes may allow different cameras to capture fields of view at different angles. In some examples, third camera 470 and fourth camera 480 have a second distance separating each camera. In such examples, the second distance may be the same as or different from the first distance. As described above, having cameras at different distances from each other may allow different groups of cameras to have different baselines to change how computations are performed when processing images for information about a physical environment. Having the camera groups the same distance from each other may allow different groups of cameras to easily switch between them without changing the ability of the group to capture objects at a particular distance that is not subject to orientation.
In some examples, the third camera 470 is a second distance from the first camera 450 and a third distance from the second camera 460. In such examples, the second distance and the third distance may be the same or different. In some examples, the second distance and/or the third distance may be the same as or different from the first distance. In some examples, the third camera 470 is offset from but parallel to the vertical axis associated with the first camera 450 and from the vertical axis associated with the second camera 460 such that the third camera 470 is below the first camera 450 and diagonal to the second camera 460. Having the camera groups along the same axis as each other may allow different groups of cameras to easily switch between them without changing the ability of the group to capture objects at a particular point of view.
In some examples, the fourth camera 480 is a fourth distance from the second camera 460 and a fifth distance from the third camera 470. In such examples, the fourth distance and the fifth distance may be the same or different. In some examples, the fourth distance and/or the fifth distance may be the same as or different from the first distance, the second distance, or the third distance. In some examples, the fourth camera 480 is offset along a vertical axis associated with the second camera 460 and from a vertical axis associated with the first camera 450 such that the fourth camera 480 is below the second camera 460 and diagonal to the first camera 450. In some examples, small angular differences in the mounting positions of the cameras result in different effects on the cameras. In such examples, when one camera experiences a particular effect, the other camera is less likely to suffer from the same problem, making the system more robust.
Fig. 5 is a flow chart illustrating a method 500 for computing depth of a location. Some operations in method 500 may optionally be combined, the order of some operations may optionally be changed, and some operations may optionally be omitted.
In some examples, the method 500 is performed at a computing system (e.g., computing system 100) in communication with a camera (e.g., a camera in the cameras 310, the first camera 410, or the first camera 450). In some examples, the computing system and camera are included in a device (e.g., device 200 or device 300). In some examples, the device includes one or more actuators and/or one or more sensors other than a camera. In some examples, the camera is connected to one or more processors of the device via at least one or more wires, in some examples, the camera is wirelessly connected to the one or more processors of the device, in some examples, the one or more processors are included in separate components of the device from the camera, in some examples, the one or more processors are included in the camera, in some examples, the plurality of processors of the device perform the method, wherein at least one step is performed by the one or more processors on the first system on a chip (i.e., soC), and a second step is performed by the second SoC, and wherein the first SoC and the second SoC are distributed in different locations on the device, wherein the different locations are separated by at least 12 inches. In some examples, the method 500 is performed while a device including a camera performs operations (such as navigating in a physical environment).
At 510, method 500 includes receiving a first image (e.g., a representation of a physical environment having one or more color channels (e.g., red, green, and blue channels)) captured by a first camera (e.g., computing system 100, first subsystem 210, a camera in camera 310, first camera 410, or first camera 450).
At 520, the method 500 includes receiving a second image captured by a second camera (e.g., the computing system 100, the first subsystem 210, a camera in the camera 310, the second camera 420, or the second camera 460), wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras (e.g., camera pairs) for computing a depth of a location (e.g., point location) (in some examples, the device includes the first camera and the second camera; in some examples, the first image is captured before or after the first set of cameras is established, in some examples, receiving the first image includes accessing a memory location using an expected location or a location identified in a message received from the first camera, in some examples, the device establishes the first set of cameras for calculating a depth of the location before receiving the first image, in some examples, the first set of cameras does not include a third camera, in some examples, the first set of cameras includes one or more cameras in addition to the first camera and the second camera, in some examples, the first set of cameras is established by default (e.g., predefined before the device begins executing an application requiring depth calculations), in some examples, the first set of cameras is established based on a likelihood that the cameras included in the first set of cameras are capable of capturing an image of the location, in some examples, the second image is captured before or after the first set of cameras is established, in some examples, the second image is captured simultaneously with the first image (e.g., substantially simultaneously, such as indicating that the first and second cameras are captured), in some examples, receiving the second image includes accessing the memory location using the expected location or a location identified in a message received from the second camera).
At 530, the method 500 includes calculating a first depth of the location based on the first image and the second image according to a determination that the first image and the second image have sufficient feature correlation for calculating a depth of the location (in some examples, there is no determination that the third image and the fourth image have sufficient feature correlation when the first depth is calculated based on the first image and the second image; in some examples, the device determines that the first image and the second image have sufficient feature correlation for calculating a depth of the location; in some examples, a second device different from the device determines that the first image and the second image have sufficient feature correlation for calculating a depth of the location; in some examples, determining that the first image and the second image have sufficient feature correlation includes determining that a threshold number of features can be identified in the two images; in some examples, determining that the first image and the second image have sufficient feature correlation includes determining that the calculated depth is similar to an expected depth, such as compared to a surrounding depth or at a different (e.g., prior) time, in some examples, determining that the first image and the second image have sufficient feature correlation based on at least one of the two images, and then a threshold number of features can be uniquely identified in addition to the first image and the second image, or at least two examples, based on at least two images. In some examples, the method 500 further includes calculating the first depth of the location is not based on the image captured by the third camera (in some examples, the feature correlation of the first depth is not based on the image captured by the third camera, although the calibration of the first image and/or the second image is performed based on the image captured by the third camera).
At 540, method 500 includes determining that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for computing a depth of the location based on a determination that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for computing a depth of the location, in some examples, a second device different from the device determines that the first image and the second image do not have sufficient feature correlation for computing a depth of the location, in some examples, determining that the first image and the second image do not have sufficient feature correlation includes determining that a threshold number of features cannot be identified in the two images, in some examples, determining that the first image and the second image do not have sufficient feature correlation includes determining that the depth computed for the location is different from an expected depth, such as compared to a surrounding depth or a depth computed for the location at a different (e.g., previous) time. In some examples, determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes identifying features (e.g., objects or portions of objects, such as edges) in the first image that are not included in the second image. In some examples, determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes identifying a fault in the first image (in some examples, the fault is a lens flash or an electrical/software fault). In some examples, the determination that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes a determination that a threshold number of features in the first image are not included in the second image, and wherein the threshold number is at least two (in some examples, a determination that one or more features included in the first image are not included in the second image and one or more features in the second image are not included in the second image). In some examples, the determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes dividing the first image into a plurality of portions and determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location based on determining that a threshold number of the plurality of portions do not have sufficient feature correlation based on determining that a first portion of the plurality of portions does not have sufficient feature correlation based on a determination that a second portion of the plurality of portions does not have sufficient feature correlation based on the determination that the first portion is different from the second portion (in some examples, the second portion does not overlap with the first portion).
At 550, the method 500 includes calculating a second depth of the location based on the third image and the fourth image based on a determination that the third image and the fourth image have sufficient feature correlation for calculating the depth of the location (in some examples, the device determines that the third image and the fourth image have sufficient feature correlation for calculating the depth of the location; in some examples, a second device different from the device determines that the third image and the fourth image have sufficient feature correlation for calculating the depth of the location; in some examples, determining that the third image and the fourth image have sufficient feature correlation includes determining that a threshold number of features can be identified in the two images; in some examples, determining that the third image and the fourth image have sufficient feature correlation includes determining that the calculated depth for the location is similar to an expected depth, such as compared to a surrounding depth or a depth calculated for the location at a different (e.g., prior) time, wherein the third image is captured by the third camera (in some examples, the third camera is in an inactive state (e.g., low power mode; in some examples, the third camera is in an inactive state, Closing, Images are captured less frequently or at a lower resolution), while the first group of camera pairs are active; in some examples, the third camera is in an active state (e.g., similar to the first camera and the second camera, but not used to calculate the depth of the location), the third camera is different from the first camera, the third camera is different from the second camera, the fourth image is captured by the fourth camera (in some examples, the fourth camera is the first camera or the second camera; in some examples, the fourth camera is different from the first camera and the second camera), the fourth camera is different from the third camera, and the third camera and the fourth camera are established as a second set of cameras (in some examples, the device includes the third camera and the fourth camera; in some examples, the third image is captured by the device by accessing the memory location using the expected location or a location identified in a message received from the third camera; in some examples, the device establishes the second set of cameras for calculating the depth of the location prior to receiving the third image; in some examples, the second camera and the second camera can be established in addition to the second camera and/or the fourth camera, the second camera can be established in some examples, the second camera can be established in addition to the second camera and the fourth camera, the fourth camera can be established in some examples, the fourth camera can be established in addition to the second camera and the fourth camera can be established in some examples, the fourth image is captured simultaneously with the third image (e.g., substantially simultaneously, such as indicating that the third camera and the fourth camera are capturing images simultaneously), in some examples, receiving the fourth image includes accessing the memory location using an expected location or a location identified in a message received from the fourth camera, in some examples, establishing the second set of cameras after determining that the first image and the second image do not have sufficient feature correlation, in some examples, establishing the second set of cameras in response to determining that a depth of the location is required, in some examples, capturing the third image before or after establishing the second set of cameras, in some examples, capturing the third image simultaneously with the first image and/or the second image (e.g., substantially simultaneously, such as indicating that all three cameras are capturing images simultaneously), in some examples, receiving the third image includes accessing the memory location using an expected location or a location identified in a message received from the third camera. In some examples, the method 500 further includes calculating a second depth of the location based further on the determination of selecting a second set of cameras from the plurality of different sets of cameras (in some examples, the plurality of different sets of cameras do not include the first set of cameras). In some examples, the determination to select the second set of cameras is based on images captured by the third camera while the third camera is in a lower power mode (in some examples, the lower power mode is an off mode) than the cameras in the first set of cameras. In some examples, the determination to select the second set of cameras is based on a priority order of the set of cameras, and wherein the priority order is established prior to receiving the first image. In some examples, the third camera is in a standby mode (e.g., a low power or off mode) when a determination is made that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location. In some examples, the image captured by the third camera is used to calibrate a feature correlation between the first image and the second image. In some examples, the third camera is the first camera. In some examples, calculating the second depth based on the third image and the fourth image is based on a determination that the first image and the second image do not have sufficient feature correlation (e.g., lack feature correlation) for calculating the depth of the location.
In some examples, the method 500 further includes calculating a third depth of the location based on the determination that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for calculating the depth of the location, and based on the determination that the fifth image and the sixth image do not have sufficient feature correlation (in some examples, the device determines that the fifth image and the sixth image do not have sufficient feature correlation(s) for calculating the depth of the location), wherein the fifth image is captured by the fifth camera (in some examples, the fifth camera is in an inactive state (e.g., a low power mode, off, less frequent image capture, or image capture at a lower resolution), and the first set of cameras and/or the second set of cameras are based on the determination that the fifth image and the sixth image have sufficient feature correlation (in some examples, the second device determines that the fifth image and the sixth image have sufficient feature correlation for calculating the depth of the location), wherein the fifth image is captured by the fifth camera (in some examples, the fifth camera is in an inactive state (e.g., a low power mode, off, less frequent image capture or image at a lower resolution), the fifth camera is in a similar state to the fifth camera, the fifth camera is in some examples, the fourth camera is in a second camera is in an inactive state, the fourth camera is in a similar state to the fourth camera, and/a fourth camera is in a similar state in the active state, the sixth image is not captured by the fifth camera, and the fifth camera and the sixth camera are established as a third set of cameras (in some examples, the third set of cameras is different from the first set of cameras and the second set of cameras) for calculating the depth of the location.
In some examples, the method 500 further includes calculating a fourth depth of the location based on the determination that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for calculating the depth of the location and the determination that the seventh image and the eighth image have sufficient feature correlation for calculating the depth of the location based on the seventh image and the eighth image (in some examples, the device determines that the seventh image and the eighth image have sufficient feature correlation for calculating the depth of the location; in some examples, a second device different from the device determines that the seventh image and the eighth image have sufficient feature correlation for calculating the depth of the location), wherein the seventh image is captured by the seventh camera (in some examples, the seventh camera is in an inactive state (e.g., a low power mode, off, capturing images less frequently, or capturing images at a lower resolution; in some examples, the seventh camera is in an active state (e.g., similar to the first camera and the second camera; in some examples, the seventh camera is in an active state; the fourth camera is not used for calculating the depth of the location; in some examples, the seventh camera is in the fourth camera is in the inactive state (in some examples, the fourth camera is in the fourth camera is not captured by the fourth camera).
In some examples, the method 500 further includes calculating a depth of the location based on the image captured by the first camera and the image captured by the second camera (in some examples, the first set of cameras is not used until the cause of the lack of feature correlation with respect to the first set of cameras is determined to have been resolved; in some examples, the calculation is for a new location; in some examples, the calculation is for the same location as before) based on the determination of the image (such as the image from the camera of the first set of cameras or the image from the camera not included in the first set of cameras).
In some examples, the techniques described above are performed in a system for computing depth of a location. In such examples, the system includes a first camera, a second camera, a third camera, and a fourth camera, wherein the first camera and the second camera are on a first axis, and wherein the third camera and the fourth camera are on a second axis that is different from the first axis (in some examples, the first axis and the second axis are parallel; in some examples, the first camera and the third camera are on a third axis, the second camera and the fourth camera are on a fourth axis, and the third axis is parallel to the fourth axis; in some examples, the first axis and the third axis are perpendicular).
It is noted that the details of the process described below with respect to method 600 (i.e., fig. 6) also apply in a similar manner to method 500 of fig. 5. For example, method 500 optionally includes one or more of the features of the various methods described below with reference to method 600. For example, computing the second depth of the location according to the method 500 may further include a determination from the representation (e.g., three-dimensional representation) that sufficient data is available for the location within the representation.
Fig. 6 is a flow chart illustrating a method 600 for obtaining sufficient data about a physical environment. Some operations in method 600 may optionally be combined, the order of some operations may optionally be changed, and some operations may optionally be omitted.
In some examples, the method 600 is performed at a computing system (e.g., computing system 100) in communication with a camera (e.g., a camera of the cameras 310, the first camera 410, or the second camera 460). In some examples, the computing system and camera are included in a device (e.g., device 200 or device 300). In some examples, the device includes one or more actuators and/or one or more sensors other than a camera. In some examples, the camera is connected to one or more processors of the device via at least one or more wires, in some examples, the camera is wirelessly connected to the one or more processors of the device, in some examples, the one or more processors are included in separate components of the device from the camera, in some examples, the one or more processors are included in the camera, in some examples, the plurality of processors of the device perform the method, wherein at least one step is performed by the one or more processors on the first system on a chip (i.e., soC), and a second step is performed by the second SoC, and wherein the first SoC and the second SoC are distributed in different locations on the device, wherein the different locations are separated by at least 12 inches. In some examples, method 600 is performed while a device including a camera performs operations (such as navigating in a physical environment).
At 610, method 600 includes receiving a representation (e.g., a world representation, a virtual representation, an object view representation, a three-dimensional representation) of a physical environment (in some examples, the representation is not an image) wherein the representation is generated based on a first set of one or more images (e.g., a representation of a physical environment having one or more color channels (e.g., red, green, and blue channels)) captured by a first set of one or more cameras (e.g., camera 310, first camera 410, or first camera 450) (in some examples, the representation is generated based on one or more depth calculations, one or more lidar images, one or more depth maps, or any combination thereof; in some examples, the representation includes a representation of an object that has been identified as including one or more characteristics associated with a particular type of object; in some examples, the device includes the first set of one or more cameras). In some examples, the representation is generated based on a depth map (in some examples, the depth map includes information related to distances of surfaces on objects in the physical environment). In some examples, the depth map is generated using images captured by a set of one or more cameras (in some examples, the depth map is generated using lidar data; in some examples, the depth map is generated by feature correlation between the images).
At 620 and 630, method 600 includes, based on the determination that the representation does not include sufficient data for a location within the representation (in some examples, the determination that the representation does not include sufficient data for the location within the representation includes the representation does not include (1) any data about the location, (2) sufficient information to categorize objects located at the location, (3) depth calculations for the location, (4) sufficient depth calculations (e.g., current depth calculations have been determined to be incorrect), or any combination thereof), and, based on the second set of one or more cameras being able to capture images (e.g., one or more images from each of the one or more cameras) to obtain a determination of sufficient data for the location, sending instructions to use the second set of one or more cameras (in some examples, the instructions cause use of one or more images in a field of view corresponding to the location of the representation; in some examples, in accordance with a determination that the second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, the operation (e.g., the operation mentioned in paragraph [0089], such as navigating a physical environment) is undone before the representation is updated based on the images from the second set of one or more cameras, which in some examples include a plurality of cameras, with instructions to each camera to capture an image, wherein the second set of one or more cameras is different from the first set of one or more cameras (in some examples, the second set of one or more cameras includes at least one camera not included in the first set of one or more cameras, and in some examples, the second set of one or more cameras includes at least one camera included in the first set of one or more cameras. In some examples, the determination that the representation does not include sufficient data for a location within the representation includes performing a geospatial search of the representation (in some examples, the geospatial search includes identifying an area of the physical environment to which the device is moving; in some examples, the determination that the representation does not include sufficient data for a location within the representation includes a determination that the representation does not include depth information for the location). In some examples, the second set of one or more cameras consists of two cameras (in some examples, the first set of one or more cameras consists of two cameras). In some examples, when the representation is determined to not include sufficient data for a location within the representation, the cameras in the second set of one or more cameras are in a standby mode (in some examples, the standby mode is a low power mode or off).
At 640, method 600 includes, in accordance with a determination that the representation includes sufficient data for a location within the representation, forgoing sending an instruction to use the second set of one or more cameras (in some examples, initiating an operation to navigate based on the representation in addition to foregoing sending the instruction).
In some examples, method 600 further includes, after sending the instruction to use the second set of one or more cameras, sending a navigation instruction (in some examples, the navigation instruction identifying a path to be taken by the device; in some examples, the navigation instruction identifying a driving characteristic (e.g., speed) of the device based on an updated representation of the physical environment, wherein the updated representation is generated based on one or more images captured by the second set of one or more cameras (in some examples, the updated representation is generated based on images from the first set of one or more cameras), and, based on a determination that the representation includes sufficient data for a location within the representation, sending an instruction to navigate based on the representation of the physical environment (i.e., in this branch, no instruction will be sent to the second set of one or more cameras to capture the one or more images).
In some examples, the method 600 further includes, prior to making the determination that the representation does not include sufficient data for a location within the representation, sending an instruction to change the camera to a second mode (e.g., ready, active, or higher power mode) different from the standby mode based on a determination that the second set of one or more cameras will need to update the representation (in some examples, the determination that the second set of one or more cameras will need to update the representation is based on physical environment (1) crowding, (2) having specific weather conditions, or (3) a determination that the device is about to change due to navigation) based on a determination that the camera in the second set of one or more cameras is in the standby mode (in some examples, the standby mode is a low power mode or off).
In some examples, method 600 further includes discarding sending instructions using the second set of one or more cameras based on the determination based on the current navigation context (in some examples, the current navigation context is a speed of travel or a direction of travel) based on the determination that the representation does not include sufficient data for a location within the representation.
In some examples, the instructions include a request to capture one or more images in a first mode (e.g., a higher power mode or an active mode), and wherein the method 600 further includes, at the device, prior to transmitting the instructions, transmitting a second instruction to capture one or more images using a second set of one or more cameras in a second mode (e.g., a lower power mode, such as capturing images at a lower resolution and/or capturing images less frequently) different from the first mode, wherein the one or more images captured in the second mode are used to determine to transmit the instruction.
In some examples, the method 600 further includes sending an instruction to change the movement characteristics (e.g., reduce the speed) in accordance with a determination that the updated representation of the physical environment does not include sufficient data for a location within the updated representation, wherein the updated representation is generated based on one or more images captured by the second set of one or more cameras, and wherein the location within the updated representation corresponds to the location within the representation (in some examples, the location within the representation is the same as the location within the updated representation).
It is noted that the details of the process described above with respect to method 500 (i.e., fig. 5) also apply in a similar manner to method 600 of fig. 6. For example, method 600 optionally includes one or more of the features of the various methods described above with reference to method 500. For example, the depth calculated in method 500 may be data included within the representation.
The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the technology and its practical application. Those skilled in the art will be able to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.
While the present disclosure and examples have been fully described with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. It should be understood that such variations and modifications are considered to be included within the scope of the disclosure and examples as defined by the claims.
As described above, one aspect of the present technology is to collect and use data available from various sources to improve certain information about the physical environment. The present disclosure contemplates that in some examples, the collected data may include personal information data that uniquely identifies or may be used to identify a particular person and/or a particular location. Such personal information data may include an image of a person, an image of data related to a person, an image of a location, or any other identification or personal information.
The present disclosure contemplates that entities responsible for the collection, analysis, disclosure, transmission, storage, or other use of such personal information data will adhere to sophisticated privacy policies and/or privacy measures. In particular, such entities should exercise and adhere to privacy policies and practices that are recognized as meeting or exceeding industry or government requirements for maintaining the privacy and security of personal information data. Such policies should be convenient for the user to access and should be updated as the collection and/or use of data changes. Personal information from users should be collected for legitimate and reasonable physical uses and must not be shared or sold outside of these legitimate uses. In addition, policies and practices should be adapted to the particular type of personal information data collected and/or accessed, and to applicable laws and standards including consideration of particular jurisdictions. Thus, different privacy measures may be maintained for different personal data types in each country.
Furthermore, it is intended that personal information data should be managed and processed in a manner that minimizes the risk of inadvertent or unauthorized access or use. Once the data is no longer needed, risk can be minimized by limiting the collection and deletion of data.

Claims (44)

1. A method for obtaining sufficient data to make a decision regarding a physical environment, the method comprising:
at the device:
receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras;
in accordance with a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and
In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.
2. The method of claim 1, the method further comprising:
at the apparatus:
After sending the instructions to use the second set of one or more cameras, sending instructions to navigate based on an updated representation of the physical environment, wherein the updated representation is generated based on one or more images captured by the second set of one or more cameras, and
Instructions are sent to navigate based on the representation of the physical environment in accordance with a determination that the representation includes sufficient data for the location within the representation.
3. The method of any of claims 1-2, wherein the representation is generated based on a depth map.
4. A method according to claim 3, wherein the depth map is generated using images captured by a set of one or more cameras.
5. The method of any of claims 1-4, wherein the second set of one or more cameras consists of two cameras.
6. The method of any of claims 1-5, wherein the determination that the representation does not include sufficient data for a location within the representation includes performing a geospatial search of the representation.
7. The method of any of claims 1-6, wherein a camera of the second set of one or more cameras is in a standby mode when the determination is made that the representation does not include sufficient data for a location within the representation.
8. The method of any one of claims 1 to 7, further comprising:
at the apparatus:
Before making the determination that the representation does not include sufficient data for a location within the representation:
In accordance with a determination that a camera of the second set of one or more cameras is in a standby mode and will be required to update the representation, an instruction to change the camera to a second mode different from the standby mode is sent.
9. The method of any one of claims 1 to 8, the method further comprising:
at the apparatus:
In accordance with the determination that the representation does not include sufficient data for the location within the representation, and in accordance with a determination based on a current navigation context, instructions to send using the second set of one or more cameras are relinquished.
10. The method of any of claims 1-9, wherein the instruction comprises a request to capture the one or more images in a first mode, and wherein the method further comprises:
at the apparatus:
before transmitting the instructions, transmitting second instructions to capture one or more images using the second set of one or more cameras in a second mode different from the first mode, wherein the one or more images captured in the second mode are used to determine to transmit the instructions.
11. The method of any one of claims 1 to 10, the method further comprising:
at the device:
in accordance with a determination that an updated representation of the physical environment does not include sufficient data for a location within the updated representation, instructions to change movement characteristics are sent, wherein the updated representation is generated based on the one or more images captured by the second set of one or more cameras, and wherein the location within the updated representation corresponds to the location within the representation.
12. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 1-11.
13. An apparatus, the apparatus comprising:
One or more processors, and
A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-11.
14. An apparatus, the apparatus comprising:
apparatus for performing the method of any one of claims 1 to 11.
15. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 1-11.
16. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for:
receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras;
in accordance with a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and
In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.
17. An apparatus, the apparatus comprising:
A first camera;
a second camera, the second camera being different from the first camera;
A third camera, the third camera being different from the first camera and the second camera;
A fourth camera, the fourth camera being different from the third camera, the second camera, and the first camera, wherein the first camera and the second camera are on a first axis, and wherein the third camera and the fourth camera are on a second axis that is different from the first axis;
One or more processors, and
A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for:
receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras, and wherein the first set includes the first camera and the second camera;
According to a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and wherein the second set includes the third camera and the fourth camera, and
In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.
18. An apparatus, the apparatus comprising:
One or more processors, and
A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for:
receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras;
in accordance with a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and
In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.
19. An apparatus, the apparatus comprising:
receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras;
in accordance with a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and
In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.
20. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for:
receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras;
in accordance with a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and
In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.
21. A method for obtaining sufficient data to make a decision regarding a physical environment, the method comprising:
at the device:
receiving a first image captured by a first camera;
Receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;
Calculating a first depth of the location based on the first image and the second image based on a determination that the first image and the second image have sufficient feature correlation for calculating the depth of the location, and
According to a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location and that third and fourth images have sufficient feature correlation for calculating the depth of the location, a second depth of the location is calculated based on the third and fourth images, wherein:
The third image is captured by a third camera,
The third camera is different from the first camera,
The third camera is different from the second camera,
The fourth image is captured by a fourth camera,
The fourth camera is different from the third camera, and
The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.
22. The method of claim 21, wherein calculating the first depth of the location is not based on an image captured by the third camera.
23. The method of any one of claims 21 to 22, the method further comprising:
at the apparatus:
Calculating a third depth of the location based on a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location, the third and fourth images do not have sufficient feature correlation for calculating the depth of the location, and a fifth and sixth image have sufficient feature correlation for calculating the depth of the location, wherein:
The fifth image is captured by a fifth camera,
The fifth camera is different from each of the first camera, the second camera, the third camera and the fourth camera,
The sixth image is captured by a sixth camera,
The sixth image is not captured by the fifth camera, and
The fifth camera and the sixth camera are established as a third set of cameras for calculating the depth of the position.
24. The method of any one of claims 21 to 23, the method further comprising:
at the apparatus:
According to a determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location and a seventh image and an eighth image have sufficient feature correlation for calculating the depth of the location, a fourth depth of the location is calculated based on the seventh image and the eighth image, wherein the seventh image is captured by a seventh camera, and wherein the eighth image is captured by the seventh camera.
25. The method of any of claims 21-24, wherein calculating the second depth of the location is further based on a determination of selecting the second set of cameras from a plurality of different sets of cameras.
26. The method of claim 25, wherein the determination to select the second set of cameras is based on images captured by the third camera when the third camera is in a lower power mode than cameras in the first set of cameras.
27. The method of any of claims 25-26, wherein the determination of selecting the second set of cameras is based on a priority order of the set of cameras, and wherein the priority order is established prior to receiving the first image.
28. The method of any one of claims 21 to 27, the method further comprising:
at the apparatus:
In accordance with a determination that the cause of lack of feature correlation with respect to the first set of cameras has been resolved, a depth of a location is calculated based on an image captured by the first camera and an image captured by the second camera.
29. The method of any of claims 21-28, wherein the determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location includes identifying features in the first image that are not included in the second image.
30. The method of any of claims 21-29, wherein the determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location comprises identifying a fault in the first image.
31. The method of any of claims 21-30, wherein the determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location comprises a determination that a threshold number of features in the first image are not included in the second image, and wherein the threshold number is at least two.
32. The method of any of claims 21-31, wherein the determination that the first image and the second image do not have sufficient feature correlation for computing the depth of the location comprises:
Dividing the first image into a plurality of portions, and
In accordance with a determination that a first portion of the plurality of portions does not have sufficient feature correlation and a second portion of the plurality of portions does not have sufficient feature correlation, determining that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location based on determining that a threshold number of the plurality of portions do not have sufficient feature correlation, wherein the first portion is different from the second portion.
33. The method of any of claims 21-32, wherein the third camera is in a standby mode when the determination is made that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location.
34. The method of any of claims 21 to 33, wherein the image captured by the third camera is used to calibrate a feature correlation between the first image and the second image.
35. The method of any of claims 21-34, wherein the third camera is the first camera.
36. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 21-35.
37. An apparatus, the apparatus comprising:
One or more processors, and
A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 21-35.
38. An apparatus, the apparatus comprising:
apparatus for performing the method of any one of claims 21 to 35.
39. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 21-35.
40. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for:
receiving a first image captured by a first camera;
Receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;
Calculating a first depth of the location based on the first image and the second image based on a determination that the first image and the second image have sufficient feature correlation for calculating the depth of the location, and
According to a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location and that third and fourth images have sufficient feature correlation for calculating the depth of the location, a second depth of the location is calculated based on the third and fourth images, wherein:
The third image is captured by a third camera,
The third camera is different from the first camera,
The third camera is different from the second camera,
The fourth image is captured by a fourth camera,
The fourth camera is different from the third camera, and
The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.
41. An apparatus, the apparatus comprising:
A first camera;
a second camera, the second camera being different from the first camera;
A third camera, the third camera being different from the first camera and the second camera;
A fourth camera, the fourth camera being different from the third camera, the second camera, and the first camera, wherein the first camera and the second camera are on a first axis, and wherein the third camera and the fourth camera are on a second axis that is different from the first axis;
One or more processors, and
A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for:
receiving a first image captured by the first camera;
receiving a second image captured by the second camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating a depth of a location;
Calculating a first depth of the location based on the first image and the second image based on a determination that the first image and the second image have sufficient feature correlation for calculating the depth of the location, and
According to a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location and that third and fourth images have sufficient feature correlation for calculating the depth of the location, a second depth of the location is calculated based on the third and fourth images, wherein:
The third image is captured by the third camera,
The fourth image is captured by the fourth camera, and
The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.
42. An apparatus, the apparatus comprising:
One or more processors, and
A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for:
receiving a first image captured by a first camera;
Receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;
Calculating a first depth of the location based on the first image and the second image based on a determination that the first image and the second image have sufficient feature correlation for calculating the depth of the location, and
According to a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location and that third and fourth images have sufficient feature correlation for calculating the depth of the location, a second depth of the location is calculated based on the third and fourth images, wherein:
The third image is captured by a third camera,
The third camera is different from the first camera,
The third camera is different from the second camera,
The fourth image is captured by a fourth camera,
The fourth camera is different from the third camera, and
The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.
43. An apparatus, the apparatus comprising:
Means for receiving a first image captured by a first camera;
Means for receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;
Means for computing a first depth of the location based on the first image and the second image, from a determination that the first image and the second image have sufficient feature correlation for computing the depth of the location, and
Means for calculating a second depth of the location based on a determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location and that a third image and a fourth image have sufficient feature correlation for calculating the depth of the location, wherein:
The third image is captured by a third camera,
The third camera is different from the first camera,
The third camera is different from the second camera,
The fourth image is captured by a fourth camera,
The fourth camera is different from the third camera, and
The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.
44. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for:
receiving a first image captured by a first camera;
Receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;
Calculating a first depth of the location based on the first image and the second image based on a determination that the first image and the second image have sufficient feature correlation for calculating the depth of the location, and
According to a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location and that third and fourth images have sufficient feature correlation for calculating the depth of the location, a second depth of the location is calculated based on the third and fourth images, wherein:
The third image is captured by a third camera,
The third camera is different from the first camera,
The third camera is different from the second camera,
The fourth image is captured by a fourth camera,
The fourth camera is different from the third camera, and
The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.
CN202380045271.6A 2022-06-09 2023-06-08 Dynamic camera selection Pending CN119343930A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202263350595P 2022-06-09 2022-06-09
US63/350,595 2022-06-09
US18/203,560 US20230401732A1 (en) 2022-06-09 2023-05-30 Dynamic camera selection
US18/203,560 2023-05-30
PCT/US2023/024871 WO2023239877A1 (en) 2022-06-09 2023-06-08 Dynamic camera selection

Publications (1)

Publication Number Publication Date
CN119343930A true CN119343930A (en) 2025-01-21

Family

ID=87136211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202380045271.6A Pending CN119343930A (en) 2022-06-09 2023-06-08 Dynamic camera selection

Country Status (3)

Country Link
EP (1) EP4505754A1 (en)
CN (1) CN119343930A (en)
WO (1) WO2023239877A1 (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9544574B2 (en) * 2013-12-06 2017-01-10 Google Inc. Selecting camera pairs for stereoscopic imaging
US9392189B2 (en) * 2014-02-28 2016-07-12 Intel Corporation Mechanism for facilitating fast and efficient calculations for hybrid camera arrays
GB2535706A (en) * 2015-02-24 2016-08-31 Nokia Technologies Oy Device with an adaptive camera array
JP2016197083A (en) * 2015-04-06 2016-11-24 ソニー株式会社 Control device, method, and program
KR102777120B1 (en) * 2015-04-19 2025-03-05 포토내이션 리미티드 Multi-baseline camera array system architectures for depth augmentation in vr/ar applications
US10067513B2 (en) * 2017-01-23 2018-09-04 Hangzhou Zero Zero Technology Co., Ltd Multi-camera system and method of use
KR102470465B1 (en) * 2018-02-19 2022-11-24 한화테크윈 주식회사 Apparatus and method for image processing

Also Published As

Publication number Publication date
WO2023239877A1 (en) 2023-12-14
EP4505754A1 (en) 2025-02-12

Similar Documents

Publication Publication Date Title
US10611023B2 (en) Systems and methods for performing occlusion detection
US8180107B2 (en) Active coordinated tracking for multi-camera systems
US10496104B1 (en) Positional awareness with quadocular sensor in autonomous platforms
US11427218B2 (en) Control apparatus, control method, program, and moving body
EP2571660B1 (en) Mobile human interface robot
US20170097643A1 (en) Systems and Methods for Performing Simultaneous Localization and Mapping using Machine Vision Systems
EP3788597A1 (en) Associating lidar data and image data
JP7166446B2 (en) System and method for estimating pose of robot, robot, and storage medium
JP7305768B2 (en) VEHICLE CONTROL METHOD, RELATED DEVICE, AND COMPUTER STORAGE MEDIA
CN113916230A (en) System and method for performing simultaneous localization and mapping using a machine vision system
WO2019168886A1 (en) System and method for spatially mapping smart objects within augmented reality scenes
WO2022179207A1 (en) Window occlusion detection method and apparatus
CN113014658B (en) Device control, device, electronic device, and storage medium
US20230401732A1 (en) Dynamic camera selection
CN119343930A (en) Dynamic camera selection
US20230177723A1 (en) Method and apparatus for estimating user pose using three-dimensional virtual space model
US20230041716A1 (en) Sensor object detection monitoring
CN117250956A (en) Mobile robot obstacle avoidance method and obstacle avoidance device with multiple observation sources fused
WO2022226989A1 (en) System and method for obstacle-free driving
US20240338835A1 (en) Dual image processing
US20240338838A1 (en) Contextual image processing
US20240338842A1 (en) Techniques for tracking one or more objects
US20240104895A1 (en) Data selection
US20240104907A1 (en) Data selection
Zhao et al. Non-Point Visible Light Transmitter Localization based on Monocular Camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination