CN119343930A

CN119343930A - Dynamic camera selection

Info

Publication number: CN119343930A
Application number: CN202380045271.6A
Authority: CN
Inventors: G·E·威廉姆斯; S·鲍威尔斯; R·M·舒勒
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2022-06-09
Filing date: 2023-06-08
Publication date: 2025-01-21
Also published as: WO2023239877A1; EP4505754A1

Abstract

The present disclosure provides more effective and/or efficient techniques for determining information about a physical environment. Such techniques may optionally supplement or replace other techniques for determining information about a physical environment. Some techniques described herein encompass switching which cameras are used to calculate the depth of a location in a physical environment. The switch may occur when the current image does not have sufficient feature correlations for calculating the depth of the location. Other techniques described herein encompass switching which cameras are used to obtain sufficient data for a location within a representation (e.g., a three-dimensional representation) of a physical environment. The switch may occur in response to determining that there is not enough data for the location.

Description

Dynamic camera selection

Cross Reference to Related Applications

The present application claims the benefit of U.S. provisional patent application Ser. No. 63/350,595, entitled "DYNAMIC CAMERA SELECTION," filed on 6/9 of 2022, which is hereby incorporated by reference in its entirety for all purposes.

Background

Some devices include a camera for capturing an image of a physical environment to determine information about the physical environment, such as the depth of a computing location. Such determination is limited by the image used and/or captured. Accordingly, there is a need to provide more effective and/or efficient techniques for determining information about a physical environment.

Disclosure of Invention

The present disclosure provides more effective and/or efficient techniques for determining information about a physical environment. Such techniques optionally supplement or replace other techniques for determining information about the physical environment.

Some techniques described herein encompass switching which cameras are used to calculate the depth of a location in a physical environment. The switching may occur when the current image does not have sufficient feature correlation for calculating the depth of the location. Other techniques described herein contemplate switching which cameras are used to obtain sufficient data for a location within a representation (e.g., a three-dimensional representation) of a physical environment. The handoff may occur in response to determining that there is not enough data for the location.

In the techniques described herein, cameras may be configured differently on a device. For example, three cameras may be positioned in a triangular pattern, with two cameras on the horizontal axis and a third camera above or below the horizontal axis. For another example, four cameras may be positioned in a rectangular pattern.

Drawings

For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings, in which like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram illustrating a computing system.

Fig. 2 is a block diagram illustrating a device having an interconnect subsystem.

Fig. 3 is a block diagram illustrating an apparatus for determining information about a physical environment.

Fig. 4A is a block diagram illustrating a camera array of three cameras.

Fig. 4B is a block diagram illustrating a camera array of four cameras.

Fig. 5 is a flowchart illustrating a method for calculating a depth of a location.

Fig. 6 is a flowchart illustrating a method for obtaining sufficient data about a physical environment.

Detailed Description

The following description sets forth exemplary methods, parameters, and the like. However, it should be recognized that such description is not intended as a limitation on the scope of the present disclosure, but is instead provided as a description of exemplary embodiments.

Some techniques described herein encompass switching which cameras are used to calculate the depth of a location in a physical environment. The switching may occur when the current image does not have sufficient feature correlation for calculating the depth of the location. In one example, a device includes a plurality of cameras having at least partially overlapping fields of view of a physical environment. The device causes a set of cameras to capture images of the physical environment. The image is then used to attempt to calculate the depth of different locations within the image. The device causes the different sets of cameras to capture images of the physical environment when the images do not have sufficient feature correlation to calculate the depth of the location. The depth of the location is then calculated using images captured by the different sets. The different groups may or may not include cameras from the original group of cameras. In some examples, a different set of cameras is selected from a plurality of possible different sets of cameras. In some examples, in response to the images not having sufficient feature correlation, the device causes a single camera to capture multiple images of depth to be used to calculate the location.

Other techniques described herein contemplate switching which cameras are used to obtain sufficient data for a location within a representation of a physical environment. The handoff may occur in response to determining that there is not enough data for the location. In one example, a device includes a plurality of cameras having at least partially overlapping fields of view of a physical environment. The device causes a set of cameras to capture images of the physical environment. The image is used to generate a depth map of the physical environment, the depth map comprising distances of different locations in the physical environment. In some examples, a representation of the physical environment is generated using the depth map and the image, the representation including a location of the identified object within the physical environment. The device then determines that the representation does not include sufficient data for the particular location. For example, the set of cameras may not be able to capture images of a particular location. After determining the shortcomings of the representation, the device causes a different set of one or more cameras to capture images of the physical environment.

In the techniques described herein, multiple cameras may be configured differently on a device. For example, the plurality of cameras may include three cameras positioned in a triangular pattern, with two cameras located on a horizontal axis and a third camera above or below the horizontal axis. When a problem arises, this configuration may allow one camera to be switched with another. For another example, the plurality of cameras may include four cameras positioned in a rectangular pattern. Such a configuration may allow a current pair of cameras having a particular distance between the current pair to be switched to a new pair of cameras having a different distance between the new pair.

In the methods described herein wherein one or more steps are dependent on one or more conditions having been met, it should be understood that the method may be repeated in multiple iterations such that during the iteration, all conditions that determine steps in the method have been met in different iterations of the method. For example, if a method requires performing a first step (if a condition is met) and performing a second step (if a condition is not met), one of ordinary skill will know that the stated steps are repeated until both the condition and the condition are not met (not sequentially). Thus, a method described as having one or more steps depending on one or more conditions having been met may be rewritten as a method that repeats until each of the conditions described in the method have been met. However, this does not require the system or computer-readable medium to claim that the system or computer-readable medium contains instructions for performing the contingent operation based on the satisfaction of the corresponding condition or conditions, and thus is able to determine whether the contingent situation has been met without explicitly repeating the steps of the method until all conditions to decide on steps in the method have been met. It will also be appreciated by those of ordinary skill in the art that, similar to a method with optional steps, a system or computer readable storage medium may repeat the steps of the method as many times as necessary to ensure that all optional steps have been performed.

Although the following description uses the terms "first," "second," etc. to describe various elements, these elements should not be limited by the terms. In some examples, these terms are used to distinguish one element from another element. For example, a first device may be referred to as a second device, and similarly, a second device may be referred to as a first device, without departing from the scope of the various described embodiments. In some examples, the first device and the second device are two separate references to the same device. In some embodiments, the first device and the second device are both devices, but they are not the same device or the same type of device.

The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and in the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "if" is optionally interpreted to mean "when..once", "at..once..once.," or "in response to a determination" or "in response to detection", depending on the context. Similarly, the phrase "if determined" or "if detected [ stated condition or event ]" is optionally interpreted to mean "upon determination" or "in response to determination" or "upon detection of [ stated condition or event ]" or "in response to detection of [ stated condition or event ]" depending on the context.

Turning now to FIG. 1, a block diagram of a computing system 100 is depicted. Computing system 100 is a non-limiting example of a computing system that may be used to perform the functions described herein. It should be appreciated that other computer architectures of a computing system can be used to perform the functions described herein.

In the illustrated example, the computing system 100 includes a processor subsystem 110 coupled (e.g., wired or wireless) to a memory 120 (e.g., system memory) and an I/O interface 130 via an interconnect 150 (e.g., a system bus, one or more memory locations, or other communication channel for connecting the various components of the computing system 100). Further, the I/O interface 130 is coupled (e.g., wired or wirelessly) to the I/O device 140. In some examples, I/O interface 130 is included with I/O device 140 such that both are a single component. It should be appreciated that there may be one or more I/O interfaces, where each I/O interface is coupled to one or more I/O devices. In some examples, multiple instances of processor subsystem 110 may be coupled to interconnect 150.

The computing system 100 may be any of a variety of types of devices including, but not limited to, a system on a chip, a server system, a personal computer system (e.g., an iPhone, iPad, or MacBook), a sensor, and the like. In some examples, computing system 100 is included with or coupled to the physical component for the purpose of modifying the physical component in response to the instruction (e.g., computing system 100 receives an instruction to modify the physical component and, in response to the instruction, causes the physical component to be modified (e.g., by an actuator)). Examples of such physical components include acceleration controls, brakes, gearboxes, motors, pumps, refrigeration systems, suspension systems, steering controls, vacuum systems, valves, and the like. As used herein, a sensor includes one or more hardware components that detect information about the physical environment in the vicinity (e.g., surrounding) of the sensor. In some examples, the hardware components of the sensor include a sensing component (e.g., an image sensor or a temperature sensor), a transmitting component (e.g., a laser or radio transmitter), a receiving component (e.g., a laser or radio receiver), or any combination thereof. Examples of sensors include angle sensors, chemical sensors, brake pressure sensors, contact sensors, non-contact sensors, electrical sensors, flow sensors, force sensors, gas sensors, humidity sensors, cameras, inertial measurement units, leak sensors, level sensors, light detection and ranging systems, metal sensors, motion sensors, particle sensors, photoelectric sensors, position sensors (e.g., global positioning systems), precipitation sensors, pressure sensors, proximity sensors, radio detection and ranging systems, radiation sensors, speed sensors (e.g., measuring the speed of an object), temperature sensors, time-of-flight sensors, torque sensors, and ultrasonic sensors. Although a single computing system is shown in fig. 1, computing system 100 may also be implemented as two or more computing systems operating together.

In some examples, the processor subsystem 110 includes one or more processors or processing units configured to execute program instructions to perform the functions described herein. For example, the processor subsystem 110 may execute an operating system, a middleware system, one or more application programs, or any combination thereof.

In some examples, the operating system manages the resources of computing system 100. Examples of types of operating systems contemplated herein include batch operating systems (e.g., multiple Virtual Storage (MVS)), time-shared operating systems (e.g., unix), distributed operating systems (e.g., advanced interactive execution (AIX)), network operating systems (e.g., microsoft Windows Server), real-time operating systems (e.g., QNX). In some examples, an operating system includes various programs, sets of instructions, software components, and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and for facilitating communication between various hardware and software components. In some examples, the operating system uses a priority-based scheduler that assigns priorities to different tasks to be performed by the processor subsystem 110. In such examples, the priority assigned to the task is used to identify the next task to be performed. In some examples, the priority-based scheduler identifies the next task to be performed when the previous task completes execution (e.g., the highest priority task runs to completion unless another higher priority task is ready).

In some examples, the middleware system provides one or more services and/or capabilities to applications (e.g., one or more applications running on processor subsystem 110) other than the services provided by the operating system (e.g., data management, application services, messaging, authentication, API management, etc.). In some examples, the middleware system is designed for heterogeneous computer clusters to provide hardware abstraction, low-level device control, implementation of common functions, messaging between processes, packet management, or any combination thereof. Examples of middleware systems include lightweight communication and grouping (LCM), PX4, robotic Operating Systems (ROS), zeroMQ. In some examples, middleware systems use graph construction to represent processes and/or operations, where processing occurs in nodes that can receive, publish, and multiplex sensor data, controls, states, plans, actuators, and other messages. In such examples, an application (e.g., an application executing on processor subsystem 110 as described above) may be defined using a graph architecture such that different operations of the application are included with different nodes in the graph architecture.

In some examples, a message is sent from a first node in a graph architecture to a second node in the graph architecture using a publish-subscribe model, wherein the first node publishes data on a channel to which the second node can subscribe. In such examples, the first node may store the data in a memory (e.g., memory 120 or some local memory of processor subsystem 110) and inform the second node that the data has been stored in the memory. In some examples, a first node informs a second node that data has been stored in memory by sending a pointer (e.g., a memory pointer, such as an identification of a memory location) to the second node so that the second node can access the data from a location where the first node stored the data. In some examples, the first node will send the data directly to the second node such that the second node will not need to access memory based on the data received from the first node.

Memory 120 may include a computer-readable medium (e.g., a non-transitory or transitory computer-readable medium) that may be used to store program instructions that may be executed by processor subsystem 110 to cause computing system 100 to perform various operations described herein. For example, memory 120 may store program instructions to implement the functions associated with the processes described in fig. 4 and/or fig. 5.

The memory 120 may be implemented using different physical non-transitory memory media, such as hard disk storage, floppy disk storage, removable disk storage, flash memory, random access memory (RAM-SRAM, EDO RAM, SDRAM, DDR SDRAM, RAMBUS RAM, etc.), read only memory (PROM, EEPROM, etc.), etc. The memory in computing system 100 is not limited to a primary storage device, such as memory 120. Rather, computing system 100 may also include other forms of storage such as cache memory in processor subsystem 110 and secondary storage (e.g., hard disk drives, storage arrays, etc.) on I/O device 140. In some examples, these other forms of storage may also store program instructions that are executed by processor subsystem 110 to perform the operations described herein. In some examples, processor subsystem 110 (or each processor within processor subsystem 110) includes a cache or other form of on-board memory.

The I/O interface 130 may be any of various types of interfaces configured to couple to and communicate with other devices. In some examples, I/O interface 130 includes a bridge chip (e.g., a south bridge) from a front-side bus to one or more back-side buses. The I/O interface 130 may be coupled to one or more I/O devices (e.g., I/O device 140) via one or more corresponding buses or other interfaces. Examples of I/O devices include storage devices (hard disk drives, optical disk drives, removable flash drives, storage arrays, SANs, or their associated controllers), network interface devices (e.g., to a local or wide area network), sensor devices (e.g., cameras, radar, lidar, ultrasonic sensors, GPS, inertial measurement devices, etc.), and audible or visual output devices (e.g., speakers, lights, screens, projectors, etc.). In some examples, computing system 100 is coupled to a network via a network interface device (e.g., configured to communicate over Wi-Fi, bluetooth, ethernet, etc.).

Fig. 2 depicts a block diagram of a device 200 having an interconnect subsystem. In the illustrated example, the device 200 includes three different subsystems (i.e., a first subsystem 210, a second subsystem 220, and a third subsystem 230) coupled (e.g., wired or wireless) to each other. An example of a possible computer architecture for a subsystem as included in fig. 2 (i.e., computing system 100) is depicted in fig. 1. Although three subsystems are shown in fig. 2, device 200 may include more or fewer subsystems.

In some examples, some subsystems are not connected to another subsystem (e.g., the first subsystem 210 may be connected to the second subsystem 220 and the third subsystem 230, but the second subsystem 220 may not be connected to the third subsystem 230). In some examples, some subsystems are connected via one or more wires, while other subsystems are connected wirelessly. In some examples, one or more subsystems are wirelessly connected to one or more computing systems external to device 200, such as a server system. In such examples, the subsystem may be configured to communicate wirelessly with one or more computing systems external to device 200.

In some examples, device 200 includes a housing that completely or partially encloses subsystems 210-230. Examples of the apparatus 200 include a home appliance apparatus (e.g., a refrigerator or an air conditioning system), a robot (e.g., a robot arm or a robot cleaner), a vehicle, and the like. In some examples, the device 200 is configured to navigate the device 200 (with or without direct user input) in a physical environment.

In some examples, one or more subsystems of device 200 are used to control, manage, and/or receive data from one or more other subsystems of device 200 and/or one or more computing systems remote from device 200. For example, the first subsystem 210 and the second subsystem 220 may each be cameras that are capturing images for use by the third subsystem 230 in making decisions. In some examples, at least a portion of device 200 functions as a distributed computing system. For example, the tasks may be divided into different portions, with a first portion being performed by the first subsystem 210 and a second portion being performed by the second subsystem 220.

Attention is now directed to techniques for determining information about a physical environment using an example of a device having a camera that captures an image of the physical environment. In some examples, the device determines that there is a lack of feature correlation between images from the current set of cameras, and selects a different set of cameras for achieving sufficient feature correlation. In other examples, the device determines that there is insufficient information in the representation of the physical environment and selects a new set of cameras for capturing sufficient information to add to the representation. It should be understood that more or fewer cameras (including single cameras) and other types of sensors are within the scope of the present disclosure and may benefit from the techniques described herein.

In accordance with the techniques described herein, the device uses sensors to locate objects within a physical environment. Such positioning may include estimating (e.g., calculating) a depth (e.g., distance from the device) of the object (e.g., substantially the object or a portion (e.g., not all) of the object). Different sensors or combinations of sensors may be used to estimate depth, including cameras and range of motion sensors (e.g., light or radio detection and ranging systems).

Estimating depth with a camera may use a monocular image (e.g., a single camera sensor that captures still or sequential images) or a stereo image (e.g., multiple camera sensors that capture still or sequential images). The following techniques for estimating depth may be used in any combination with each other.

One technique for estimating depth uses depth cues to identify the relative locations of different objects in a physical environment. Examples of depth cues include comparing the size of different objects (e.g., objects appear smaller when the objects are farther), texture (e.g., textures of objects are less identifiable (e.g., lower quality) when the objects are farther), shading (e.g., shadows of objects may indicate that a portion of an object is farther than another portion), linear perspective (e.g., objects converge to the horizon as the objects are farther), motion parallax (e.g., objects farther appear to move slower than objects farther), binocular parallax (e.g., objects farther have greater parallax between two images than objects farther), and apparent size of known objects (e.g., the size of objects may be constrained by the typical size of objects of the type when the type of object is identified). Using one or more of these depth cues, the technique determines that an object is closer or farther than another object.

Another technique for estimating depth identifies correspondence between two images (e.g., captured using two different image sensors (such as two different cameras) or a single image sensor (such as a single camera) at two different times). The technique then uses geometry (e.g., epipolar geometry) to estimate the depth of the region within the image. In one example, estimating the depth using the correspondence includes identifying features in the first image (e.g., one or more pixels in the image, such as corners, edges, or any distinguishing portion of the image) and identifying corresponding features in the second image (e.g., features in the second image determined to match features in the first image). In some examples, such features are identified independently and compared to each other. In other examples, features in the first image are used to identify corresponding features in the second image. After identifying the features, differences (sometimes referred to as shifts or disparities) between where the features are located in their respective images are calculated. The depth of the feature is determined based on the parallax, the focal length of the cameras capturing the images, and the distance between the cameras capturing the images. In another example, estimating the depth using the correspondence includes dividing the image into a plurality of regions and identifying a plurality of features in each region. The different images are then compared with the features in each region to find the corresponding features. If sufficient features are identified in sufficient regions (e.g., above a threshold), a depth is calculated for the region using a calibrated model of the calculated disparity based on the relative geometric position of the camera as described above. Thus, in such an example, a different threshold number of corresponding features are required than the region in which depth is being calculated.

Another technique for estimating depth uses a neural network to estimate depth. For example, the neural network takes the image as input and outputs a depth value based on a depth cue (such as the depth cue described above). In some examples, the neural network learns the regression depth from depth cues in the image by minimizing a loss function (e.g., regression loss) via supervised learning.

In the techniques described herein, one or more cameras may be calibrated before, during, or after performing particular steps. In some examples, calibration includes a process of determining specific camera parameters to determine an accurate relationship between a three-dimensional point in a physical environment and its corresponding two-dimensional projection (e.g., pixel) in an image. Such parameters eliminate distortions in the image, thereby establishing a relationship between the image pixels and the physical environment size. The distortion may be captured by a distortion coefficient whose value reflects the amount of radial distortion (e.g., that occurs when the light rays curve more at the edge of the lens than the optical center of the lens) and tangential distortion (e.g., that occurs when the lens is not parallel to the image plane) in the image. The distortion coefficients include intrinsic parameters (e.g., focal length and optical center) and extrinsic parameters (e.g., rotation and translation of the camera) of each camera. In some examples, external parameters are used to transfer between world (e.g., physical environment) coordinates and camera coordinates, and intrinsic parameters are used to transfer between camera coordinates and pixel coordinates.

The techniques described herein use images captured by a camera to perform calibration. For example, calibration may include comparing an image captured by a first camera with an image captured by a second camera to identify differences in order to determine distortion coefficients. In such an example, the distortion coefficients are determined by comparing image features captured by cameras in a control environment for which the ground truth geometry of these features is known. In some examples, the camera used for calibration may or may not be included in a set of cameras used to determine information about the physical environment. For example, cameras in the set of cameras may be calibrated using cameras not included in the set of cameras, such as determining distortion coefficients for the cameras in the set of cameras.

The above-described techniques rely on capturing one or more images of an object to determine the depth of the object. Thus, some techniques described herein switch from using a first set of images to another set of images, such as images from different cameras oriented differently and/or located in different locations than at least some of the cameras used to capture the first set.

Fig. 3 is a block diagram illustrating an apparatus 300 for determining information about a physical environment in accordance with some techniques described herein. Although fig. 3 is described primarily with respect to computing depth, it should be appreciated that similar techniques may be used for other determinations (e.g., identifying, classifying, or determining information about objects or other elements in a physical environment).

The device 300 includes a plurality of subsystems that are at least partially interconnected. For example, device 300 includes a plurality of cameras (i.e., cameras 310) each connected (e.g., wired or wireless) to camera selector 320. It should be appreciated that one or more subsystems of device 300 may be combined or further broken down into more subsystems. For example, the camera subsystem may include a camera selector and a depth processor, or the depth processor may include a camera selector.

In some examples, each of the cameras 310 is configured to capture an image and send the image to the camera selector 320 or store the image in a particular location specific to each camera. In some examples where the camera stores the image in a particular location, the camera informs the camera selector 320 that the image has been stored (and optionally includes a location where the image has been stored by, for example, a pointer to a memory location), and the camera selector 320 is configured to access the location where the image has been stored. In other examples where the camera stores images in a particular location, the camera does not inform the camera selector 320 that the images have been stored, but rather the camera selector 320 requests access to a known location of one or more stored images. As used herein, "transmitting" an image from one subsystem to another may refer to actually transmitting the image to the other subsystem or storing the image so that the image is accessible to the other subsystem.

In some examples, cameras 310 correspond to two or more cameras configured to capture images of at least partially overlapping areas in a physical environment. Examples of configurations of cameras 310 are illustrated in fig. 4A and 4B, which are discussed further below.

As described above, the device 300 includes a camera selector 320. The camera selector 320 may be configured to identify one or more cameras from the cameras 310 for further processing, such as depth processing by the depth processor 330. In some examples, camera selector 320 receives images from each of cameras 310 and determines which images to send to depth processor 330. In other examples, the camera selector 320 receives images from only a subset (e.g., less than all) of the cameras in the cameras 310 (where the subset is selected by the camera selector 320 before or after the camera selector 320 receives any images from the cameras in the cameras 310) and sends all of the received images to the depth processor 330. In some examples, the camera selector 320 causes the camera to capture an image and then sends the captured image to the depth processor 330. In some examples, the camera selector 320 causes one or more cameras to cease capturing images and/or send images to the camera selector 320.

As shown in fig. 3, the apparatus 300 further includes a depth processor 330. The depth processor 330 may be configured to calculate a depth of a location within the physical environment based on images captured by one or more cameras selected by the camera selector 320. In other examples, the depth processor 330 is included in a device separate from the device 300 (such as a remote server). Depth processor 330 may calculate the depth using any one of the techniques described herein (or any combination thereof). For example, the depth processor 330 may (1) receive the first image and the second image, (2) identify features in the first image and corresponding features in the second image, and (3) calculate the depth of the features using the epipolar geometry.

In some examples, different computing systems (e.g., different systems on a chip executing the camera selector 320 and/or the depth processor 330) are assigned to receive images from different cameras. For example, a first computing system may be configured to receive images from a first camera and a second camera to determine depth information for a location using the images from the first camera and the second camera, and another computing system may be configured to receive images from other cameras to determine depth information using the images from the other cameras. In such examples, the images may be stored on memory local to the respective computing systems to reduce the time to access the pixel information and/or the need to send the pixel information between the computing systems. In another example, a first computing system is configured to receive an image from a first camera and store the received image in a memory local to the first computing system, and a second computing system is configured to receive an image from a second camera and store the received image in a memory local to the second computing system, wherein one of the two computing systems is configured to determine depth information using the images from the two cameras. In such an example, the computing system that does not determine the depth information may send only a portion (e.g., less than all) of the image to the computing system that determines the depth information to reduce the amount of data that moves to a different computing system. The portion may correspond to lower resolution images and/or to only a subset of the images required for the computation (e.g., some images are sent and some images are not). In other examples, feature correlation is performed in object space such that objects are identified in a first image and objects are identified in a second image, and then objects are compared to each other to determine correspondence between the images. Such comparison would not require comparing individual pixels.

In some examples, depth processor 330 generates a depth map for the physical environment using the calculated depth for the location. In such examples, the depth map includes depths of different locations within the physical environment. The depth map may then be used by the device 300 to make decisions, such as how to navigate the physical environment. As used herein, a depth map is sometimes referred to as a representation of a physical environment. In some examples, depth processor 330 uses other data detected by other types of sensors in conjunction with the image to generate a depth map. For example, light or radio detection and ranging systems may be used to identify the depth of the featureless areas and/or provide calibration for depth calculation. In such examples, the light or radio detection and ranging system may capture data only for a particular region and/or with a particular resolution (e.g., a lower resolution than an image captured by a camera (such as a camera in a ready mode as described herein).

In some examples, device 300 generates a representation (e.g., a three-dimensional representation or an object view) of a physical environment using a depth map and one or more images captured by camera 310. In such examples, the representation includes additional information about the physical environment (i.e., in addition to depth), such as identification of objects within the physical environment, and any other information that will assist the device 300 in making decisions about the physical environment. In some examples, other data detected by other types of sensors (e.g., light or radio detection and ranging systems) are used in combination with the image to generate the representation.

In some examples, the depth processor 330 is unable to match a particular feature in the first image with a particular feature in the second image. In such examples, the depth of a particular feature cannot be calculated due to the lack of feature correlation. In other examples, depth processor 330 determines that, in addition to the particular feature, the subset of images lacks feature correlation. In such examples, the particular location cannot be located within the physical environment.

In some examples, the depth processor 330 sends a message to the camera selector 320 in response to determining that the feature correlation is lacking. The message may include an identification of the location where the lack of feature correlation is affecting, a confidence level that the lack of feature correlation has occurred, a depth map (or an update to a depth map if, for example, the depth map is managed by or stored in memory local to the camera selector 320), a representation generated from the depth map (or an update to a representation if, for example, the representation is managed by or stored in memory local to the camera selector 320), a depth map, or an indication that the representation generated from the depth map has been updated, or any combination thereof. In other examples, the lack of feature correlation is reflected in the depth map and/or a representation generated from the depth map, which may be accessed by the camera selector 320. In some examples, the camera selector 320 may be configured to operate at a fixed rate such that the camera selector 320 identifies one or more cameras from the cameras 310 for further processing according to the fixed rate (e.g., every 100 milliseconds).

In response to receiving the message, determining that information related to the physical environment has been updated, or initiating an operation, the camera selector 320 may determine whether to switch and to which camera or cameras to switch to in order to obtain sufficient information. Such determination may be based on (1) whether feature correlation is absent, (2) whether a representation of the physical environment lacks depth information for the location, (3) whether there is no data about the location, (4) whether there is sufficient information to classify the object located at the location, (5) whether there is sufficient depth computation (e.g., the current depth computation has been determined to be incorrect or there is no depth computation), (6) whether the object is determined to be in the line of sight of a particular camera such that objects behind the object are hidden, or (7) any combination thereof. In some examples, the camera selector 320 uses the representation to perform a geospatial search to determine an area in which information is needed. The geospatial search may include identifying a portion of the physical environment in which the device 300 is moving and determining which portions of the representation are relevant to the region. In some examples, the geospatial search may include a spatial decomposition of one or more portions of the physical environment and rank other objects in the physical environment based on semantic knowledge of the physical environment, such as the location, speed, and heading of the device 300 within the physical environment, the location and classification of other objects in the physical environment, and particular goals in terms of their importance to the device 300. In some examples, selection based on such criteria prioritizes high resolution in small windows in one direction at one time and low resolution in larger windows at another time. In other examples, selection based on such criteria prioritizes depth perception between twenty meters and one hundred meters at one time and depth resolution between five meters and twenty-five meters at another time.

In some examples, the camera selector 320 identifies a portion (e.g., less than all) of the physical environment in which the device 300 needs additional information and sends an image corresponding to the portion to the depth processor 330. In such examples, a partial image (e.g., not an entire image) may be sent to the depth processor 330 to reduce the amount of data that the depth processor 330 needs to process.

In some examples, the camera selector 320 determines that depth information for the location is not needed for the current determination that is needed to be made by the device 300. For example, the device 300 may be traveling at a speed that does not require that location at this time to make the determination. For another example, device 300 may use information other than the image captured by the camera (e.g., a map received by device 300 or a light or radio detection and ranging system) to identify sufficient information about the location.

In some examples, the camera selector 320 determines to select a new camera set in response to identifying a problem with the current camera set. For example, problems may include lens flashes, veiling glare, occlusions, hardware failures, software failures, incorrect locations of depth calculations determined, areas of the physical environment that are not adequately covered by the current camera set, and so forth. In some examples, the device 300 continues to navigate the physical environment while attempting to solve the problem using a different set of one or more cameras.

With more than two cameras, the camera selector 320 may form a set of cameras (e.g., stereo pairs of two cameras) between different cameras to capture a portion of the physical environment, and/or have redundant sets of cameras to capture the same portion of the physical environment (e.g., multiple stereo pairs covering at least partially overlapping areas). In some examples, having multiple cameras increases availability (e.g., in the event of individual camera failure or occlusion) and/or information about the physical environment.

In some examples, a first set of cameras 310 is established by default (e.g., predefined before device 300 begins executing an application). In such examples, the first group may change to the second group of cameras (e.g., the second group may or may not include cameras in the first group) to alleviate the problem determined by the camera selector 320. In some examples, the first group and/or the second group are established based on a likelihood that cameras included in the respective groups are able to capture images of a particular location in the physical environment.

In some examples, the camera selector 320 determines the best camera set from a plurality of different camera sets. In some examples, the optimal group consists of one or more cameras. In one example, the optimal camera set is selected based on a lookup table that indicates the next set of cameras to use (e.g., in a particular case). In some examples, the lookup table may be populated based on the distance of the location from the device 300. For example, a first order set of possible cameras that prioritizes short distances between cameras (e.g., small baselines) may be used when the location is within a certain distance from the device 300, and a second order set of possible cameras that prioritizes long distances between cameras (e.g., large baselines) may be used when the location exceeds the certain distance from the device 300. In some examples, a look-up table is generated (e.g., established) before any images are captured by camera 310. In other examples, the look-up table is generated (e.g., built up) based on images received by the camera 310, such as by learning how best to respond to problems encountered by a particular configuration of cameras on the device 300.

In some examples, the best camera set is selected based on information identified by the camera selector 320, such as information indicating the context of the device 300 (e.g., speed, acceleration, path, weather, time of day, etc.). In such examples, the images used to select the best group may be captured while the device 300 is moving.

In some examples, the feature comparison operation occurs simultaneously with respect to multiple different sets of cameras to determine which set of cameras is used by the depth processor 330. In some examples, different sets of cameras capture images at different rates and/or resolutions to reduce the computational cost of performing multiple feature comparison operations simultaneously. In such examples, some feature comparison operations are used for diagnostic operations, while other feature comparison operations are used by the depth processor 330 to calculate depth, where the feature comparison operations for diagnostic operations are performed at a lower rate than that used by the depth processor 330.

In some examples, different sets of cameras alternate at a faster rate than is needed for determination purposes. In such examples, a particular set of cameras captures images at a rate required for a determined purpose, and another set of cameras captures images between times when the particular set of cameras captures images for determining when to switch to the other set of cameras.

In some examples, the camera selector 320 determines whether the device 300 needs depth information in a location that is near or far from the device 300. In such examples, the camera selector 320 may cause one or more cameras to capture images at a lower resolution when the locations are closer and to capture images at a higher resolution when the locations are farther.

After using one or more cameras of the new group, the camera selector 320 may determine whether the new group is able to determine sufficient information for a location. In some examples, when the new group is able to be used to determine sufficient information for a location, the device 300 may perform one or more operations based on the information for the location, such as navigating the device 300 in a physical environment. In some examples, when the new group cannot be used to determine sufficient information for a location, the camera selector 320 may select a different group of one or more cameras based on one or more techniques discussed above. In such examples, the camera selector 320 may continue to select a different set of one or more cameras until sufficient information is determined for the location or a threshold number of sets of cameras are attempted. In some examples, when a threshold number of groups of one or more cameras are attempted and information is still needed to make the determination, device 300 may perform operations other than changing to a different group of one or more cameras to attempt to solve the problem. Examples of operations include changing a navigational characteristic of the device 300 (e.g., changing from a first path to a second path) or an operational characteristic of the device 300 (e.g., reducing a speed of the device 300).

In some examples, the camera selector 320 is configured to change the mode of the camera from the camera 310, such as changing the camera from standby (e.g., off or in a lower power mode) to ready (e.g., on or in a mode capable of capturing images at a particular rate and/or resolution). In some examples, the camera selector 320 predicts that a camera will be needed and changes the camera from standby to ready. In such examples, the camera selector 320 may cause the camera to change modes so that the camera may be used by the depth processor 330 without having to change modes when determined to be necessary. For example, the camera selector 320 may cause the camera to transition to a ready mode before determining that there is a lack of feature correlation between current images or that there is a lack of information in the representation of the physical environment.

In some examples, the camera selector 320 determines to transition from the current group of cameras to a previous group of cameras, such as a default group of cameras. In such examples, the determination may be based on determining that sufficient information has been determined for the location or that a cause of the lack of feature correlation for the previous group of cameras has been resolved (e.g., a predetermined amount of time has elapsed, images from the previous group of cameras have been determined to have sufficient feature correlation, or the operational state of the device 300 has changed (such as an operational state that may have caused the previous group of cameras to now have sufficient feature correlation)).

Fig. 4A is a block diagram of a camera array 400 illustrating three cameras. The camera array 400 includes a first camera 410, a second camera 420, and a third camera 430. In some examples, camera array 400 is attached to a device (e.g., device 300) and is configured to capture an image of a physical environment. In such examples, the fields of view of each camera in the camera array 400 may at least partially overlap such that all cameras are able to capture images of a particular region of the physical environment. In some examples, the first camera 410, the second camera 420, and the third camera 430 are each oriented in a different direction. In other examples, at least two of the first camera 410, the second camera 420, and the third camera 430 are each oriented in the same direction.

In some examples, the first camera 410 and the second camera 420 are on a first axis (e.g., a horizontal axis), with each camera separated by a first distance. In such examples, the third camera 430 may be offset from the other cameras and on a second axis different from the first axis. The third camera 430 may be below (as shown in fig. 4A) or above (not illustrated in fig. 4A) the other cameras. Locating the cameras at different axes may allow different cameras to capture fields of view at different angles. In some examples, the third camera 430 is a second distance from the first camera 410 and a third distance from the second camera 420. In such examples, the second distance and the third distance may be the same or different. In some examples, the second distance and/or the third distance may be the same as or different from the first distance. Having cameras at different distances from each other may allow different groups of cameras to have different baselines to change how computations are performed when processing images for information about the physical environment. In some examples, the third camera 430 is offset from the vertical axis associated with the first camera 410 and the vertical axis associated with the second camera 420 such that the third camera 430 is between the vertical axis associated with the first camera 410 and the vertical axis associated with the second camera 420. As described above, having the cameras at different axes may allow different cameras to capture fields of view at different angles.

Fig. 4B is a block diagram of a camera array 440 illustrating four cameras. The camera array includes a first camera 450, a second camera 460, a third camera 470, and a fourth camera 480. In some examples, camera array 440 is attached to a device (e.g., device 300) and is configured to capture images of a physical environment. In such examples, the fields of view of each camera in the camera array 440 may at least partially overlap such that all cameras are able to capture images of a particular region of the physical environment. In some examples, the first camera 450, the second camera 460, the third camera 470, and the fourth camera 480 are each oriented in a different direction. In other examples, at least two of the first camera 450, the second camera 460, the third camera 470, and the fourth camera 480 are each oriented in the same direction (e.g., the first camera 450 and the second camera 460; the first camera 450 and the third camera 470; or the first camera 450 and the fourth camera 480).

In some examples, the first camera 450 and the second camera 460 are on a first axis (e.g., a horizontal axis), with each camera separated by a first distance. In such examples, the third camera 470 and the fourth camera 480 may be offset from the other cameras and on a second axis that is different from the first axis (e.g., a different horizontal axis that is parallel to the other horizontal axis). The third camera 470 and/or the fourth camera 480 may be below (below illustrated in fig. 4B) or above (not illustrated in fig. 4B) the other cameras. As described above, having the cameras at different axes may allow different cameras to capture fields of view at different angles. In some examples, third camera 470 and fourth camera 480 have a second distance separating each camera. In such examples, the second distance may be the same as or different from the first distance. As described above, having cameras at different distances from each other may allow different groups of cameras to have different baselines to change how computations are performed when processing images for information about a physical environment. Having the camera groups the same distance from each other may allow different groups of cameras to easily switch between them without changing the ability of the group to capture objects at a particular distance that is not subject to orientation.

In some examples, the third camera 470 is a second distance from the first camera 450 and a third distance from the second camera 460. In such examples, the second distance and the third distance may be the same or different. In some examples, the second distance and/or the third distance may be the same as or different from the first distance. In some examples, the third camera 470 is offset from but parallel to the vertical axis associated with the first camera 450 and from the vertical axis associated with the second camera 460 such that the third camera 470 is below the first camera 450 and diagonal to the second camera 460. Having the camera groups along the same axis as each other may allow different groups of cameras to easily switch between them without changing the ability of the group to capture objects at a particular point of view.

In some examples, the fourth camera 480 is a fourth distance from the second camera 460 and a fifth distance from the third camera 470. In such examples, the fourth distance and the fifth distance may be the same or different. In some examples, the fourth distance and/or the fifth distance may be the same as or different from the first distance, the second distance, or the third distance. In some examples, the fourth camera 480 is offset along a vertical axis associated with the second camera 460 and from a vertical axis associated with the first camera 450 such that the fourth camera 480 is below the second camera 460 and diagonal to the first camera 450. In some examples, small angular differences in the mounting positions of the cameras result in different effects on the cameras. In such examples, when one camera experiences a particular effect, the other camera is less likely to suffer from the same problem, making the system more robust.

Fig. 5 is a flow chart illustrating a method 500 for computing depth of a location. Some operations in method 500 may optionally be combined, the order of some operations may optionally be changed, and some operations may optionally be omitted.

In some examples, the method 500 is performed at a computing system (e.g., computing system 100) in communication with a camera (e.g., a camera in the cameras 310, the first camera 410, or the first camera 450). In some examples, the computing system and camera are included in a device (e.g., device 200 or device 300). In some examples, the device includes one or more actuators and/or one or more sensors other than a camera. In some examples, the camera is connected to one or more processors of the device via at least one or more wires, in some examples, the camera is wirelessly connected to the one or more processors of the device, in some examples, the one or more processors are included in separate components of the device from the camera, in some examples, the one or more processors are included in the camera, in some examples, the plurality of processors of the device perform the method, wherein at least one step is performed by the one or more processors on the first system on a chip (i.e., soC), and a second step is performed by the second SoC, and wherein the first SoC and the second SoC are distributed in different locations on the device, wherein the different locations are separated by at least 12 inches. In some examples, the method 500 is performed while a device including a camera performs operations (such as navigating in a physical environment).

At 510, method 500 includes receiving a first image (e.g., a representation of a physical environment having one or more color channels (e.g., red, green, and blue channels)) captured by a first camera (e.g., computing system 100, first subsystem 210, a camera in camera 310, first camera 410, or first camera 450).

At 520, the method 500 includes receiving a second image captured by a second camera (e.g., the computing system 100, the first subsystem 210, a camera in the camera 310, the second camera 420, or the second camera 460), wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras (e.g., camera pairs) for computing a depth of a location (e.g., point location) (in some examples, the device includes the first camera and the second camera; in some examples, the first image is captured before or after the first set of cameras is established, in some examples, receiving the first image includes accessing a memory location using an expected location or a location identified in a message received from the first camera, in some examples, the device establishes the first set of cameras for calculating a depth of the location before receiving the first image, in some examples, the first set of cameras does not include a third camera, in some examples, the first set of cameras includes one or more cameras in addition to the first camera and the second camera, in some examples, the first set of cameras is established by default (e.g., predefined before the device begins executing an application requiring depth calculations), in some examples, the first set of cameras is established based on a likelihood that the cameras included in the first set of cameras are capable of capturing an image of the location, in some examples, the second image is captured before or after the first set of cameras is established, in some examples, the second image is captured simultaneously with the first image (e.g., substantially simultaneously, such as indicating that the first and second cameras are captured), in some examples, receiving the second image includes accessing the memory location using the expected location or a location identified in a message received from the second camera).

At 530, the method 500 includes calculating a first depth of the location based on the first image and the second image according to a determination that the first image and the second image have sufficient feature correlation for calculating a depth of the location (in some examples, there is no determination that the third image and the fourth image have sufficient feature correlation when the first depth is calculated based on the first image and the second image; in some examples, the device determines that the first image and the second image have sufficient feature correlation for calculating a depth of the location; in some examples, a second device different from the device determines that the first image and the second image have sufficient feature correlation for calculating a depth of the location; in some examples, determining that the first image and the second image have sufficient feature correlation includes determining that a threshold number of features can be identified in the two images; in some examples, determining that the first image and the second image have sufficient feature correlation includes determining that the calculated depth is similar to an expected depth, such as compared to a surrounding depth or at a different (e.g., prior) time, in some examples, determining that the first image and the second image have sufficient feature correlation based on at least one of the two images, and then a threshold number of features can be uniquely identified in addition to the first image and the second image, or at least two examples, based on at least two images. In some examples, the method 500 further includes calculating the first depth of the location is not based on the image captured by the third camera (in some examples, the feature correlation of the first depth is not based on the image captured by the third camera, although the calibration of the first image and/or the second image is performed based on the image captured by the third camera).

At 540, method 500 includes determining that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for computing a depth of the location based on a determination that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for computing a depth of the location, in some examples, a second device different from the device determines that the first image and the second image do not have sufficient feature correlation for computing a depth of the location, in some examples, determining that the first image and the second image do not have sufficient feature correlation includes determining that a threshold number of features cannot be identified in the two images, in some examples, determining that the first image and the second image do not have sufficient feature correlation includes determining that the depth computed for the location is different from an expected depth, such as compared to a surrounding depth or a depth computed for the location at a different (e.g., previous) time. In some examples, determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes identifying features (e.g., objects or portions of objects, such as edges) in the first image that are not included in the second image. In some examples, determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes identifying a fault in the first image (in some examples, the fault is a lens flash or an electrical/software fault). In some examples, the determination that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes a determination that a threshold number of features in the first image are not included in the second image, and wherein the threshold number is at least two (in some examples, a determination that one or more features included in the first image are not included in the second image and one or more features in the second image are not included in the second image). In some examples, the determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location includes dividing the first image into a plurality of portions and determining that the first image and the second image do not have sufficient feature correlation for computing the depth of the location based on determining that a threshold number of the plurality of portions do not have sufficient feature correlation based on determining that a first portion of the plurality of portions does not have sufficient feature correlation based on a determination that a second portion of the plurality of portions does not have sufficient feature correlation based on the determination that the first portion is different from the second portion (in some examples, the second portion does not overlap with the first portion).

At 550, the method 500 includes calculating a second depth of the location based on the third image and the fourth image based on a determination that the third image and the fourth image have sufficient feature correlation for calculating the depth of the location (in some examples, the device determines that the third image and the fourth image have sufficient feature correlation for calculating the depth of the location; in some examples, a second device different from the device determines that the third image and the fourth image have sufficient feature correlation for calculating the depth of the location; in some examples, determining that the third image and the fourth image have sufficient feature correlation includes determining that a threshold number of features can be identified in the two images; in some examples, determining that the third image and the fourth image have sufficient feature correlation includes determining that the calculated depth for the location is similar to an expected depth, such as compared to a surrounding depth or a depth calculated for the location at a different (e.g., prior) time, wherein the third image is captured by the third camera (in some examples, the third camera is in an inactive state (e.g., low power mode; in some examples, the third camera is in an inactive state, Closing, Images are captured less frequently or at a lower resolution), while the first group of camera pairs are active; in some examples, the third camera is in an active state (e.g., similar to the first camera and the second camera, but not used to calculate the depth of the location), the third camera is different from the first camera, the third camera is different from the second camera, the fourth image is captured by the fourth camera (in some examples, the fourth camera is the first camera or the second camera; in some examples, the fourth camera is different from the first camera and the second camera), the fourth camera is different from the third camera, and the third camera and the fourth camera are established as a second set of cameras (in some examples, the device includes the third camera and the fourth camera; in some examples, the third image is captured by the device by accessing the memory location using the expected location or a location identified in a message received from the third camera; in some examples, the device establishes the second set of cameras for calculating the depth of the location prior to receiving the third image; in some examples, the second camera and the second camera can be established in addition to the second camera and/or the fourth camera, the second camera can be established in some examples, the second camera can be established in addition to the second camera and the fourth camera, the fourth camera can be established in some examples, the fourth camera can be established in addition to the second camera and the fourth camera can be established in some examples, the fourth image is captured simultaneously with the third image (e.g., substantially simultaneously, such as indicating that the third camera and the fourth camera are capturing images simultaneously), in some examples, receiving the fourth image includes accessing the memory location using an expected location or a location identified in a message received from the fourth camera, in some examples, establishing the second set of cameras after determining that the first image and the second image do not have sufficient feature correlation, in some examples, establishing the second set of cameras in response to determining that a depth of the location is required, in some examples, capturing the third image before or after establishing the second set of cameras, in some examples, capturing the third image simultaneously with the first image and/or the second image (e.g., substantially simultaneously, such as indicating that all three cameras are capturing images simultaneously), in some examples, receiving the third image includes accessing the memory location using an expected location or a location identified in a message received from the third camera. In some examples, the method 500 further includes calculating a second depth of the location based further on the determination of selecting a second set of cameras from the plurality of different sets of cameras (in some examples, the plurality of different sets of cameras do not include the first set of cameras). In some examples, the determination to select the second set of cameras is based on images captured by the third camera while the third camera is in a lower power mode (in some examples, the lower power mode is an off mode) than the cameras in the first set of cameras. In some examples, the determination to select the second set of cameras is based on a priority order of the set of cameras, and wherein the priority order is established prior to receiving the first image. In some examples, the third camera is in a standby mode (e.g., a low power or off mode) when a determination is made that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location. In some examples, the image captured by the third camera is used to calibrate a feature correlation between the first image and the second image. In some examples, the third camera is the first camera. In some examples, calculating the second depth based on the third image and the fourth image is based on a determination that the first image and the second image do not have sufficient feature correlation (e.g., lack feature correlation) for calculating the depth of the location.

In some examples, the method 500 further includes calculating a third depth of the location based on the determination that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for calculating the depth of the location, and based on the determination that the fifth image and the sixth image do not have sufficient feature correlation (in some examples, the device determines that the fifth image and the sixth image do not have sufficient feature correlation(s) for calculating the depth of the location), wherein the fifth image is captured by the fifth camera (in some examples, the fifth camera is in an inactive state (e.g., a low power mode, off, less frequent image capture, or image capture at a lower resolution), and the first set of cameras and/or the second set of cameras are based on the determination that the fifth image and the sixth image have sufficient feature correlation (in some examples, the second device determines that the fifth image and the sixth image have sufficient feature correlation for calculating the depth of the location), wherein the fifth image is captured by the fifth camera (in some examples, the fifth camera is in an inactive state (e.g., a low power mode, off, less frequent image capture or image at a lower resolution), the fifth camera is in a similar state to the fifth camera, the fifth camera is in some examples, the fourth camera is in a second camera is in an inactive state, the fourth camera is in a similar state to the fourth camera, and/a fourth camera is in a similar state in the active state, the sixth image is not captured by the fifth camera, and the fifth camera and the sixth camera are established as a third set of cameras (in some examples, the third set of cameras is different from the first set of cameras and the second set of cameras) for calculating the depth of the location.

In some examples, the method 500 further includes calculating a fourth depth of the location based on the determination that the first image and the second image do not have sufficient feature correlation (e.g., lack of feature correlation) for calculating the depth of the location and the determination that the seventh image and the eighth image have sufficient feature correlation for calculating the depth of the location based on the seventh image and the eighth image (in some examples, the device determines that the seventh image and the eighth image have sufficient feature correlation for calculating the depth of the location; in some examples, a second device different from the device determines that the seventh image and the eighth image have sufficient feature correlation for calculating the depth of the location), wherein the seventh image is captured by the seventh camera (in some examples, the seventh camera is in an inactive state (e.g., a low power mode, off, capturing images less frequently, or capturing images at a lower resolution; in some examples, the seventh camera is in an active state (e.g., similar to the first camera and the second camera; in some examples, the seventh camera is in an active state; the fourth camera is not used for calculating the depth of the location; in some examples, the seventh camera is in the fourth camera is in the inactive state (in some examples, the fourth camera is in the fourth camera is not captured by the fourth camera).

In some examples, the method 500 further includes calculating a depth of the location based on the image captured by the first camera and the image captured by the second camera (in some examples, the first set of cameras is not used until the cause of the lack of feature correlation with respect to the first set of cameras is determined to have been resolved; in some examples, the calculation is for a new location; in some examples, the calculation is for the same location as before) based on the determination of the image (such as the image from the camera of the first set of cameras or the image from the camera not included in the first set of cameras).

In some examples, the techniques described above are performed in a system for computing depth of a location. In such examples, the system includes a first camera, a second camera, a third camera, and a fourth camera, wherein the first camera and the second camera are on a first axis, and wherein the third camera and the fourth camera are on a second axis that is different from the first axis (in some examples, the first axis and the second axis are parallel; in some examples, the first camera and the third camera are on a third axis, the second camera and the fourth camera are on a fourth axis, and the third axis is parallel to the fourth axis; in some examples, the first axis and the third axis are perpendicular).

It is noted that the details of the process described below with respect to method 600 (i.e., fig. 6) also apply in a similar manner to method 500 of fig. 5. For example, method 500 optionally includes one or more of the features of the various methods described below with reference to method 600. For example, computing the second depth of the location according to the method 500 may further include a determination from the representation (e.g., three-dimensional representation) that sufficient data is available for the location within the representation.

Fig. 6 is a flow chart illustrating a method 600 for obtaining sufficient data about a physical environment. Some operations in method 600 may optionally be combined, the order of some operations may optionally be changed, and some operations may optionally be omitted.

In some examples, the method 600 is performed at a computing system (e.g., computing system 100) in communication with a camera (e.g., a camera of the cameras 310, the first camera 410, or the second camera 460). In some examples, the computing system and camera are included in a device (e.g., device 200 or device 300). In some examples, the device includes one or more actuators and/or one or more sensors other than a camera. In some examples, the camera is connected to one or more processors of the device via at least one or more wires, in some examples, the camera is wirelessly connected to the one or more processors of the device, in some examples, the one or more processors are included in separate components of the device from the camera, in some examples, the one or more processors are included in the camera, in some examples, the plurality of processors of the device perform the method, wherein at least one step is performed by the one or more processors on the first system on a chip (i.e., soC), and a second step is performed by the second SoC, and wherein the first SoC and the second SoC are distributed in different locations on the device, wherein the different locations are separated by at least 12 inches. In some examples, method 600 is performed while a device including a camera performs operations (such as navigating in a physical environment).

At 610, method 600 includes receiving a representation (e.g., a world representation, a virtual representation, an object view representation, a three-dimensional representation) of a physical environment (in some examples, the representation is not an image) wherein the representation is generated based on a first set of one or more images (e.g., a representation of a physical environment having one or more color channels (e.g., red, green, and blue channels)) captured by a first set of one or more cameras (e.g., camera 310, first camera 410, or first camera 450) (in some examples, the representation is generated based on one or more depth calculations, one or more lidar images, one or more depth maps, or any combination thereof; in some examples, the representation includes a representation of an object that has been identified as including one or more characteristics associated with a particular type of object; in some examples, the device includes the first set of one or more cameras). In some examples, the representation is generated based on a depth map (in some examples, the depth map includes information related to distances of surfaces on objects in the physical environment). In some examples, the depth map is generated using images captured by a set of one or more cameras (in some examples, the depth map is generated using lidar data; in some examples, the depth map is generated by feature correlation between the images).

At 620 and 630, method 600 includes, based on the determination that the representation does not include sufficient data for a location within the representation (in some examples, the determination that the representation does not include sufficient data for the location within the representation includes the representation does not include (1) any data about the location, (2) sufficient information to categorize objects located at the location, (3) depth calculations for the location, (4) sufficient depth calculations (e.g., current depth calculations have been determined to be incorrect), or any combination thereof), and, based on the second set of one or more cameras being able to capture images (e.g., one or more images from each of the one or more cameras) to obtain a determination of sufficient data for the location, sending instructions to use the second set of one or more cameras (in some examples, the instructions cause use of one or more images in a field of view corresponding to the location of the representation; in some examples, in accordance with a determination that the second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, the operation (e.g., the operation mentioned in paragraph [0089], such as navigating a physical environment) is undone before the representation is updated based on the images from the second set of one or more cameras, which in some examples include a plurality of cameras, with instructions to each camera to capture an image, wherein the second set of one or more cameras is different from the first set of one or more cameras (in some examples, the second set of one or more cameras includes at least one camera not included in the first set of one or more cameras, and in some examples, the second set of one or more cameras includes at least one camera included in the first set of one or more cameras. In some examples, the determination that the representation does not include sufficient data for a location within the representation includes performing a geospatial search of the representation (in some examples, the geospatial search includes identifying an area of the physical environment to which the device is moving; in some examples, the determination that the representation does not include sufficient data for a location within the representation includes a determination that the representation does not include depth information for the location). In some examples, the second set of one or more cameras consists of two cameras (in some examples, the first set of one or more cameras consists of two cameras). In some examples, when the representation is determined to not include sufficient data for a location within the representation, the cameras in the second set of one or more cameras are in a standby mode (in some examples, the standby mode is a low power mode or off).

At 640, method 600 includes, in accordance with a determination that the representation includes sufficient data for a location within the representation, forgoing sending an instruction to use the second set of one or more cameras (in some examples, initiating an operation to navigate based on the representation in addition to foregoing sending the instruction).

In some examples, method 600 further includes, after sending the instruction to use the second set of one or more cameras, sending a navigation instruction (in some examples, the navigation instruction identifying a path to be taken by the device; in some examples, the navigation instruction identifying a driving characteristic (e.g., speed) of the device based on an updated representation of the physical environment, wherein the updated representation is generated based on one or more images captured by the second set of one or more cameras (in some examples, the updated representation is generated based on images from the first set of one or more cameras), and, based on a determination that the representation includes sufficient data for a location within the representation, sending an instruction to navigate based on the representation of the physical environment (i.e., in this branch, no instruction will be sent to the second set of one or more cameras to capture the one or more images).

In some examples, the method 600 further includes, prior to making the determination that the representation does not include sufficient data for a location within the representation, sending an instruction to change the camera to a second mode (e.g., ready, active, or higher power mode) different from the standby mode based on a determination that the second set of one or more cameras will need to update the representation (in some examples, the determination that the second set of one or more cameras will need to update the representation is based on physical environment (1) crowding, (2) having specific weather conditions, or (3) a determination that the device is about to change due to navigation) based on a determination that the camera in the second set of one or more cameras is in the standby mode (in some examples, the standby mode is a low power mode or off).

In some examples, method 600 further includes discarding sending instructions using the second set of one or more cameras based on the determination based on the current navigation context (in some examples, the current navigation context is a speed of travel or a direction of travel) based on the determination that the representation does not include sufficient data for a location within the representation.

In some examples, the instructions include a request to capture one or more images in a first mode (e.g., a higher power mode or an active mode), and wherein the method 600 further includes, at the device, prior to transmitting the instructions, transmitting a second instruction to capture one or more images using a second set of one or more cameras in a second mode (e.g., a lower power mode, such as capturing images at a lower resolution and/or capturing images less frequently) different from the first mode, wherein the one or more images captured in the second mode are used to determine to transmit the instruction.

In some examples, the method 600 further includes sending an instruction to change the movement characteristics (e.g., reduce the speed) in accordance with a determination that the updated representation of the physical environment does not include sufficient data for a location within the updated representation, wherein the updated representation is generated based on one or more images captured by the second set of one or more cameras, and wherein the location within the updated representation corresponds to the location within the representation (in some examples, the location within the representation is the same as the location within the updated representation).

It is noted that the details of the process described above with respect to method 500 (i.e., fig. 5) also apply in a similar manner to method 600 of fig. 6. For example, method 600 optionally includes one or more of the features of the various methods described above with reference to method 500. For example, the depth calculated in method 500 may be data included within the representation.

The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the technology and its practical application. Those skilled in the art will be able to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.

While the present disclosure and examples have been fully described with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. It should be understood that such variations and modifications are considered to be included within the scope of the disclosure and examples as defined by the claims.

As described above, one aspect of the present technology is to collect and use data available from various sources to improve certain information about the physical environment. The present disclosure contemplates that in some examples, the collected data may include personal information data that uniquely identifies or may be used to identify a particular person and/or a particular location. Such personal information data may include an image of a person, an image of data related to a person, an image of a location, or any other identification or personal information.

The present disclosure contemplates that entities responsible for the collection, analysis, disclosure, transmission, storage, or other use of such personal information data will adhere to sophisticated privacy policies and/or privacy measures. In particular, such entities should exercise and adhere to privacy policies and practices that are recognized as meeting or exceeding industry or government requirements for maintaining the privacy and security of personal information data. Such policies should be convenient for the user to access and should be updated as the collection and/or use of data changes. Personal information from users should be collected for legitimate and reasonable physical uses and must not be shared or sold outside of these legitimate uses. In addition, policies and practices should be adapted to the particular type of personal information data collected and/or accessed, and to applicable laws and standards including consideration of particular jurisdictions. Thus, different privacy measures may be maintained for different personal data types in each country.

Furthermore, it is intended that personal information data should be managed and processed in a manner that minimizes the risk of inadvertent or unauthorized access or use. Once the data is no longer needed, risk can be minimized by limiting the collection and deletion of data.

Claims

1. A method for obtaining sufficient data to make a decision regarding a physical environment, the method comprising:

at the device:

receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras;

in accordance with a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and

In accordance with a determination that the representation includes sufficient data for the location within the representation, instructions to use the second set of one or more cameras are relinquished from transmission.

2. The method of claim 1, the method further comprising:

at the apparatus:

After sending the instructions to use the second set of one or more cameras, sending instructions to navigate based on an updated representation of the physical environment, wherein the updated representation is generated based on one or more images captured by the second set of one or more cameras, and

Instructions are sent to navigate based on the representation of the physical environment in accordance with a determination that the representation includes sufficient data for the location within the representation.

3. The method of any of claims 1-2, wherein the representation is generated based on a depth map.

4. A method according to claim 3, wherein the depth map is generated using images captured by a set of one or more cameras.

5. The method of any of claims 1-4, wherein the second set of one or more cameras consists of two cameras.

6. The method of any of claims 1-5, wherein the determination that the representation does not include sufficient data for a location within the representation includes performing a geospatial search of the representation.

7. The method of any of claims 1-6, wherein a camera of the second set of one or more cameras is in a standby mode when the determination is made that the representation does not include sufficient data for a location within the representation.

8. The method of any one of claims 1 to 7, further comprising:

at the apparatus:

Before making the determination that the representation does not include sufficient data for a location within the representation:

In accordance with a determination that a camera of the second set of one or more cameras is in a standby mode and will be required to update the representation, an instruction to change the camera to a second mode different from the standby mode is sent.

9. The method of any one of claims 1 to 8, the method further comprising:

at the apparatus:

In accordance with the determination that the representation does not include sufficient data for the location within the representation, and in accordance with a determination based on a current navigation context, instructions to send using the second set of one or more cameras are relinquished.

10. The method of any of claims 1-9, wherein the instruction comprises a request to capture the one or more images in a first mode, and wherein the method further comprises:

at the apparatus:

before transmitting the instructions, transmitting second instructions to capture one or more images using the second set of one or more cameras in a second mode different from the first mode, wherein the one or more images captured in the second mode are used to determine to transmit the instructions.

11. The method of any one of claims 1 to 10, the method further comprising:

at the device:

in accordance with a determination that an updated representation of the physical environment does not include sufficient data for a location within the updated representation, instructions to change movement characteristics are sent, wherein the updated representation is generated based on the one or more images captured by the second set of one or more cameras, and wherein the location within the updated representation corresponds to the location within the representation.

12. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 1-11.

13. An apparatus, the apparatus comprising:

One or more processors, and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-11.

14. An apparatus, the apparatus comprising:

apparatus for performing the method of any one of claims 1 to 11.

15. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 1-11.

16. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for:

17. An apparatus, the apparatus comprising:

A first camera;

a second camera, the second camera being different from the first camera;

A third camera, the third camera being different from the first camera and the second camera;

A fourth camera, the fourth camera being different from the third camera, the second camera, and the first camera, wherein the first camera and the second camera are on a first axis, and wherein the third camera and the fourth camera are on a second axis that is different from the first axis;

One or more processors, and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for:

receiving a representation of a physical environment, wherein the representation is generated based on a first set of one or more images captured by a first set of one or more cameras, and wherein the first set includes the first camera and the second camera;

According to a determination that the representation does not include sufficient data for a location within the representation and a second set of one or more cameras is capable of capturing images to obtain sufficient data for the location, sending instructions to use the second set of one or more cameras, wherein the second set of one or more cameras is different from the first set of one or more cameras, and wherein the second set includes the third camera and the fourth camera, and

18. An apparatus, the apparatus comprising:

One or more processors, and

19. An apparatus, the apparatus comprising:

20. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for:

21. A method for obtaining sufficient data to make a decision regarding a physical environment, the method comprising:

at the device:

receiving a first image captured by a first camera;

Receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;

Calculating a first depth of the location based on the first image and the second image based on a determination that the first image and the second image have sufficient feature correlation for calculating the depth of the location, and

According to a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location and that third and fourth images have sufficient feature correlation for calculating the depth of the location, a second depth of the location is calculated based on the third and fourth images, wherein:

The third image is captured by a third camera,

The third camera is different from the first camera,

The third camera is different from the second camera,

The fourth image is captured by a fourth camera,

The fourth camera is different from the third camera, and

The third camera and the fourth camera are established as a second set of cameras for calculating the depth of the position.

22. The method of claim 21, wherein calculating the first depth of the location is not based on an image captured by the third camera.

23. The method of any one of claims 21 to 22, the method further comprising:

at the apparatus:

Calculating a third depth of the location based on a determination that the first and second images do not have sufficient feature correlation for calculating the depth of the location, the third and fourth images do not have sufficient feature correlation for calculating the depth of the location, and a fifth and sixth image have sufficient feature correlation for calculating the depth of the location, wherein:

The fifth image is captured by a fifth camera,

The fifth camera is different from each of the first camera, the second camera, the third camera and the fourth camera,

The sixth image is captured by a sixth camera,

The sixth image is not captured by the fifth camera, and

The fifth camera and the sixth camera are established as a third set of cameras for calculating the depth of the position.

24. The method of any one of claims 21 to 23, the method further comprising:

at the apparatus:

According to a determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location and a seventh image and an eighth image have sufficient feature correlation for calculating the depth of the location, a fourth depth of the location is calculated based on the seventh image and the eighth image, wherein the seventh image is captured by a seventh camera, and wherein the eighth image is captured by the seventh camera.

25. The method of any of claims 21-24, wherein calculating the second depth of the location is further based on a determination of selecting the second set of cameras from a plurality of different sets of cameras.

26. The method of claim 25, wherein the determination to select the second set of cameras is based on images captured by the third camera when the third camera is in a lower power mode than cameras in the first set of cameras.

27. The method of any of claims 25-26, wherein the determination of selecting the second set of cameras is based on a priority order of the set of cameras, and wherein the priority order is established prior to receiving the first image.

28. The method of any one of claims 21 to 27, the method further comprising:

at the apparatus:

In accordance with a determination that the cause of lack of feature correlation with respect to the first set of cameras has been resolved, a depth of a location is calculated based on an image captured by the first camera and an image captured by the second camera.

29. The method of any of claims 21-28, wherein the determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location includes identifying features in the first image that are not included in the second image.

30. The method of any of claims 21-29, wherein the determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location comprises identifying a fault in the first image.

31. The method of any of claims 21-30, wherein the determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location comprises a determination that a threshold number of features in the first image are not included in the second image, and wherein the threshold number is at least two.

32. The method of any of claims 21-31, wherein the determination that the first image and the second image do not have sufficient feature correlation for computing the depth of the location comprises:

Dividing the first image into a plurality of portions, and

In accordance with a determination that a first portion of the plurality of portions does not have sufficient feature correlation and a second portion of the plurality of portions does not have sufficient feature correlation, determining that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location based on determining that a threshold number of the plurality of portions do not have sufficient feature correlation, wherein the first portion is different from the second portion.

33. The method of any of claims 21-32, wherein the third camera is in a standby mode when the determination is made that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location.

34. The method of any of claims 21 to 33, wherein the image captured by the third camera is used to calibrate a feature correlation between the first image and the second image.

35. The method of any of claims 21-34, wherein the third camera is the first camera.

36. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 21-35.

37. An apparatus, the apparatus comprising:

One or more processors, and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 21-35.

38. An apparatus, the apparatus comprising:

apparatus for performing the method of any one of claims 21 to 35.

39. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for performing the method of any of claims 21-35.

40. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a device, the one or more programs comprising instructions for:

receiving a first image captured by a first camera;

The third image is captured by a third camera,

The third camera is different from the first camera,

The third camera is different from the second camera,

The fourth image is captured by a fourth camera,

The fourth camera is different from the third camera, and

41. An apparatus, the apparatus comprising:

A first camera;

a second camera, the second camera being different from the first camera;

One or more processors, and

receiving a first image captured by the first camera;

receiving a second image captured by the second camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating a depth of a location;

The third image is captured by the third camera,

The fourth image is captured by the fourth camera, and

42. An apparatus, the apparatus comprising:

One or more processors, and

receiving a first image captured by a first camera;

The third image is captured by a third camera,

The third camera is different from the first camera,

The third camera is different from the second camera,

The fourth image is captured by a fourth camera,

The fourth camera is different from the third camera, and

43. An apparatus, the apparatus comprising:

Means for receiving a first image captured by a first camera;

Means for receiving a second image captured by a second camera, wherein the second camera is different from the first camera, and wherein the first camera and the second camera are established as a first set of cameras for calculating depth of a location;

Means for computing a first depth of the location based on the first image and the second image, from a determination that the first image and the second image have sufficient feature correlation for computing the depth of the location, and

Means for calculating a second depth of the location based on a determination that the first image and the second image do not have sufficient feature correlation for calculating the depth of the location and that a third image and a fourth image have sufficient feature correlation for calculating the depth of the location, wherein:

The third image is captured by a third camera,

The third camera is different from the first camera,

The third camera is different from the second camera,

The fourth image is captured by a fourth camera,

The fourth camera is different from the third camera, and

44. A computer program product comprising one or more programs configured to be executed by one or more processors of a device, the one or more programs comprising instructions for:

receiving a first image captured by a first camera;

The third image is captured by a third camera,

The third camera is different from the first camera,

The third camera is different from the second camera,

The fourth image is captured by a fourth camera,

The fourth camera is different from the third camera, and