CN109983784A

CN109983784A - Information processing unit, methods and procedures

Info

Publication number: CN109983784A
Application number: CN201780069477.7A
Authority: CN
Inventors: 望月大介
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2016-11-16
Filing date: 2017-10-17
Publication date: 2019-07-05
Anticipated expiration: 2037-10-17
Also published as: JP2018082308A; CN109983784B; WO2018092486A1; US20200053501A1; US10986458B2; EP3544320A1; EP3544320A4

Abstract

A kind of information processing unit is provided with Activity recognition unit, voice data selecting unit and acoustic information generation unit, wherein the Activity recognition unit is configured to identify the behavior pattern of user based on sensor information.The voice data selecting unit is configured to select voice data corresponding with the behavior pattern for the user that Activity recognition unit is identified.The acoustic information generation unit generates multichannel audio information, which positions the acoustic image of sound source in the real space around user based on the voice data selected by voice data selecting unit.The information processing unit allows to obtain the augmented reality for the variation for following the matter in user behavior.

Description

Information processing apparatus, method and program

技术领域technical field

本技术涉及增强现实技术的质量的改进。The present technology relates to improvements in the quality of augmented reality technology.

背景技术Background technique

在可穿戴计算技术领域中，已知以下技术：当穿戴可穿戴计算机的用户移动时，通过可穿戴计算机中包括的传感器装置来估计用户的空间位移的量(例如，参见专利文献1)。In the wearable computing technology field, a technology is known that estimates the amount of spatial displacement of the user by a sensor device included in the wearable computer when the user wearing the wearable computer moves (for example, see Patent Document 1).

专利文献1也公开了关于多声道音频信息的生成的技术。专利文献1中所描述的技术是合成音频，使得声音可以被感知为如同该声音是从一空间位置发出的一样，并且特别是合成音频使得声音可以被感知为即使在用户改变位置或方向时发出该声音的空间位置也不改变一样。Patent Document 1 also discloses a technique regarding the generation of multi-channel audio information. The technique described in Patent Document 1 is to synthesize audio so that a sound can be perceived as if the sound is emitted from a spatial position, and in particular to synthesize audio so that the sound can be perceived as emitted even when the user changes the position or direction The spatial position of the sound also does not change.

专利文献2公开了以下技术：当人做动作时，通过使用另外人的过去真实动作的信息来显示与该另外人的过去动作的信息相关的虚拟对象。专利文献2公开的应用示例示出了以下示例：在跑步期间在眼镜式显示装置上显示跑相同路线的其他人的跑步图像。Patent Document 2 discloses a technique of displaying a virtual object related to the information of the past action of the other person by using the information of the past real action of the other person when the person performs an action. The application example disclosed in Patent Document 2 shows an example in which running images of other people running the same route are displayed on the glasses-type display device during running.

引用列表Citation List

专利文献Patent Literature

专利文献1：日本专利申请特许公开第2013-005021号Patent Document 1: Japanese Patent Application Laid-Open No. 2013-005021

专利文献2：日本专利申请特许公开第2013-167941号Patent Document 2: Japanese Patent Application Laid-Open No. 2013-167941

发明内容SUMMARY OF THE INVENTION

技术问题technical problem

在通过使用可穿戴计算机向用户提供增强现实的技术领域中，期望提供更逼真的呈现。然而，上述示例并没有关注用户动作的内容来提供增强现实。例如，即使当动作的内容从“行走”动作变化为“跑步”动作或从“轻疲劳度”运动变化为“重疲劳度”运动时，没有执行遵循动作中的质的变化的输出。In the technical field of providing augmented reality to users through the use of wearable computers, it is desirable to provide a more realistic presentation. However, the above examples do not focus on the content of user actions to provide augmented reality. For example, even when the content of the motion is changed from a "walking" motion to a "running" motion or from a "light fatigue" motion to a "heavy fatigue" motion, an output following a qualitative change in the motion is not performed.

鉴于上述情况，本技术的目的是提供一种使得可以实现遵循用户动作中的质的变化的增强现实的信息处理装置。In view of the above circumstances, an object of the present technology is to provide an information processing apparatus that makes it possible to realize augmented reality that follows qualitative changes in user actions.

问题的解决方案solution to the problem

为了实现上述目的，根据本技术的一方面的信息处理装置包括：动作识别单元；音频数据选择单元；以及音频信息生成单元。To achieve the above object, an information processing apparatus according to an aspect of the present technology includes: an action recognition unit; an audio data selection unit; and an audio information generation unit.

动作识别单元被配置成基于传感器信息识别用户的动作模式。The motion recognition unit is configured to recognize a motion pattern of the user based on the sensor information.

音频数据选择单元被配置成选择与动作识别单元所识别的用户的动作模式对应的音频数据。The audio data selection unit is configured to select audio data corresponding to the motion pattern of the user recognized by the motion recognition unit.

音频信息生成单元，其基于由音频数据选择单元选择的音频数据来生成用于在用户周围的真实空间中对声源的声像进行定位的多声道音频信息。An audio information generation unit that generates multi-channel audio information for localizing a sound image of a sound source in a real space around a user based on the audio data selected by the audio data selection unit.

按照根据本技术的一方面的信息处理装置，可以向用户提供遵循用户动作中的质的变化的增强现实。According to the information processing apparatus according to an aspect of the present technology, it is possible to provide the user with augmented reality that follows qualitative changes in the user's actions.

音频数据选择单元被配置成选择音频数据作为从要放置在真实空间中的虚拟对象发出的音频。The audio data selection unit is configured to select audio data as audio emitted from a virtual object to be placed in the real space.

在这种情况下，音频信息生成单元可以被配置成通过生成多声道音频信息执行声像定位，虚拟对象通过声像定位被放置在声源的位置处。In this case, the audio information generating unit may be configured to perform sound image localization by generating multi-channel audio information, and the virtual object is placed at the position of the sound source by the sound image localization.

音频数据选择单元可以被配置成，在作为动作识别单元的识别结果，要选择的音频数据被改变时，选择与从变化前的音频数据到变化后的音频数据的音频数据切换模式对应的音频数据以及变化后的音频数据。The audio data selection unit may be configured to, when audio data to be selected is changed as a recognition result of the action recognition unit, select audio data corresponding to an audio data switching pattern from the audio data before the change to the audio data after the change and the changed audio data.

音频数据选择单元可以被配置成，在作为动作识别单元的识别结果，要选择的音频数据被改变时，在存在与用户的动作模式对应的多条音频数据的情况下选择与和虚拟对象关联的信息相匹配的音频数据。The audio data selection unit may be configured to, when audio data to be selected is changed as a recognition result of the motion recognition unit, select the audio data associated with the virtual object in the presence of a plurality of pieces of audio data corresponding to the motion pattern of the user. information to match the audio data.

信息处理装置还包括位移计算单元，其基于传感器信息输出包括用户的位置的相对变化的用户位移。The information processing apparatus further includes a displacement calculation unit that outputs a user displacement including a relative change in the user's position based on the sensor information.

音频信息生成单元可以被配置成，基于由位移计算单元输出的用户位移来调制由音频数据选择单元选择的音频数据，从而生成多声道音频信息。The audio information generation unit may be configured to modulate the audio data selected by the audio data selection unit based on the user displacement output by the displacement calculation unit, thereby generating the multi-channel audio information.

音频信息生成单元可以被配置成调制由音频数据选择单元选择的音频数据，使得声像通过多声道音频信息而定位的声源被放置在遵循位移计算单元输出的用户位移的位置处，从而生成多声道音频信息。The audio information generation unit may be configured to modulate the audio data selected by the audio data selection unit such that a sound source whose sound image is located by the multi-channel audio information is placed at a position following the user displacement output by the displacement calculation unit, thereby generating Multichannel audio information.

音频信息生成单元可以被配置成生成多声道音频信息，使得声像通过多声道音频信息而定位的声源以一时延来遵循空间中的位置，该位置从由用户位移标识的用户的位置开始。The audio information generating unit may be configured to generate the multi-channel audio information such that the sound source positioned by the sound image by the multi-channel audio information follows a position in space with a time delay from the position of the user identified by the user displacement start.

音频信息生成单元可以被配置成，基于由位移计算单元输出的用户位移以及从外部获得的包括建筑物的位置坐标的地图信息来生成多声道音频信息，使得虚拟对象不被放置在地图信息中包括的建筑物的位置坐标的范围中。The audio information generating unit may be configured to generate the multi-channel audio information based on the user's displacement output by the displacement calculating unit and the map information including the position coordinates of the building obtained from the outside so that the virtual object is not placed in the map information Included in the range of the location coordinates of the building.

音频信息生成单元可以被配置成，在地图信息中包括的建筑物的位置坐标的范围与虚拟对象被放置的位置交叠的情况下，生成包括碰撞声的多声道音频信息。The audio information generating unit may be configured to generate multi-channel audio information including the collision sound in a case where the range of the position coordinates of the building included in the map information overlaps with the position where the virtual object is placed.

信息处理装置还包括状态分析单元，其被配置成分析用户的状态，所述状态能够根据传感器信息和动作识别单元所识别的用户的动作模式中的一个而变化。The information processing apparatus further includes a state analysis unit configured to analyze a state of the user that can be changed according to one of the sensor information and the motion pattern of the user recognized by the motion recognition unit.

音频数据选择单元可以被配置成选择与用户的动作模式对应的音频数据以及与状态分析单元所分析的用户的状态对应的音频数据。The audio data selection unit may be configured to select audio data corresponding to the action pattern of the user and audio data corresponding to the state of the user analyzed by the state analysis unit.

音频信息生成单元可以被配置成将音频数据选择单元所选择的与用户的动作模式对应的音频数据与对应于用户的状态的音频数据进行合成，从而基于合成的音频数据生成多声道音频信息。The audio information generating unit may be configured to synthesize the audio data corresponding to the action pattern of the user selected by the audio data selecting unit and the audio data corresponding to the state of the user, thereby generating multi-channel audio information based on the synthesized audio data.

状态分析单元可以被配置成根据传感器信息和动作识别单元所识别的用户的动作模式中的一个来分配每单位时间的疲劳度，并且积累所分配的每单位时间的疲劳度，从而计算疲劳度作为用户的状态。The state analysis unit may be configured to assign the fatigue level per unit time according to one of the sensor information and the action pattern of the user recognized by the action recognition unit, and to accumulate the assigned fatigue level per unit time, thereby calculating the fatigue level as User's status.

音频数据选择单元可以被配置成，在动作识别单元所识别的用户的动作模式持续超过预定阈值的情况下，选择与对应于动作识别单元所识别的用户的动作模式的音频数据不同的音频数据。The audio data selection unit may be configured to select audio data different from audio data corresponding to the motion pattern of the user recognized by the motion recognition unit if the motion pattern of the user recognized by the motion recognition unit continues to exceed a predetermined threshold.

根据本技术的另一方面的信息处理方法，包括：动作识别步骤；音频数据选择步骤；以及音频信息生成步骤。An information processing method according to another aspect of the present technology includes: an action recognition step; an audio data selection step; and an audio information generation step.

在动作识别步骤中，基于传感器信息识别用户的动作模式。In the motion recognition step, the motion pattern of the user is recognized based on sensor information.

在音频数据选择步骤中，选择与动作识别步骤所识别的用户的动作模式对应的音频数据。In the audio data selection step, audio data corresponding to the motion pattern of the user recognized by the motion recognition step is selected.

在音频信息生成步骤中，基于由音频数据选择步骤选择的音频数据，生成用于在用户周围的真实空间中对声源的声像进行定位的多声道音频信息。In the audio information generating step, based on the audio data selected by the audio data selecting step, multi-channel audio information for locating the sound image of the sound source in the real space around the user is generated.

根据本技术的又一方面的程序，其使计算机执行以下步骤：动作识别步骤；音频数据选择步骤；以及音频信息生成步骤。A program according to still another aspect of the present technology causes a computer to perform the steps of: a motion recognition step; an audio data selection step; and an audio information generation step.

本发明的有益效果The beneficial effects of the present invention

如上所述，根据本技术，可以实现遵循用户动作中的质的变化的增强现实。As described above, according to the present technology, augmented reality that follows qualitative changes in user actions can be realized.

应该注意的是，上述效果不一定是限制性的。除了上述效果之外或代替于上述效果，可以运用本说明书中所述的任意效果或可从本说明书领会的其他效果。It should be noted that the above-described effects are not necessarily limiting. In addition to or in place of the above-described effects, any of the effects described in this specification or other effects that can be understood from this specification may be employed.

附图说明Description of drawings

图1是示出作为根据本技术实施方式的信息处理装置的输出结果，向用户提供的增强现实的示例的图(部分1)。FIG. 1 is a diagram (part 1) showing an example of augmented reality provided to a user as an output result of the information processing apparatus according to the embodiment of the present technology.

图2是示出作为根据本技术实施方式的信息处理装置的输出结果，向用户提供的增强现实的示例的图(部分2)。FIG. 2 is a diagram (part 2) showing an example of augmented reality provided to a user as an output result of the information processing apparatus according to the embodiment of the present technology.

图3是示出上述信息处理装置的外部配置的示例的图。FIG. 3 is a diagram showing an example of the external configuration of the above-described information processing apparatus.

图4是示出上述信息处理装置的内部配置的示例的框图。FIG. 4 is a block diagram showing an example of the internal configuration of the above-described information processing apparatus.

图5是示出由上述信息处理装置执行的处理流程的流程图。FIG. 5 is a flowchart showing the flow of processing performed by the above-described information processing apparatus.

图6是描述上述信息处理装置的音频数据选择单元的信息处理的图。FIG. 6 is a diagram describing information processing by an audio data selection unit of the above-described information processing apparatus.

图7是描述上述信息处理装置的声像位置计算单元的信息处理的图。FIG. 7 is a diagram describing information processing by a sound image position calculation unit of the above-described information processing apparatus.

图8是示出本技术的另外的实施方式的配置的示例的框图。FIG. 8 is a block diagram showing an example of a configuration of another embodiment of the present technology.

图9是示出本技术的另外的实施方式的配置的示例的框图。FIG. 9 is a block diagram showing an example of a configuration of another embodiment of the present technology.

具体实施方式Detailed ways

在下文中，参照附图将对本技术的优选实施方式进行详细地描述。应该注意的是由相同的附图标记表示具有基本相同的功能配置的部件，并且在本说明书和附图中将省略重复的描述。Hereinafter, preferred embodiments of the present technology will be described in detail with reference to the accompanying drawings. It should be noted that components having substantially the same functional configuration are denoted by the same reference numerals, and overlapping descriptions will be omitted in this specification and the drawings.

注意，将按下述顺序给出描述。Note that the description will be given in the following order.

1.根据本技术的实施方式的信息处理装置的概述1. Outline of Information Processing Device According to Embodiment of Present Technology

2.配置2. Configuration

2-1.外部配置2-1. External configuration

2-2.内部配置2-2. Internal configuration

3.操作3. Operation

4.结论4 Conclusion

5.其他实施方式5. Other implementations

5-1.另外的实施方式15-1. Another Embodiment 1

5-2.另外的实施方式25-2. Another Embodiment 2

5-3.另外的实施方式35-3. Another Embodiment 3

5-4.另外的实施方式45-4. Additional Embodiment 4

5-5.另外的实施方式55-5. Additional Embodiment 5

5-6.另外的实施方式65-6. Additional Embodiment 6

5-7.另外的实施方式75-7. Additional Embodiment 7

6.附录6. Appendix

<1.根据本技术的实施方式的信息处理装置的概述><1. Outline of Information Processing Device According to Embodiment of the Present Technology>

图1和图2各自是示出了作为根据这个实施方式的信息处理装置1的输出结果、向用户提供的增强现实的示例的图。信息处理装置1输出多声道音频信息，在该多声道音频信息中，声像被定位成使得能够从用户周围的特定方向听到声音。通过例如调整进入左右耳朵中的每个耳朵的声音音量来执行声像定位。1 and 2 are each a diagram showing an example of augmented reality provided to a user as an output result of the information processing apparatus 1 according to this embodiment. The information processing apparatus 1 outputs multi-channel audio information in which sound images are positioned so that the sound can be heard from a specific direction around the user. Sound image localization is performed by, for example, adjusting the volume of sound entering each of the left and right ears.

图1的部分(a)示出作为虚拟对象的示例的虚拟狗在用户前方50cm处正在行走的状态。狗的脚步声和呼吸声是多声道的，并且调整进入左右耳朵的声音的音量或调整效果，从而向用户提供如图所示的增强现实。在此，改变多声道音频信息中左右音量之间的平衡，产生一种如同虚拟对象在用户左后方100cm的位置处正在行走的感觉，如图1的部分(b)中所示。Part (a) of FIG. 1 shows a state in which a virtual dog, which is an example of a virtual object, is walking 50 cm in front of the user. The dog's footsteps and breathing sounds are multi-channel, and the volume or effect of the sound entering the left and right ears is adjusted to provide the user with augmented reality as shown. Here, changing the balance between the left and right volume in the multi-channel audio information produces a feeling as if the virtual object is walking at a position 100 cm behind the user's left, as shown in part (b) of FIG. 1 .

这样的声像定位技术使用户能够一定程度地感受虚拟对象的存在。同时，如果在用户的动作发生质的变化或用户的状态发生变化时从虚拟对象发出的声音没有变化，那么这是不自然的。例如，在用户的状态从行走状态(图2的部分(a))改变为跑步状态(图2的部分(b))的情况下，如果模拟为狗的虚拟对象以与行走时的呼吸声相同的呼吸声跟随用户，那么这是不自然的。如果虚拟对象在同用户跑了很长时间之后似乎一点也不疲劳，那么这是不自然的。Such sound image localization technology enables users to feel the presence of virtual objects to a certain extent. Meanwhile, it is unnatural if the sound emitted from the virtual object does not change when the user's action changes qualitatively or the user's state changes. For example, in the case where the state of the user is changed from the walking state (part (a) of FIG. 2 ) to the running state (part (b) of FIG. 2 ), if a virtual object simulated as a dog is made to sound the same as the breathing when walking The breathing sound follows the user, then it is unnatural. If the virtual object doesn't seem to tire at all after running with the user for a long time, then this is unnatural.

鉴于上述，在如下所述的这个实施方式中，为了提供具有更高质量的增强现实，增强现实遵循用户动作的质的变化。In view of the above, in this embodiment described below, in order to provide augmented reality with higher quality, the augmented reality follows qualitative changes in user actions.

在此，质的变化包括用户动作的类型(“跑步”和“行走”等)的变化。在现有可穿戴计算中，通过例如绝对位置测量的方法，系统已经能够领会用户的动作是“正在移动”。然而，在动作从“行走”动作类型改变为“跑步”动作类型的情况下，对质的变化的跟进不充分。由于这个原因，已经存在以下可能性：提供了使用户感觉不舒适的增强现实。Here, the qualitative change includes a change in the type of the user's action ("running", "walking", etc.). In existing wearable computing, the system has been able to comprehend that the user's action is "moving" through methods such as absolute position measurement. However, in the case where the action is changed from the "walking" action type to the "running" action type, the follow-up to the qualitative change is insufficient. For this reason, there has been a possibility to provide augmented reality that makes users feel uncomfortable.

作为示例在虚拟对象被视为虚拟存在的角色的情况下，要向用户提供的增强现实需要根据角色的动作类型而改变。例如，存在以下可能性：如果角色在跑步时的脚步与角色在行走时的脚步不同(尽管两者都是“脚步”)，那么用户就感觉不舒适。As an example, in the case where a virtual object is regarded as a character that exists virtually, the augmented reality to be provided to the user needs to be changed according to the action type of the character. For example, there is a possibility that if the footsteps of the character when running are different from the footsteps of the character when walking (although both are "footsteps"), the user will feel uncomfortable.

在这个实施方式中，向用户提供通过以下方式来遵循用户动作的质的变化的增强现实：基于从传感器101输入的传感器信息识别用户的动作模式，选择与所识别的动作模式对应的音频数据，然后移位所选择的音频数据。In this embodiment, the user is provided with augmented reality that follows qualitative changes in the user's actions by recognizing the user's motion pattern based on sensor information input from the sensor 101, selecting audio data corresponding to the recognized motion pattern, The selected audio data is then shifted.

注意在下文的描述中，虚拟狗被用作虚拟对象的示例。此外，作为示例将描述通过穿戴信息处理装置1而使得用户能够与虚拟狗一起行走的应用作为整个应用。Note that in the following description, a virtual dog is used as an example of a virtual object. Furthermore, an application that enables the user to walk with the virtual dog by wearing the information processing apparatus 1 will be described as an example as the entire application.

至此已经描述了根据这个实施方式的信息处理装置1的概述。接下来，将参照图3和图4对信息处理装置1的配置进行描述。The outline of the information processing apparatus 1 according to this embodiment has been described so far. Next, the configuration of the information processing apparatus 1 will be described with reference to FIGS. 3 and 4 .

<2-1.外部配置><2-1. External configuration>

图3是示出根据这个实施方式的信息处理装置的外部配置的示例的图。如图3所示，信息处理装置1例如是颈挂式可穿戴计算机。如图3所示，颈挂式信息处理装置1整体上具有马蹄形状，并且用户通过从脖子后侧悬挂来穿戴它。FIG. 3 is a diagram showing an example of the external configuration of the information processing apparatus according to this embodiment. As shown in FIG. 3 , the information processing apparatus 1 is, for example, a neck-mounted wearable computer. As shown in FIG. 3 , the neck-mounted information processing apparatus 1 has a horseshoe shape as a whole, and the user wears it by hanging from the back side of the neck.

此外，如图3所示，信息处理装置1包括音频输出单元109和多种传感器101。音频输出单元109再现音频数据。特别地，根据这个实施方式的扬声器15再现已经对其执行了声像定位处理的虚拟对象的音频信号，这使用户感知虚拟对象仿佛该虚拟对象真实地存在于真实空间中一样。Furthermore, as shown in FIG. 3 , the information processing apparatus 1 includes an audio output unit 109 and various sensors 101 . The audio output unit 109 reproduces audio data. In particular, the speaker 15 according to this embodiment reproduces the audio signal of the virtual object on which the sound image localization process has been performed, which makes the user perceive the virtual object as if the virtual object actually exists in the real space.

<2-1.内部配置><2-1. Internal configuration>

图4是示出根据这个实施方式的信息处理装置的内部配置的示例的图。如图4所示，信息处理装置1包括作为硬件的中央处理单元(在下文中，称为CPU)100，传感器101，存储单元107，以及音频输出单元109。通过软件程序进行的信息处理，CPU 100被配置为具有图4所示的各个功能块。FIG. 4 is a diagram showing an example of the internal configuration of the information processing apparatus according to this embodiment. As shown in FIG. 4 , the information processing apparatus 1 includes a central processing unit (hereinafter, referred to as a CPU) 100 as hardware, a sensor 101 , a storage unit 107 , and an audio output unit 109 . Through information processing by a software program, the CPU 100 is configured to have the respective functional blocks shown in FIG. 4 .

传感器101被示出为信息处理装置1的各种传感器装置组的抽象层。传感器装置的具体示例包括检测纵向、水平方向和竖直方向三个方向上的加速度的加速度传感器，检测在三个方向上的绕轴速度的陀螺仪传感器，测量大气压力的大气压力传感器，以及检测地磁的方向传感器。用于在GPS(全球定位系统)、移动通信系统、或无线局域网中接收信号并且检测信息处理装置1的位置信息(在下文中，称为“绝对位置信息”)的机制可以被视为构成传感器101的一种传感器装置组。另外，传感器101的特定示例包括检测用户的脉搏和体温以及体温的上升的传感器装置，以及用于输入声音的麦克风。注意从传感器101向CPU100输入的信息将被称为传感器信息。The sensor 101 is shown as an abstraction layer for various sensor device groups of the information processing device 1 . Specific examples of the sensor device include an acceleration sensor that detects acceleration in three directions of longitudinal, horizontal, and vertical directions, a gyroscope sensor that detects axial velocity in three directions, an atmospheric pressure sensor that measures atmospheric pressure, and a sensor that detects Geomagnetic orientation sensor. A mechanism for receiving a signal in a GPS (Global Positioning System), a mobile communication system, or a wireless local area network and detecting position information of the information processing apparatus 1 (hereinafter, referred to as "absolute position information") can be regarded as constituting the sensor 101 A sensor device group. In addition, specific examples of the sensor 101 include a sensor device that detects a user's pulse and body temperature and a rise in body temperature, and a microphone for inputting sound. Note that information input from the sensor 101 to the CPU 100 will be referred to as sensor information.

存储单元107包括非易失性存储装置，例如电可擦除可编程只读存储器(EEPROM)。存储单元107存储多种类型的音频数据。The storage unit 107 includes a non-volatile storage device such as an electrically erasable programmable read only memory (EEPROM). The storage unit 107 stores various types of audio data.

音频输出单元109是具有将多声道音频信息作为声波而输出的功能的装置，该多声道音频信息用于在用户周围的真实空间中定位声源的声像，所述多声道音频信息由音频信息生成单元108生成。音频输出单元109的具体示例可以包括如图3所示形式的扬声器。The audio output unit 109 is a device having a function of outputting multi-channel audio information for locating a sound image of a sound source in a real space around the user as sound waves, the multi-channel audio information Generated by the audio information generating unit 108 . A specific example of the audio output unit 109 may include a speaker in the form shown in FIG. 3 .

CPU 100是信息处理装置1的算术处理装置。CPU 100不一定是信息处理装置1的主算术处理装置。CPU 100可以包括用于信息处理装置1的辅助装置。CPU 100执行存储在存储单元107中的软件程序或从外部下载的软件程序。因此，CPU 100被配置为包括具有下述功能的动作识别单元102，疲劳度计算单元103，音频数据选择单元104，位移计算单元105，声像位置计算单元106，以及音频信息生成单元108。The CPU 100 is an arithmetic processing device of the information processing device 1 . The CPU 100 is not necessarily the main arithmetic processing device of the information processing device 1 . The CPU 100 may include auxiliary devices for the information processing apparatus 1 . The CPU 100 executes a software program stored in the storage unit 107 or a software program downloaded from the outside. Therefore, the CPU 100 is configured to include an action recognition unit 102, a fatigue degree calculation unit 103, an audio data selection unit 104, a displacement calculation unit 105, a sound image position calculation unit 106, and an audio information generation unit 108 having the following functions.

动作识别单元102识别用户动作的类型，作为用户动作模式的示例。The action recognition unit 102 recognizes the type of user action as an example of a user action pattern.

疲劳度计算单元103是分析用户的状态并且计算用户的疲劳度的状态分析单元的示例。The fatigue degree calculation unit 103 is an example of a state analysis unit that analyzes the user's state and calculates the user's fatigue degree.

音频数据选择单元104基于由动作识别单元102识别的用户动作的类型以及由疲劳度计算单元103计算的疲劳度来选择适当的音频数据。The audio data selection unit 104 selects appropriate audio data based on the type of user action recognized by the action recognition unit 102 and the fatigue level calculated by the fatigue level calculation unit 103 .

位移计算单元105基于从传感器101输入的信息，计算在时间T0与稍后时间Tn之间发生的用户的空间位移。The displacement calculation unit 105 calculates the spatial displacement of the user that occurs between time T0 and later time Tn based on the information input from the sensor 101 .

声像位置计算单元106基于由位移计算单元105计算的用户位移，计算要叠加在真实空间中的虚拟对象应当在真实空间中定位的位置。The sound image position calculation unit 106 calculates the position where the virtual object to be superimposed in the real space should be located in the real space based on the user displacement calculated by the displacement calculation unit 105 .

音频信息生成单元108通过调制由音频数据选择单元104选择的音频数据，生成用于在用户周围的真实空间中定位声源的声像的多声道音频信息。在调制时，由声像位置计算单元106设定的声像位置被用作参数。The audio information generation unit 108 generates multi-channel audio information for locating the sound image of the sound source in the real space around the user by modulating the audio data selected by the audio data selection unit 104 . At the time of modulation, the sound image position set by the sound image position calculation unit 106 is used as a parameter.

<3.操作><3. Operation>

图5是示出根据这个实施方式的信息处理装置执行的处理流程。在下文所述的图5所示处理的描述中，除非另外指出，否则CPU 100是操作的主体。首先，CPU 100获取传感器信息并且将其输入至各个单元(S101)。FIG. 5 is a flowchart showing the processing performed by the information processing apparatus according to this embodiment. In the description of the process shown in FIG. 5 described below, unless otherwise indicated, the CPU 100 is the subject of the operation. First, the CPU 100 acquires sensor information and inputs it to each unit (S101).

接下来，动作识别单元102基于从传感器101输入的信息，识别用户的动作类型(用户的动作模式的示例)(S102)。识别动作类型的信息处理的示例包括使用通过使确定装置进行机器学习而获得的经学习确定装置的方法，其中所述确定装置使用传感器信息作为输入并且使用动作类型作为输出。可替选地，可以使用基于传感器信息中包括的加速度的变化来确定静止/行走/跑步的方法。Next, the action recognition unit 102 recognizes the user's action type (an example of the user's action pattern) based on the information input from the sensor 101 ( S102 ). Examples of information processing to identify action types include methods using a learned determination device obtained by subjecting a determination device to machine learning, wherein the determination device uses sensor information as input and the action type as output. Alternatively, a method of determining stationary/walking/running based on changes in acceleration included in sensor information may be used.

接下来，疲劳度计算单元103计算用户的疲劳度(S103)。用户的疲劳度可以是积累的参数。在这种情况下，通过例如将每单位时间的疲劳度乘以动作的持续时间来计算关于疲劳度的参数，其中基于动作识别单元102所识别的动作类型来确定每单位时间的疲劳度。疲劳度可以是随时间逐渐减少的参数。Next, the fatigue degree calculation unit 103 calculates the user's fatigue degree ( S103 ). User fatigue may be an accumulated parameter. In this case, the parameter regarding the fatigue degree is calculated by, for example, multiplying the fatigue degree per unit time determined based on the type of motion recognized by the motion recognition unit 102 by the duration of the action. Fatigue may be a parameter that gradually decreases over time.

例如，作为每单位时间的疲劳度，可以使用以下值：通过使用动作识别单元102识别动作的结果而为每个动作类型分配的值，例如，-α为静止，+β为行走，以及+γ为跑步(α、β和γ是正值，β<γ)。注意在作为动作识别的结果、用户的动作类型改变的情况下，每单位时间的疲劳度可以相应地更新。可以通过对以这种方式分配的每单位时间的疲劳度进行积分来计算疲劳度。For example, as the degree of fatigue per unit time, the following values can be used: a value assigned to each motion type by using the result of recognizing the motion using the motion recognition unit 102, for example, -α is stationary, +β is walking, and +γ for running (α, β and γ are positive, β<γ). Note that in the case where the user's action type changes as a result of action recognition, the fatigue level per unit time may be updated accordingly. The degree of fatigue can be calculated by integrating the degree of fatigue per unit time allocated in this way.

疲劳度计算单元103可以通过更简单的方法计算疲劳度，而不是使用动作识别单元102的识别结果计算疲劳度。例如，根据由传感器101捕捉的用户的步数的累积或由加速度传感器或陀螺仪传感器检测的位移而直接计算的值可以被输出作为疲劳度。可替选地，基于检测用户的脉搏和体温以及体温的上升(传感器信息的示例)的传感器装置的输出的活动量可以被输出作为疲劳度。The fatigue degree calculation unit 103 may calculate the fatigue degree by a simpler method, instead of calculating the fatigue degree using the recognition result of the action recognition unit 102 . For example, a value directly calculated from the accumulation of the number of steps of the user captured by the sensor 101 or the displacement detected by the acceleration sensor or the gyro sensor may be output as the degree of fatigue. Alternatively, the activity amount based on the output of the sensor device that detects the user's pulse and body temperature and the rise in body temperature (an example of sensor information) may be output as the fatigue level.

接下来，音频数据选择单元104根据由动作识别单元102识别的用户的动作类型来选择音频数据(S104)。存储单元107预先存储与假定的动作类型对应的多个音频数据模式。多个音频数据模式可以与一个动作类型对应。在这种情况下，音频数据选择单元104随机选择多个音频数据模式中的一个。要注意的是，取决于动作类型，例如在输出音频可能使用户感觉不舒适的情况下，不一定需要选择音频数据。Next, the audio data selection unit 104 selects audio data according to the motion type of the user recognized by the motion recognition unit 102 (S104). The storage unit 107 stores in advance a plurality of audio data patterns corresponding to assumed action types. Multiple audio data patterns may correspond to one action type. In this case, the audio data selection unit 104 randomly selects one of the plurality of audio data patterns. It is to be noted that, depending on the type of action, it is not necessarily necessary to select audio data, for example in cases where outputting audio may be uncomfortable for the user.

例如，在动作识别单元102已经识别出用户动作是“行走”的情况下，音频数据选择单元104从预先存储在存储单元107中的虚拟对象的运动声音中随机选择与“行走”相关联的多条音频数据中之一。在虚拟对象是上文所述的虚拟狗的情况下，如果用户正在行走，选择使用户感觉好像虚拟狗正在以与他/她的步速相同的步速行走的音频。For example, in a case where the motion recognition unit 102 has recognized that the user motion is "walking", the audio data selection unit 104 randomly selects a plurality of sounds associated with "walking" from the motion sounds of virtual objects stored in the storage unit 107 in advance. one of the pieces of audio data. In the case where the virtual object is the virtual dog described above, if the user is walking, audio is selected that makes the user feel as if the virtual dog is walking at the same pace as his/her pace.

类似地，动作识别单元102已经识别出用户的动作是“跑步”，音频数据选择单元104从预先存储在存储单元107中的虚拟对象的运动声音中随机选择与“跑步”相关联的多条音频数据中之一。此外，在动作识别单元102已经识别出用户的动作是“静止”的情况下，音频数据选择单元104从预先存储在存储单元107中的虚拟对象的运动声音中随机选择与“静止”相关联的多条音频数据中之一。在“静止”的情况下，其可以被配置为不选择音频数据。Similarly, the motion recognition unit 102 has recognized that the user's motion is "running", and the audio data selection unit 104 randomly selects a plurality of audio pieces associated with "running" from the motion sounds of virtual objects pre-stored in the storage unit 107 one of the data. Furthermore, in the case where the motion recognition unit 102 has recognized that the user's motion is "still", the audio data selection unit 104 randomly selects a motion sound associated with "still" from the motion sounds of the virtual objects pre-stored in the storage unit 107 One of multiple pieces of audio data. In the case of "still", it can be configured not to select audio data.

例如，行走时的脚步或运动声音的音频数据与“行走”关联。跑步时的脚步或表示比行走时的呼吸更粗的呼吸的运动声音的音频数据与“跑步”关联。For example, audio data of footsteps or motion sounds while walking is associated with "walking". Footsteps during running or audio data of motion sounds representing breathing thicker than breathing during walking are associated with "running".

音频数据选择单元104还根据疲劳度计算单元103计算的疲劳度来选择音频数据(S104)。音频数据选择单元104通过预定的阈值将疲劳度分为“大疲劳度”和“小疲劳度”。在确定疲劳度是“大疲劳度”的情况下，音频数据选择单元104从预先存储在存储单元108中的虚拟对象的运动声音中随机选择与“大疲劳度”相关联的多条音频数据中之一。同时，在确定疲劳度是“小疲劳度”的情况下，音频数据选择单元104从预先存储在存储单元108中的虚拟对象的运动声音中随机选择与“小疲劳度”相关联的多条音频数据中之一。The audio data selection unit 104 also selects audio data according to the fatigue degree calculated by the fatigue degree calculation unit 103 (S104). The audio data selection unit 104 classifies the degree of fatigue into "large degree of fatigue" and "small degree of fatigue" by a predetermined threshold. In a case where it is determined that the degree of fatigue is “large degree of fatigue”, the audio data selection unit 104 randomly selects pieces of audio data associated with “a degree of high degree of fatigue” from motion sounds of virtual objects stored in advance in the storage unit 108 . one. Meanwhile, in a case where it is determined that the degree of fatigue is "small degree of fatigue", the audio data selection unit 104 randomly selects a plurality of pieces of audio associated with "little degree of fatigue" from motion sounds of virtual objects pre-stored in the storage unit 108 one of the data.

例如，呼吸短促的声音可以与“大疲劳度”关联。此外，可以将疲劳度划分为三个或更多个级别，例如大级别、中等级别和小级别，并且被理解。For example, the sound of shortness of breath can be associated with "great fatigue." Furthermore, fatigue can be divided into three or more levels, such as a large level, a medium level, and a small level, and understood.

图6是用于理解音频数据选择单元104的信息处理的说明图。图6的部分(a)中的两个表是示出了虚拟对象的运动声音等的音频数据与用户的动作类型之间的关联性的表，以及示出音频数据与疲劳度之间的关联性的表，这些预先存储在存储单元107中。FIG. 6 is an explanatory diagram for understanding information processing by the audio data selection unit 104 . The two tables in part (a) of FIG. 6 are tables showing the association between audio data of motion sounds of virtual objects and the like and the action type of the user, and showing the association between the audio data and the degree of fatigue properties, which are stored in the storage unit 107 in advance.

音频数据选择单元104选择与动作识别单元102所识别的用户动作类型对应的音频数据，根据由疲劳度计算单元103计算的疲劳度来选择音频数据，并且向后续阶段输出所选择的两条音频数据。在图6的部分(a)中，选择与“行走”对应的文件mb002.mp3以及与“大疲劳度”对应的文件tc001.mp3。所选择的多条音频数据被音频信息生成单元108合成。The audio data selection unit 104 selects audio data corresponding to the user action type identified by the action identification unit 102, selects audio data according to the fatigue level calculated by the fatigue level calculation unit 103, and outputs the selected two pieces of audio data to the subsequent stage . In part (a) of Fig. 6, the file mb002.mp3 corresponding to "walking" and the file tc001.mp3 corresponding to "large fatigue" are selected. The selected pieces of audio data are synthesized by the audio information generating unit 108 .

作为音频数据选择单元104的音频数据选择的另一示例，可以预先准备图6的部分(b)中所示的表，在该表中，由动作识别单元102识别的动作类型以及由疲劳度计算单元103计算的疲劳度可被组合，并且可以执行选择所合成的音频数据的处理。在这种情况下，例如，作为与“行走”和“大疲劳度”对应的音频数据，可以放置“行走同时喘息的声音”。As another example of the audio data selection by the audio data selection unit 104, a table shown in part (b) of FIG. 6 in which the types of actions recognized by the action recognition unit 102 and calculated by the fatigue level may be prepared in advance The fatigue levels calculated by the unit 103 may be combined, and a process of selecting the synthesized audio data may be performed. In this case, for example, as audio data corresponding to "walking" and "great fatigue", "sound of breathing while walking" may be placed.

作为音频数据选择单元104的音频数据选择的另一示例，存在以下示例：音频数据选择单元104动态地生成音频数据模式，并且动态生成的音频数据被选择。在这种情况下，由动作识别单元102识别的用户的动作是具有连续值的参数，并且由疲劳度计算单元103计算的疲劳度也是具有连续值的参数。如上所述，音频数据选择单元104可以基于以下参数组动态地生成音频数据：该参数组包括从“行走”时到“跑步”时的运动速度的程度和疲劳度的参数。As another example of audio data selection by the audio data selection unit 104, there is an example in which the audio data selection unit 104 dynamically generates an audio data pattern, and the dynamically generated audio data is selected. In this case, the user's motion recognized by the motion recognition unit 102 is a parameter with continuous values, and the fatigue level calculated by the fatigue level calculation unit 103 is also a parameter with continuous values. As described above, the audio data selection unit 104 may dynamically generate audio data based on a parameter group including parameters of the degree of motion speed and the degree of fatigue from "walking" to "running".

在已经选择了音频数据的情况下，音频数据选择单元104将成对的时间和选择结果存储在存储单元107中(S105)。由声像位置计算单元106等使用成对的时间和选择结果。In a case where audio data has been selected, the audio data selection unit 104 stores the paired time and selection result in the storage unit 107 (S105). The paired time and selection result are used by the sound image position calculation unit 106 or the like.

接下来，位移计算单元105计算从任意时间点开始的位移(S106)。此处所计算的位移表示信息处理装置1的空间位移。由于前提是用户穿戴信息处理装置1，所以在下文中，由位移计算单元105计算的位移将被称为“用户位移”。用户位移包括用户的空间位置上的相对改变，还包括取向、水平位置、竖直方向上的位置、以及其位移。Next, the displacement calculation unit 105 calculates the displacement from an arbitrary time point (S106). The displacement calculated here represents the spatial displacement of the information processing apparatus 1 . Since the premise is that the user wears the information processing apparatus 1, hereinafter, the displacement calculated by the displacement calculation unit 105 will be referred to as "user displacement". User displacement includes relative changes in the user's spatial position, as well as orientation, horizontal position, vertical position, and displacement thereof.

例如，可以通过对从传感器101输入的多条传感器信息当中的陀螺仪传感器的输出进行积分来计算取向的位移。另外，存在着通过地磁传感器的输出来获取绝对取向的方法。为了补偿地磁传感器的精度，可以对陀螺仪传感器的输出进行积分。通过这些方法，位移计算单元105计算用户位移中的取向(用户的取向，前方方向)。For example, the displacement of the orientation can be calculated by integrating the output of the gyro sensor among the pieces of sensor information input from the sensor 101 . In addition, there is a method of acquiring the absolute orientation from the output of the geomagnetic sensor. To compensate for the accuracy of the geomagnetic sensor, the output of the gyro sensor can be integrated. Through these methods, the displacement calculation unit 105 calculates the orientation in the user's displacement (the user's orientation, the front direction).

位移计算单元105还计算水平位置的位移，作为用户位移之一。可以通过接收GPS卫星的无线电波的绝对位置测量来计算水平位置的位移，或通过与多个基站执行无线通信以确定绝对位置的方法来计算水平位置的位移。作为另一方法，位移计算单元105基于行进距离和行进方向(上述的取向的位移)来计算从一时间点开始的相对位置的位移。在此，存在着通过对加速度传感器的输出值进行积分来获得行进距离的方法。The displacement calculation unit 105 also calculates the displacement of the horizontal position as one of the user displacements. The displacement of the horizontal position can be calculated by receiving absolute position measurement of radio waves of GPS satellites, or by performing wireless communication with a plurality of base stations to determine the absolute position. As another method, the displacement calculation unit 105 calculates the displacement of the relative position from a point in time based on the travel distance and the travel direction (the above-described displacement of the orientation). Here, there is a method of obtaining the travel distance by integrating the output value of the acceleration sensor.

此外，可以通过以下方式获得行进距离：根据加速度的变化来检测行走步伐，并且将与行走步伐对应的步长乘以步数。在这种情况下，作为步长，固定地使用平均步长，或通过例如根据水平移动距离与步数之间的关系计算用户的平均步长来设定步长。Further, the travel distance can be obtained by detecting the walking step from the change in acceleration, and multiplying the step length corresponding to the walking step by the number of steps. In this case, as the step size, the average step size is fixedly used, or the step size is set by, for example, calculating the average step size of the user from the relationship between the horizontal movement distance and the number of steps.

位移计算单元105还计算高度的位移，作为用户位移之一。通过使用大气压力传感器的测量值的方法或计算与下述情况对应的高度的位移的方法，可以计算出高度方向(竖直方向)上的位移：在该情况下，认识到由动作识别单元102识别为“站”的用户动作类型和识别为“坐”的用户动作类型交替地重复。注意可以根据加速度传感器的测量值的变化模式来识别“站”/“坐”。The displacement calculation unit 105 also calculates the displacement of the height as one of the user displacements. The displacement in the height direction (vertical direction) can be calculated by the method of using the measurement value of the atmospheric pressure sensor or the method of calculating the displacement of the height corresponding to the case in which it is recognized that the action recognition unit 102 The user action type identified as "standing" and the user action type identified as "sit" are alternately repeated. Note that "standing"/"sitting" can be identified based on the change pattern of the measurement value of the acceleration sensor.

由位移计算单元105计算的用户位移被声像位置计算单元106使用。声像位置计算单元106计算从用户处观看的虚拟对象的相对位置(S107)。这个位置是在从音频输出单元109输出音频的情况下，由于最终合成的音频导致用户感觉到的感受上的声源(真实空间中的位置)。The user displacement calculated by the displacement calculation unit 105 is used by the sound image position calculation unit 106 . The sound image position calculation unit 106 calculates the relative position of the virtual object viewed from the user (S107). This position is a perceptual sound source (position in real space) that the user feels due to the final synthesized audio in the case of outputting audio from the audio output unit 109 .

作为声像位置计算单元106的信息处理，可以根据这个应用期望向虚拟对象给予的角色来选择合适的一个。在这种情况下，根据虚拟对象是什么角色或虚拟对象是什么，在声像位置计算单元106执行的计算方法中设定若干模式。在下文中，将参照图7描述两个典型的模式。图7是描述声像位置计算单元106的信息处理的示意图，并且示出了各个模式下的用户位移和虚拟对象的位移。As the information processing of the sound image position calculation unit 106, an appropriate one can be selected according to the character this application desires to give to the virtual object. In this case, depending on what character the virtual object is or what the virtual object is, several modes are set in the calculation method performed by the sound image position calculation unit 106 . In the following, two typical modes will be described with reference to FIG. 7 . FIG. 7 is a schematic diagram describing the information processing of the sound image position calculation unit 106, and shows the displacement of the user and the displacement of the virtual object in each mode.

图7的部分(a)示出了以下声像位置计算模式：其中，虚拟对象的移动以一定时延追踪与用户位移的位置相同的位置。在该图中，垂直轴是包括三轴位置和三轴方向的六维信息的一维表示，作为位移的示例。水平轴表示时间t。Part (a) of FIG. 7 shows a sound image position calculation mode in which the movement of the virtual object tracks the same position as the position displaced by the user with a certain time delay. In this figure, the vertical axis is a one-dimensional representation of six-dimensional information including three-axis position and three-axis orientation, as an example of displacement. The horizontal axis represents time t.

通过例如以下公式可以实现图7的部分(a)所示的可以追踪用户位移的声像位置计算模式。然而，X(t)表示用户位移，X'(t)表示虚拟对象的位移，并且K表示在虚拟对象开始移动之前的时延。K值越大，在虚拟对象开始移动之前的时间(延迟)越大。The sound image position calculation mode that can track the user's displacement shown in part (a) of FIG. 7 can be realized by, for example, the following formula. However, X(t) represents the user displacement, X'(t) represents the displacement of the virtual object, and K represents the time delay before the virtual object starts to move. The larger the value of K, the larger the time (delay) before the virtual object starts to move.

X'(t)＝X(t-K)X'(t)=X(t-K)

在呈现随着用户的移动而移动的存在物的情况下，图7的部分(a)中所示的由虚拟对象追踪用户位移是有效的。例如，在期望向用户提供虚拟对象是人、机器人、汽车、动物等等的增强现实的情况下，即，声像定位位置以一定时延追踪与用户位移的位置相同的位置的情况下，可以采用这种声像位置计算模式。In the case of presenting a presence that moves with the user's movement, the tracking of the user's displacement by the virtual object shown in part (a) of FIG. 7 is effective. For example, in the case where it is desired to provide the user with augmented reality in which the virtual object is a person, robot, car, animal, etc., that is, in the case where the sound image localization position tracks the same position as the position displaced by the user with a certain delay, it is possible to This mode of panning position calculation is used.

图7的部分(b)示出了以下声像位置计算模式：其中，虚拟对象以相对于用户位移的一定时延移动，以直接去到用户存在的位置。这种声像位置计算模式可以通过例如以下公式来实现。然而，a表示虚拟对象的移动速度。a值越靠近，追上用户需要花费的时间越长。也就是说，移动缓慢。Part (b) of FIG. 7 shows a sound image position calculation mode in which the virtual object moves with a certain time delay relative to the user displacement to go directly to the position where the user exists. This sound image position calculation mode can be realized by, for example, the following formula. However, a represents the moving speed of the virtual object. The closer the value of a, the longer it takes to catch up with the user. That is, move slowly.

X'(t)＝aX(t-K)+(1-a)X'(t-1)X'(t)=aX(t-K)+(1-a)X'(t-1)

例如在呈现穿过墙壁追随用户的存在物的情况下，图7的部分(b)中所示的由虚拟对象追踪用户位移是有效的。例如，它适合于表示作为虚拟对象的鬼魂角色。Tracking the user's displacement by the virtual object shown in part (b) of Fig. 7 is effective, for example, in the case of presenting a presence that follows the user through a wall. For example, it is suitable for representing ghost characters as virtual objects.

声像位置计算单元106使用由上述信息处理计算的位移X'(t)或能够由此计算出的在时间t处的点，作为在时间t处的虚拟对象的位置。注意这个点可以被用作为基点，并且通过向这个基点添加预定的位置变化而获得的点可以用作为虚拟对象的位置。例如，在虚拟对象是狗角色的情况下，通过将由所述计算计算出的基点移至更靠近地面的更低位置而获得的点被输出。可替选地，在虚拟对象是鬼魂角色的情况下，为了产生飘浮的感觉，进行计算以便每隔一定间隔添加向上和向下的位置变化。根据这种配置，可以再现更逼真的角色移动。The sound image position calculation unit 106 uses the displacement X'(t) calculated by the above-described information processing or the point at time t that can be calculated therefrom, as the position of the virtual object at time t. Note that this point can be used as a base point, and a point obtained by adding a predetermined position change to this base point can be used as the position of the virtual object. For example, in the case where the virtual object is a dog character, a point obtained by moving the base point calculated by the calculation to a lower position closer to the ground is output. Alternatively, in the case where the virtual object is a ghost character, in order to generate a floating feeling, calculation is performed so as to add upward and downward positional changes at regular intervals. According to this configuration, more realistic character movement can be reproduced.

此外，声像位置计算单元106针对以用户位置作为开始点情况下的角色的位置(X'(t)-X(t))，考虑用户的取向位移，来计算从用户处观看的角色的相对位置。作为声像定位方法，可以使用专利文献1中所描述的方法。In addition, the audio-visual position calculation unit 106 calculates the relative position of the character viewed from the user in consideration of the orientation displacement of the user with respect to the position of the character (X'(t)-X(t)) with the user's position as the starting point Location. As the sound image localization method, the method described in Patent Document 1 can be used.

音频信息生成单元108执行用于在空间上布置音频信息的信息处理(S108)。信息处理将声像定位在与用户的相对位置处，例如以用户为中心与用户相距一距离或方向，并且例如可以使用专利文献1中所描述的方法。The audio information generating unit 108 performs information processing for spatially arranging the audio information ( S108 ). The information processing locates the sound image at a relative position to the user, eg, a distance or a direction from the user centered on the user, and the method described in Patent Document 1 can be used, for example.

音频信息生成单元108使用由音频数据选择单元104选择的音频数据作为要用于输出的音频信息。然而，在声像位置计算中存在延迟的情况下，使用在参考用户位置时的所选择的音频数据。也就是说，在由计算公式X'(t)＝X(t-K)计算声像位置的情况下，在时间(t-K)处选择的音频数据被在时间t处使用。音频信息生成单元108通过音频数据选择单元104提取并使用与时间信息相关联地存储在存储单元107中的音频数据。The audio information generation unit 108 uses the audio data selected by the audio data selection unit 104 as audio information to be used for output. However, in the case where there is a delay in the sound image position calculation, the selected audio data when referring to the user position is used. That is, in the case where the sound image position is calculated by the calculation formula X'(t)=X(t-K), the audio data selected at the time (t-K) is used at the time t. The audio information generation unit 108 extracts and uses the audio data stored in the storage unit 107 in association with the time information through the audio data selection unit 104 .

此外，音频信息生成单元108将声像位置计算单元106基于位移计算单元105输出的用户位移而计算的位置指定作为由于输出的音频信息导致用户感觉到的感受上的声源位置(声像被定位的位置)。音频信息生成单元108调制音频数据使得其被从指定的位置听到。在这个实施方式中，生成为2声道音频信息。然而，根据音频输出单元109的具体实施方式，可以生成为5.1声道音频信息。Further, the audio information generation unit 108 specifies the position calculated by the sound image position calculation unit 106 based on the user displacement output by the displacement calculation unit 105 as the perceptual sound source position (the sound image is localized) that the user feels due to the output audio information. s position). The audio information generating unit 108 modulates the audio data so that it is heard from a designated position. In this embodiment, it is generated as 2-channel audio information. However, according to the specific implementation of the audio output unit 109, it may be generated as 5.1 channel audio information.

此外，音频信息生成单元108可以根据用户的移动速度来调整已调制的音频数据的再现速度，其中基于由位移计算单元105计算的用户位移来计算用户的移动速度。例如，即使在音频数据选择单元104选择的音频数据也是与“行走”对应的音频数据的情况下，也取决于移动速度的差异，以不同的再现速度进行再现。Further, the audio information generating unit 108 can adjust the reproduction speed of the modulated audio data according to the moving speed of the user calculated based on the user's displacement calculated by the displacement calculating unit 105 . For example, even in the case where the audio data selected by the audio data selection unit 104 is also audio data corresponding to "walking", reproduction is performed at different reproduction speeds depending on the difference in moving speed.

接下来，音频输出单元109将音频信息生成单元108生成的音频数据物理地输出为声波。Next, the audio output unit 109 physically outputs the audio data generated by the audio information generation unit 108 as sound waves.

<4.结论><4. Conclusion>

根据上述的实施方式，通过识别用户的动作模式并且基于其来切换要再现的音频数据，可以产生遵循用户的动作模式中的变化的表达。另外，通过基于三维位置或用户取向的变化来改变声像定位位置，可以产生以下声音表达：该声音表达遵循用户动作的结果或用户在空间中占据的位置。另外，由于声像定位位置或其基点以相对于用户动作的预定延迟而移动，所以可以产生遵循用户的位置变化的声音表达。如上所述，根据上述的实施方式，在将虚拟角色(例如，狗)设定为虚拟对象的情况下，实现了好像虚拟角色真实存在于用户附近的声音表达。According to the above-described embodiment, by recognizing the action pattern of the user and switching audio data to be reproduced based thereon, it is possible to generate an expression that follows a change in the action pattern of the user. In addition, by changing the sound image localization position based on changes in three-dimensional position or user orientation, it is possible to generate a sound representation that follows the result of the user's actions or the position the user occupies in space. In addition, since the sound image localization position or its base point is moved with a predetermined delay with respect to the user's action, it is possible to generate a sound expression that follows the position change of the user. As described above, according to the above-described embodiments, in the case where a virtual character (eg, a dog) is set as a virtual object, a voice expression as if the virtual character actually exists in the vicinity of the user is realized.

<5.其他实施方式><5. Other Embodiments>

注意本技术还可以采取以下配置。Note that the present technology can also take the following configurations.

尽管在图3中挂在脖子上的颈挂式扬声器被示出为外观配置示例，但上述实施方式中公开的技术还可以应用于其他实施方式，例如，包括眼镜式显示器的头戴式显示器。在这种情况下，在通过位移计算单元105和声像位置计算单元106的信息处理输出的虚拟对象的位置处呈现图像也是有利的。根据本技术通过将视觉刺激添加至听觉刺激可以实现协同作用，并且可以向用户提供具有更高质量的增强现实。Although the neck-mounted speaker hung around the neck is shown as an appearance configuration example in FIG. 3 , the technology disclosed in the above-described embodiment can also be applied to other embodiments, for example, a head-mounted display including a glasses-type display. In this case, it is also advantageous to present an image at the position of the virtual object output by the information processing of the displacement calculation unit 105 and the sound image position calculation unit 106 . Synergy can be achieved by adding visual stimuli to auditory stimuli according to the present technology, and augmented reality with higher quality can be provided to the user.

<5-1.另外的实施方式1><5-1. Another embodiment 1>

将参照图8描述根据本技术的另外的实施方式1的配置。在这个实施方式中，假设存在信息处理装置1的多个用户。在这个实施方式中，存在两种要输入至CPU 100的传感器信息。将输出要在位移计算单元105计算用户位移的信息处理中使用的传感器信息的一个或多个传感器设定为传感器101，并且将输出要在动作识别和疲劳度计算中使用的传感器信息的一个或多个传感器设定为其他人传感器110。A configuration according to another embodiment 1 of the present technology will be described with reference to FIG. 8 . In this embodiment, it is assumed that there are multiple users of the information processing apparatus 1 . In this embodiment, there are two kinds of sensor information to be input to the CPU 100 . One or more sensors that output sensor information to be used in information processing in which the displacement calculation unit 105 calculates the user's displacement is set as the sensor 101, and one or more sensors that output sensor information to be used in motion recognition and fatigue calculation A plurality of sensors are set as other human sensors 110 .

位移计算单元105计算感知音频输出单元109所输出的音频的用户的用户位移。动作识别单元102和疲劳度计算单元103基于不是用户的另外人的传感器信息来识别所述另外人的动作模式和疲劳度。其他信息处理类似于上述实施方式中的信息处理。The displacement calculation unit 105 calculates the user displacement of the user who perceives the audio output by the audio output unit 109 . The action recognition unit 102 and the fatigue level calculation unit 103 recognize the action pattern and the fatigue level of the other person who is not the user based on the sensor information of the other person. Other information processing is similar to that in the above-described embodiment.

根据这个实施方式，识别另外的用户的动作，选择根据动作模式的音频数据，并且声像所被定位的真实空间中的位置遵循收听音频的用户的空间位移。根据这个实施方式，可以向用户提供另外人的虚拟化身追随用户的增强现实。声像位置计算单元106可以将声像定位位置设定在空中，以产生好像另外人的虚拟化身飘浮的感觉。According to this embodiment, the motion of the further user is recognized, the audio data according to the motion pattern is selected, and the position in the real space where the sound image is located follows the spatial displacement of the user listening to the audio. According to this embodiment, the user may be provided with an augmented reality in which an avatar of another person follows the user. The sound image position calculation unit 106 may set the sound image localization position in the air to create a feeling as if the virtual avatar of another person is floating.

这个实施方式可以应用于，但不限于，用户能够与在远方跑步的另一个人进行比赛的跑步应用。可替选地，该实施方式可以应用于用户体验另一个人的体验的应用。例如，通过将该实施方式应用于用户体验另一个人的视野的头戴式显示器，可以向用户提供他/她追踪远方的运动员的运动的增强现实。This embodiment can be applied to, but is not limited to, running applications where the user can compete against another person running at a distance. Alternatively, this embodiment may be applied to an application where a user experiences another person's experience. For example, by applying this embodiment to a head mounted display in which a user experiences another person's field of view, an augmented reality can be provided to the user in which he/she tracks the movement of a distant athlete.

<5-2.另外的实施方式2><5-2. Another embodiment 2>

针对执行，上述实施方式中描述的信息处理并不依赖于上述实施方式中示出的硬件和软件配置。本技术可以以下述形式来实施：图4中所示的部分或全部功能块在单独的硬件上执行。如图9所示，这个实施方式是信息处理装置1被配置为服务器用户端系统的实施方式，在该服务器用户端系统中，包括CPU 100和存储单元107的服务器2与可穿戴装置3经由网络4彼此通信。For execution, the information processing described in the above-described embodiments does not depend on the hardware and software configurations shown in the above-described embodiments. The present technology may be implemented in a form in which some or all of the functional blocks shown in FIG. 4 are executed on separate hardware. As shown in FIG. 9 , this embodiment is an embodiment in which the information processing apparatus 1 is configured as a server client system in which the server 2 including the CPU 100 and the storage unit 107 and the wearable device 3 are connected via a network 4 communicate with each other.

在这个实施方式中，可以采用图3中所示的颈挂式扬声器作为可穿戴装置3。另外，智能手机可以被用作为可穿戴装置3的示例。服务器2放置在云端，并且根据本技术的信息处理在服务器2的一侧上执行。本技术也可以以这样的形式实施。In this embodiment, the neck-mounted speaker shown in FIG. 3 can be used as the wearable device 3 . In addition, a smartphone can be used as an example of the wearable device 3 . The server 2 is placed in the cloud, and information processing according to the present technology is performed on the side of the server 2 . The present technology may also be implemented in such a form.

<5-3.另外的实施方式3><5-3. Another embodiment 3>

在上述的实施方式中，在图5的S102中动作识别单元102识别用户的动作类型。在这个实施方式中，当在此识别到用户的动作类型的变化时，音频数据选择单元104选择与从变化前的音频数据到变化后的音频数据的音频数据切换模式相对应的音频数据以及变化后的音频数据两者。In the above-mentioned embodiment, the action recognition unit 102 recognizes the action type of the user in S102 of FIG. 5 . In this embodiment, when a change in the action type of the user is recognized here, the audio data selection unit 104 selects the audio data corresponding to the audio data switching pattern from the audio data before the change to the audio data after the change and the change After the audio data both.

切换音频数据的定时是动作的开始或结束的定时。例如，在要提供给虚拟对象的角色是“系有铃铛的角色”并且动作类型从“跑步”变为“静止”时，音频数据选择单元104选择听起来像叮当的叮当声音。音频数据选择单元104选择与“静止”对应的音频数据以及关于叮当声音的音频数据。The timing of switching the audio data is the timing of the start or the end of the action. For example, when the character to be provided to the virtual object is "a bell-tethered character" and the action type is changed from "running" to "still", the audio data selection unit 104 selects a jingle sound that sounds like a jingle. The audio data selection unit 104 selects audio data corresponding to "still" and audio data on jingle sound.

根据对应于虚拟对象的角色的动作变化来选择音频数据的这种配置，可以向用户提供更富娱乐性的增强现实或更逼真的增强现实。Such a configuration in which audio data is selected according to a change in motion of a character corresponding to a virtual object can provide a user with a more entertaining augmented reality or a more realistic augmented reality.

除了叮当声音之外，当动作类型从“跑步”变为“静止”时，音频数据选择单元104可以选择指示角色感到惊讶的对话。在这种情况下，可以产生以下效果：在跑步的用户突然停下时虚拟对象的角色感到惊讶。可以使角色更生动，并且向用户提供更富娱乐性的增强现实。In addition to the jingle sound, when the action type changes from "running" to "still", the audio data selection unit 104 may select a dialogue indicating that the character is surprised. In this case, the following effect can be produced: the character of the virtual object is surprised when the running user suddenly stops. Characters can be brought to life and users can be provided with more entertaining augmented reality.

<5-4.另外的实施方式4><5-4. Another embodiment 4>

在上述的实施方式中，在图5的S102中动作识别单元102识别用户的动作类型。在这个实施方式中，当在此识别到用户动作类型的变化时，CPU 100执行与变化后的动作类型相关的预定条件确定。在这个条件确定中，确定变化后的动作类型是否匹配于与虚拟对象相关联的信息。In the above-mentioned embodiment, the action recognition unit 102 recognizes the action type of the user in S102 of FIG. 5 . In this embodiment, when a change in the user action type is recognized here, the CPU 100 performs predetermined condition determination related to the changed action type. In this condition determination, it is determined whether the changed action type matches the information associated with the virtual object.

在上述的实施方式或这个实施方式中，向用户提供仿佛虚拟对象追随用户的增强现实(AR)。在这方面，根据用户的动作类型，如果可能的话从虚拟对象发出的音频也被改变。信息处理装置1将个性、特征、所有物等作为不存在的角色给予虚拟对象。在与虚拟对象相关联的这种信息与变化后的动作类型不匹配的情况下，增强现实被降低。In the above-described embodiment or this embodiment, the user is provided with augmented reality (AR) as if the virtual object follows the user. In this regard, the audio emitted from the virtual object is also changed, if possible, according to the type of action of the user. The information processing apparatus 1 gives a personality, characteristic, belonging, etc. to a virtual object as a character that does not exist. In cases where this information associated with the virtual object does not match the changed action type, augmented reality is degraded.

在这方面，在这个实施方式中，变化后的动作类型是否与虚拟对象关联的信息相匹配。当执行关于匹配的条件确定时，音频数据选择单元104可以选择预定的音频数据。In this regard, in this embodiment, whether the changed action type matches the information associated with the virtual object. The audio data selection unit 104 may select predetermined audio data when performing condition determination on matching.

例如，在动作类型从“行走”变为“骑自行车”的情况下，执行虚拟对象的角色的所有物是否包括“自行车”的条件确定。在这个示例中，所有物是否包括“自行车”的条件确定对应于变化后的动作类型是否匹配于与虚拟对象相关联的信息的确定。For example, in the case where the action type is changed from "walking" to "cycling", the conditional determination of whether or not the belonging of the character of the virtual object includes "bicycle" is performed. In this example, the conditional determination of whether the belonging includes a "bicycle" corresponds to a determination of whether the changed action type matches the information associated with the virtual object.

在这个示例中作为确定的结果、与虚拟对象相关联的信息(角色的所有物)不包括自行车的情况下，即，角色没有“自行车”的情况下，音频数据选择单元104不选择与自行车对应的音频数据。而是，可以选择角色低声说“我也想骑自行车”的声音。In the case where the information associated with the virtual object (the character's possession) does not include a bicycle as a result of the determination in this example, that is, in the case where the character does not have "bicycle", the audio data selection unit 104 does not select the corresponding bicycle audio data. Instead, you can choose the voice where the character whispers "I want to ride a bike, too."

根据虚拟对象的角色的说话定时受到控制的这种配置，可以向用户提供更富娱乐性的增强现实或逼真的增强现实。Such a configuration in which the speaking timing of the character according to the virtual object is controlled, can provide the user with a more entertaining augmented reality or a realistic augmented reality.

<5-5.另外的实施方式5><5-5. Another embodiment 5>

在上述的实施方式中，计算出用户的疲劳或疲劳度作为用户状态的示例，并且疲劳或疲劳度被用于选择音频数据。然而，作为用户状态的另一示例，可以通过传感器101获得用户的情绪(例如，高兴、愤怒、悲伤和快乐情绪)，并且可以基于情绪选择音频数据。传感器101不受特别地限制。只要可以通过生物感测装置根据血压或体温获得用户的情绪，就可以基于情绪选择音频数据。In the above-described embodiment, the user's fatigue or the degree of fatigue is calculated as an example of the user's state, and the fatigue or the degree of fatigue is used to select audio data. However, as another example of the user's state, the user's emotions (eg, happy, angry, sad, and happy emotions) may be obtained through the sensor 101, and audio data may be selected based on the emotions. The sensor 101 is not particularly limited. As long as the user's emotion can be obtained from the blood pressure or body temperature through the biosensing device, the audio data may be selected based on the emotion.

此外，代替于用户的状态或情绪，或除了用户的状态或情绪之外，可以获得用户周围的环境信息，并且可以基于环境信息选择音频信息。例如，在降雨被检测为环境信息的情况下，音频数据选择单元104根据这个而选择在水洼上行走的声音。根据这种配置，可以向用户提供更富娱乐性的增强现实或逼真的增强现实。Furthermore, instead of, or in addition to, the user's state or mood, environmental information around the user may be obtained, and audio information may be selected based on the environmental information. For example, in the case where rainfall is detected as environmental information, the audio data selection unit 104 selects the sound of walking on puddles according to this. According to this configuration, more entertaining augmented reality or realistic augmented reality can be provided to the user.

<5-6.另外的实施方式6><5-6. Another embodiment 6>

声像位置计算单元106可以基于以下组合中的一个或多个来确定虚拟对象被放置的位置。The sound image position calculation unit 106 may determine the position where the virtual object is placed based on one or more of the following combinations.

·从传感器101获得的信息· Information obtained from sensor 101

·从外部(地图数据等)获得的信息・Information obtained from outside (map data, etc.)

·关于对虚拟对象给予的个性或所有物的信息· Information about personalities or possessions given to virtual objects

例如，在信息处理装置1能够获得详细的地图数据、并且基于位移计算单元105计算的用户位移的用户绝对位置在建筑物的墙附近的情况下，声像位置计算单元106放置虚拟对象使得所述虚拟对象不在跨过墙面向用户的位置。例如，在虚拟对象是诸如狗的角色的情况下，因为如果角色在行走期间进入墙的另一侧则是不自然的，所以如果能够获得详细的地图数据，则声像位置计算单元106放置虚拟对象使得所述虚拟对象转向至用户的一侧。For example, in the case where the information processing apparatus 1 can obtain detailed map data, and the user's absolute position based on the user's displacement calculated by the displacement calculating unit 105 is near a wall of a building, the audio-visual position calculating unit 106 places a virtual object such that the The virtual object is not positioned across the wall towards the user. For example, in the case where the virtual object is a character such as a dog, since it is unnatural if the character enters the other side of the wall during walking, if detailed map data can be obtained, the sound image position calculation unit 106 places a virtual The object causes the virtual object to turn to the user's side.

可以从外部获得的地图数据不仅包括建筑物的大致纬度和经度，而且包括划分建筑物与道路等之间的边界的墙的位置坐标。在信息处理装置1能够使用这样的地图数据的情况下，可以将建筑物视为由墙的位置坐标围绕的范围。在这方面，声像位置计算单元106将虚拟对象的声像定位位置设置在排除了建筑物的坐标位置的范围中，其中基于由位移计算单元105输出的用户位移来确定所述虚拟对象。具体地，例如，将虚拟对象放置在道路的一侧。可替选地，声像位置计算单元106沿着空间是开放的方向(例如，存在着道路的方向)放置虚拟对象。The map data that can be obtained from the outside includes not only the approximate latitude and longitude of the building, but also the positional coordinates of the wall dividing the boundary between the building and the road or the like. When the information processing apparatus 1 can use such map data, the building can be regarded as a range surrounded by the position coordinates of the wall. In this regard, the sound image position calculation unit 106 sets the sound image localization position of the virtual object determined based on the user displacement output by the displacement calculation unit 105 in the range excluding the coordinate position of the building. Specifically, for example, a virtual object is placed on one side of a road. Alternatively, the sound image position calculation unit 106 places the virtual object along a direction in which the space is open (eg, a direction in which a road exists).

此外，在这个示例中，在虚拟对象与物体(例如地图数据上的建筑物的墙)碰撞的情况下，可以再现例如碰撞声音的音频数据。例如，在由声像位置计算单元106放置的虚拟对象的位置与建筑物的位置范围的坐标交叠的情况下，音频信息生成单元108再现例如碰撞声音的音频数据。Also, in this example, in the case where a virtual object collides with an object such as a wall of a building on the map data, audio data such as a collision sound can be reproduced. For example, in the case where the position of the virtual object placed by the sound-image position calculating unit 106 overlaps the coordinates of the position range of the building, the audio information generating unit 108 reproduces audio data such as collision sound.

注意给予虚拟对象的角色是能够穿过墙的角色(例如鬼魂)，声像位置计算单元106可以将该角色放置在墙的另一侧。此外，在鬼魂穿过墙的时刻可以播放特定的叮当声音。Note that the character given to the virtual object is a character (eg, a ghost) that can pass through a wall, and the sound image position calculation unit 106 can place the character on the other side of the wall. Additionally, a specific jingle sound can be played when the ghost walks through the wall.

根据这个实施方式的配置，可以向用户提供更富娱乐性的增强现实或逼真的增强现实。According to the configuration of this embodiment, more entertaining augmented reality or realistic augmented reality can be provided to the user.

<5-7.另外的实施方式7><5-7. Another embodiment 7>

在这个实施方式中，除了上述的实施方式中公开的配置之外，音频信息生成单元108具有根据虚拟对象的角色的动作状态而生成不同的音频信息的配置。例如，当动作识别单元102领会到用户的动作模式是跑了超过预定阈值的很长时间时，音频数据选择单元104可以选择不同的音频数据，并且由音频信息生成单元108最终生成的音频信息会不同。在这种情况下，音频信息生成单元108可以将要由音频数据选择单元104选择的音频数据从跑步时的正常音频数据切换至例如以下音频数据：所述音频数据指示角色感到疲劳，这例如包括呼吸短促的声音或语音“累了”。In this embodiment, in addition to the configuration disclosed in the above-described embodiment, the audio information generating unit 108 has a configuration of generating different audio information according to the action state of the character of the virtual object. For example, when the motion recognition unit 102 recognizes that the user's motion pattern is running for a long time exceeding a predetermined threshold, the audio data selection unit 104 may select different audio data, and the audio information finally generated by the audio information generation unit 108 may different. In this case, the audio information generation unit 108 may switch the audio data to be selected by the audio data selection unit 104 from normal audio data while running to, for example, audio data indicating that the character is tired, which includes breathing, for example Short voice or voice "tired".

根据这样的配置，可以产生虚拟对象的角色也具有动作状态(疲倦、无聊等)的效果，并且可以向用户提供更富娱乐性的增强现实或逼真的增强现实。According to such a configuration, it is possible to generate an effect that the character of the virtual object also has an action state (tiredness, boredom, etc.), and more entertaining augmented reality or realistic augmented reality can be provided to the user.

<6.附录><6. Appendix>

本说明书中公开的技术思想的部分可以描述为以下的(1)至(11)。Parts of the technical idea disclosed in this specification can be described as the following (1) to (11).

(1)(1)

一种信息处理装置，包括：An information processing device, comprising:

动作识别单元，其被配置成基于传感器信息识别用户的动作模式；a motion recognition unit configured to recognize a user's motion pattern based on sensor information;

音频数据选择单元，其被配置成选择与所述动作识别单元所识别的所述动作模式对应的音频数据；以及an audio data selection unit configured to select audio data corresponding to the motion pattern identified by the motion recognition unit; and

音频信息生成单元，其基于由所述音频数据选择单元选择的所述音频数据来生成用于在所述用户周围的真实空间中对声源的声像进行定位的多声道音频信息。An audio information generation unit that generates multi-channel audio information for localizing a sound image of a sound source in a real space around the user based on the audio data selected by the audio data selection unit.

(2)(2)

根据上述(1)所述的信息处理装置，其中The information processing apparatus according to (1) above, wherein

所述音频数据选择单元被配置成选择所述音频数据作为从要放置在所述真实空间中的虚拟对象发出的音频，并且the audio data selection unit is configured to select the audio data as audio emitted from a virtual object to be placed in the real space, and

所述音频信息生成单元被配置成通过生成所述多声道音频信息来执行声像定位，所述虚拟对象通过所述声像定位被放置在所述声源的位置处。The audio information generating unit is configured to perform sound image localization by generating the multi-channel audio information, by which the virtual object is placed at the position of the sound source.

(3)(3)

根据上述(1)或(2)所述的信息处理装置，其中The information processing apparatus according to (1) or (2) above, wherein

所述音频数据选择单元被配置成：在作为所述动作识别单元的识别结果，要选择的音频数据被改变时，选择与从变化前的音频数据到变化后的音频数据的音频数据切换模式对应的音频数据以及变化后的音频数据。The audio data selection unit is configured to select an audio data switching mode corresponding to the audio data before the change to the audio data after the change when the audio data to be selected is changed as a result of the recognition by the action recognition unit the audio data and the changed audio data.

(4)(4)

根据上述(1)至(3)中任一项所述的信息处理装置，其中The information processing apparatus according to any one of (1) to (3) above, wherein

所述音频数据选择单元被配置成：在作为所述动作识别单元的识别结果，要选择的音频数据被改变时，在存在与所述用户的动作模式对应的多条音频数据的情况下选择与和所述虚拟对象关联的信息相匹配的音频数据。The audio data selection unit is configured to, when audio data to be selected is changed as a recognition result of the motion recognition unit, select the audio data corresponding to the user's motion pattern in the presence of a plurality of pieces of audio data. Audio data matching the information associated with the virtual object.

(5)(5)

根据上述(1)至(4)中任一项所述的信息处理装置，还包括：The information processing device according to any one of (1) to (4) above, further comprising:

位移计算单元，其基于所述传感器信息来输出包括所述用户的位置的相对变化的用户位移。A displacement calculation unit that outputs a user displacement including a relative change in the position of the user based on the sensor information.

(6)(6)

根据上述(5)所述的信息处理装置，其中，The information processing apparatus according to (5) above, wherein:

所述音频信息生成单元被配置成：基于由所述位移计算单元输出的所述用户位移，调制由所述音频数据选择单元选择的音频数据，从而生成所述多声道音频信息。The audio information generation unit is configured to: modulate the audio data selected by the audio data selection unit based on the user displacement output by the displacement calculation unit, thereby generating the multi-channel audio information.

(7)(7)

根据上文的项(6)所述的信息处理装置，其中，The information processing apparatus according to the above item (6), wherein,

所述音频信息生成单元被配置成调制由所述音频数据选择单元选择的音频数据，使得声像通过所述多声道音频信息而定位的声源被放置在遵循所述位移计算单元输出的所述用户位移的位置处，从而生成所述多声道音频信息。The audio information generation unit is configured to modulate the audio data selected by the audio data selection unit such that a sound source whose sound image is localized by the multi-channel audio information is placed in accordance with the output of the displacement calculation unit. at the position displaced by the user, thereby generating the multi-channel audio information.

(8)(8)

根据上述(7)所述的信息处理装置，其中，The information processing apparatus according to (7) above, wherein:

所述音频信息生成单元被配置成生成所述多声道音频信息，使得声像通过所述多声道音频信息而定位的声源以一时延来遵循空间中的位置，所述位置从由所述用户位移标识的所述用户的位置开始。The audio information generating unit is configured to generate the multi-channel audio information such that a sound source whose sound image is located by the multi-channel audio information follows a position in space with a time delay, the position is The position of the user identified by the user displacement begins.

(9)(9)

根据上述(5)至(8)中任一项所述的信息处理装置，其中The information processing apparatus according to any one of (5) to (8) above, wherein

所述音频信息生成单元，基于由所述位移计算单元输出的所述用户位移以及从外部获得的包括建筑物的位置坐标的地图信息来生成所述多声道音频信息，使得所述虚拟对象不被放置在所述地图信息中包括的所述建筑物的位置坐标的范围中。The audio information generation unit generates the multi-channel audio information based on the user displacement output by the displacement calculation unit and map information including position coordinates of buildings obtained from the outside so that the virtual object does not is placed in the range of the location coordinates of the building included in the map information.

(10)(10)

根据上述(9)所述的信息处理装置，其中，The information processing apparatus according to (9) above, wherein:

所述音频信息生成单元在所述地图信息中包括的所述建筑物的位置坐标的范围与虚拟对象被放置的位置交叠的情况下生成包括碰撞声音的所述多声道音频信息。The audio information generating unit generates the multi-channel audio information including the collision sound in a case where the range of the position coordinates of the building included in the map information overlaps with the position where the virtual object is placed.

(11)(11)

根据上述(1)至(10)中任一项所述的信息处理装置，还包括：The information processing device according to any one of (1) to (10) above, further comprising:

状态分析单元，其被配置成分析所述用户的状态，所述状态能够根据所述传感器信息以及所述动作识别单元所识别的所述用户的动作模式中的一个而变化。A state analysis unit configured to analyze the state of the user, the state being changeable according to one of the sensor information and the motion pattern of the user recognized by the motion recognition unit.

(12)(12)

根据上述(11)所述的信息处理装置，其中，The information processing apparatus according to (11) above, wherein:

所述音频数据选择单元被配置成选择与所述用户的动作模式对应的音频数据以及与所述状态分析单元所分析的所述用户的状态对应的音频数据。The audio data selection unit is configured to select audio data corresponding to the action pattern of the user and audio data corresponding to the state of the user analyzed by the state analysis unit.

(13)(13)

根据上述(12)所述的信息处理装置，其中，The information processing apparatus according to (12) above, wherein:

所述音频信息生成单元被配置成将所述音频数据选择单元所选择的与所述用户的动作模式对应的音频数据与对应于所述用户的状态的音频数据进行合成，从而基于合成的音频数据生成所述多声道音频信息。The audio information generating unit is configured to synthesize the audio data corresponding to the action pattern of the user selected by the audio data selecting unit and the audio data corresponding to the state of the user, thereby based on the synthesized audio data The multi-channel audio information is generated.

(14)(14)

根据上述(11)至(13)中任一项所述的信息处理装置，其中，The information processing apparatus according to any one of (11) to (13) above, wherein,

所述状态分析单元被配置成根据所述传感器信息以及所述动作识别单元所识别的所述用户的动作模式中的一个来分配每单位时间的疲劳度，并且积累所分配的每单位时间的疲劳度，从而计算疲劳度作为所述用户的状态。The state analysis unit is configured to assign fatigue per unit time according to one of the sensor information and the action pattern of the user identified by the action identification unit, and to accumulate the assigned fatigue per unit time degree, so as to calculate the fatigue degree as the state of the user.

(15)(15)

根据上述(1)至(14)中任一项所述的信息处理装置，其中，The information processing apparatus according to any one of (1) to (14) above, wherein,

所述音频数据选择单元在所述动作识别单元所识别的所述用户的动作模式持续超过预定阈值的情况下选择与对应于所述动作识别单元所识别的所述用户的动作模式的音频数据不同的音频数据。The audio data selection unit selects audio data different from the audio data corresponding to the motion pattern of the user recognized by the motion recognition unit if the motion pattern of the user recognized by the motion recognition unit continues to exceed a predetermined threshold audio data.

(16)(16)

一种信息处理方法，包括：An information processing method, comprising:

动作识别步骤：基于传感器信息来识别用户的动作模式；Action recognition step: based on sensor information to identify the user's action pattern;

音频数据选择步骤：选择与所述动作识别步骤所识别的所述用户的动作模式对应的音频数据；以及Audio data selection step: selecting audio data corresponding to the motion pattern of the user identified by the motion recognition step; and

音频信息生成步骤：基于由所述音频数据选择步骤选择的音频数据来生成用于在所述用户周围的真实空间中对声源的声像进行定位的多声道音频信息。Audio information generating step: generating multi-channel audio information for localizing a sound image of a sound source in a real space around the user based on the audio data selected by the audio data selecting step.

(17)(17)

一种程序，其使计算机执行以下步骤：A program that causes a computer to perform the following steps:

附图标记列表List of reference signs

1 信息处理装置1 Information processing device

100 CPU100 CPUs

101 传感器101 Sensors

102 动作识别单元102 Action Recognition Unit

103 疲劳度计算单元(状态分析单元)103 Fatigue calculation unit (state analysis unit)

104 音频数据选择单元104 Audio data selection unit

105 位移计算单元105 Displacement calculation unit

106 声像位置计算单元106 Sound image position calculation unit

107 存储单元107 storage unit

108 音频信息生成单元108 Audio information generation unit

109 音频输出单元109 Audio output unit

110 其他人传感器110 Other people sensor

Claims

1. a kind of information processing unit, comprising:

Action recognition unit is configured to identify the action mode of user based on sensor information；

Audio data selecting unit is configured to select the movement mould of the user identified with the action recognition unit The corresponding audio data of formula；And

Audio-frequency information generation unit is generated based on the audio data selected by the audio data selecting unit in institute State the multichannel audio information positioned in the real space around user to the acoustic image of sound source.

2. information processing unit according to claim 1, wherein

The audio data selecting unit is configured to select the audio data as from being placed in the real space Virtual objects issue audio, and

The audio-frequency information generation unit is configured to execute Sound image localization, the void by generating the multichannel audio information Quasi- object is placed at the position of the sound source by the Sound image localization.

3. information processing unit according to claim 1, wherein

The audio data selecting unit is configured to: in the recognition result as the action recognition unit, the sound to be selected When frequency evidence is changed, selection and the audio data switch mode pair from the audio data after the audio data to variation before variation Audio data after the audio data answered and variation.

4. information processing unit according to claim 1, wherein

The audio data selecting unit is configured to: in the recognition result as the action recognition unit, the sound to be selected When frequency evidence is changed, selection and and institute in the case where there is a plurality of audio data corresponding with the action mode of the user State the audio data that the associated information of virtual objects matches.

5. information processing unit according to claim 1, further includes:

It is displaced computing unit, the user of the opposite variation of the position including the user is exported based on the sensor information Displacement.

6. information processing unit according to claim 5, wherein

The audio-frequency information generation unit is configured to: being displaced, is adjusted based on the user by the displacement computing unit output The audio data selected by the audio data selecting unit is made, to generate the multichannel audio information.

7. information processing unit according to claim 6, wherein

The audio-frequency information generation unit is configured to modulate the audio data selected by the audio data selecting unit, so that Acoustic image is placed on by the sound source that the multichannel audio information positions to be followed described in the displacement computing unit output At the position of user's displacement, to generate the multichannel audio information.

8. information processing unit according to claim 7, wherein

The audio-frequency information generation unit is configured to generate the multichannel audio information, so that acoustic image passes through the multichannel Audio-frequency information and the sound source that positions follow the position in space with a time delay, and the position is displaced mark from by the user The position of the user starts.

9. information processing unit according to claim 5, wherein

The audio-frequency information generation unit is obtained based on the user displacement by the displacement computing unit output and from outside The cartographic informations of the position coordinates including building generate the multichannel audio information so that the virtual objects are not It is placed in the cartographic information in the range of the position coordinates for the building for including.

10. information processing unit according to claim 9, wherein

The range of the position coordinates for the building that the audio-frequency information generation unit includes in the cartographic information and institute It states in the case that the position that virtual objects are placed overlaps and generates the multichannel audio information including impact sound.

11. information processing unit according to claim 1, further includes:

State analysis unit, is configured to analyze the state of the user, and the state can be according to the sensor information And one in the action mode of the user that is identified of the action recognition unit and change.

12. information processing unit according to claim 11, wherein

The audio data selecting unit be configured to select corresponding with the action mode of user audio data and with The corresponding audio data of state for the user that the state analysis unit is analyzed.

13. information processing unit according to claim 12, wherein

The audio-frequency information generation unit is configured to the audio data selecting unit is selected dynamic with the user The corresponding audio data of operation mode and the audio data for the state for corresponding to the user synthesize, thus the sound based on synthesis Frequency is according to the generation multichannel audio information.

14. information processing unit according to claim 11, wherein

The state analysis unit is configured to the institute identified according to the sensor information and the action recognition unit One in the action mode of user is stated to distribute fatigue strength per unit time, and accumulate distributed per unit time Fatigue strength, to calculate state of the fatigue strength as the user.

15. information processing unit according to claim 1, wherein

The audio data selecting unit is continued above in the action mode for the user that the action recognition unit is identified The audio with the action mode for corresponding to the user that the action recognition unit is identified is selected in the case where predetermined threshold The different audio data of data.

16. a kind of information processing method, comprising:

Action recognition step: the action mode based on sensor information identification user；

Audio data selects step: selection sound corresponding with the action mode of the user that the action recognition step is identified Frequency evidence；And

Audio-frequency information generation step: it based on the audio data by audio data selection step selection, generates for described The multichannel audio information that the acoustic image of sound source is positioned in real space around user.

17. a kind of program makes computer execute following steps: